Vassia Pavlaki
National Technical University of Athens
The impact of arithmetic comparisons in data integration and data exchange settings
Abstract
Conjunctive queries constitute a class of queries
characterized by many researchers as "the greatest success story of the
theory of database queries". These queries correspond to the most
common queries in database practice, (e.g. SQL select-project-join
queries). They are surprisingly well-behaved in the sense that many
important properties hold for conjunctive queries. However, in practice
users often ask select-project-join queries that involve comparisons in
the selection condition (e.g. price <= 100). The original definition
of conjunctive queries does not allow for comparisons between data
values. For this reason, the class of conjunctive queries was extended
to include also built-in predicates which are arithmetic comparisons.
In this talk I will discuss the impact of arithmetic comparisons in data
integration and data exchange settings. In particular, the problem of
rewriting queries using views becomes more complex when both the query
and the views are conjunctive queries with arithmetic comparisons. For
simple conjunctive queries and views, there exist efficient algorithms
in the literature, which find equivalent rewritings or maximally
contained rewritings. I will discuss the reasons why existing algorithms
can not be extended in a straightforward way to handle also arithmetic
comparisons. Moreover, I will show that existing data exchange setting
can not be easily extended to compute a solution and answer a query
posed over the target schema, when both the constraints in the setting
and the query include arithmetic comparisons.
Nicoleta Preda
INRIA and Paris-Sud XI University
XML processing in DHT networks
Abstract
The current development of peer-to-peer (P2P) information sharing has
opened the way for supporting high-level data management applications
and in particular structured queries in a P2P setting. An issue of
particular importance is the management of queries over XML data, and in
particular its scaling. We introduce the KadoP platform that leverages
on the Distributed Hash Tables (DHT) technology to support P2P XML query
processing.
Such an approach raises a number of performance issues. The purpose of
the presentation is two-fold. First, we identify DHT aspects hidering
efficient query processing and propose an array of techniques we
developed to lift these limitations. Second, we extend the technique to
also manage intensional content, and in particular, document includes.
Chong Sun
UC Santa Cruz
Multiway SLCA-based Keyword Search in XML Data
by Chong Sun, Chee-Yong Chan, Amit Goenka
Abstract
Keyword search for smallest lowest common ancestors (SLCAs) in XML data has
recently been proposed as a meaningful way to identify interesting data
nodes in XML data whose subtrees contain an input set of keywords. In this
paper, we generalize this useful search paradigm to support keyword search
beyond the traditional AND semantics to include both AND and OR boolean
operators as well. We first analyze properties of the LCA computation and
propose more efficient algorithms to solve the traditional keyword search
problem (with only AND semantics). We then extend our approach to handle
general keyword search involving combinations of AND and OR boolean
operators. The effectiveness of our new algorithms is demonstrated with a
comprehensive experimental performance study.
Don Chamberlin
IBM Almaden Research Center
New Standards from W3C: XPath, XQuery, and XSLT
Abstract
On January 23, 2007, the World Wide Web Consortium (W3C) announced a new
suite of XML-related recommendations, including a new query language
called XQuery and significant updates to the widely-used XPath and XSLT
languages. This talk will describe the new languages, their
significance, and their relationship to other XML standards. It will
also discuss the W3C design process and some of the influences that
shaped the design of these languages. It will conclude with a look at
some of the ongoing work at W3C relating to XML languages and standards.
Philip Bohannon
Yahoo! Research
"Open-World" Data Cleaning
Abstract
"Data Cleaning" in the database literature typically focuses on the
process of populating a data warehouse: objects should be
de-duplicated based on available evidence, errors such as constraint
violations should be detected and data or query results amended
accordingly, and suitable transformations should be applied so that
the integrated data in the warehouse is as usable as possible. However
these efforts are "closed-world" in the sense that the availability of
data external to the system is largely ignored. In this talk, I
advocate an expanded scope of "data cleaning" that incorporates "open
world" issues in two ways. First, the value of improved data quality
should be characterized based on the risk caused by dubious or incorrect
data. Second, the potential to make use of higher quality (but more
expensive) external data to improve data quality should be considered.
Various parts of this talk are based on joint work with Michael
Benedikt, Glenn Bruns, Wenfei Fan and Rajeev Rastogi (though any
problems are mine).
I will also give a brief introduction to the Community Systems Research
Group at Yahoo! Research, and some additional examples of the problems
and opportunities we are facing in web data management, particularly
when seeking to support and leverage on-line communities.