UC Santa CruzUniversity Relations

Abstracts for Seminar Presentations - Spring 2007


Vassia Pavlaki
National Technical University of Athens

The impact of arithmetic comparisons in data integration and data exchange settings

Abstract
Conjunctive queries constitute a class of queries characterized by many researchers as "the greatest success story of the theory of database queries". These queries correspond to the most common queries in database practice, (e.g. SQL select-project-join queries). They are surprisingly well-behaved in the sense that many important properties hold for conjunctive queries. However, in practice users often ask select-project-join queries that involve comparisons in the selection condition (e.g. price <= 100). The original definition of conjunctive queries does not allow for comparisons between data values. For this reason, the class of conjunctive queries was extended to include also built-in predicates which are arithmetic comparisons.
In this talk I will discuss the impact of arithmetic comparisons in data integration and data exchange settings. In particular, the problem of rewriting queries using views becomes more complex when both the query and the views are conjunctive queries with arithmetic comparisons. For simple conjunctive queries and views, there exist efficient algorithms in the literature, which find equivalent rewritings or maximally contained rewritings. I will discuss the reasons why existing algorithms can not be extended in a straightforward way to handle also arithmetic comparisons. Moreover, I will show that existing data exchange setting can not be easily extended to compute a solution and answer a query posed over the target schema, when both the constraints in the setting and the query include arithmetic comparisons.


Nicoleta Preda
INRIA and Paris-Sud XI University

XML processing in DHT networks

Abstract
The current development of peer-to-peer (P2P) information sharing has opened the way for supporting high-level data management applications and in particular structured queries in a P2P setting. An issue of particular importance is the management of queries over XML data, and in particular its scaling. We introduce the KadoP platform that leverages on the Distributed Hash Tables (DHT) technology to support P2P XML query processing. Such an approach raises a number of performance issues. The purpose of the presentation is two-fold. First, we identify DHT aspects hidering efficient query processing and propose an array of techniques we developed to lift these limitations. Second, we extend the technique to also manage intensional content, and in particular, document includes.


Chong Sun
UC Santa Cruz

Multiway SLCA-based Keyword Search in XML Data
by Chong Sun, Chee-Yong Chan, Amit Goenka

Abstract
Keyword search for smallest lowest common ancestors (SLCAs) in XML data has recently been proposed as a meaningful way to identify interesting data nodes in XML data whose subtrees contain an input set of keywords. In this paper, we generalize this useful search paradigm to support keyword search beyond the traditional AND semantics to include both AND and OR boolean operators as well. We first analyze properties of the LCA computation and propose more efficient algorithms to solve the traditional keyword search problem (with only AND semantics). We then extend our approach to handle general keyword search involving combinations of AND and OR boolean operators. The effectiveness of our new algorithms is demonstrated with a comprehensive experimental performance study.


Don Chamberlin
IBM Almaden Research Center

New Standards from W3C: XPath, XQuery, and XSLT

Abstract
On January 23, 2007, the World Wide Web Consortium (W3C) announced a new suite of XML-related recommendations, including a new query language called XQuery and significant updates to the widely-used XPath and XSLT languages. This talk will describe the new languages, their significance, and their relationship to other XML standards. It will also discuss the W3C design process and some of the influences that shaped the design of these languages. It will conclude with a look at some of the ongoing work at W3C relating to XML languages and standards.


Philip Bohannon
Yahoo! Research

"Open-World" Data Cleaning

Abstract
"Data Cleaning" in the database literature typically focuses on the process of populating a data warehouse: objects should be de-duplicated based on available evidence, errors such as constraint violations should be detected and data or query results amended accordingly, and suitable transformations should be applied so that the integrated data in the warehouse is as usable as possible. However these efforts are "closed-world" in the sense that the availability of data external to the system is largely ignored. In this talk, I advocate an expanded scope of "data cleaning" that incorporates "open world" issues in two ways. First, the value of improved data quality should be characterized based on the risk caused by dubious or incorrect data. Second, the potential to make use of higher quality (but more expensive) external data to improve data quality should be considered.
Various parts of this talk are based on joint work with Michael Benedikt, Glenn Bruns, Wenfei Fan and Rajeev Rastogi (though any problems are mine).
I will also give a brief introduction to the Community Systems Research Group at Yahoo! Research, and some additional examples of the problems and opportunities we are facing in web data management, particularly when seeking to support and leverage on-line communities.


Site maintainer: Bogdan Alexe
Last modified: October 31 2006 13:08:46.