Showing papers presented at "Extending Database Technology in 2002"

PDF

Open Access

Book Chapter•DOI•

[...]

Dan Olteanu¹, Holger Meuss¹, Tim Furche¹, François Bry¹•Institutions (1)

24 Mar 2002

TL;DR: Equivalences of XPath 1.0 location paths involving reverse axes, such as anc and prec, are established and used as rewriting rules in an algorithm for transforming location paths with reverse axes into equivalent reverse-axis-free ones.

...read moreread less

Abstract: The location path language XPath is of particular importance for XML applications since it is a core component of many XML processing standards such as XSLT or XQuery. In this paper, based on axis symmetry of XPath, equivalences of XPath 1.0 location paths involving reverse axes, such as anc and prec, are established. These equivalences are used as rewriting rules in an algorithm for transforming location paths with reverse axes into equivalent reverse-axis-free ones. Location paths without reverse axes, as generated by the presented rewriting algorithm, enable efficient SAX-like streamed data processing of XPath.

...read moreread less

257 citations

Book Chapter•DOI•

Tree Pattern Relaxation

[...]

Sihem Amer-Yahia¹, SungRan Cho², Divesh Srivastava¹•Institutions (2)

AT&T Labs¹, Stevens Institute of Technology²

25 Mar 2002

TL;DR: This paper studies the problem of approximate XML query matching, based on tree pattern relaxations, and devise efficient algorithms to evaluate relaxed tree patterns, and designs data pruning algorithms where intermediate query results are filtered dynamically during the evaluation process.

...read moreread less

Abstract: Tree patterns are fundamental to querying tree-structured data like XML Because of the heterogeneity of XML data, it is often more appropriate to permit approximate query matching and return ranked answers, in the spirit of Information Retrieval, than to return only exact answers In this paper, we study the problem of approximate XML query matching, based on tree pattern relaxations, and devise efficient algorithms to evaluate relaxed tree patterns We consider weighted tree patterns, where exact and relaxed weights, associated with nodes and edges of the tree pattern, are used to compute the scores of query answers We are interested in the problem of finding answers whose scores are at least as large as a given threshold We design data pruning algorithms where intermediate query results are filtered dynamically during the evaluation process We develop anoptimization that exploits scores of intermediate results to improve query evaluation efficiency Finally, we show experimentally that our techniques outperform rewriting-based and post-pruning strategies

...read moreread less

208 citations

Book Chapter•DOI•

The Index-Based XXL Search Engine for Querying XML Data with Relevance Ranking

[...]

Anja Theobald, Gerhard Weikum

25 Mar 2002

TL;DR: This paper presents the XXL search engine that supports relevance ranking on XML data XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions.

...read moreread less

Abstract: Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a rank list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names. This paper presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java servlets. Experiments with a variety of structurally diverse XML data demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.

...read moreread less

198 citations

Book Chapter•DOI•

Querying with Intrinsic Preferences

[...]

Jan Chomicki¹•Institutions (1)

University at Buffalo¹

25 Mar 2002

TL;DR: This work proposes a logical framework for formulating preferences and its embedding into relational query languages, which makes it possible to formulate different kinds of preferences and to use preferences in querying databases.

...read moreread less

Abstract: The handling of user preferences is becoming an increasingly important issue in present-day information systems. Among others, preferences are used for information filtering and extraction to reduce the volume of data presented to the user. They are also used to keep track of user profiles and formulate policies to improve and automate decision making. We propose a logical framework for formulating preferences and its embedding into relational query languages. The framework is simple, and entirely neutral with respect to the properties of preferences. It makes it possible to formulate different kinds of preferences and to use preferences in querying databases. We demonstrate the usefulness of the framework through numerous examples.

...read moreread less

174 citations

Book Chapter•DOI•

DAML+OIL: A Reason-Able Web Ontology Language

[...]

Ian Horrocks¹•Institutions (1)

University of Manchester¹

27 May 2002

TL;DR: DAML+OIL is an ontology language specifically designed for use on the Web that exploits existing Web standards (XML and RDF), adding the familiar ontological primitives of object oriented and frame based systems, and the formal rigor of a very expressive description logic.

...read moreread less

Abstract: Ontologies are set to play a key role in the "Semantic Web", extending syntactic interoperability to semantic interoperability by providing a source of shared and precisely defined terms. DAML+OIL is an ontology language specifically designed for use on the Web; it exploits existing Web standards (XML and RDF), adding the familiar ontological primitives of object oriented and frame based systems, and the formal rigor of a very expressive description logic. The logical basis of the language means that reasoning services can be provided, both to support ontology design and to make DAML+OIL described Web resources more accessible to automated processes.

...read moreread less

164 citations

Book Chapter•DOI•

Efficient Indexing of Spatiotemporal Objects

[...]

Marios Hadjieleftheriou¹, George Kollios², Vassilis J. Tsotras¹, Dimitrios Gunopulos¹•Institutions (2)

University of California¹, Boston University²

25 Mar 2002

TL;DR: This paper addresses the problem of indexing large volumes of spatiotemporal data by introducing two algorithms for splitting and presenting three algorithms that decide how the splits should be distributed among the objects so that the total empty space is minimized.

...read moreread less

Abstract: Spatiotemporal objects i.e., objects which change their position and/or extent over time, appear in many applications. This paper addresses the problem of indexing large volumes of such data. We consider general object movements and extent changes. We further concentrate on "snapshot" as well as small "interval" historical queries on the gathered data. The obvious approach that approximates spatiotemporal objects with MBRs and uses a traditional multidimensional access method to index them is inefficient. Objects that "live" for long time intervals have large MBRs which introduce a lot of empty space. Clustering long intervals has been dealt in temporal databases by the use of partially persistent indices. What differentiates this problem from traditional temporal indexing is that objects are allowed to move/change during their lifetime. Better methods are thus needed to approximate general spatiotemporal objects. One obvious solution is to introduce artificial splits: the lifetime of a long-lived object is split into smaller consecutive pieces. This decreases the empty space but increases the number of indexed MBRs. We first introduce two algorithms for splitting a given spatiotemporal object. Then, given an upper bound on the total number of possible splits, we present three algorithms that decide how the splits should be distributed among the objects so that the total empty space is minimized.

...read moreread less

158 citations

Book Chapter•DOI•

The Geometry of Uncertainty in Moving Objects Databases

[...]

Goce Trajcevski¹, Ouri Wolfson¹, Fengli Zhang¹, Sam Chamberlain²•Institutions (2)

University of Illinois at Chicago¹, United States Army Research Laboratory²

25 Mar 2002

TL;DR: In this paper, the authors propose to model a trajectory as a 3D cylindrical body and use a set of natural spatio-temporal operators to capture uncertainty, and also devise and analyze algorithms to process the operators.

...read moreread less

Abstract: This work addresses the problem of querying moving objects databases. which capture the inherent uncertainty associated with the location of moving point objects. We address the issue of modeling, constructing, and querying a trajectories database. We propose to model a trajectory as a 3D cylindrical body. The model incorporates uncertainty in a manner that enables efficient querying. Thus our model strikes a balance between modeling power, and computational efficiency. To demonstrate efficiency, we report on experimental results that relate the length of a trajectory to its size in bytes. The experiments were conducted using a real map of the Chicago Metropolitan area.We introduce a set of novel but natural spatio-temporal operators which capture uncertainty, and are used to express spatio-temporal range queries. We also devise and analyze algorithms to process the operators. The operators have been implemented as a part of our DOMINO project.

...read moreread less

132 citations

Book Chapter•DOI•

Estimating Answer Sizes for XML Queries

[...]

Yuqing Wu¹, Jignesh M. Patel¹, H. V. Jagadish¹•Institutions (1)

University of Michigan¹

25 Mar 2002

TL;DR: An extensive experimental evaluation is presented using several XML data sets, both real and synthetic, with a variety of queries, to demonstrate that accurate and robust estimates can be achieved, with limited space, and at a miniscule computational cost.

...read moreread less

Abstract: Estimating the sizes of query results, and intermediate results, is crucial to many aspects of query processing. In particular, it is necessary for effective query optimization. Even at the user level, predictions of the total result size can be valuable in "next-step" decisions, such as query refinement. This paper proposes a technique to obtain query result size estimates effectively in an XML database.Queries in XML frequently specify structural patterns, requiring specific relationships between selected elements. Whereas traditional techniques can estimate the number of nodes (XML elements) that will satisfy a node-specific predicate in the query pattern, such estimates cannot easily be combined to provide estimates for the entire query pattern, since element occurrences are expected to have high correlation.We propose a solution based on a novel histogram encoding of element occurrence position. With such position histograms, we are able to obtain estimates of sizes for complex pattern queries, as well as for simpler intermediate patterns that may be evaluated in alternative query plans, by means of a position histogram join (pH-join) algorithm that we introduce. We extend our technique to exploit schema information regarding allowable structure (the no-overlap property) through the use of a coverage histogram.We present an extensive experimental evaluation using several XML data sets, both real and synthetic, with a variety of queries. Our results demonstrate that accurate and robust estimates can be achieved, with limited space, and at a miniscule computational cost. These techniques have been implemented in the context of the TIMBER native XML database [22] at the University of Michigan.

...read moreread less

116 citations

Book Chapter•DOI•

Designing Functional Dependencies for XML

[...]

Mong Li Lee¹, Tok Wang Ling¹, Wai Lup Low²•Institutions (2)

National University of Singapore¹, DSO National Laboratories²

25 Mar 2002

TL;DR: It is shown how functional dependencies in XML can be verified with a single pass through the XML data, and a platform-independent framework is drawn up to demonstrate how the techniques proposed in this work can enrich the semantics of XML.

...read moreread less

Abstract: Functional dependencies are an integral part of database theory and they form the basis for normalizing relational tables up to BCNF. With the increasing relevance of the data-centric aspects of XML, it is pertinent to study functional dependencies in the context of XML, which will form the basis for further studies into XML keys and normalization. In this work, we investigate the design of functional dependencies in XML databases. We propose FDXML, a notation and DTD for representing functional dependencies in XML. We observe that many databases are hierarchical in nature and the corresponding nested XML data may inevitably contain redundancy. We develop a model based on FDXML to estimate the amount of data replication in XML data. We show how functional dependencies in XML can be verified with a single pass through the XML data, and present supporting experimental results. A platform-independent framework is also drawn up to demonstrate how the techniques proposed in this work can enrich the semantics of XML.

...read moreread less

108 citations

Book Chapter•DOI•

Profit Mining: From Patterns to Actions

[...]

Ke Wang¹, Senqiang Zhou¹, Jiawei Han²•Institutions (2)

Simon Fraser University¹, University of Illinois at Urbana–Champaign²

25 Mar 2002

TL;DR: In this article, a profit mining approach is proposed to reduce the gap between the statistic-based pattern extraction and the value-based decision making in data mining applications, where a set of past transactions and preselected target items are given, and a model for recommending target items and promotion strategies to new customers, with the goal of maximizing the net profit.

...read moreread less

Abstract: A major obstacle in data mining applications is the gap between the statistic-based pattern extraction and the value-based decision making. We present a profit mining approach to reduce this gap. In profit mining, we are given a set of past transactions and pre-selected target items, and we like to build a model for recommending target items and promotion strategies to new customers, with the goal of maximizing the net profit. We identify several issues in profit mining and propose solutions. We evaluate the effectiveness of this approach using data sets of a wide range of characteristics.

...read moreread less

101 citations

Book Chapter•DOI•

Dynamic Queries over Mobile Objects

[...]

Iosif Lazaridis¹, Kriengkrai Porkaew², Sharad Mehrotra¹•Institutions (2)

University of California¹, King Mongkut's University of Technology Thonburi²

25 Mar 2002

TL;DR: New query processing techniques for dynamic queries over mobile objects, i.e., queries that are themselves continuously changing with time, are introduced to address problems of incorporating such objects in databases.

...read moreread less

Abstract: Increasingly applications require the storage and retrieval of spatio-temporal information in a database management system. A type of such information is mobile objects, i.e., objects whose location changes continuously with time. Various techniques have been proposed to address problems of incorporating such objects in databases. In this paper, we introduce new query processing techniques for dynamic queries over mobile objects, i.e., queries that are themselves continuously changing with time. Dynamic queries are natural in situational awareness systems when an observer is navigating through space. All objects visible by the observer must be retrieved and presented to her at very high rates, to ensure a high-quality visualization. We show how our proposed techniques offer a great performance improvement over a traditional approach of multiple instantaneous queries.

...read moreread less

Book Chapter•DOI•

On Efficient Matching of Streaming XML Documents and Queries

[...]

Laks V. S. Lakshmanan¹, Sailaja Parthasarathy²•Institutions (2)

University of British Columbia¹, Sun Microsystems²

25 Mar 2002

TL;DR: In this paper, the authors present a framework where requirements and specifications are both registered with and maintained by a registry, and the registry matches new incoming specifications against requirements, and notifies the owners of the requirements of matches found.

...read moreread less

Abstract: Applications such as online shopping, e-commerce, and supply-chain management require the ability to manage large sets of specifications of products and/or services as well as of consumer requirements, and call for efficient matching of requirements to specifications.Requirements are best viewed as "queries" and specifications as data, often represented in XML. We present a framework where requirements and specifications are both registered with and are maintained by a registry. On a periodical basis, the registry matches new incoming specifications, e.g., of products and services, against requirements, and notifies the owners of the requirements of matches found. This problem is dual to the conventional problem of database query processing in that the size of data (e.g., a document that is streaming by) is quite small compared to the number of registered queries (which can be very large). For performing matches efficiently, we propose the notion of a "requirements index", a notion that is dual to a traditional index. We provide efficient matching algorithms that use the proposed indexes. Our prototype MatchMaker system implementation uses our requirements index-based matching algorithms as a core and provides timely notification service to registered users. We illustrate the effectiveness and scalability of the techniques developed with a detailed set of experiments.

...read moreread less

Book Chapter•DOI•

Efficient Algorithms for Mining Inclusion Dependencies

[...]

Fabien De Marchi¹, Stéphane Lopes, Jean-Marc Petit¹•Institutions (1)

Blaise Pascal University¹

25 Mar 2002

TL;DR: A new data mining algorithm for computing unary INDs is given and a levelwise algorithm is proposed to discover all remaining INDs, where candidate INDs of size i + 1 are generated from Satisfied INDsof size i, (i > 0).

...read moreread less

Abstract: Foreign keys form one of the most fundamental constraints for relational databases. Since they are not always defined in existing databases, algorithms need to be devised to discover foreign keys. One of the underlying problems is known to be the inclusion dependency (IND) inference problem. In this paper a new data mining algorithm for computing unary INDs is given. From unary INDs, we also propose a levelwise algorithmto discover all remaining INDs, where candidate INDs of size i + 1 are generated fromsatisfied INDs of size i, (i > 0).An implementation of these algorithms has been achieved and tested against synthetic databases. Up to our knowledge, this paper is the first one to address in a comprehensive manner this data mining problem, from algorithms to experimental results.

...read moreread less

Book Chapter•DOI•

Schema-Driven Evaluation of Approximate Tree-Pattern Queries

[...]

Torsten Schlieder¹•Institutions (1)

Free University of Berlin¹

25 Mar 2002

TL;DR: A simple query language for XML, which supports hierarchical, Boolean-connected query patterns, is presented and it is shown that the schema-based evaluation outperforms the pruning approach for small values of n.

...read moreread less

Abstract: We present a simple query language for XML, which supports hierarchical, Boolean-connected query patterns. The interpretation of a query is founded on cost-based query transformations: The total cost of a sequence of transformations measures the similarity between the query and the data and is used to rank the results. We introduce two polynomial-time algorithms that efficiently find the best n answers to the query: The first algorithm finds all approximate results, sorts them by increasing cost, and prunes the result list after the nthen try. The second algorithm uses a structural summary -the schema- of the database to estimate the best k transformed queries, which in turn are executed against the database. We compare both approaches and show that the schema-based evaluation outperforms the pruning approach for small values of n. The pruning strategy is the better choice if n is close to the total number of approximate results for the query.

...read moreread less

Book Chapter•DOI•

An Approach to Integrating Query Refinement in SQL

[...]

Michael Ortega-Binderberger¹, Kaushik Chakrabarti¹, Sharad Mehrotra²•Institutions (2)

University of Illinois at Urbana–Champaign¹, University of California, Irvine²

25 Mar 2002

TL;DR: In this article, a query refinement framework and an array of strategies for query refinement are presented to handle user subjectivity in similarity search systems, where the user judges individual result tuples and the system adapts and restructures the query to better reflect the users information need.

...read moreread less

Abstract: With the emergence of applications that require content-based similarity retrieval, techniques to support such a retrieval paradigm over database systems have emerged as a critical area of research. User subjectivity is an important aspect of such queries, i.e., which objects are relevant to the user and which are not depends on the perception of the user. Query refinement is used to handle user subjectivity in similarity search systems. This paper explores how to enhance database systems with query refinement for content-based (similarity) searches in object-relational databases. Query refinement is achieved through relevance feedback where the user judges individual result tuples and the system adapts and restructures the query to better reflect the users information need. We present a query refinement framework and an array of strategies for refinement that address different aspects of the problem. Our experiments demonstrate the effectiveness of the query refinement techniques proposed in this paper.

...read moreread less

Book Chapter•DOI•

Temporal Aggregation over Data Streams Using Multiple Granularities

[...]

Donghui Zhang¹, Dimitrios Gunopulos¹, Vassilis J. Tsotras¹, Bernhard Seeger²•Institutions (2)

University of California¹, University of Marburg²

25 Mar 2002

TL;DR: This paper presents specialized indexing schemes for dynamically and progressively maintaining temporal aggregates and discusses how these schemes can be extended to solve the more general range temporal and spatio-temporal aggregation problems.

...read moreread less

Abstract: Temporal aggregation is an important but costly operation for applications that maintain time-evolving data (data warehouses, temporal databases, etc.). In this paper we examine the problem of computing temporal aggregates over data streams. Such aggregates are maintained using multiple levels of temporal granularities: older data is aggregated using coarser granularities while more recent data is aggregated with finer detail. We present specialized indexing schemes for dynamically and progressively maintaining temporal aggregates. Moreover, these schemes can be parameterized. The levels of granularity as well as their corresponding index sizes (or validity lengths) can be dynamically adjusted. This provides a useful trade-off between aggregation detail and storage space. Analytical and experimental results show the efficiency of the proposed structures. Moreover, we discuss how the indexing schemes can be extended to solve the more general range temporal and spatio-temporal aggregation problems.

...read moreread less

Book Chapter•DOI•

Semantic Analysis of Business Process Executions

[...]

Fabio Casati¹, Ming-Chien Shan¹•Institutions (1)

Hewlett-Packard¹

25 Mar 2002

TL;DR: This paper presents a system and a set of techniques, developed at Hewlett-Packard, that overcome limitations, enabling the use of log data for efficient business-level analysis of business processes.

...read moreread less

Abstract: Business Process Management Systems log a large amount of operational data about processes and about the (human and automated) resources involved in their executions. This information can be analyzed for assessing the quality of business operations, identify problems, and suggest solutions. However, current process analysis systems lack the functionalities required to provide information that can be immediately digested and used by business analysts to take decisions. In this paper we discuss the limitations of existing approaches and we present a system and a set of techniques, developed at Hewlett-Packard, that overcome this limitations, enabling the use of log data for efficient business-level analysis of business processes.

...read moreread less

Book Chapter•DOI•

Efficient Complex Query Support for Multiversion XML Documents

[...]

Shu-Yao Chien¹, Vassilis J. Tsotras¹, Carlo Zaniolo¹, Donghui Zhang¹•Institutions (1)

University of California¹

25 Mar 2002

TL;DR: This paper examines three indexing schemes to efficiently evaluate partial version retrieval queries in this environment and relies on a scheme based on durable node numbers (DNNs) that preserve the order among the XML tree nodes and are invariant with respect to updates.

...read moreread less

Abstract: Managing multiple versions of XML documents represents a critical requirement for many applications. Also, there has been much recent interest in supporting complex queries on XML data (e.g., regular path expressions, structural projections, DIFF queries). In this paper, we examine the problem of supporting efficiently complex queries on multiversioned XML documents. Our approach relies on a scheme based on durable node numbers (DNNs) that preserve the order among the XML tree nodes and are invariant with respect to updates. Using the document's DNNs various complex queries are reduced to combinations of partial version retrieval queries. We examine three indexing schemes to efficiently evaluate partial version retrieval queries in this environment. A thorough performance analysis is then presented to reveal the advantages of each scheme.

...read moreread less

Book Chapter•DOI•

Selectivity Estimation for Spatial Joins with Geometric Selections

[...]

Chengyu Sun¹, Divyakant Agrawal¹, Amr El Abbadi¹•Institutions (1)

University of California¹

25 Mar 2002

TL;DR: In this article, a framework for estimating the selectivity of spatial joins constrained by geometric selections is proposed, where the center piece of the framework is Euler Histogram, which decomposes the estimation process into estimations on vertices, edges and faces.

...read moreread less

Abstract: Spatial join is an expensive operation that is commonly used in spatial database systems. In order to generate efficient query plans for the queries involving spatial join operations, it is crucial to obtain accurate selectivity estimates for these operations. In this paper we introduce a framework for estimating the selectivity of spatial joins constrained by geometric selections. The center piece of the framework is Euler Histogram, which decomposes the estimation process into estimations on vertices, edges and faces. Based on the characteristics of different datasets, different probabilistic models can be plugged into the framework to provide better estimation results. To demonstrate the effectiveness of this framework, we implement it by incorporating two existing probabilistic models, and compare the performance with the Geometric Histogram [1] and the algorithm recently proposed by Mamoulis and Papadias.

...read moreread less

Book Chapter•DOI•

Grouping in XML

[...]

Stelios Paparizos¹, Shurug Al-Khalifa¹, H. V. Jagadish¹, Laks V. S. Lakshmanan², Andrew Nierman¹, Divesh Srivastava³, Yuqing Wu¹ - Show less +3 more•Institutions (3)

University of Michigan¹, University of British Columbia², AT&T Labs³

24 Mar 2002

TL;DR: The techniques described here have been implemented in the TIMBER native XML database system being developed at the University of Michigan.

...read moreread less

Abstract: XML permits repeated and missing sub-elements, and missing attributes. We discuss the consequent implications on grouping, both with respect to specification and with respect to implementation. The techniques described here have been implemented in the TIMBER native XML database system being developed at the University of Michigan.

...read moreread less

Book Chapter•DOI•

Storing and Querying XML Data in Object-Relational DBMSs

[...]

Kanda Runapongsa¹, Jignesh M. Patel¹•Institutions (1)

University of Michigan¹

24 Mar 2002

TL;DR: It is demonstrated that using the XO-Rator algorithm, an ORDBMS is usually more efficient than a Relational DBMS (RDBMS) and that the primary reason for this performance improvement is that the XORator algorithm results in a database that is smaller in size, and queries that usually have fewer number of joins.

...read moreread less

Abstract: As the popularity of eXtensible Markup Language (XML) continues to increase at an astonishing pace, data management systems for storing and querying large repositories of XML data are urgently needed. In this paper, we investigate an Object-Relational DBMS (ORDBMS) for storing and querying XML data. We present an algorithm, called XORator, for mapping XML documents to tables in an ORDBMS. An important part of this mapping is assigning a fragment of an XML document to a new XML data type. We demonstrate that using the XO-Rator algorithm, an ORDBMS is usually more efficient than a Relational DBMS (RDBMS). Based on an actual implementation in DB2 V.7.2, we compare the performance of the XORator algorithm with a well-known algorithm for mapping XML data to an RDBMS. Our experiments show that the XORator algorithm requires less storage space, has much faster loading times, and in most cases can evaluate queries faster. The primary reason for this performance improvement is that the XORator algorithm results in a database that is smaller in size, and queries that usually have fewer number of joins.

...read moreread less

Book Chapter•DOI•

Metrics for XML Document Collections

[...]

Meike Klettke¹, Lars Schneider¹, Andreas Heuer¹•Institutions (1)

University of Rostock¹

24 Mar 2002

TL;DR: In this paper, several metrics for XML document collections are enumerated and their applications are subsequently discussed.

...read moreread less

Abstract: In this paper, several metrics for XML document collections are enumerated and their applications are subsequently discussed.

...read moreread less

Book Chapter•DOI•

Aggregate Processing of Planar Points

[...]

Yufei Tao¹, Dimitris Papadias¹, Jun Zhang¹•Institutions (1)

Hong Kong University of Science and Technology¹

25 Mar 2002

TL;DR: In this paper, the authors show that an aggregate window query can be answered in logarithmic worst-case time by an indexing structure called the aP-tree, and propose efficient cost models that predict the structure size and actual query cost.

...read moreread less

Abstract: Aggregate window queries return summarized information about objects that fall inside a query rectangle (e.g., the number of objects instead of their concrete ids). Traditional approaches for processing such queries usually retrieve considerable extra information, thus compromising the processing cost. The paper addresses this problem for planar points from both theoretical and practical points of view. We show that, an aggregate window query can be answered in logarithmic worst-case time by an indexing structure called the aP-tree. Next we study the practical behavior of the aP-tree and propose efficient cost models that predict the structure size and actual query cost. Extensive experiments show that the aP-tree, while involving more space consumption, accelerates query processing by up to an order of magnitude compared to a specialized method based on R-trees. Furthermore, our cost models are accurate and can be employed for the selection of the most appropriate method, balancing the space and query time tradeoff.

...read moreread less

Book Chapter•DOI•

A Structural Numbering Scheme for XML Data

[...]

Dao Dinh Kha¹, Masatoshi Yoshikawa¹, Masatoshi Yoshikawa², Shunsuke Uemura¹•Institutions (2)

Nara Institute of Science and Technology¹, Nagoya University²

24 Mar 2002

TL;DR: A new numbering scheme based on the UID techniques called multilevel recursive UID (rUID) is introduced, which is robust, scalable and hierarchical and takes into account the XML tree topology.

...read moreread less

Abstract: Identifier generation is a common but crucial task in many XML applications. In addition, the structural information of XML data is essential to evaluate the XML queries. In order to meet both these requirements, several numbering schemes, including the powerful UID technique, have been proposed. In this paper, we introduce a new numbering scheme based on the UID techniques called multilevel recursive UID (rUID). The proposed rUID is robust, scalable and hierarchical. rUID features identifier generation by level and takes into account the XML tree topology. rUID not only enables the computation of the parent node's identifier from the child node's identifier, as in the original UID, but also deals effectively with XML structural update and can be applied to arbitrarily large XML documents. In addition, we investigate the effectiveness of rUID in representing the XPath axes and query processing and briefly discuss other applications of rUID.

...read moreread less

Book Chapter•DOI•

ProPolyne: A Fast Wavelet-Based Algorithm for Progressive Evaluation of Polynomial Range-Sum Queries

[...]

Rolfe R. Schmidt¹, Cyrus Shahabi¹•Institutions (1)

University of Southern California¹

25 Mar 2002

TL;DR: A novel pre-aggregation method called ProPolyne is introduced to evaluate arbitrary polynomial range-sums progressively and shows that this approach of approximating queries rather than compressing data produces consistent and superior approximate results when compared to typical wavelet-based data compression techniques.

...read moreread less

Abstract: Many range aggregate queries can be efficiently derived from a class of fundamental queries: the polynomial range-sums. After demonstrating how any range-sum can be evaluated exactly in the wavelet domain, we introduce a novel pre-aggregation method called ProPolyne to evaluate arbitrary polynomial range-sums progressively. At each step of the computation, ProPolyne makes the best possible wavelet approximation of the submitted query. The result is a data-independent approximate query answering technique which uses data structures that can be maintained efficiently. ProPolyne's performance as an exact algorithm is comparable to the best known MOLAP techniques. Our experimental results show that this approach of approximating queries rather than compressing data produces consistent and superior approximate results when compared to typical wavelet-based data compression techniques.

...read moreread less

Book Chapter•DOI•

Evolving a Set of DTDs According to a Dynamic Set of XML Documents

[...]

Elisa Bertino¹, Giovanna Guerrini², Marco Mesiti³, Luigi Tosetto³•Institutions (3)

University of Milan¹, University of Pisa², University of Genoa³

24 Mar 2002

TL;DR: This paper addresses the problem of evolving a set of DTDs so to obtain a description as precise as possible of the structures of the documents actually stored in a source of XML documents.

...read moreread less

Abstract: In this paper we address the problem of evolving a set of DTDs so to obtain a description as precise as possible of the structures of the documents actually stored in a source of XML documents This problem is highly relevant in such a dynamic and heterogeneous environment as the Web The approach we propose relies on the use of a classification mechanism based on document structure and on the use of data mining association rules to find out frequent structural patterns in data

...read moreread less

Book Chapter•DOI•

Broadcast-Based Data Access in Wireless Environments

[...]

Xu Yang¹, Athman Bouguettaya¹•Institutions (1)

Virginia Tech¹

25 Mar 2002

TL;DR: In this article, the authors present an extensive study of some of the most representative indexing schemes and present a novel adaptive testbed for evaluating wireless data access methods in asymmetric communications.

...read moreread less

Abstract: Broadcast is one of the most suitable forms of information dissemination over wireless networks. It is particularly attractive for resource limited mobile clients in asymmetric communications. To support faster access to information and conserve battery power of mobile clients, a number of indexing schemes have been proposed in recent years. In this paper, we report on our extensive study of some of the most representative indexing schemes. We present a novel adaptive testbed for evaluating wireless data access methods. A comprehensive analytical study of the sample indexing schemes is also presented. Exhaustive simulations of these indexing schemes have been conducted. As a result, selection criteria for the suitability of the indexing schemes for different applications are proposed.

...read moreread less

Book Chapter•DOI•

QoS-Driven Load Shedding on Data Streams

[...]

Nesime Tatbul¹•Institutions (1)

Brown University¹

24 Mar 2002

TL;DR: The scope includes both classical query optimization issues adapted to the stream data environment as well as analysis and resolution of overload situations by intelligently discarding data based on application-dependent quality of service (QoS) information.

...read moreread less

Abstract: In this thesis, we are working on the optimized execution of very large number of continuous queries defined on data streams Our scope includes both classical query optimization issues adapted to the stream data environment as well as analysis and resolution of overload situations by intelligently discarding data based on application-dependent quality of service (QoS) information This paper serves as a prelude to our view of the problem and a promising approach to solve it

...read moreread less

Book Chapter•DOI•

Management of Dynamic Location Information in DOMINO

[...]

Ouri Wolfson¹, Hu Cao¹, Hai Lin¹, Goce Trajcevski¹, Fengli Zhang¹, Naphtali Rishe² - Show less +2 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Florida International University²

25 Mar 2002

TL;DR: In this article, the authors refer to applications with the above characteristics as moving objects database (MOD) applications, and to queries as the ones mentioned above as MOD queries, and refer to these applications as MOD applications in the context of the digital battlefield (c.f.

...read moreread less

Abstract: Consider a database that represents information about moving objects and their location. For example, for a database representing the location of taxi-cabs a typical query may be: retrieve the free cabs that are currently within 1 mile of 33 N. Michigan Ave., Chicago (to pick-up a customer); or for a trucking company database a typical query may be: retrieve the trucks that are currently within 1 mile of truck ABT312 (which needs assistance); or for a database representing the current location of objects in a battlefield a typical query may be: retrieve the friendly helicopters that are in a given region, or, retrieve the friendly helicopters that are expectedto enter the region within the next 10 minutes. The queries may originate from the moving objects, or from stationary users. We will refer to applications with the above characteristics as moving-objects-database (MOD) applications, and to queries as the ones mentioned above as MOD queries. In the military MOD applications arise in the context of the digital battlefield (c.f [1]), and in the civilian industry they arise in transportation systems. For example, Omnitracs developed by Qualcomm (see [3]) is a commercial system used by the transportation industry, which enables MOD functionality. It provides location management by connecting vehicles (e.g. trucks), via satellites, to company databases. The vehicles are equipped with a Global Positioning System (GPS), and they automatically and periodically report their location.

...read moreread less

Book Chapter•DOI•

Divide-and-Conquer Algorithm for Computing Set Containment Joins

[...]

Sergey Melnik¹, Hector Garcia-Molina¹•Institutions (1)

Stanford University¹

25 Mar 2002

TL;DR: Divide-and-conquer set join (DCJ) as mentioned in this paper is a partitioning algorithm for set containment joins, which is a join between set-valued attributes of two relations whose join condition is specified using the subset (?) operator.

...read moreread less

Abstract: A set containment join is a join between set-valued attributes of two relations, whose join condition is specified using the subset (?) operator. Set containment joins are used in a variety of database applications. In this paper, we propose a novel partitioning algorithm called Divide-and-Conquer Set Join (DCJ) for computing set containment joins efficiently. We show that the divide-and-conquer approach outperforms previously suggested algorithms over a wide range of data sets. We present a detailed analysis of DCJ and previously known algorithms and describe their behavior in an implemented testbed.

...read moreread less