scispace - formally typeset
Search or ask a question
Journal Article•DOI•

Lore: a database management system for semistructured data

Jason G. McHugh1, Serge Abiteboul1, Roy Goldman1, Dallas Quass1, Jennifer Widom1 •
01 Sep 1997-Vol. 26, Iss: 3, pp 54-66
TL;DR: This paper provides an overview of these aspects of the Lore system, as well as other novel features such as dynamic structural summaries and seamless access to data from external sources.
Abstract: Lore (for Lightweight Object Repository) is a DBMS designed specifically for managing semistructured information. Implementing Lore has required rethinking all aspects of a DBMS, including storage management, indexing, query processing and optimization, and user interfaces. This paper provides an overview of these aspects of the Lore system, as well as other novel features such as dynamic structural summaries and seamless access to data from external sources.

Content maybe subject to copyright    Report

Citations
More filters
Journal Article•DOI•
TL;DR: This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data Fusion.
Abstract: The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation.This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data fusion, namely, uncertain and conflicting data values. We give an overview and classification of different ways of fusing data and present several techniques based on standard and advanced operators of the relational algebra and SQL. Finally, the article features a comprehensive survey of data integration systems from academia and industry, showing if and how data fusion is performed in each.

1,797 citations


Cites background from "Lore: a database management system ..."

  • ...Lore [McHugh et al. 1997] and Tukwila [Ives et al. 1999] are examples for systems who focus on semistructured data....

    [...]

Proceedings Article•
25 Aug 1997
TL;DR: The theoretical foundations of DataGuides are presented along with an algorithm for their creation and an overview of incremental maintenance, and performance results based on the implementation of dataGuides in the Lore DBMS for semistructured data are provided.
Abstract: In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.

1,341 citations

Proceedings Article•
07 Sep 1999
TL;DR: It turns out that the relational approach can handle most (but not all) of the semantics of semi-structured queries over XML data, but is likely to be effective only in some cases.
Abstract: XML is fast emerging as the dominant standard for representing data in the World Wide Web. Sophisticated query engines that allow users to effectively tap the data stored in XML documents will be crucial to exploiting the full power of XML. While there has been a great deal of activity recently proposing new semistructured data models and query languages for this purpose, this paper explores the more conservative approach of using traditional relational database engines for processing XML documents conforming to Document Type Descriptors (DTDs). To this end, we have developed algorithms and implemented a prototype system that converts XML documents to relational tuples, translates semi-structured queries over XML documents to SQL queries over tables, and converts the results to XML. We have qualitatively evaluated this approach using several real DTDs drawn from diverse domains. It turns out that the relational approach can handle most (but not all) of the semantics of semi-structured queries over XML data, but is likely to be effective only in some cases. We identify the causes for these limitations and propose certain extensions to the relational model that would make it more appropriate for processing queries over XML documents.

1,111 citations


Cites background from "Lore: a database management system ..."

  • ...Techniques for building such indices have been proposed in the context of semistructured databases [14]....

    [...]

  • ...Since an XML document is an example of a semi-structured data set (it is tree-structured, with each node in the tree described by a label), why not use semi-structured query languages and query evaluation techniques? This is indeed a viable approach, and there is considerable activity in the semistructured data community focussed upon exploiting this approach [5,14]....

    [...]

  • ...There has been a lot of work developing special purpose query engines for semi-structured data [5,14]....

    [...]

Proceedings Article•DOI•
03 Jun 2002
TL;DR: This paper proposes a novel holistic twig join algorithm, TwigStack, that uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern.
Abstract: XML employs a tree-structured data model, and, naturally, XML queries specify patterns of selection predicates on multiple elements related by a tree structure. Finding all occurrences of such a twig pattern in an XML database is a core operation for XML query processing. Prior work has typically decomposed the twig pattern into binary structural (parent-child and ancestor-descendant) relationships, and twig matching is achieved by: (i) using structural join algorithms to match the binary relationships against the XML database, and (ii) stitching together these basic matches. A limitation of this approach for matching twig patterns is that intermediate result sizes can get large, even when the input and output sizes are more manageable.In this paper, we propose a novel holistic twig join algorithm, TwigStack, for matching an XML query twig pattern. Our technique uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern. When the twig pattern uses only ancestor-descendant relationships between elements, TwigStack is I/O and CPU optimal among all sequential algorithms that read the entire input: it is linear in the sum of sizes of the input lists and the final result list, but independent of the sizes of intermediate results. We then show how to use (a modification of) B-trees, along with TwigStack, to match query twig patterns in sub-linear time. Finally, we complement our analysis with experimental results on a range of real and synthetic data, and query twig patterns.

1,014 citations


Cites background from "Lore: a database management system ..."

  • ...Their results showed that the MPMGJN algorithm could outperform standard RDBMS join algorithms by more than an order of magni- tude....

    [...]

  • ...In particular, work done in the Lore DBMS [21, 16, 17], and the Niagara system [19], has consid- ered various aspects of query processing on such data....

    [...]

  • ...Since a great deal of XML data is expected to be stored in relational database systems (all the major DBMS vendors including Oracle, IBM and Microsoft are providing system support for XML data), our study provides evidence that RDBMS systems need to augment their suite of query pro- cessing strategies to include holistic twig joins for efficient XML query processing....

    [...]

  • ...In particular, work done in the Lore DBMS [21, 16, 17], and the Niagara system [19], has considered various aspects of query processing on such data....

    [...]

Proceedings Article•DOI•
07 Aug 2002
TL;DR: It is shown that, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse, and this behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack- tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-MERge algorithms do not have the same guarantee.
Abstract: XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. We develop two families of structural join algorithms for this task: tree-merge and stack-tree. The tree-merge algorithms are a natural extension of traditional merge joins and the multi-predicate merge joins, while the stack-tree algorithms have no counterpart in traditional relational join processing. We present experimental results on a range of data and queries using the TIMBER native XML query engine built on top of SHORE. We show that while, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse. This behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack-tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-merge algorithms do not have the same guarantee.

895 citations

References
More filters
Journal Article•DOI•
Amit P. Sheth, James A. Larson1•
TL;DR: In this paper, the authors define a reference architecture for distributed database management systems from system and schema viewpoints and show how various FDBS architectures can be developed, and define a methodology for developing one of the popular architectures of an FDBS.
Abstract: A federated database system (FDBS) is a collection of cooperating database systems that are autonomous and possibly heterogeneous. In this paper, we define a reference architecture for distributed database management systems from system and schema viewpoints and show how various FDBS architectures can be developed. We then define a methodology for developing one of the popular architectures of an FDBS. Finally, we discuss critical issues related to developing and operating an FDBS.

2,376 citations

Journal Article•DOI•
Goetz Graefe1•
TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Abstract: Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate the problem: In order to manipulate large sets of complex objects as efficiently as today's database systems manipulate simple records, query-processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.

1,427 citations

Proceedings Article•
25 Aug 1997
TL;DR: The theoretical foundations of DataGuides are presented along with an algorithm for their creation and an overview of incremental maintenance, and performance results based on the implementation of dataGuides in the Lore DBMS for semistructured data are provided.
Abstract: In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.

1,341 citations

Journal Article•DOI•
Serge Abiteboul1, Dallan Quass1, Jason G. McHugh1, Jennifer Widom1, Janet L. Wiener1 •
TL;DR: The main novelties of the Lorel language are the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user.
Abstract: language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular: some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of the ODMG data model and the Lorel language as an extension of OQL. The main novelties of the Lorel language are: (i) the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and (ii) powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user. Lorel also includes a declarative update language. Lorel is implemented as the query language of the Lore prototype database management system at Stanford. Information about Lore can be found at http://www-db.stanford.edu/lore. In addition to presenting the Lorel language in full, this paper briefly describes the Lore system and query processor. We also briefly discuss a second implementation of Lorel on top of a conventional object-oriented database management system, the O2 system.

1,257 citations

Book•
01 May 1997
TL;DR: With this book, standards are defined for object management systems and this will be the foundational book for object-oriented database product.
Abstract: This book is the first of its kind and is produced as a result of the efforts by a consortium of database companies called the Object Database Management Group (ODMG). With this book, standards are defined for object management systems and this will be the foundational book for object-oriented database product.

1,231 citations