Lore: a database management system for semistructured data

doi:10.1145/262762.262770

Home
/
Papers
/
Lore: a database management system for semistructured data

Journal Article•DOI•

Lore: a database management system for semistructured data

Jason G. McHugh¹, Serge Abiteboul¹, Roy Goldman¹, Dallas Quass¹, Jennifer Widom¹ - Show less +1 more•Institutions (1)

Stanford University¹

01 Sep 1997-Vol. 26, Iss: 3, pp 54-66

TL;DR: This paper provides an overview of these aspects of the Lore system, as well as other novel features such as dynamic structural summaries and seamless access to data from external sources.

read less

Abstract: Lore (for Lightweight Object Repository) is a DBMS designed specifically for managing semistructured information. Implementing Lore has required rethinking all aspects of a DBMS, including storage management, indexing, query processing and optimization, and user interfaces. This paper provides an overview of these aspects of the Lore system, as well as other novel features such as dynamic structural summaries and seamless access to data from external sources.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Data fusion

[...]

Jens Bleiholder¹, Felix Naumann¹•Institutions (1)

Hasso Plattner Institute¹

15 Jan 2009-ACM Computing Surveys

TL;DR: This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data Fusion.

...read moreread less

Abstract: The development of the Internet in recent years has made it possible and useful to access many different information systems anywhere in the world to obtain information. While there is much research on the integration of heterogeneous information systems, most commercial systems stop short of the actual integration of available data. Data fusion is the process of fusing multiple records representing the same real-world object into a single, consistent, and clean representation.This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data fusion, namely, uncertain and conflicting data values. We give an overview and classification of different ways of fusing data and present several techniques based on standard and advanced operators of the relational algebra and SQL. Finally, the article features a comprehensive survey of data integration systems from academia and industry, showing if and how data fusion is performed in each.

...read moreread less

1,797 citations

Cites background from "Lore: a database management system ..."

...Lore [McHugh et al. 1997] and Tukwila [Ives et al. 1999] are examples for systems who focus on semistructured data....
[...]

Proceedings Article•

DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

[...]

Roy Goldman¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

25 Aug 1997

TL;DR: The theoretical foundations of DataGuides are presented along with an algorithm for their creation and an overview of incremental maintenance, and performance results based on the implementation of dataGuides in the Lore DBMS for semistructured data are provided.

...read moreread less

Abstract: In semistructured databases there is no schema fixed in advance. To provide the benefits of a schema in such environments, we introduce DataGuides: concise and accurate structural summaries of semistructured databases. DataGuides serve as dynamic schemas, generated from the database; they are useful for browsing database structure, formulating queries, storing information such as statistics and sample values, and enabling query optimization. This paper presents the theoretical foundations of DataGuides along with an algorithm for their creation and an overview of incremental maintenance. We provide performance results based on our implementation of DataGuides in the Lore DBMS for semistructured data. We also describe the use of DataGuides in Lore, both in the user interface to enable structure browsing and query formulation, and as a means of guiding the query processor and optimizing query execution.

...read moreread less

1,341 citations

Proceedings Article•

Relational Databases for Querying XML Documents: Limitations and Opportunities

[...]

Jayavel Shanmugasundaram, Kristin Tufte, Chun Zhang, Gang He, David J. DeWitt, Jeffrey F. Naughton - Show less +2 more

07 Sep 1999

TL;DR: It turns out that the relational approach can handle most (but not all) of the semantics of semi-structured queries over XML data, but is likely to be effective only in some cases.

...read moreread less

Abstract: XML is fast emerging as the dominant standard for representing data in the World Wide Web. Sophisticated query engines that allow users to effectively tap the data stored in XML documents will be crucial to exploiting the full power of XML. While there has been a great deal of activity recently proposing new semistructured data models and query languages for this purpose, this paper explores the more conservative approach of using traditional relational database engines for processing XML documents conforming to Document Type Descriptors (DTDs). To this end, we have developed algorithms and implemented a prototype system that converts XML documents to relational tuples, translates semi-structured queries over XML documents to SQL queries over tables, and converts the results to XML. We have qualitatively evaluated this approach using several real DTDs drawn from diverse domains. It turns out that the relational approach can handle most (but not all) of the semantics of semi-structured queries over XML data, but is likely to be effective only in some cases. We identify the causes for these limitations and propose certain extensions to the relational model that would make it more appropriate for processing queries over XML documents.

...read moreread less

1,111 citations

Cites background from "Lore: a database management system ..."

...Techniques for building such indices have been proposed in the context of semistructured databases [14]....
[...]
...Since an XML document is an example of a semi-structured data set (it is tree-structured, with each node in the tree described by a label), why not use semi-structured query languages and query evaluation techniques? This is indeed a viable approach, and there is considerable activity in the semistructured data community focussed upon exploiting this approach [5,14]....
[...]
...There has been a lot of work developing special purpose query engines for semi-structured data [5,14]....
[...]

Proceedings Article•DOI•

Holistic twig joins: optimal XML pattern matching

[...]

Nicolas Bruno¹, Nick Koudas², Divesh Srivastava²•Institutions (2)

Columbia University¹, AT&T Labs²

03 Jun 2002

TL;DR: This paper proposes a novel holistic twig join algorithm, TwigStack, that uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern.

...read moreread less

Abstract: XML employs a tree-structured data model, and, naturally, XML queries specify patterns of selection predicates on multiple elements related by a tree structure. Finding all occurrences of such a twig pattern in an XML database is a core operation for XML query processing. Prior work has typically decomposed the twig pattern into binary structural (parent-child and ancestor-descendant) relationships, and twig matching is achieved by: (i) using structural join algorithms to match the binary relationships against the XML database, and (ii) stitching together these basic matches. A limitation of this approach for matching twig patterns is that intermediate result sizes can get large, even when the input and output sizes are more manageable.In this paper, we propose a novel holistic twig join algorithm, TwigStack, for matching an XML query twig pattern. Our technique uses a chain of linked stacks to compactly represent partial results to root-to-leaf query paths, which are then composed to obtain matches for the twig pattern. When the twig pattern uses only ancestor-descendant relationships between elements, TwigStack is I/O and CPU optimal among all sequential algorithms that read the entire input: it is linear in the sum of sizes of the input lists and the final result list, but independent of the sizes of intermediate results. We then show how to use (a modification of) B-trees, along with TwigStack, to match query twig patterns in sub-linear time. Finally, we complement our analysis with experimental results on a range of real and synthetic data, and query twig patterns.

...read moreread less

1,014 citations

Cites background from "Lore: a database management system ..."

...Their results showed that the MPMGJN algorithm could outperform standard RDBMS join algorithms by more than an order of magni- tude....
[...]
...In particular, work done in the Lore DBMS [21, 16, 17], and the Niagara system [19], has consid- ered various aspects of query processing on such data....
[...]
...Since a great deal of XML data is expected to be stored in relational database systems (all the major DBMS vendors including Oracle, IBM and Microsoft are providing system support for XML data), our study provides evidence that RDBMS systems need to augment their suite of query pro- cessing strategies to include holistic twig joins for efficient XML query processing....
[...]
...In particular, work done in the Lore DBMS [21, 16, 17], and the Niagara system [19], has considered various aspects of query processing on such data....
[...]

Proceedings Article•DOI•

Structural joins: a primitive for efficient XML query pattern matching

[...]

Shurug Al-Khalifa¹, H. V. Jagadish, Nick Koudas¹, Jignesh M. Patel¹, Divesh Srivastava¹, Yuqing Wu¹ - Show less +2 more•Institutions (1)

University of Michigan¹

07 Aug 2002

TL;DR: It is shown that, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse, and this behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack- tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-MERge algorithms do not have the same guarantee.

...read moreread less

Abstract: XML queries typically specify patterns of selection predicates on multiple elements that have some specified tree structured relationships. The primitive tree structured relationships are parent-child and ancestor-descendant, and finding all occurrences of these relationships in an XML database is a core operation for XML query processing. We develop two families of structural join algorithms for this task: tree-merge and stack-tree. The tree-merge algorithms are a natural extension of traditional merge joins and the multi-predicate merge joins, while the stack-tree algorithms have no counterpart in traditional relational join processing. We present experimental results on a range of data and queries using the TIMBER native XML query engine built on top of SHORE. We show that while, in some cases, tree-merge algorithms can have performance comparable to stack-tree algorithms, in many cases they are considerably worse. This behavior is explained by analytical results that demonstrate that, on sorted inputs, the stack-tree algorithms have worst-case I/O and CPU complexities linear in the sum of the sizes of inputs and output, while the tree-merge algorithms do not have the same guarantee.

...read moreread less

895 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Federated database systems for managing distributed, heterogeneous, and autonomous databases

[...]

Amit P. Sheth, James A. Larson¹•Institutions (1)

Intel¹

01 Sep 1990-ACM Computing Surveys

TL;DR: In this paper, the authors define a reference architecture for distributed database management systems from system and schema viewpoints and show how various FDBS architectures can be developed, and define a methodology for developing one of the popular architectures of an FDBS.

...read moreread less

Abstract: A federated database system (FDBS) is a collection of cooperating database systems that are autonomous and possibly heterogeneous. In this paper, we define a reference architecture for distributed database management systems from system and schema viewpoints and show how various FDBS architectures can be developed. We then define a methodology for developing one of the popular architectures of an FDBS. Finally, we discuss critical issues related to developing and operating an FDBS.

...read moreread less

2,376 citations

Journal Article•DOI•

Query evaluation techniques for large databases

[...]

Goetz Graefe¹•Institutions (1)

Portland State University¹

01 Jun 1993-ACM Computing Surveys

TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.

...read moreread less

Abstract: Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate the problem: In order to manipulate large sets of complex objects as efficiently as today's database systems manipulate simple records, query-processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.

...read moreread less

1,427 citations

Proceedings Article•

DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases

[...]

Roy Goldman¹, Jennifer Widom¹•Institutions (1)

Stanford University¹

25 Aug 1997

...read moreread less

1,341 citations

Journal Article•DOI•

The Lorel Query Language for Semistructured Data

[...]

Serge Abiteboul¹, Dallan Quass¹, Jason G. McHugh¹, Jennifer Widom¹, Janet L. Wiener¹ - Show less +1 more•Institutions (1)

Stanford University¹

01 Apr 1997-International Journal on Digital Libraries

TL;DR: The main novelties of the Lorel language are the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user.

...read moreread less

Abstract: language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular: some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of the ODMG data model and the Lorel language as an extension of OQL. The main novelties of the Lorel language are: (i) the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and (ii) powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user. Lorel also includes a declarative update language. Lorel is implemented as the query language of the Lore prototype database management system at Stanford. Information about Lore can be found at http://www-db.stanford.edu/lore. In addition to presenting the Lorel language in full, this paper briefly describes the Lore system and query processor. We also briefly discuss a second implementation of Lorel on top of a conventional object-oriented database management system, the O2 system.

...read moreread less

1,257 citations

Book•

The object database standard: ODMG 2.0

[...]

R. G. G. Cattell, Douglas K. Barry, Dirk Bartels, Mark Berler, Jeff Eastman, Sophie Gamerman, David Jordan, Adam Springer, Henry Strickland, Drew Wade - Show less +6 more

01 May 1997

TL;DR: With this book, standards are defined for object management systems and this will be the foundational book for object-oriented database product.

...read moreread less

Abstract: This book is the first of its kind and is produced as a result of the efforts by a consortium of database companies called the Object Database Management Group (ODMG). With this book, standards are defined for object management systems and this will be the foundational book for object-oriented database product.

...read moreread less

1,231 citations