scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The Lorel Query Language for Semistructured Data

01 Apr 1997-International Journal on Digital Libraries (Springer-Verlag)-Vol. 1, Iss: 1, pp 68-88
TL;DR: The main novelties of the Lorel language are the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user.
Abstract: language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured documents such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular: some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of the ODMG data model and the Lorel language as an extension of OQL. The main novelties of the Lorel language are: (i) the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and (ii) powerful path expressions, which permit a flexible form of declarative navigational access and are particularly suitable when the details of the structure are not known to the user. Lorel also includes a declarative update language. Lorel is implemented as the query language of the Lore prototype database management system at Stanford. Information about Lore can be found at http://www-db.stanford.edu/lore. In addition to presenting the Lorel language in full, this paper briefly describes the Lore system and query processor. We also briefly discuss a second implementation of Lorel on top of a conventional object-oriented database management system, the O2 system.
Citations
More filters
Proceedings ArticleDOI
03 Jun 2002
TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Abstract: Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

2,716 citations


Cites background from "The Lorel Query Language for Semist..."

  • ...The main difficulty arising in this context is that languages for querying semistructured data enable expressing regularpath queries [2, 15, 45]....

    [...]

01 Jan 2006
TL;DR: There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99].
Abstract: The book Knowledge Discovery in Databases, edited by Piatetsky-Shapiro and Frawley [PSF91], is an early collection of research papers on knowledge discovery from data. The book Advances in Knowledge Discovery and Data Mining, edited by Fayyad, Piatetsky-Shapiro, Smyth, and Uthurusamy [FPSSe96], is a collection of later research results on knowledge discovery and data mining. There have been many data mining books published in recent years, including Predictive Data Mining by Weiss and Indurkhya [WI98], Data Mining Solutions: Methods and Tools for Solving Real-World Problems by Westphal and Blaxton [WB98], Mastering Data Mining: The Art and Science of Customer Relationship Management by Berry and Linofi [BL99], Building Data Mining Applications for CRM by Berson, Smith, and Thearling [BST99], Data Mining: Practical Machine Learning Tools and Techniques by Witten and Frank [WF05], Principles of Data Mining (Adaptive Computation and Machine Learning) by Hand, Mannila, and Smyth [HMS01], The Elements of Statistical Learning by Hastie, Tibshirani, and Friedman [HTF01], Data Mining: Introductory and Advanced Topics by Dunham, and Data Mining: Multimedia, Soft Computing, and Bioinformatics by Mitra and Acharya [MA03]. There are also books containing collections of papers on particular aspects of knowledge discovery, such as Machine Learning and Data Mining: Methods and Applications edited by Michalski, Brakto, and Kubat [MBK98], and Relational Data Mining edited by Dzeroski and Lavrac [De01], as well as many tutorial notes on data mining in major database, data mining and machine learning conferences.

2,591 citations

01 Jan 2007
TL;DR: XML is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories.
Abstract: XML is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. A query language that uses the

2,066 citations

Journal ArticleDOI
TL;DR: This paper surveys the research in the area of Web mining, point out some confusions regarded the usage of the term Web mining and suggest three Web mining categories, which are then situate some of the research with respect to these three categories.
Abstract: With the huge amount of information available online, the World Wide Web is a fertile area for data mining research. The Web mining research is at the cross road of research from several research communities, such as database, information retrieval, and within AI, especially the sub-areas of machine learning and natural language processing. However, there is a lot of confusions when comparing research efforts from different point of views. In this paper, we survey the research in the area of Web mining, point out some confusions regarded the usage of the term Web mining and suggest three Web mining categories. Then we situate some of the research with respect to these three categories. We also explore the connection between the Web mining categories and the related agent paradigm. For the survey, we focus on representation issues, on the process, on the learning algorithm, and on the application of the recent works as the criteria. We conclude the paper with some research issues.

1,699 citations


Cites background or methods from "The Lorel Query Language for Semist..."

  • ...There has been some work on query languages for semi-structured data [2; 19] and for the Web [8; 78; 87; 47]....

    [...]

  • ...- (Modifi~ (Modified) association rules -Categorization Finding frequent sub--Categorization structures -Clustering - Web site schema discovery -Clustering - Finding extraction rules - Finding patterns in text -User modeling Table 1: Web mining categories Web Mining Web Content Mining Web Structure Mining DB View -Semi structured - Links structure -Web site as DB -Hypertext documents - Links structure -Edge-labeled graph (OEM) - Graph -Relational -Proprietary algorithms -Proprietary algorithms , - ILP readers can consult the book by Mitchell [89] and the re-spective papers for the explanation of the methods....

    [...]

  • ...The DB view mainly uses Object Exchange Model (OEM) [2] that represents semi-structured data by a labeled graph....

    [...]

  • ...The DB view mainly uses Object Exchange Model (OEM) [2] that represents semi-structured data by a labeled graph....

    [...]

  • ...The data in the OEM is viewed as a graph, with ob- jects as the vertices and labels on the edges....

    [...]

Journal ArticleDOI
TL;DR: The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.
Abstract: Graph database models can be defined as those in which data structures for the schema and instances are modeled as graphs or generalizations of them, and data manipulation is expressed by graph-oriented operations and type constructors. These models took off in the eighties and early nineties alongside object-oriented models. Their influence gradually died out with the emergence of other database models, in particular geographical, spatial, semistructured, and XML. Recently, the need to manage information with graph-like nature has reestablished the relevance of this area. The main objective of this survey is to present the work that has been conducted in the area of graph database modeling, concentrating on data structures, query languages, and integrity constraints.

1,669 citations

References
More filters
Journal ArticleDOI
TL;DR: This survey describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.
Abstract: Database management systems will continue to manage large data volumes. Thus, efficient algorithms for accessing and manipulating large sets and sequences will be required to provide acceptable performance. The advent of object-oriented and extensible database systems will not solve this problem. On the contrary, modern data models exacerbate the problem: In order to manipulate large sets of complex objects as efficiently as today's database systems manipulate simple records, query-processing algorithms and software will become more complex, and a solid understanding of algorithm and architectural issues is essential for the designer of database management software. This survey provides a foundation for the design and implementation of query execution facilities in new database management systems. It describes a wide array of practical query evaluation techniques for both relational and postrelational database systems, including iterative execution of complex query evaluation plans, the duality of sort- and hash-based set-matching algorithms, types of parallel query execution and their implementation, and special operators for emerging database application domains.

1,427 citations

Book
01 May 1997
TL;DR: With this book, standards are defined for object management systems and this will be the foundational book for object-oriented database product.
Abstract: This book is the first of its kind and is produced as a result of the efforts by a consortium of database companies called the Object Database Management Group (ODMG). With this book, standards are defined for object management systems and this will be the foundational book for object-oriented database product.

1,231 citations

Proceedings ArticleDOI
06 Mar 1995
TL;DR: An object-based information exchange model and a corresponding query language are defined that are well suited for integration of diverse information sources and used to integrate heterogeneous bibliographic information sources.
Abstract: We address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem differs from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We define an object-based information exchange model and a corresponding query language that we believe are well suited for integration of diverse information sources. We describe how, the model and language have been used to integrate heterogeneous bibliographic information sources. We also describe two general-purpose libraries we have implemented for object exchange between clients and servers. >

894 citations

Book ChapterDOI
08 Jan 1997
TL;DR: The main purpose of the paper is to isolate the essential aspects of semistructured data, and survey some proposals of models and query languages for semi-structured data.
Abstract: The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in the systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specic interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-oriented as in a relational model or sorted-graph as in object databases. As will seen later when the notion of semi-structured data is more precisely de ned, the need for semi-structured data arises naturally in the context of data integration, even when the data sources are themselves well-structured. Although data integration is an old topic, the need to integrate a wider variety of data- formats (e.g., SGML or ASN.1 data) and data found on the Web has brought the topic of semi-structured data to the forefront of research. The main purpose of the paper is to isolate the essential aspects of semi- structured data. We also survey some proposals of models and query languages for semi-structured data. In particular, we consider recent works at Stanford U. and U. Penn on semi-structured data. In both cases, the motivation is found in the integration of heterogeneous data.

878 citations

Book
01 Jul 1993

726 citations