scispace - formally typeset
Open AccessProceedings Article

Querying Heterogeneous Information Sources Using Source Descriptions

Alon Y. Levy, +2 more
- pp 251-262
Reads0
Chats0
TLDR
The Information Manifold is described, an implemented system that provides uniform access to a heterogeneous collection of more than 100 information sources, many of them on the WWW, and algorithms that use the source descriptions to prune effciently the set of information sources for a given query are described.
Abstract
We witness a rapid increase in the number of structured information sources that are available online, especially on the WWW. These sources include commercial databases on product information, stock market information, real estate, automobiles, and entertainment. We would like to use the data stored in these databases to answer complex queries that go beyond keyword searches. We face the following challenges: (1) Several information sources store interrelated data, and any query-answering system must understand the relationships between their contents. (2) Many sources are not full-featured database systems and can answer only a small set of queries over their data (for example, forms on the WWW restrict the set of queries one can (3) Since the number of sources is very large, effective techniques are needed to prune the set of information sources accessed to answer a query. (4) The details of interacting with each source vary greatly. We describe the Information Manifold, an implemented system that provides uniform access to a heterogeneous collection of more than 100 information sources, many of them on the WWW. IM tackles the above problems by providing a mechanism to describe declaratively the contents and query capabilities of available information sources. There is a clean separation between the declarative source description and the actual details of interacting with an information source. We describe algorithms that use the source descriptions to prune effciently the set of information sources for a given query and practical algorithms to generate executable query plans. The query plans we generate can inolve querying several information sources and combining their answers. We also present experimental studies that indicate that the architecture and algorithms used in the Information Manifold scale up well to several hundred information sources

read more

Citations
More filters
Proceedings ArticleDOI

Data integration: a theoretical perspective

TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Journal ArticleDOI

Data fusion

TL;DR: This article places data fusion into the greater context of data integration, precisely defines the goals of data fusion, namely, complete, concise, and consistent data, and highlights the challenges of data Fusion.
Journal ArticleDOI

Answering queries using views: A survey

TL;DR: The state of the art on the problem of answering queries using views is surveyed, the algorithms proposed to solve it are described, and the disparate works into a coherent framework are synthesized.
Journal ArticleDOI

The state of the art in distributed query processing

TL;DR: The paper presents the “textbook” architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems, and discusses different kinds of distributed systems such as client-server, middleware (multitier), and heterogeneous database systems and shows how query processing works in these systems.
Journal ArticleDOI

Information integration using logical views

TL;DR: The formal basis of information-integration techniques, which are closely related to containment algorithms for conjunctive queries and/or Datalog programs, are reviewed.
References
More filters
Journal ArticleDOI

Mediators in the architecture of future information systems

TL;DR: A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications as discussed by the authors, which simplifies, abstracts, reduces, merges, and explains data.
Proceedings ArticleDOI

Optimal implementation of conjunctive queries in relational data bases

TL;DR: It is shown that while answering conjunctive queries is NP complete (general queries are PSPACE complete), one can find an implementation that is within a constant of optimal.

The TSIMMIS project: Integration of heterogeneous information sources

TL;DR: The Tsimmis project as mentioned in this paper is a joint project between Stanford and IBM Almaden Research Center to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data.
Journal ArticleDOI

A softbot-based interface to the Internet

TL;DR: Etzioni, Lcsh, and Segal as discussed by the authors developed the Internet Softbot (software robot) which uses a UNIX shell and the World Wide Web to interact with a wide range of internet resources.