Optimizing Queries Across Diverse Data Sources

Open AccessProceedings Article

Optimizing Queries Across Diverse Data Sources

Laura M. Haas, +3 more

- pp 276-285

Chats0

TLDR

This work presents the design of a query optimizer for Garlic, a middleware system designed to integrate data from a broad range of data sources with very different query capabilities, and describes the design and implementation.

Abstract:

Businessestoday need to interrelate data stored in diverse systems with differing capabilities, ideally via a single high-level query interface. We present the design of a query optimizer for Garlic [C 95], a middleware system designedto integrate data from a broad range of data sources with very different query capabilities. Garlic’s optimizer extends the rule-based approach of [Loh88] to work in a heterogeneous environment, by defining generic rules for the middleware and using wrapper-provided rules to encapsulate the capabilities of each data source. This approach offers great advantages in terms of plan quality, extensibility to new sources, incremental implementationof rules for new sources, and the ability to express the capabilities of a diverse set of sources. We describe the design and implementationof this optimizer, and illustrate its actions through an example.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

The state of the art in distributed query processing

Donald Kossmann

- 01 Dec 2000 -

ACM Computing Surveys

TL;DR: The paper presents the “textbook” architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems, and discusses different kinds of distributed systems such as client-server, middleware (multitier), and heterogeneous database systems and shows how query processing works in these systems.

...read moreread less

Journal ArticleDOI

Eddies: continuously adaptive query processing

Ron Avnur, +1 more

TL;DR: This paper introduces a query processing mechanism called an eddy, which continuously reorders operators in a query plan as it runs, and describes the moments of symmetry during which pipelined joins can be easily reordered, and the synchronization barriers that require inputs from different sources to be coordinated.

...read moreread less

Proceedings ArticleDOI

Reconciling schemas of disparate data sources: a machine-learning approach

AnHai Doan, +2 more

TL;DR: LSD is a system that employs and extends current machine-learning techniques to semi-automatically find semantic mappings between the source schemas and the mediated schema, and its architecture is extensible to additional learners that may exploit new kinds of information.

...read moreread less

Proceedings ArticleDOI

Extracting structured data from Web pages

Arvind Arasu, +1 more

TL;DR: This paper presents an algorithm that takes, as input, a set of template-generated pages, deduces the unknown template used to generate the pages, and extracts, as output, the values encoded in the pages.

...read moreread less

Proceedings ArticleDOI

CrowdDB: answering queries with crowdsourcing

Michael J. Franklin, +4 more

TL;DR: The design of CrowdDB is described, a major change is that the traditional closed-world assumption for query processing does not hold for human input, and important avenues for future work in the development of crowdsourced query processing systems are outlined.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

QBIC project: querying images by content, using color, texture, and shape

Carlton Wayne Niblack, +8 more

- 14 Apr 1993 -

Storage and Retrieval for Image and Vide...

TL;DR: The main algorithms for color texture, shape and sketch query that are presented, show example query results, and discuss future directions are presented.

...read moreread less

Proceedings ArticleDOI

Access path selection in a relational database management system

P. Griffiths Selinger, +4 more

TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.

...read moreread less

Proceedings Article

Querying Heterogeneous Information Sources Using Source Descriptions

Alon Y. Levy, +2 more

TL;DR: The Information Manifold is described, an implemented system that provides uniform access to a heterogeneous collection of more than 100 information sources, many of them on the WWW, and algorithms that use the source descriptions to prune effciently the set of information sources for a given query are described.

...read moreread less

Book

The object database standard: ODMG 2.0

R. G. G. Cattell, +9 more

TL;DR: With this book, standards are defined for object management systems and this will be the foundational book for object-oriented database product.

...read moreread less

Proceedings ArticleDOI

Object exchange across heterogeneous information sources

Yannis Papakonstantinou, +2 more

TL;DR: An object-based information exchange model and a corresponding query language are defined that are well suited for integration of diverse information sources and used to integrate heterogeneous bibliographic information sources.

...read moreread less

Collapse

Optimizing Queries Across Diverse Data Sources

Citations

The state of the art in distributed query processing

Eddies: continuously adaptive query processing

Reconciling schemas of disparate data sources: a machine-learning approach

Extracting structured data from Web pages

CrowdDB: answering queries with crowdsourcing

References

QBIC project: querying images by content, using color, texture, and shape

Access path selection in a relational database management system

Querying Heterogeneous Information Sources Using Source Descriptions

The object database standard: ODMG 2.0

Object exchange across heterogeneous information sources

Related Papers (5)

Querying Heterogeneous Information Sources Using Source Descriptions

Mediators in the architecture of future information systems

The TSIMMIS project: Integration of heterogeneous information sources

Access path selection in a relational database management system

Federated database systems for managing distributed, heterogeneous, and autonomous databases