Answering queries using views: A survey

doi:10.1007/S007780100054

Home
/
Papers
/
Answering queries using views: A survey

Journal Article•DOI•

Answering queries using views: A survey

Alon Halevy¹•Institutions (1)

University of Washington¹

01 Dec 2001-Vol. 10, Iss: 4, pp 270-294

TL;DR: The state of the art on the problem of answering queries using views is surveyed, the algorithms proposed to solve it are described, and the disparate works into a coherent framework are synthesized.

read less

Abstract: The problem of answering queries using views is to find efficient methods of answering a query using a set of previously defined materialized views over the database, rather than accessing the database relations. The problem has recently received significant attention because of its relevance to a wide variety of data management problems. In query optimization, finding a rewriting of a query using a set of materialized views can yield a more efficient query execution plan. To support the separation of the logical and physical views of data, a storage schema can be described using views over the logical schema. As a result, finding a query execution plan that accesses the storage amounts to solving the problem of answering queries using views. Finally, the problem arises in data integration systems, where data sources can be described as precomputed views over a mediated schema. This article surveys the state of the art on the problem of answering queries using views, and synthesizes the disparate works into a coherent framework. We describe the different applications of the problem, the algorithms proposed to solve it and the relevant theoretical results.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Data integration: a theoretical perspective

[...]

Maurizio Lenzerini¹•Institutions (1)

Sapienza University of Rome¹

03 Jun 2002

TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

...read moreread less

Abstract: Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

...read moreread less

2,716 citations

Cites background from "Answering queries using views: A su..."

...Generally speaking, the problem is to compute the answer to a query based on a set of views, rather than on the raw data in the database [89, 60]....
[...]
...Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data [60, 61, 89]....
[...]
...integration system have been proposed in the literature, called local-as-view (LAV), and global-as-view (GAV), respectively [89, 60]....
[...]

Journal Article•DOI•

Scientific Workflow Management and the Kepler System

[...]

Bertram Ludäscher¹, Bertram Ludäscher², Ilkay Altintas², Chad Berkley³, Dan Higgins³, Efrat Jaeger², Matthew B. Jones³, Edward A. Lee⁴, Jing Tao², Yang Zhao⁴ - Show less +6 more•Institutions (4)

University of California, Davis¹, San Diego Supercomputer Center², University of California, Santa Barbara³, University of California, Berkeley⁴

25 Aug 2006-Concurrency and Computation: Practice and Experience

TL;DR: Kepler as mentioned in this paper is a scientific workflow system, which is currently under development across a number of scientific data management projects and is a community-driven, open source project, and always welcome related projects and new contributors to join.

...read moreread less

Abstract: Many scientific disciplines are now data and information driven, and new scientific knowledge is often gained by scientists putting together data analysis and knowledge discovery “pipelines”. A related trend is that more and more scientific communities realize the benefits of sharing their data and computational services, and are thus contributing to a distributed data and computational community infrastructure (a.k.a. “the Grid”). However, this infrastructure is only a means to an end and scientists ideally should be bothered little with its existence. The goal is for scientists to focus on development and use of what we call scientific workflows. These are networks of analytical steps that may involve, e.g., database access and querying steps, data analysis and mining steps, and many other steps including computationally intensive jobs on high performance cluster computers. In this paper we describe characteristics of and requirements for scientific workflows as identified in a number of our application projects. We then elaborate on Kepler, a particular scientific workflow system, currently under development across a number of scientific data management projects. We describe some key features of Kepler and its underlying Ptolemyii system, planned extensions, and areas of future research. Kepler is a communitydriven, open source project, and we always welcome related projects and new contributors to join.

...read moreread less

1,926 citations

Journal Article•DOI•

Data exchange: semantics and query answering

[...]

Ronald Fagin¹, Phokion G. Kolaitis², Renée J. Miller³, Lucian Popa¹•Institutions (3)

IBM¹, University of California, Santa Cruz², University of Toronto³

25 May 2005-Theoretical Computer Science

TL;DR: This paper gives an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that is called universal and shows that a universal solution has no more and no less data than required for data exchange and that it represents the entire space of possible solutions.

...read moreread less

1,221 citations

Cites background from "Answering queries using views: A su..."

...It should be noted that such incomplete specification arises naturally in many practical scenarios of data exchange (or data integration for that matter; see [18,21])....
[...]
...systems studied to date are either local-as-view (LAV) systems or global-as-view (GAV) systems [18,21,22]....
[...]

Journal Article•DOI•

The state of the art in distributed query processing

[...]

Donald Kossmann¹•Institutions (1)

University of Passau¹

01 Dec 2000-ACM Computing Surveys

TL;DR: The paper presents the “textbook” architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems, and discusses different kinds of distributed systems such as client-server, middleware (multitier), and heterogeneous database systems and shows how query processing works in these systems.

...read moreread less

Abstract: Distributed data processing is becoming a reality. Businesses want to do it for many reasons, and they often must do it in order to stay competitive. While much of the infrastructure for distributed data processing is already there (e.g., modern network technology), a number of issues make distributed data processing still a complex undertaking: (1) distributed systems can become very large, involving thousands of heterogeneous sites including PCs and mainframe server machines; (2) the state of a distributed system changes rapidly because the load of sites varies over time and new sites are added to the system; (3) legacy systems need to be integrated—such legacy systems usually have not been designed for distributed data processing and now need to interact with other (modern) systems in a distributed environment. This paper presents the state of the art of query processing for distributed database and information systems. The paper presents the “textbook” architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems. These techniques include special join techniques, techniques to exploit intraquery paralleli sm, techniques to reduce communication costs, and techniques to exploit caching and replication of data. Furthermore, the paper discusses different kinds of distributed systems such as client-server, middleware (multitier), and heterogeneous database systems, and shows how query processing works in these systems.

...read moreread less

980 citations

Book Chapter•DOI•

Data Exchange: Semantics and Query Answering

[...]

Ronald Fagin¹, Phokion G. Kolaitis², Renée J. Miller³, Lucian Popa¹•Institutions (3)

IBM¹, University of California, Santa Cruz², University of Toronto³

08 Jan 2003

TL;DR: The notion of "certain answers" in indefinite databases for the semantics for query answering in data exchange is adopted and the computational complexity of computing the certain answers in this context is investigated.

...read moreread less

Abstract: Data exchange is the problem of taking data structured under a source schema and creating an instance of a target schema that reflects the source data as accurately as possible. In this paper, we address foundational and algorithmic issues related to the semantics of data exchange and to query answering in the context of data exchange. These issues arise because, given a source instance, there may be many target instances that satisfy the constraints of the data exchange problem. We give an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that we call universal. A universal solution has no more and no less data than required for data exchange and it represents the entire space of possible solutions. We then identify fairly general, and practical, conditions that guarantee the existence of a universal solution and yield algorithms to compute a canonical universal solution efficiently. We adopt the notion of "certain answers" in indefinite databases for the semantics for query answering in data exchange. We investigate the computational complexity of computing the certain answers in this context and also study the problem of computing the certain answers of target queries by simply evaluating them on a canonical universal solution.

...read moreread less

916 citations

Cites background from "Answering queries using views: A su..."

...systems studied to date are either local-as-view (LAV) systems or global-as-view (GAV) systems [18,21,22]....
[...]
...It should be noted that such incomplete specification arises naturally in many practical scenarios of data exchange (or data integration for that matter; see [18,21])....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•

Foundations of databases

[...]

Serge Abiteboul, Richard Hull, Victor Vianu

02 Dec 1994

TL;DR: This book discusses Languages, Computability, and Complexity, and the Relational Model, which aims to clarify the role of Semantic Data Models in the development of Query Language Design.

...read moreread less

Abstract: A. ANTECHAMBER. Database Systems. The Main Principles. Functionalities. Complexity and Diversity. Past and Future. Ties with This Book. Bibliographic Notes. Theoretical Background. Some Basics. Languages, Computability, and Complexity. Basics from Logic. The Relational Model. The Structure of the Relational Model. Named versus Unnamed Perspectives. Notation. Bibliographic Notes. B. BASICS: RELATIONAL QUERY LANGUAGES. Conjunctive Queries. Getting Started. Logic-Based Perspectives. Query Composition and Views. Algebraic Perspectives. Adding Union. Bibliographic Notes. Exercises. Adding Negation: Algebra and Calculus. The Relational Algebras. Nonrecursive Datalog with Negation. The Relational Calculus. Syntactic Restrictions for Domain Independence. Aggregate Functions. Digression: Finite Representations of Infinite Databases. Bibliographic Notes. Exercises. Static Analysis and Optimization. Issues in Practical Query Optimization. Global Optimization. Static Analysis of the Relational Calculus. Computers with Acyclic Joins. Bibliographic Notes. Exercises. Notes on Practical Languages. SQL: The Structured Query Language. Query-by-Example and Microsoft Access. Confronting the Real World. Bibliographic Notes. Exercises. C. CONSTRAINTS. Functional and Join Dependency. Motivation. Functional and Key Dependencies. join and Multivalued Dependencies. The Chase. Bibliographic Notes. Exercises. Inclusion Dependency. Inclusion Dependency in Isolation. Finite versus Infinite Implication. Nonaxiomatizability of fd's + ind's. Restricted Kinds of Inclusion Dependency. Bibliographic Notes. Exercises. A Larger Perspective. A Unifying Framework. The Chase revisited. Axiomatization. An Algebraic Perspective. Bibliographic Notes. Exercises. Design and Dependencies. Semantic Data Models. Normal Forms. Universal Relation Assumption. Bibliographic Notes. Exercises. D. DATALOG AND RECURSION. Datalog. Syntax of Datalog. Model-Theoretic Semantics. Fixpoint Semantics. Proof-Theoretic Approach. Static Program Analysis. Bibliographic Notes. Exercises. Evaluation of Datalog. Seminaive Evaluation. Top-Down Techniques. Magic. Two Improvements. Bibliographic Notes. Exercises. Recursion and Negation. Algebra + While. Calculus + Fixpoint. Datalog with Negation. Equivalence. Recursion in Practical Language. Bibliographic Notes. Exercises. Negation in Datalog. The Basic Problem. Stratified Semantics. Well-Founded Semantics. Expressive Power. Negation as Failure of Brief. Bibliographic Notes. Exercises. E. EXPRESSIVENESS AND COMPLEXITY. Sizing up Languages. Queries. Complexity of Queries. Languages and Complexity. Bibliographic Notes. Exercises. First Order, Fixpoint and While. Complexity of First-Order Queries. Expressiveness of First-Order Queries. Fixpoint and While Queries. The Impact of Order. Bibliographic Notes. Exercises. Highly Expressive Languages. While(N)-while with Arithmetic. While(new)-while with New Values. While(uty)-An Untyped Extension of while. Bibliographic Notes. Exercises. F. FINALE. Incomplete Information. Warm-Up. Weak Representation Systems. Conditional Tables. The Complexity of Nulls. Other Approaches. Bibliographic Notes. Exercises. Complex Values. Complex Value Databases. The Algebra. The Caculas. Examples. Equivalence Theorems. Fixpoint and Deduction. Expressive Power and Complexity. A Practicle Query Language for Complex Values. Bibliographic Notes. Exercises. Object Databases. Informal Presentation. Formal Definition of an OODB Model. Languages for OODB Queries. Languages for Methods. Further Issues for OODB's. Bibliographic Notes. Exercises. Dynamic Aspects. Updated Languages. Transactional Schemas. Updating Views and Deductive Databases. Active Databases. Temporal Databases and Constraints. Bibliographic Notes. Exercises. Bibliography. Symbol Index. Index. 0201537710T04062001

...read moreread less

4,381 citations

"Answering queries using views: A su..." refers background in this paper

...minder of datalog notation and of conjunctive queries [Ull89, AHV95 ]....
[...]

Book•

Principles of database and knowledge-base systems

[...]

Jeffrey D. Ullman

01 Jan 1979

TL;DR: This book goes into the details of database conception and use, it tells you everything on relational databases from theory to the actual used algorithms.

...read moreread less

Abstract: This book goes into the details of database conception and use, it tells you everything on relational databases. from theory to the actual used algorithms.

...read moreread less

2,475 citations

Journal Article•DOI•

Mediators in the architecture of future information systems

[...]

Gio Wiederhold¹•Institutions (1)

Stanford University¹

01 Mar 1992-IEEE Computer

TL;DR: A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications as discussed by the authors, which simplifies, abstracts, reduces, merges, and explains data.

...read moreread less

Abstract: For single databases, primary hindrances for end-user access are the volume of data that is becoming available, the lack of abstraction, and the need to understand the representation of the data. When information is combined from multiple databases, the major concern is the mismatch encountered in information representation and structure. Intelligent and active use of information requires a class of software modules that mediate between the workstation applications and the databases. It is shown that mediation simplifies, abstracts, reduces, merges, and explains data. A mediator is a software module that exploits encoded knowledge about certain sets or subsets of data to create information for a higher layer of applications. A model of information processing and information system components is described. The mediator architecture, including mediator interfaces, sharing of mediator modules, distribution of mediators, and triggers for knowledge maintenance, are discussed. >

...read moreread less

2,441 citations

Journal Article•DOI•

Data cube: a relational aggregation operator generalizing GROUP-BY, CROSS-TAB, and SUB-TOTALS

[...]

Jim Gray¹, A. Bosworth¹, A. Lyaman¹, Hamid Pirahesh²•Institutions (2)

Microsoft¹, IBM²

26 Feb 1996

TL;DR: The data cube operator as discussed by the authors generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers.

...read moreread less

Abstract: Data analysis applications typically aggregate data across many dimensions looking for unusual patterns. The SQL aggregate functions and the GROUP BY operator produce zero-dimensional or one-dimensional answers. Applications need the N-dimensional generalization of these operators. The paper defines that operator, called the data cube or simply cube. The cube operator generalizes the histogram, cross-tabulation, roll-up, drill-down, and sub-total constructs found in most report writers. The cube treats each of the N aggregation attributes as a dimension of N-space. The aggregate of a particular set of attribute values is a point in this space. The set of points forms an N-dimensionaI cube. Super-aggregates are computed by aggregating the N-cube to lower dimensional spaces. Aggregation points are represented by an "infinite value": ALL, so the point (ALL,ALL,...,ALL, sum(*)) represents the global sum of all items. Each ALL value actually represents the set of values contributing to that aggregation.

...read moreread less

2,308 citations

Proceedings Article•DOI•

Access path selection in a relational database management system

[...]

P. Griffiths Selinger¹, Morton M. Astrahan¹, Donald D. Chamberlin¹, Raymond A. Lorie¹, T. G. Price¹ - Show less +1 more•Institutions (1)

IBM¹

30 May 1979

TL;DR: System R as mentioned in this paper is an experimental database management system developed to carry out research on the relational model of data, which chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates.

...read moreread less

Abstract: In a high level query and data manipulation language such as SQL, requests are stated non-procedurally, without reference to access paths. This paper describes how System R chooses access paths for both simple (single relation) and complex queries (such as joins), given a user specification of desired data as a boolean expression of predicates. System R is an experimental database management system developed to carry out research on the relational model of data. System R was designed and built by members of the IBM San Jose Research Laboratory.

...read moreread less

2,082 citations