Data integration: a theoretical perspective

doi:10.1145/543613.543644

Home
/
Papers
/
Data integration: a theoretical perspective

Proceedings Article•DOI•

Data integration: a theoretical perspective

Maurizio Lenzerini¹•Institutions (1)

Sapienza University of Rome¹

03 Jun 2002-pp 233-246

TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

read less

Abstract: Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Book•

Ontology Matching

[...]

Jérôme Euzenat, Pavel Shvaiko

05 Jun 2007

TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.

...read moreread less

Abstract: Ontologies tend to be found everywhere. They are viewed as the silver bullet for many applications, such as database integration, peer-to-peer systems, e-commerce, semantic web services, or social networks. However, in open or evolving systems, such as the semantic web, different parties would, in general, adopt different ontologies. Thus, merely using ontologies, like using XML, does not reduce heterogeneity: it just raises heterogeneity problems to a higher level. Euzenat and Shvaikos book is devoted to ontology matching as a solution to the semantic heterogeneity problem faced by computer systems. Ontology matching aims at finding correspondences between semantically related entities of different ontologies. These correspondences may stand for equivalence as well as other relations, such as consequence, subsumption, or disjointness, between ontology entities. Many different matching solutions have been proposed so far from various viewpoints, e.g., databases, information systems, and artificial intelligence. The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content. In particular, the book includes a new chapter dedicated to the methodology for performing ontology matching. It also covers emerging topics, such as data interlinking, ontology partitioning and pruning, context-based matching, matcher tuning, alignment debugging, and user involvement in matching, to mention a few. More than 100 state-of-the-art matching systems and frameworks were reviewed. With Ontology Matching, researchers and practitioners will find a reference book that presents currently available work in a uniform framework. In particular, the work and the techniques presented in this book can be equally applied to database schema matching, catalog integration, XML schema matching and other related problems. The objectives of the book include presenting (i) the state of the art and (ii) the latest research results in ontology matching by providing a systematic and detailed account of matching techniques and matching systems from theoretical, practical and application perspectives.

...read moreread less

2,579 citations

Cites background or methods from "Data integration: a theoretical per..."

...There have been different formalisations of matching and its result (Bernstein et al. 2000; Lenzerini 2002; Kalfoglou and Schorlemmer 2003b; Bouquet et al. 2004a; Zimmermann et al. 2006; Bellahsene et al. 2011)....
[...]
...In particular, it is a GLAV-like framework (Lenzerini 2002) where the alignment is defined as uncertain rules in probabilistic Datalog....
[...]
...Query answering is then performed by using these correspondences (mappings) within the Local-as-View (LAV), Global-as-View (GAV), or Global-Local-as-View (GLAV) settings (Lenzerini 2002)....
[...]
...Finally, as noticed in (Lenzerini 2002), the main task in these applications is to establish mappings, i....
[...]
...Finally, as noticed in (Lenzerini 2002), the main task in these applications is to establish mappings, i.e., to perform the matching operation....
[...]

Journal Article•DOI•

Big Data: A Survey

[...]

Min Chen¹, Shiwen Mao², Yunhao Liu³•Institutions (3)

Huazhong University of Science and Technology¹, Auburn University², Tsinghua University³

01 Apr 2014-Mobile Networks and Applications

TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.

...read moreread less

Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

...read moreread less

2,303 citations

Cites background from "Data integration: a theoretical per..."

...bination of data from different sources and provides users with a uniform view of data [66]....
[...]

Book Chapter•DOI•

A survey of schema-based matching approaches

[...]

Pavel Shvaiko¹, Jérôme Euzenat²•Institutions (2)

University of Trento¹, French Institute for Research in Computer Science and Automation²

01 Jan 2005-Journal on Data Semantics

TL;DR: This paper presents a new classification of schema-based matching techniques that builds on the top of state of the art in both schema and ontology matching and distinguishes between approximate and exact techniques at schema-level; and syntactic, semantic, and external techniques at element- and structure-level.

...read moreread less

Abstract: Schema and ontology matching is a critical problem in many application domains, such as semantic web, schema/ontology integration, data warehouses, e-commerce, etc. Many different matching solutions have been proposed so far. In this paper we present a new classification of schema-based matching techniques that builds on the top of state of the art in both schema and ontology matching. Some innovations are in introducing new criteria which are based on (i) general properties of matching techniques, (ii) interpretation of input information, and (iii) the kind of input information. In particular, we distinguish between approximate and exact techniques at schema-level; and syntactic, semantic, and external techniques at element- and structure-level. Based on the classification proposed we overview some of the recent schema/ontology matching systems pointing which part of the solution space they cover. The proposed classification provides a common conceptual basis, and, hence, can be used for comparing different existing schema/ontology matching techniques and systems as well as for designing new ones, taking advantages of state of the art solutions.

...read moreread less

1,285 citations

Cites methods from "Data integration: a theoretical per..."

...Having identified the relationships between schemas, next step is to use these relationships for the purpose of query answering, for example, using techniques applied in data integration systems, namely Local-as-View (LAV), Global-as-View (GAV), or Global-Local-as-View (GLAV) [43]....
[...]

Journal Article•DOI•

Data exchange: semantics and query answering

[...]

Ronald Fagin¹, Phokion G. Kolaitis², Renée J. Miller³, Lucian Popa¹•Institutions (3)

IBM¹, University of California, Santa Cruz², University of Toronto³

25 May 2005-Theoretical Computer Science

TL;DR: This paper gives an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that is called universal and shows that a universal solution has no more and no less data than required for data exchange and that it represents the entire space of possible solutions.

...read moreread less

1,221 citations

Cites background or methods from "Data integration: a theoretical per..."

...Note that data exchange settings with tgds as source-to-target dependencies include as special cases both LAV and GAV data integration systems in which the views are sound [Len02] and are defined by conjunctive queries....
[...]
...Following the terminology and notation in the recent overview [Len02], a data integration system is a triple 〈G,S,M〉, where G is the global schema, S is the source schema, and M is a set of assertions relating elements of the global schema with elements of the source schema....
[...]

Journal Article•DOI•

Ontology Matching: State of the Art and Future Challenges

[...]

Pavel Shvaiko, Jérôme Euzenat¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

01 Jan 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: It is conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching and presents such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.

...read moreread less

Abstract: After years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology matching still making progress? Is this progress significant enough to pursue further research? If so, what are the particularly promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.

...read moreread less

1,215 citations

Cites methods from "Data integration: a theoretical per..."

...There are some other parameters that can extend the definition of matching, namely: (i) the use of an input alignment A, which is to be extended; (ii) the matching parameters, for instance, weights, or thresholds; and (iii) external resources, such as common knowledge and domain specific thesauri, see Figure 2....
[...]
...For instance, Figure 6 shows two entities from the Agrovoc8 and NAL9 thesauri that had to be matched in the food test case of OAEI-2007....
[...]
...A comparison of the results in 2007–2010 for the top-3 systems of each year based on the highest F-measure is shown in Figure 5....
[...]
...A comparative summary of the best results of OAEI on the benchmarks is shown in Figure 3. edna is a simple edit distance algorithm on labels, which is used as a baseline....
[...]
...To this end, it is vital to identify the minimal background knowledge necessary, e.g., a part of TAP in the example of Figure 6, to resolve a particular problem with sufficiently good results....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book•DOI•

The Description Logic Handbook: Theory, Implementation and Applications

[...]

Franz Baader¹, Diego Calvanese², Deborah L. McGuinness³, Daniele Nardi², Peter F. Patel-Schneider⁴ - Show less +1 more•Institutions (4)

Dresden University of Technology¹, Sapienza University of Rome², Stanford University³, Bell Labs⁴

01 Jan 2003

TL;DR: The Description Logic Handbook as mentioned in this paper provides a thorough account of the subject, covering all aspects of research in this field, namely: theory, implementation, and applications, and can also be used for self-study or as a reference for knowledge representation and artificial intelligence courses.

...read moreread less

Abstract: Description logics are embodied in several knowledge-based systems and are used to develop various real-life applications. Now in paperback, The Description Logic Handbook provides a thorough account of the subject, covering all aspects of research in this field, namely: theory, implementation, and applications. Its appeal will be broad, ranging from more theoretically oriented readers, to those with more practically oriented interests who need a sound and modern understanding of knowledge representation systems based on description logics. As well as general revision throughout the book, this new edition presents a new chapter on ontology languages for the semantic web, an area of great importance for the future development of the web. In sum, the book will serve as a unique resource for the subject, and can also be used for self-study or as a reference for knowledge representation and artificial intelligence courses.

...read moreread less

5,644 citations

Book•

Foundations of logic programming

[...]

John W. Lloyd¹•Institutions (1)

University of Melbourne¹

01 Jan 1984

TL;DR: This is the second edition of an account of the mathematical foundations of logic programming, which collects, in a unified and comprehensive manner, the basic theoretical results of the field, which have previously only been available in widely scattered research papers.

...read moreread less

Abstract: This is the second edition of an account of the mathematical foundations of logic programming. Its purpose is to collect, in a unified and comprehensive manner, the basic theoretical results of the field, which have previously only been available in widely scattered research papers. In addition to presenting the technical results, the book also contains many illustrative examples and problems. The text is intended to be self-contained, the only prerequisites being some familiarity with PROLOG and knowledge of some basic undergraduate mathematics. The material is suitable either as a reference book for researchers or as a textbook for a graduate course on the theoretical aspects of logic programming and deductive database systems.

...read moreread less

4,500 citations

"Data integration: a theoretical per..." refers background in this paper

...Such an expansion is performed by viewing each foreign key constraint r1[X] i r2[Y], where X and Y are sets of h attributes and Y is a key for r2, as a logic programming [ 77 ] rule r 0 2( ~...
[...]

Book•

Foundations of databases

[...]

Serge Abiteboul, Richard Hull, Victor Vianu

02 Dec 1994

TL;DR: This book discusses Languages, Computability, and Complexity, and the Relational Model, which aims to clarify the role of Semantic Data Models in the development of Query Language Design.

...read moreread less

Abstract: A. ANTECHAMBER. Database Systems. The Main Principles. Functionalities. Complexity and Diversity. Past and Future. Ties with This Book. Bibliographic Notes. Theoretical Background. Some Basics. Languages, Computability, and Complexity. Basics from Logic. The Relational Model. The Structure of the Relational Model. Named versus Unnamed Perspectives. Notation. Bibliographic Notes. B. BASICS: RELATIONAL QUERY LANGUAGES. Conjunctive Queries. Getting Started. Logic-Based Perspectives. Query Composition and Views. Algebraic Perspectives. Adding Union. Bibliographic Notes. Exercises. Adding Negation: Algebra and Calculus. The Relational Algebras. Nonrecursive Datalog with Negation. The Relational Calculus. Syntactic Restrictions for Domain Independence. Aggregate Functions. Digression: Finite Representations of Infinite Databases. Bibliographic Notes. Exercises. Static Analysis and Optimization. Issues in Practical Query Optimization. Global Optimization. Static Analysis of the Relational Calculus. Computers with Acyclic Joins. Bibliographic Notes. Exercises. Notes on Practical Languages. SQL: The Structured Query Language. Query-by-Example and Microsoft Access. Confronting the Real World. Bibliographic Notes. Exercises. C. CONSTRAINTS. Functional and Join Dependency. Motivation. Functional and Key Dependencies. join and Multivalued Dependencies. The Chase. Bibliographic Notes. Exercises. Inclusion Dependency. Inclusion Dependency in Isolation. Finite versus Infinite Implication. Nonaxiomatizability of fd's + ind's. Restricted Kinds of Inclusion Dependency. Bibliographic Notes. Exercises. A Larger Perspective. A Unifying Framework. The Chase revisited. Axiomatization. An Algebraic Perspective. Bibliographic Notes. Exercises. Design and Dependencies. Semantic Data Models. Normal Forms. Universal Relation Assumption. Bibliographic Notes. Exercises. D. DATALOG AND RECURSION. Datalog. Syntax of Datalog. Model-Theoretic Semantics. Fixpoint Semantics. Proof-Theoretic Approach. Static Program Analysis. Bibliographic Notes. Exercises. Evaluation of Datalog. Seminaive Evaluation. Top-Down Techniques. Magic. Two Improvements. Bibliographic Notes. Exercises. Recursion and Negation. Algebra + While. Calculus + Fixpoint. Datalog with Negation. Equivalence. Recursion in Practical Language. Bibliographic Notes. Exercises. Negation in Datalog. The Basic Problem. Stratified Semantics. Well-Founded Semantics. Expressive Power. Negation as Failure of Brief. Bibliographic Notes. Exercises. E. EXPRESSIVENESS AND COMPLEXITY. Sizing up Languages. Queries. Complexity of Queries. Languages and Complexity. Bibliographic Notes. Exercises. First Order, Fixpoint and While. Complexity of First-Order Queries. Expressiveness of First-Order Queries. Fixpoint and While Queries. The Impact of Order. Bibliographic Notes. Exercises. Highly Expressive Languages. While(N)-while with Arithmetic. While(new)-while with New Values. While(uty)-An Untyped Extension of while. Bibliographic Notes. Exercises. F. FINALE. Incomplete Information. Warm-Up. Weak Representation Systems. Conditional Tables. The Complexity of Nulls. Other Approaches. Bibliographic Notes. Exercises. Complex Values. Complex Value Databases. The Algebra. The Caculas. Examples. Equivalence Theorems. Fixpoint and Deduction. Expressive Power and Complexity. A Practicle Query Language for Complex Values. Bibliographic Notes. Exercises. Object Databases. Informal Presentation. Formal Definition of an OODB Model. Languages for OODB Queries. Languages for Methods. Further Issues for OODB's. Bibliographic Notes. Exercises. Dynamic Aspects. Updated Languages. Transactional Schemas. Updating Views and Deductive Databases. Active Databases. Temporal Databases and Constraints. Bibliographic Notes. Exercises. Bibliography. Symbol Index. Index. 0201537710T04062001

...read moreread less

4,381 citations

"Data integration: a theoretical per..." refers background in this paper

...I nd ee d, A bi te bo ul an d D us ch ka [2 ]s ho w ed...
[...]
...es ta bl is he d in [2 ]....
[...]
...of w ea kl y re cu rs iv e IL O G [1 2] ,e ve n th ou gh th e la tte r is no t di re ct ly re la te d to de pe n-...
[...]
...I n [2 ], it w as cl ai m ed th at in th e L A V se tti ng an d w ith co nj un ct iv e...
[...]
...co nj un ct iv e qu er ie s w ith in eq ua lit ie s is a co N P -h ar d pr ob le m [2 ]....
[...]

Journal Article•DOI•

A survey of approaches to automatic schema matching

[...]

Erhard Rahm¹, Philip A. Bernstein²•Institutions (2)

Leipzig University¹, Microsoft²

01 Dec 2001

TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.

...read moreread less

Abstract: Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.

...read moreread less

3,693 citations

"Data integration: a theoretical per..." refers background in this paper

...• How to build an appropriate global schema, and how to discover inter-schema [31] and mapping assertions (LAV or GAV) in the design of a data integration system (see, for instance, [83])....
[...]

Journal Article•DOI•

J+=j

[...]

Michael Wolfe¹•Institutions (1)

Oregon Health & Science University¹

01 Jul 1994-Sigplan Notices

2,428 citations