scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Data integration: a theoretical perspective

03 Jun 2002-pp 233-246
TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Abstract: Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

Content maybe subject to copyright    Report

Citations
More filters
Book
05 Jun 2007
TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.
Abstract: Ontologies tend to be found everywhere. They are viewed as the silver bullet for many applications, such as database integration, peer-to-peer systems, e-commerce, semantic web services, or social networks. However, in open or evolving systems, such as the semantic web, different parties would, in general, adopt different ontologies. Thus, merely using ontologies, like using XML, does not reduce heterogeneity: it just raises heterogeneity problems to a higher level. Euzenat and Shvaikos book is devoted to ontology matching as a solution to the semantic heterogeneity problem faced by computer systems. Ontology matching aims at finding correspondences between semantically related entities of different ontologies. These correspondences may stand for equivalence as well as other relations, such as consequence, subsumption, or disjointness, between ontology entities. Many different matching solutions have been proposed so far from various viewpoints, e.g., databases, information systems, and artificial intelligence. The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content. In particular, the book includes a new chapter dedicated to the methodology for performing ontology matching. It also covers emerging topics, such as data interlinking, ontology partitioning and pruning, context-based matching, matcher tuning, alignment debugging, and user involvement in matching, to mention a few. More than 100 state-of-the-art matching systems and frameworks were reviewed. With Ontology Matching, researchers and practitioners will find a reference book that presents currently available work in a uniform framework. In particular, the work and the techniques presented in this book can be equally applied to database schema matching, catalog integration, XML schema matching and other related problems. The objectives of the book include presenting (i) the state of the art and (ii) the latest research results in ontology matching by providing a systematic and detailed account of matching techniques and matching systems from theoretical, practical and application perspectives.

2,579 citations


Cites background or methods from "Data integration: a theoretical per..."

  • ...There have been different formalisations of matching and its result (Bernstein et al. 2000; Lenzerini 2002; Kalfoglou and Schorlemmer 2003b; Bouquet et al. 2004a; Zimmermann et al. 2006; Bellahsene et al. 2011)....

    [...]

  • ...In particular, it is a GLAV-like framework (Lenzerini 2002) where the alignment is defined as uncertain rules in probabilistic Datalog....

    [...]

  • ...Query answering is then performed by using these correspondences (mappings) within the Local-as-View (LAV), Global-as-View (GAV), or Global-Local-as-View (GLAV) settings (Lenzerini 2002)....

    [...]

  • ...Finally, as noticed in (Lenzerini 2002), the main task in these applications is to establish mappings, i....

    [...]

  • ...Finally, as noticed in (Lenzerini 2002), the main task in these applications is to establish mappings, i.e., to perform the matching operation....

    [...]

Journal ArticleDOI
TL;DR: The background and state-of-the-art of big data are reviewed, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid, as well as related technologies.
Abstract: In this paper, we review the background and state-of-the-art of big data. We first introduce the general background of big data and review related technologies, such as could computing, Internet of Things, data centers, and Hadoop. We then focus on the four phases of the value chain of big data, i.e., data generation, data acquisition, data storage, and data analysis. For each phase, we introduce the general background, discuss the technical challenges, and review the latest advances. We finally examine the several representative applications of big data, including enterprise management, Internet of Things, online social networks, medial applications, collective intelligence, and smart grid. These discussions aim to provide a comprehensive overview and big-picture to readers of this exciting area. This survey is concluded with a discussion of open problems and future directions.

2,303 citations


Cites background from "Data integration: a theoretical per..."

  • ...bination of data from different sources and provides users with a uniform view of data [66]....

    [...]

Book ChapterDOI
TL;DR: This paper presents a new classification of schema-based matching techniques that builds on the top of state of the art in both schema and ontology matching and distinguishes between approximate and exact techniques at schema-level; and syntactic, semantic, and external techniques at element- and structure-level.
Abstract: Schema and ontology matching is a critical problem in many application domains, such as semantic web, schema/ontology integration, data warehouses, e-commerce, etc. Many different matching solutions have been proposed so far. In this paper we present a new classification of schema-based matching techniques that builds on the top of state of the art in both schema and ontology matching. Some innovations are in introducing new criteria which are based on (i) general properties of matching techniques, (ii) interpretation of input information, and (iii) the kind of input information. In particular, we distinguish between approximate and exact techniques at schema-level; and syntactic, semantic, and external techniques at element- and structure-level. Based on the classification proposed we overview some of the recent schema/ontology matching systems pointing which part of the solution space they cover. The proposed classification provides a common conceptual basis, and, hence, can be used for comparing different existing schema/ontology matching techniques and systems as well as for designing new ones, taking advantages of state of the art solutions.

1,285 citations


Cites methods from "Data integration: a theoretical per..."

  • ...Having identified the relationships between schemas, next step is to use these relationships for the purpose of query answering, for example, using techniques applied in data integration systems, namely Local-as-View (LAV), Global-as-View (GAV), or Global-Local-as-View (GLAV) [43]....

    [...]

Journal ArticleDOI
TL;DR: This paper gives an algebraic specification that selects, among all solutions to the data exchange problem, a special class of solutions that is called universal and shows that a universal solution has no more and no less data than required for data exchange and that it represents the entire space of possible solutions.

1,221 citations


Cites background or methods from "Data integration: a theoretical per..."

  • ...Note that data exchange settings with tgds as source-to-target dependencies include as special cases both LAV and GAV data integration systems in which the views are sound [Len02] and are defined by conjunctive queries....

    [...]

  • ...Following the terminology and notation in the recent overview [Len02], a data integration system is a triple 〈G,S,M〉, where G is the global schema, S is the source schema, and M is a set of assertions relating elements of the global schema with elements of the source schema....

    [...]

Journal ArticleDOI
TL;DR: It is conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching and presents such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.
Abstract: After years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology matching still making progress? Is this progress significant enough to pursue further research? If so, what are the particularly promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.

1,215 citations


Cites methods from "Data integration: a theoretical per..."

  • ...There are some other parameters that can extend the definition of matching, namely: (i) the use of an input alignment A, which is to be extended; (ii) the matching parameters, for instance, weights, or thresholds; and (iii) external resources, such as common knowledge and domain specific thesauri, see Figure 2....

    [...]

  • ...For instance, Figure 6 shows two entities from the Agrovoc8 and NAL9 thesauri that had to be matched in the food test case of OAEI-2007....

    [...]

  • ...A comparison of the results in 2007–2010 for the top-3 systems of each year based on the highest F-measure is shown in Figure 5....

    [...]

  • ...A comparative summary of the best results of OAEI on the benchmarks is shown in Figure 3. edna is a simple edit distance algorithm on labels, which is used as a baseline....

    [...]

  • ...To this end, it is vital to identify the minimal background knowledge necessary, e.g., a part of TAP in the example of Figure 6, to resolve a particular problem with sufficiently good results....

    [...]

References
More filters
BookDOI
01 Jan 2003
TL;DR: The Description Logic Handbook as mentioned in this paper provides a thorough account of the subject, covering all aspects of research in this field, namely: theory, implementation, and applications, and can also be used for self-study or as a reference for knowledge representation and artificial intelligence courses.
Abstract: Description logics are embodied in several knowledge-based systems and are used to develop various real-life applications. Now in paperback, The Description Logic Handbook provides a thorough account of the subject, covering all aspects of research in this field, namely: theory, implementation, and applications. Its appeal will be broad, ranging from more theoretically oriented readers, to those with more practically oriented interests who need a sound and modern understanding of knowledge representation systems based on description logics. As well as general revision throughout the book, this new edition presents a new chapter on ontology languages for the semantic web, an area of great importance for the future development of the web. In sum, the book will serve as a unique resource for the subject, and can also be used for self-study or as a reference for knowledge representation and artificial intelligence courses.

5,644 citations

Book
01 Jan 1984
TL;DR: This is the second edition of an account of the mathematical foundations of logic programming, which collects, in a unified and comprehensive manner, the basic theoretical results of the field, which have previously only been available in widely scattered research papers.
Abstract: This is the second edition of an account of the mathematical foundations of logic programming. Its purpose is to collect, in a unified and comprehensive manner, the basic theoretical results of the field, which have previously only been available in widely scattered research papers. In addition to presenting the technical results, the book also contains many illustrative examples and problems. The text is intended to be self-contained, the only prerequisites being some familiarity with PROLOG and knowledge of some basic undergraduate mathematics. The material is suitable either as a reference book for researchers or as a textbook for a graduate course on the theoretical aspects of logic programming and deductive database systems.

4,500 citations


"Data integration: a theoretical per..." refers background in this paper

  • ...Such an expansion is performed by viewing each foreign key constraint r1[X] i r2[Y], where X and Y are sets of h attributes and Y is a key for r2, as a logic programming [ 77 ] rule r 0 2( ~...

    [...]

Book
02 Dec 1994
TL;DR: This book discusses Languages, Computability, and Complexity, and the Relational Model, which aims to clarify the role of Semantic Data Models in the development of Query Language Design.
Abstract: A. ANTECHAMBER. Database Systems. The Main Principles. Functionalities. Complexity and Diversity. Past and Future. Ties with This Book. Bibliographic Notes. Theoretical Background. Some Basics. Languages, Computability, and Complexity. Basics from Logic. The Relational Model. The Structure of the Relational Model. Named versus Unnamed Perspectives. Notation. Bibliographic Notes. B. BASICS: RELATIONAL QUERY LANGUAGES. Conjunctive Queries. Getting Started. Logic-Based Perspectives. Query Composition and Views. Algebraic Perspectives. Adding Union. Bibliographic Notes. Exercises. Adding Negation: Algebra and Calculus. The Relational Algebras. Nonrecursive Datalog with Negation. The Relational Calculus. Syntactic Restrictions for Domain Independence. Aggregate Functions. Digression: Finite Representations of Infinite Databases. Bibliographic Notes. Exercises. Static Analysis and Optimization. Issues in Practical Query Optimization. Global Optimization. Static Analysis of the Relational Calculus. Computers with Acyclic Joins. Bibliographic Notes. Exercises. Notes on Practical Languages. SQL: The Structured Query Language. Query-by-Example and Microsoft Access. Confronting the Real World. Bibliographic Notes. Exercises. C. CONSTRAINTS. Functional and Join Dependency. Motivation. Functional and Key Dependencies. join and Multivalued Dependencies. The Chase. Bibliographic Notes. Exercises. Inclusion Dependency. Inclusion Dependency in Isolation. Finite versus Infinite Implication. Nonaxiomatizability of fd's + ind's. Restricted Kinds of Inclusion Dependency. Bibliographic Notes. Exercises. A Larger Perspective. A Unifying Framework. The Chase revisited. Axiomatization. An Algebraic Perspective. Bibliographic Notes. Exercises. Design and Dependencies. Semantic Data Models. Normal Forms. Universal Relation Assumption. Bibliographic Notes. Exercises. D. DATALOG AND RECURSION. Datalog. Syntax of Datalog. Model-Theoretic Semantics. Fixpoint Semantics. Proof-Theoretic Approach. Static Program Analysis. Bibliographic Notes. Exercises. Evaluation of Datalog. Seminaive Evaluation. Top-Down Techniques. Magic. Two Improvements. Bibliographic Notes. Exercises. Recursion and Negation. Algebra + While. Calculus + Fixpoint. Datalog with Negation. Equivalence. Recursion in Practical Language. Bibliographic Notes. Exercises. Negation in Datalog. The Basic Problem. Stratified Semantics. Well-Founded Semantics. Expressive Power. Negation as Failure of Brief. Bibliographic Notes. Exercises. E. EXPRESSIVENESS AND COMPLEXITY. Sizing up Languages. Queries. Complexity of Queries. Languages and Complexity. Bibliographic Notes. Exercises. First Order, Fixpoint and While. Complexity of First-Order Queries. Expressiveness of First-Order Queries. Fixpoint and While Queries. The Impact of Order. Bibliographic Notes. Exercises. Highly Expressive Languages. While(N)-while with Arithmetic. While(new)-while with New Values. While(uty)-An Untyped Extension of while. Bibliographic Notes. Exercises. F. FINALE. Incomplete Information. Warm-Up. Weak Representation Systems. Conditional Tables. The Complexity of Nulls. Other Approaches. Bibliographic Notes. Exercises. Complex Values. Complex Value Databases. The Algebra. The Caculas. Examples. Equivalence Theorems. Fixpoint and Deduction. Expressive Power and Complexity. A Practicle Query Language for Complex Values. Bibliographic Notes. Exercises. Object Databases. Informal Presentation. Formal Definition of an OODB Model. Languages for OODB Queries. Languages for Methods. Further Issues for OODB's. Bibliographic Notes. Exercises. Dynamic Aspects. Updated Languages. Transactional Schemas. Updating Views and Deductive Databases. Active Databases. Temporal Databases and Constraints. Bibliographic Notes. Exercises. Bibliography. Symbol Index. Index. 0201537710T04062001

4,381 citations


"Data integration: a theoretical per..." refers background in this paper

  • ...I nd ee d, A bi te bo ul an d D us ch ka [2 ]s ho w ed...

    [...]

  • ...es ta bl is he d in [2 ]....

    [...]

  • ...of w ea kl y re cu rs iv e IL O G [1 2] ,e ve n th ou gh th e la tte r is no t di re ct ly re la te d to de pe n-...

    [...]

  • ...I n [2 ], it w as cl ai m ed th at in th e L A V se tti ng an d w ith co nj un ct iv e...

    [...]

  • ...co nj un ct iv e qu er ie s w ith in eq ua lit ie s is a co N P -h ar d pr ob le m [2 ]....

    [...]

Journal ArticleDOI
01 Dec 2001
TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
Abstract: Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.

3,693 citations


"Data integration: a theoretical per..." refers background in this paper

  • ...• How to build an appropriate global schema, and how to discover inter-schema [31] and mapping assertions (LAV or GAV) in the design of a data integration system (see, for instance, [83])....

    [...]

Journal ArticleDOI

[...]

2,428 citations

Trending Questions (1)
What is data integration?

Data integration is the process of merging data from different sources to provide a unified view, focusing on theoretical aspects like modeling, query processing, handling inconsistencies, and query reasoning.