scispace - formally typeset
Search or ask a question
Topic

Ontology-based data integration

About: Ontology-based data integration is a research topic. Over the lifetime, 11065 publications have been published within this topic receiving 216888 citations.


Papers
More filters
Book ChapterDOI
01 Feb 2001
TL;DR: This paper describes several languages for describing contents of data sources, the tradeoffs between them, and the associated reformulation algorithms.
Abstract: The data integration problem is to provide uniform access to multiple heterogeneous information sources available online (eg, databases on the WWW) This problem has recently received considerable attention from researches in the fields of Artificial Intelligence and Database Systems The data integration problem is complicated by the facts that (1) sources contain closely related and overlapping data, (2) data is stored in multiple data models and schemas, and (3) data sources have differing query processing capabilities A key element in a data integration system is the language used to describe the contents and capabilities of the data sources While such a language needs to be as expressive as possible, it should also enable to efficiently address the main inference problem that arises in this context: to translate a user query that is formulated over a mediated schema into a query on the local schemas This paper describes several languages for describing contents of data sources, the tradeoffs between them, and the associated reformulation algorithms

282 citations

Journal ArticleDOI
TL;DR: MASTRO is a Java tool for ontology-based data access (OBDA) developed at Sapienza Universita di Roma and at the Free University of Bozen-Bolzano that provides optimized algorithms for answering expressive queries, as well as features for intensional reasoning and consistency checking.
Abstract: In this paper we present MASTRO, a Java tool for ontology-based data access (OBDA) developed at Sapienza Universita di Roma and at the Free University of Bozen-Bolzano. MASTRO manages OBDA systems in which the ontology is specified in DL-Lite A,id, a logic of the DL-Lite family of tractable Description Logics specifically tailored to ontology-based data access, and is connected to external JDBC enabled data management systems through semantic mappings that associate SQL queries over the external data to the elements of the ontology. Advanced forms of integrity constraints, which turned out to be very useful in practical applications, are also enabled over the ontologies. Optimized algorithms for answering expressive queries are provided, as well as features for intensional reasoning and consistency checking. MASTRO provides a proprietary API, an OWLAPI compatible interface, and a plugin for the Protege 4 ontology editor. It has been successfully used in several projects carried out in collaboration with important organizations, on which we briefly comment in this paper.

282 citations

Proceedings Article
01 Jan 1999
TL;DR: The different meanings of the word “integration” are discussed, and the main characteristics of the three different processes and proposethree words to distinguish among those meanings are identified:integration, merge and use.
Abstract: The word integration has been used with different meanings in the ontology field. This article aims at clarifying the meaning of the word “integration” and presenting some of the relevant work done in integration. We identify three meanings of ontology “integration”: when building a new ontology reusing (by assembling, extending, specializing or adapting) other ontologies already available; when building an ontology by merging several ontologies into a single one that unifies all of them; when building an application using one or more ontologies. We discuss the different meanings of “integration”, identify the main characteristics of the three different processes and proposethree words to distinguish among those meanings:integration, merge and use.

276 citations

Journal ArticleDOI
TL;DR: This paper summarises ENVO’s motivation, content, structure, adoption, and governance approach.
Abstract: As biological and biomedical research increasingly reference the environmental context of the biological entities under study, the need for formalisation and standardisation of environment descriptors is growing. The Environment Ontology (ENVO; http://www.environmentontology.org) is a community-led, open project which seeks to provide an ontology for specifying a wide range of environments relevant to multiple life science disciplines and, through an open participation model, to accommodate the terminological requirements of all those needing to annotate data using ontology classes. This paper summarises ENVO’s motivation, content, structure, adoption, and governance approach. The ontology is available from http://purl.obolibrary.org/obo/envo.owl - an OBO format version is also available by switching the file suffix to “obo”.

274 citations

Proceedings ArticleDOI
09 Jun 2008
TL;DR: This paper describes the first completely self-configuring data integration system based on the new concept of a probabilistic mediated schema that is automatically created from the data sources that is able to produce high-quality answers with no human intervention.
Abstract: Data integration systems offer a uniform interface to a set of data sources. Despite recent progress, setting up and maintaining a data integration application still requires significant upfront effort of creating a mediated schema and semantic mappings from the data sources to the mediated schema. Many application contexts involving multiple data sources (e.g., the web, personal information management, enterprise intranets) do not require full integration in order to provide useful services, motivating a pay-as-you-go approach to integration. With that approach, a system starts with very few (or inaccurate) semantic mappings and these mappings are improved over time as deemed necessary.This paper describes the first completely self-configuring data integration system. The goal of our work is to investigate how advanced of a starting point we can provide a pay-as-you-go system. Our system is based on the new concept of a probabilistic mediated schema that is automatically created from the data sources. We automatically create probabilistic schema mappings between the sources and the mediated schema. We describe experiments in multiple domains, including 50-800 data sources, and show that our system is able to produce high-quality answers with no human intervention.

273 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Software development
73.8K papers, 1.4M citations
84% related
User interface
85.4K papers, 1.7M citations
84% related
Support vector machine
73.6K papers, 1.7M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202337
2022149
202111
202011
201919
201843