Topic

Dataspaces

About: Dataspaces is a research topic. Over the lifetime, 189 publications have been published within this topic receiving 11891 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A survey of approaches to automatic schema matching

[...]

Erhard Rahm¹, Philip A. Bernstein²•Institutions (2)

Leipzig University¹, Microsoft²

01 Dec 2001

TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.

...read moreread less

Abstract: Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.

...read moreread less

3,693 citations

Proceedings Article•DOI•

Data integration: a theoretical perspective

[...]

Maurizio Lenzerini¹•Institutions (1)

Sapienza University of Rome¹

03 Jun 2002

TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

...read moreread less

Abstract: Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. The problem of designing data integration systems is important in current real world applications, and is characterized by a number of issues that are interesting from a theoretical point of view. This document presents on overview of the material to be presented in a tutorial on data integration. The tutorial is focused on some of the theoretical issues that are relevant for data integration. Special attention will be devoted to the following aspects: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.

...read moreread less

2,716 citations

Journal Article•DOI•

From databases to dataspaces: a new abstraction for information management

[...]

Michael J. Franklin¹, Alon Halevy², David Maier³•Institutions (3)

University of California, Berkeley¹, University of Washington², Portland State University³

01 Dec 2005

TL;DR: This paper proposes dataspaces and their support systems as a new agenda for data management, which encompasses much of the work going on in data management today, while posing additional research objectives.

...read moreread less

Abstract: The development of relational database management systems served to focus the data management community for decades, with spectacular results. In recent years, however, the rapidly-expanding demands of "data everywhere" have led to a field comprised of interesting and productive efforts, but without a central focus or coordinated agenda. The most acute information management challenges today stem from organizations (e.g., enterprises, government agencies, libraries, "smart" homes) relying on a large number of diverse, interrelated data sources, but having no way to manage their dataspaces in a convenient, integrated, or principled fashion. This paper proposes dataspaces and their support systems as a new agenda for data management. This agenda encompasses much of the work going on in data management today, while posing additional research objectives.

...read moreread less

723 citations

Proceedings Article•DOI•

Data integration: the teenage years

[...]

Alon Halevy¹, Anand Rajaraman, Joann J. Ordille²•Institutions (2)

Google¹, Avaya²

01 Sep 2006

650 citations

Journal Article•DOI•

Data integration with uncertainty

[...]

Xin Luna Dong¹, Alon Halevy², Cong Yu³•Institutions (3)

University of Washington¹, Google², University of Michigan³

23 Sep 2007

TL;DR: The concept of probabilistic schema mappings is introduced and it is shown that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but the author does not know what it is; by-tuple semantics assuming that the correct mapping may depend on the particular tuple in the source data.

...read moreread less

Abstract: This paper reports our first set of results on managing uncertainty in data integration. We posit that data-integration systems need to handle uncertainty at three levels, and do so in a principled fashion. First, the semantic mappings between the data sources and the mediated schema may be approximate because there may be too many of them to be created and maintained or because in some domains (e.g., bioinformatics) it is not clear what the mappings should be. Second, queries to the system may be posed with keywords rather than in a structured form. Third, the data from the sources may be extracted using information extraction techniques and so may yield imprecise data. As a first step to building such a system, we introduce the concept of probabilistic schema mappings and analyze their formal foundations. We show that there are two possible semantics for such mappings: by-table semantics assumes that there exists a correct mapping but we don't know what it is; by-tuple semantics assumes that the correct mapping may depend on the particular tuple in the source data. We present the query complexity and algorithms for answering queries in the presence of approximate schema mappings, and we describe an algorithm for efficiently computing the top-k answers to queries in such a setting.

...read moreread less

366 citations

Collapse

Network Information

Performance

Metrics

189

Papers

12,306

Citations

No. of papers in the topic in previous years
Year	Papers
2021	5
2020	11
2019	6
2018	3
2017	8
2016	10

Dataspaces

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics