Similarity flooding: a versatile graph matching algorithm and its application to schema matching
Sergey Melnik,Hector Garcia-Molina,Erhard Rahm +2 more
- pp 117-128
Reads0
Chats0
TLDR
This paper presents a matching algorithm based on a fixpoint computation that is usable across different scenarios and conducts a user study, in which the accuracy metric was used to estimate the labor savings that the users could obtain by utilizing the algorithm to obtain an initial matching.Abstract:
Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the 'accuracy' of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we illustrate how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.read more
Citations
More filters
Journal ArticleDOI
A survey of approaches to automatic schema matching
Erhard Rahm,Philip A. Bernstein +1 more
TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
Book
Ontology Matching
Jérôme Euzenat,Pavel Shvaiko +1 more
TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.
Proceedings ArticleDOI
SimRank: a measure of structural-context similarity
Glen Jeh,Jennifer Widom +1 more
TL;DR: A complementary approach, applicable in any domain with object-to-object relationships, that measures similarity of the structural context in which objects occur, based on their relationships with other objects is proposed.
Book
Mining of Massive Datasets
TL;DR: This book focuses on practical algorithms that have been used to solve key problems in data mining and which can be used on even the largest datasets, and explains the tricks of locality-sensitive hashing and stream processing algorithms for mining data that arrives too fast for exhaustive processing.
Book ChapterDOI
A survey of schema-based matching approaches
Pavel Shvaiko,Jérôme Euzenat +1 more
TL;DR: This paper presents a new classification of schema-based matching techniques that builds on the top of state of the art in both schema and ontology matching and distinguishes between approximate and exact techniques at schema-level; and syntactic, semantic, and external techniques at element- and structure-level.
References
More filters
Journal ArticleDOI
The anatomy of a large-scale hypertextual Web search engine
Sergey Brin,Lawrence Page +1 more
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin,Lawrence Page +1 more
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
Book
Randomized Algorithms
TL;DR: This book introduces the basic concepts in the design and analysis of randomized algorithms and presents basic tools such as probability theory and probabilistic analysis that are frequently used in algorithmic applications.
Journal ArticleDOI
A survey of approaches to automatic schema matching
Erhard Rahm,Philip A. Bernstein +1 more
TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
Proceedings Article
Generic Schema Matching with Cupid
TL;DR: This paper proposes a new algorithm, Cupid, that discovers mappings between schema elements based on their names, data types, constraints, and schema structure, using a broader set of techniques than past approaches.