scispace - formally typeset
Open AccessBook ChapterDOI

A survey of schema-based matching approaches

TLDR
This paper presents a new classification of schema-based matching techniques that builds on the top of state of the art in both schema and ontology matching and distinguishes between approximate and exact techniques at schema-level; and syntactic, semantic, and external techniques at element- and structure-level.
Abstract
Schema and ontology matching is a critical problem in many application domains, such as semantic web, schema/ontology integration, data warehouses, e-commerce, etc. Many different matching solutions have been proposed so far. In this paper we present a new classification of schema-based matching techniques that builds on the top of state of the art in both schema and ontology matching. Some innovations are in introducing new criteria which are based on (i) general properties of matching techniques, (ii) interpretation of input information, and (iii) the kind of input information. In particular, we distinguish between approximate and exact techniques at schema-level; and syntactic, semantic, and external techniques at element- and structure-level. Based on the classification proposed we overview some of the recent schema/ontology matching systems pointing which part of the solution space they cover. The proposed classification provides a common conceptual basis, and, hence, can be used for comparing different existing schema/ontology matching techniques and systems as well as for designing new ones, taking advantages of state of the art solutions.

read more

Content maybe subject to copyright    Report

UNIVERSITY
OF TRENTO
DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY
38050 Povo Trento (Italy), Via Sommarive 14
http://www.dit.unitn.it
A CLASSIFICATION OF SCHEMA-BASED MATCHING
APPROACHES
Pavel Shvaiko
August 2004
Technical Report # DIT-04-093
Also: in Proceedings of the Meaning Coordination and Negotiation
workshop at ISWC'04

.

A Classification of Schema-Based Matching
Approach es
Pavel Shvaiko
University of Trento, Povo, Trento, Italy
pavel@dit.unitn.it
Abstract. Schema/ontology matching is a critical problem in many ap-
plication domains, such as, semantic w eb, schema/ontology integration,
data warehouses, e-commerce, catalog matching, etc. Many diverse so-
lutions to the matching problem have been proposed so far. In this pa-
per w e present a taxonomy of schema-based matching techniques that
builds on the previous work on classifying schema match ing approaches.
Some innovations are in introducing new criteria which distinguish be-
tween matching techniques relying on diverse semantic clues. In partic-
ular, we distinguish between heuristic and formal techniques at schema-
level; and implicit and explicit techniques at element- and structure-level.
Based on the classification proposed we overview some of the recent
schema/ontology matching systems pointing which part of the solution
space they cover.
1 Introduction
Match is a critical operator in many well-known application domains, such as,
semantic web, schema/ontology integration, data warehouses, e-commerce, XML
message mapping, catalog matching, etc. Many solutions to the matching prob-
lem include identifying terms in one information source that ”match” terms in
another information source. The applications can be viewed as graph-like struc-
tures containing terms and their inter-relationships. These might be database
schemas, taxonomies, or ontologies, for example [14], etc. Match operator takes
two graph-like structures as input and produces a mapping between the nodes
of the graphs that correspond semantically to each other as output.
Many diverse solutions to the matching problem have been proposed so far,
for example [19, 15, 8, 21, 32, 1, 17, 23, 26, 20], etc. In this paper we focus only on
schema-based solutions, i.e., matching systems exploiting only intensional infor-
mation, not instance data. Although, there is a difference between schema and
ontology matching (alignment) problems (see next section for details), we believe
that techniques developed for each of them can be of a mutual benefit, therefore
we discuss schema and ontology matching referring as to the one problem.
With the emergence and proliferation of the semantic web, the semantics
captured in schemas/ontologies should be also handled at different levels of
details. Therefore, there is a need in distinguishing between schema/ontology
matching techniques relying on diverse semantic clues. In this paper we present

Fig. 1. Two XML schemas
a taxonomy of schema-based matching techniques that builds on the previous
work of E. Rahm and P. Bernstein on classifying schema matching approaches
[28]. Some innovations are in introducing new criteria which distinguish between
schema/ontology matching techniques relying on diverse semantic clues. In par-
ticular, we distinguish between heuristic and formal techniques at schema-level;
and implicit and explicit techniques at element- and structure-level.
The rest of the paper is organized as follows. Section 2 provides, via an exam-
ple, the basic motivations to the schema/ontology matching problem. Section 3
introduces the classification of schema-based approaches and discusses in details
possible alternatives. Section 4 overviews some of the recent schema/ontology
matching solutions in light of the classification proposed pointing which part of
the solution space they cover. Section 5 reports some conclusions.
2 The Matching Problem
2.1 Motivating Example
To motivate the matching problem, let us use two simple XML schemas that are
shown in Figure 1 and exemplify one of the possible situations which arise, for
example, when resolving a schema integration task.
Suppose an e-commerce company A1 needs to finalize a corporate acquisi-
tion of another company A2. To complete the acquisition we have to integrate
databases of the two companies. The documents of both companies are stored
accordingtoXMLschemasA1andA2respectively. Numbers in boxes are the
unique identifiers of the nodes (sometimes in the following we refer to nodes as
elements). A first step in integrating the schemas is to identify candidates to
be merged or to have taxonomic relationships under an integrated schema. This
step refers to a process of schema matching. For example, the nodes with labels
Office
Products in A1 and in A2 are the candidates to be merged, while the
node with label Digital
Cameras in A2 should be subsumed by the node with
label Photo
and Cameras in A1.

2.2 Matching: Syntactic vs. Semantic
In this paper we discuss the problem of matching schemas and ontologies from
the generic perspective i.e., we analyze information which is exploited by match-
ing systems in order to produce mappings. In this respect, ontology matching
differs substantially from schema matching in the following two (among the oth-
ers, see [25]) areas:
Database schemas often do not provide explicit semantics for their data. Se-
mantics is usually specified explicitly at design-time, and frequently is not
becoming a part of a database specification, therefore it is not available. On-
tologies are logical systems that themselves incorporate semantics (intuitive
or formal). For example, in the case of formal semantics we can interpret
ontology definitions as a set of logical axioms.
Ontology data models are richer (the number of primitives is higher, and
they are more complex) then schema data models. For example, OWL [30]
allows defining inverse properties, transitive properties; disjoint classes, new
classes as unions or intersections of other classes, etc.
However, ontologies can be viewed as schemas for knowledge bases. Having
defined classes and slots in the ontology, we populate the knowledge base with
instance data [25]. Thus, techniques developed for each separate problem can
be of interest to each other. On the one side, schema matching is usually per-
formed with the help of heuristic techniques trying to guess semantics encoded
in the schemas. On the other side, ontology matching systems (primarily) try
to exploit knowledge explicitly encodedintheontologies.Inreal-worldappli-
cations, schemas/ontologies usually have both well defined and obscure labels
(terms), and contexts they occur, therefore, solutions from both problems would
be mutually beneficial.
Apart from the information that matching systems exploit, the other im-
portant dimension of schema/ontology matching is a form of the result they
produce. Based on these criteria, following the proposal first introduced in [11],
schema/ontology matching systems can be viewed as syntactic and semantic
matching systems. Syntactic matching approaches do not analyze term mean-
ing, and thus semantics, directly. In these approaches semantic correspondences
are determined using (i) syntactic similarity measures, usually in [0,1] range, for
example, with the help of similarity coefficients [19, 10] or confidence measures
[32]; and (ii) syntax driven techniques, for instance techniques, which consider
labels as strings, etc., see [21, 19, 15]. The first key distinction of the semantic
matching approaches is that mappings are calculated between schema/ontology
elements by computing semantic relations (for example, equivalent (=) or sub-
suming elements (, ), etc., see for details [12]). The second key distinction is
that semantic relations are determined by analyzing meaning (concepts, not la-
bels as in syntactic matching) which is codified in the elements and the structure
of schemas/ontologies. These ideas are schematically represented in Figure 2.
Let us define the matching problem in terms of graphs [11]. A mapping el-
ement is a 4-tuple <ID
ij
, n1
i
, n2
j
, R>, i=1,...,N1; j=1,...,N2; where ID
ij

Citations
More filters
Book

Ontology Matching

TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.
Journal ArticleDOI

Ontology Matching: State of the Art and Future Challenges

TL;DR: It is conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching and presents such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.
Journal ArticleDOI

Ontologies and the semantic web

TL;DR: How ontologies provide the semantics, as explained here with the help of Harry Potter and his owl Hedwig.
Journal ArticleDOI

RiMOM: A Dynamic Multistrategy Ontology Alignment Framework

TL;DR: This paper presents a dynamic multistrategy ontology alignment framework, named RiMOM, and proposes a systematic approach to quantitatively estimate the similarity characteristics for each alignment task and a strategy selection method to automatically combine the matching strategies based on two estimated factors.
Proceedings ArticleDOI

Model management 2.0: manipulating richer mappings

TL;DR: A revised vision that differs from the original in two main respects: the operations must handle more expressive mappings, and the runtime that executes mappings should be added as an important model management component.
References
More filters
Journal ArticleDOI

WordNet: a lexical database for English

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Journal ArticleDOI

A survey of approaches to automatic schema matching

TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
Posted Content

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

TL;DR: In this article, a new measure of semantic similarity in an IS-A taxonomy based on the notion of information content is presented, and experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r < 0.90 for human subjects performing the same task).
Proceedings ArticleDOI

Data integration: a theoretical perspective

TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Book ChapterDOI

Semantic Matching of Web Services Capabilities

TL;DR: In this article, the authors propose a solution based on DAML-S, a DAMLbased language for service description, and show how service capabilities are presented in the Profile section of a DAMl-S description and how a semantic match between advertisements and requests is performed.