UNIVERSITY
OF TRENTO
DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY
38050 Povo – Trento (Italy), Via Sommarive 14
http://www.dit.unitn.it
A CLASSIFICATION OF SCHEMA-BASED MATCHING
APPROACHES
Pavel Shvaiko
August 2004
Technical Report # DIT-04-093
Also: in Proceedings of the Meaning Coordination and Negotiation
workshop at ISWC'04
.
A Classification of Schema-Based Matching
Approach es
Pavel Shvaiko
University of Trento, Povo, Trento, Italy
pavel@dit.unitn.it
Abstract. Schema/ontology matching is a critical problem in many ap-
plication domains, such as, semantic w eb, schema/ontology integration,
data warehouses, e-commerce, catalog matching, etc. Many diverse so-
lutions to the matching problem have been proposed so far. In this pa-
per w e present a taxonomy of schema-based matching techniques that
builds on the previous work on classifying schema match ing approaches.
Some innovations are in introducing new criteria which distinguish be-
tween matching techniques relying on diverse semantic clues. In partic-
ular, we distinguish between heuristic and formal techniques at schema-
level; and implicit and explicit techniques at element- and structure-level.
Based on the classification proposed we overview some of the recent
schema/ontology matching systems pointing which part of the solution
space they cover.
1 Introduction
Match is a critical operator in many well-known application domains, such as,
semantic web, schema/ontology integration, data warehouses, e-commerce, XML
message mapping, catalog matching, etc. Many solutions to the matching prob-
lem include identifying terms in one information source that ”match” terms in
another information source. The applications can be viewed as graph-like struc-
tures containing terms and their inter-relationships. These might be database
schemas, taxonomies, or ontologies, for example [14], etc. Match operator takes
two graph-like structures as input and produces a mapping between the nodes
of the graphs that correspond semantically to each other as output.
Many diverse solutions to the matching problem have been proposed so far,
for example [19, 15, 8, 21, 32, 1, 17, 23, 26, 20], etc. In this paper we focus only on
schema-based solutions, i.e., matching systems exploiting only intensional infor-
mation, not instance data. Although, there is a difference between schema and
ontology matching (alignment) problems (see next section for details), we believe
that techniques developed for each of them can be of a mutual benefit, therefore
we discuss schema and ontology matching referring as to the one problem.
With the emergence and proliferation of the semantic web, the semantics
captured in schemas/ontologies should be also handled at different levels of
details. Therefore, there is a need in distinguishing between schema/ontology
matching techniques relying on diverse semantic clues. In this paper we present
Fig. 1. Two XML schemas
a taxonomy of schema-based matching techniques that builds on the previous
work of E. Rahm and P. Bernstein on classifying schema matching approaches
[28]. Some innovations are in introducing new criteria which distinguish between
schema/ontology matching techniques relying on diverse semantic clues. In par-
ticular, we distinguish between heuristic and formal techniques at schema-level;
and implicit and explicit techniques at element- and structure-level.
The rest of the paper is organized as follows. Section 2 provides, via an exam-
ple, the basic motivations to the schema/ontology matching problem. Section 3
introduces the classification of schema-based approaches and discusses in details
possible alternatives. Section 4 overviews some of the recent schema/ontology
matching solutions in light of the classification proposed pointing which part of
the solution space they cover. Section 5 reports some conclusions.
2 The Matching Problem
2.1 Motivating Example
To motivate the matching problem, let us use two simple XML schemas that are
shown in Figure 1 and exemplify one of the possible situations which arise, for
example, when resolving a schema integration task.
Suppose an e-commerce company A1 needs to finalize a corporate acquisi-
tion of another company A2. To complete the acquisition we have to integrate
databases of the two companies. The documents of both companies are stored
accordingtoXMLschemasA1andA2respectively. Numbers in boxes are the
unique identifiers of the nodes (sometimes in the following we refer to nodes as
elements). A first step in integrating the schemas is to identify candidates to
be merged or to have taxonomic relationships under an integrated schema. This
step refers to a process of schema matching. For example, the nodes with labels
Office
Products in A1 and in A2 are the candidates to be merged, while the
node with label Digital
Cameras in A2 should be subsumed by the node with
label Photo
and Cameras in A1.
2.2 Matching: Syntactic vs. Semantic
In this paper we discuss the problem of matching schemas and ontologies from
the generic perspective i.e., we analyze information which is exploited by match-
ing systems in order to produce mappings. In this respect, ontology matching
differs substantially from schema matching in the following two (among the oth-
ers, see [25]) areas:
• Database schemas often do not provide explicit semantics for their data. Se-
mantics is usually specified explicitly at design-time, and frequently is not
becoming a part of a database specification, therefore it is not available. On-
tologies are logical systems that themselves incorporate semantics (intuitive
or formal). For example, in the case of formal semantics we can interpret
ontology definitions as a set of logical axioms.
• Ontology data models are richer (the number of primitives is higher, and
they are more complex) then schema data models. For example, OWL [30]
allows defining inverse properties, transitive properties; disjoint classes, new
classes as unions or intersections of other classes, etc.
However, ontologies can be viewed as schemas for knowledge bases. Having
defined classes and slots in the ontology, we populate the knowledge base with
instance data [25]. Thus, techniques developed for each separate problem can
be of interest to each other. On the one side, schema matching is usually per-
formed with the help of heuristic techniques trying to guess semantics encoded
in the schemas. On the other side, ontology matching systems (primarily) try
to exploit knowledge explicitly encodedintheontologies.Inreal-worldappli-
cations, schemas/ontologies usually have both well defined and obscure labels
(terms), and contexts they occur, therefore, solutions from both problems would
be mutually beneficial.
Apart from the information that matching systems exploit, the other im-
portant dimension of schema/ontology matching is a form of the result they
produce. Based on these criteria, following the proposal first introduced in [11],
schema/ontology matching systems can be viewed as syntactic and semantic
matching systems. Syntactic matching approaches do not analyze term mean-
ing, and thus semantics, directly. In these approaches semantic correspondences
are determined using (i) syntactic similarity measures, usually in [0,1] range, for
example, with the help of similarity coefficients [19, 10] or confidence measures
[32]; and (ii) syntax driven techniques, for instance techniques, which consider
labels as strings, etc., see [21, 19, 15]. The first key distinction of the semantic
matching approaches is that mappings are calculated between schema/ontology
elements by computing semantic relations (for example, equivalent (=) or sub-
suming elements (, ), etc., see for details [12]). The second key distinction is
that semantic relations are determined by analyzing meaning (concepts, not la-
bels as in syntactic matching) which is codified in the elements and the structure
of schemas/ontologies. These ideas are schematically represented in Figure 2.
Let us define the matching problem in terms of graphs [11]. A mapping el-
ement is a 4-tuple <ID
ij
, n1
i
, n2
j
, R>, i=1,...,N1; j=1,...,N2; where ID
ij