A survey of schema-based matching approaches

doi:10.1007/11603412_5

UNIVERSITY

OF TRENTO

DEPARTMENT OF INFORMATION AND COMMUNICATION TECHNOLOGY

38050 Povo – Trento (Italy), Via Sommarive 14

http://www.dit.unitn.it

A CLASSIFICATION OF SCHEMA-BASED MATCHING

APPROACHES

Pavel Shvaiko

August 2004

Technical Report # DIT-04-093

Also: in Proceedings of the Meaning Coordination and Negotiation

workshop at ISWC'04

.

A Classiﬁcation of Schema-Based Matching

Approach es

Pavel Shvaiko

University of Trento, Povo, Trento, Italy

pavel@dit.unitn.it

Abstract. Schema/ontology matching is a critical problem in many ap-

plication domains, such as, semantic w eb, schema/ontology integration,

data warehouses, e-commerce, catalog matching, etc. Many diverse so-

lutions to the matching problem have been proposed so far. In this pa-

per w e present a taxonomy of schema-based matching techniques that

builds on the previous work on classifying schema match ing approaches.

Some innovations are in introducing new criteria which distinguish be-

tween matching techniques relying on diverse semantic clues. In partic-

ular, we distinguish between heuristic and formal techniques at schema-

level; and implicit and explicit techniques at element- and structure-level.

Based on the classiﬁcation proposed we overview some of the recent

schema/ontology matching systems pointing which part of the solution

space they cover.

1 Introduction

Match is a critical operator in many well-known application domains, such as,

semantic web, schema/ontology integration, data warehouses, e-commerce, XML

message mapping, catalog matching, etc. Many solutions to the matching prob-

lem include identifying terms in one information source that ”match” terms in

another information source. The applications can be viewed as graph-like struc-

tures containing terms and their inter-relationships. These might be database

schemas, taxonomies, or ontologies, for example [14], etc. Match operator takes

two graph-like structures as input and produces a mapping between the nodes

of the graphs that correspond semantically to each other as output.

Many diverse solutions to the matching problem have been proposed so far,

for example [19, 15, 8, 21, 32, 1, 17, 23, 26, 20], etc. In this paper we focus only on

schema-based solutions, i.e., matching systems exploiting only intensional infor-

mation, not instance data. Although, there is a diﬀerence between schema and

ontology matching (alignment) problems (see next section for details), we believe

that techniques developed for each of them can be of a mutual beneﬁt, therefore

we discuss schema and ontology matching referring as to the one problem.

With the emergence and proliferation of the semantic web, the semantics

captured in schemas/ontologies should be also handled at diﬀerent levels of

details. Therefore, there is a need in distinguishing between schema/ontology

matching techniques relying on diverse semantic clues. In this paper we present

Fig. 1. Two XML schemas

a taxonomy of schema-based matching techniques that builds on the previous

work of E. Rahm and P. Bernstein on classifying schema matching approaches

[28]. Some innovations are in introducing new criteria which distinguish between

schema/ontology matching techniques relying on diverse semantic clues. In par-

ticular, we distinguish between heuristic and formal techniques at schema-level;

and implicit and explicit techniques at element- and structure-level.

The rest of the paper is organized as follows. Section 2 provides, via an exam-

ple, the basic motivations to the schema/ontology matching problem. Section 3

introduces the classiﬁcation of schema-based approaches and discusses in details

possible alternatives. Section 4 overviews some of the recent schema/ontology

matching solutions in light of the classiﬁcation proposed pointing which part of

the solution space they cover. Section 5 reports some conclusions.

2 The Matching Problem

2.1 Motivating Example

To motivate the matching problem, let us use two simple XML schemas that are

shown in Figure 1 and exemplify one of the possible situations which arise, for

example, when resolving a schema integration task.

Suppose an e-commerce company A1 needs to ﬁnalize a corporate acquisi-

tion of another company A2. To complete the acquisition we have to integrate

databases of the two companies. The documents of both companies are stored

accordingtoXMLschemasA1andA2respectively. Numbers in boxes are the

unique identiﬁers of the nodes (sometimes in the following we refer to nodes as

elements). A ﬁrst step in integrating the schemas is to identify candidates to

be merged or to have taxonomic relationships under an integrated schema. This

step refers to a process of schema matching. For example, the nodes with labels

Office

Products in A1 and in A2 are the candidates to be merged, while the

node with label Digital

Cameras in A2 should be subsumed by the node with

label Photo

and Cameras in A1.

2.2 Matching: Syntactic vs. Semantic

In this paper we discuss the problem of matching schemas and ontologies from

the generic perspective i.e., we analyze information which is exploited by match-

ing systems in order to produce mappings. In this respect, ontology matching

diﬀers substantially from schema matching in the following two (among the oth-

ers, see [25]) areas:

• Database schemas often do not provide explicit semantics for their data. Se-

mantics is usually speciﬁed explicitly at design-time, and frequently is not

becoming a part of a database speciﬁcation, therefore it is not available. On-

tologies are logical systems that themselves incorporate semantics (intuitive

or formal). For example, in the case of formal semantics we can interpret

ontology deﬁnitions as a set of logical axioms.

• Ontology data models are richer (the number of primitives is higher, and

they are more complex) then schema data models. For example, OWL [30]

allows deﬁning inverse properties, transitive properties; disjoint classes, new

classes as unions or intersections of other classes, etc.

However, ontologies can be viewed as schemas for knowledge bases. Having

deﬁned classes and slots in the ontology, we populate the knowledge base with

instance data [25]. Thus, techniques developed for each separate problem can

be of interest to each other. On the one side, schema matching is usually per-

formed with the help of heuristic techniques trying to guess semantics encoded

in the schemas. On the other side, ontology matching systems (primarily) try

to exploit knowledge explicitly encodedintheontologies.Inreal-worldappli-

cations, schemas/ontologies usually have both well deﬁned and obscure labels

(terms), and contexts they occur, therefore, solutions from both problems would

be mutually beneﬁcial.

Apart from the information that matching systems exploit, the other im-

portant dimension of schema/ontology matching is a form of the result they

produce. Based on these criteria, following the proposal ﬁrst introduced in [11],

schema/ontology matching systems can be viewed as syntactic and semantic

matching systems. Syntactic matching approaches do not analyze term mean-

ing, and thus semantics, directly. In these approaches semantic correspondences

are determined using (i) syntactic similarity measures, usually in [0,1] range, for

example, with the help of similarity coeﬃcients [19, 10] or conﬁdence measures

[32]; and (ii) syntax driven techniques, for instance techniques, which consider

labels as strings, etc., see [21, 19, 15]. The ﬁrst key distinction of the semantic

matching approaches is that mappings are calculated between schema/ontology

elements by computing semantic relations (for example, equivalent (=) or sub-

suming elements (, ), etc., see for details [12]). The second key distinction is

that semantic relations are determined by analyzing meaning (concepts, not la-

bels as in syntactic matching) which is codiﬁed in the elements and the structure

of schemas/ontologies. These ideas are schematically represented in Figure 2.

Let us deﬁne the matching problem in terms of graphs [11]. A mapping el-

ement is a 4-tuple <ID

ij

, n1

i

, n2

j

, R>, i=1,...,N1; j=1,...,N2; where ID

ij

A survey of schema-based matching approaches

Figures

Citations

Ontology Matching

Ontology Matching: State of the Art and Future Challenges

Ontologies and the semantic web

RiMOM: A Dynamic Multistrategy Ontology Alignment Framework

Model management 2.0: manipulating richer mappings

References

WordNet: a lexical database for English

A survey of approaches to automatic schema matching

Using Information Content to Evaluate Semantic Similarity in a Taxonomy

Data integration: a theoretical perspective

Semantic Matching of Web Services Capabilities

Related Papers (5)

A survey of approaches to automatic schema matching

Generic Schema Matching with Cupid

COMA: a system for flexible combination of schema matching approaches

Similarity flooding: a versatile graph matching algorithm and its application to schema matching

Ontology Matching