scispace - formally typeset
Open AccessJournal ArticleDOI

Ontology Matching: State of the Art and Future Challenges

TLDR
It is conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching and presents such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.
Abstract
After years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology matching still making progress? Is this progress significant enough to pursue further research? If so, what are the particularly promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field.

read more

Content maybe subject to copyright    Report

HAL Id: hal-00917910
https://hal.inria.fr/hal-00917910
Submitted on 12 Dec 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Ontology matching: state of the art and future
challenges
Pavel Shvaiko, Jérôme Euzenat
To cite this version:
Pavel Shvaiko, Jérôme Euzenat. Ontology matching: state of the art and future challenges. IEEE
Transactions on Knowledge and Data Engineering, Institute of Electrical and Electronics Engineers,
2013, 25 (1), pp.158-176. �10.1109/TKDE.2011.253�. �hal-00917910�

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, JANUARY 201X 1
Ontology matching:
state of the art and future challenges
Pavel Shvaiko and J
´
er
ˆ
ome Euzenat
Abstract—After years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology
matching still making progress? Is this progress significant enough to pursue further research? If so, what are the particularly
promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of
recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit
slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology
matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most
promising tracks and to facilitate the progress of the field.
Index Terms—Semantic heterogeneity, semantic technologies, ontology matching, ontology alignment, schema matching.
1 INTRODUCTION
The progress of information and communication tech-
nologies has made available a huge amount of dis-
parate information. The problem of managing het-
erogeneity among various information resources is
increasing. For example, most of the database research
self-assessment reports recognize that the thorny
question of semantic heterogeneity, that is of handling
variations in meaning or ambiguity in entity interpre-
tation, remains open [1]. As a consequence, various
solutions have been proposed to facilitate dealing
with this situation, and specifically, to automate in-
tegration of distributed information sources. Among
these, semantic technologies have attracted particular
attention. In this paper we focus on a kind of semantic
technologies, namely, ontology matching.
An ontology typically provides a vocabulary that de-
scribes a domain of interest and a specification of the
meaning of terms used in the vocabulary. Depending
on the precision of this specification, the notion of on-
tology encompasses several data and conceptual mod-
els, including, sets of terms, classifications, thesauri,
database schemas, or fully axiomatized theories [2].
When several competing ontologies are used in differ-
ent applications, most often these applications cannot
immediately interoperate. In this paper we consider
ontologies expressed in OWL as a typical example of
a knowledge representation language on which most
of the issues can be illustrated. OWL is succeeding to a
large degree as a knowledge representation standard,
for instance, used for building knowledge systems.
Pavel Shvaiko is with TasLab, Informatica Trentina SpA. Via G. Gilli
2, 38121 Trento, Italy. E-mail: pavel.shvaiko@infotn.it
erˆome Euzenat is with INRIA & LIG. 655 avenue de l’Europe, 38334
Saint-Ismier, France. Email: jerome.euzenat@inria.fr
However, several matching systems discussed in the
paper are able to deal with RDFS or SKOS as well.
Database schemas and ontologies share similarity
since they both provide a vocabulary of terms and
somewhat constrain the meaning of terms used in the
vocabulary. Hence, they often share similar matching
solutions [3–7]. Therefore, we discuss in this paper ap-
proaches that come from semantic web and artificial
intelligence as well as from databases.
Overcoming semantic heterogeneity is typically
achieved in two steps, namely: (i) matching entities
to determine an alignment, i.e., a set of correspon-
dences, and (ii) interpreting an alignment according
to application needs, such as data translation or query
answering. We focus only on the matching step.
Ontology matching is a solution to the semantic het-
erogeneity problem. It finds correspondences between
semantically related entities of ontologies. These cor-
respondences can be used for various tasks, such as
ontology merging, query answering, or data transla-
tion. Thus, matching ontologies enables the knowl-
edge and data expressed with respect to the matched
ontologies to interoperate [2]. Diverse solutions for
matching have been proposed in the last decades [8,
9]. Several recent surveys [10–16] and books [2, 7] have
been written on the topic
1
as well.
As evaluations of the recent years indicate, the
field of ontology matching has made a measurable
improvement, the speed of which is albeit slowing
down. In order to achieve similar or better results
in the forthcoming years, actions have to be taken.
We believe this can be done through addressing
specifically promising challenges that we identify as:
(i) large-scale matching evaluation, (ii) efficiency of
matching techniques, (iii) matching with background
1. See http://www.ontologymatching.org for more details on the topic.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, JANUARY 201X 2
knowledge, (iv) matcher selection, combination and
tuning, (v) user involvement, (vi) explanation of
matching results, (vii) social and collaborative match-
ing, (viii) alignment management: infrastructure and
support.
This article is an expanded and updated version
of an earlier invited conference paper [17]. The first
contribution of this work is a review of the state of
the art backed up with analytical and experimental
comparisons. Its second contribution is an in-depth
discussion of the challenges in the field, of the recent
advances made in the areas of each of the challenges,
and an outline of potentially useful approaches to
tackle the challenges identified.
The remainder of the paper is organized as follows.
Section 2 presents the basics of ontology matching.
Section 3 outlines some ontology matching applica-
tions. Sections 4 and 5 discuss the state of the art
in ontology matching together with analytical and
experimental comparisons. Section 6 overviews the
challenges of the field, while Sections 7–14 discuss
them in detail. Finally, Section 15 provides the major
conclusions.
2 THE ONTOLOGY MATCHING PROBLEM
In this section we first discuss a motivating exam-
ple (§2.1) and then we provide some basics of ontol-
ogy matching (§2.2).
2.1 Motivating example
In order to illustrate the matching problem let us use
the two simple ontologies, O1 and O2, of Figure 1.
Classes are shown in rectangles with rounded corners,
e.g., in O1, Book being a specialization (subclass) of
Product, while relations are shown without the latter,
such as price being an attribute defined on the integer
domain and creator being a property. Albert Camus: La
chute is a shared instance. Correspondences are shown
as thick arrows that link an entity from O1 with an
entity from O2. They are annotated with the relation
that is expressed by the correspondence: for example,
Person in O1 is less general () than Human in O2.
Assume that an e-commerce company acquires an-
other one. Technically, this acquisition requires the
integration of their information sources, and hence,
of the ontologies of these companies. The documents
or instance data of both companies are stored ac-
cording to ontologies O1 and O2, respectively. In
our example these ontologies contain subsumption
statements, property specifications and instance de-
scriptions. The first step in integrating ontologies is
matching, which identifies correspondences, namely
the candidate entities to be merged or to have sub-
sumption relationships under an integrated ontology.
Once the correspondences between two ontologies
have been determined, they may be used, for instance,
for generating query expressions that automatically
Product
Book
CD
price
title
doi
creator
. . .
author
integer string
Person
Monograph
Essay
Literary critics
Politics
Biography
. . .
Literature
isbn
. . .
title
subject
Human
Writer
Albert Camus: La chute
=
O1 O2
Fig. 1: Two simple ontologies and an alignment.
translate instances of these ontologies under an inte-
grated ontology [18]. For example, the attributes with
labels title in O1 and in O2 are the candidates to be
merged, while the class with label Monograph in O2
should be subsumed by the class Product in O1.
2.2 Problem statement
There have been different formalizations of the match-
ing operation and its result [11, 14, 19–21]. We follow
the work in [2] that provided a unified account over
the previous works.
The matching operation determines an alignment A
for a pair of ontologies O1 and O2. Hence, given a pair
of ontologies (which can be very simple and contain
one entity each), the matching task is that of finding an
alignment between these ontologies. There are some
other parameters that can extend the definition of
matching, namely: (i) the use of an input alignment A,
which is to be extended; (ii) the matching parameters,
for instance, weights, or thresholds; and (iii) external
resources, such as common knowledge and domain
specific thesauri, see Figure 2.
O1
O2
A
matching
A
parameters
resources
Fig. 2: The ontology matching operation.
We use interchangeably the terms matching oper-
ation, thereby focussing on the input and the result;
matching task, thereby focussing on the goal and the
insertion of the task in a wider context; and matching
process, thereby focussing on its internals.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, JANUARY 201X 3
It can be useful to specifically consider match-
ing more than two ontologies within the same pro-
cess [22], though this is out of the scope of this paper.
An alignment is a set of correspondences between
entities belonging to the matched ontologies. Align-
ments can be of various cardinalities: 1:1 (one-to-one),
1:m (one-to-many), n:1 (many-to-one) or n:m (many-
to-many).
Given two ontologies, a correspondence is a 4-uple:
hid, e
1
, e
2
, ri,
such that:
id is an identifier for the given correspondence;
e
1
and e
2
are entities, e.g., classes and properties
of the first and the second ontology, respectively;
r is a relation, e.g., equivalence (=), more general
(), disjointness (), holding between e
1
and e
2
.
The correspondence hid, e
1
, e
2
, ri asserts that the
relation r holds between the ontology entities e
1
and
e
2
. For example, hid
7,1
, Book, Monograph, ⊒i asserts that
Book in O1 is more general () than Monograph in
O2. Correspondences have some associated metadata,
such as the correspondence author name. A frequently
used metadata element is a confidence in the corre-
spondence (typically in the [0, 1] range). The higher
the confidence, the higher the likelihood that the
relation holds.
3 APPLICATIONS
Ontology matching is an important operation in tra-
ditional applications, e.g., ontology evolution [23], on-
tology integration [24], data integration [25], and data
warehouses [26]. These applications are characterized
by heterogeneous models, e.g., database schemas or
ontologies, that are analyzed and matched manually
or semi-automatically at design time. In such applica-
tions, matching is a prerequisite to running the actual
system.
There are some emerging applications that can be
characterized by their dynamics, such as peer-to-
peer information sharing [27], web service compo-
sition [28], search [29], and query answering [22].
Such applications, contrary to traditional ones, re-
quire (ultimately) a run time matching operation and
take advantage of more explicit conceptual models.
A detailed description of these applications as well
as of the requirements they pose to matching can
be found in [2]. We illustrate only some of these
applications with the help of two short real-world
examples in order to facilitate the comprehension of
the forthcoming material.
Cultural heritage. A typical situation consists of hav-
ing several large thesauri, such as: Iconclass
2
(25.000
entities) and the Aria collection (600 terms) from the
Rijksmuseum
3
. The documents indexed by these the-
2. http://www.iconclass.nl/
3. http://www.rijksmuseum.nl/collectie/index.jsp?lang=en
sauri are illuminated manuscripts and masterpieces,
i.e., image data. The labels are gloss-like, i.e., sen-
tences or phrases describing the concept, since they
have to capture what is depicted on a masterpiece.
Examples of labels from Iconclass include: city-view, and
landscape with man-made constructions and earth, world as
celestial body. In contrast to Iconclass, Aria uses simple
terms as labels. Examples of these include: landscapes,
personifications and wild animals. Matching between
these thesauri (that can be performed at design time)
is required in order to enable an integrated access
to the masterpieces of both collections. Specifically,
alignments can be used as navigation links within
a multi-faceted browser to access a collection via
thesauri it was not originally indexed with [30].
Geo-information (GI). A typical situation at a ur-
ban planning department of a public administration
consists of a simple keyword-like request for a map
generation, such as: “hydrography, Trento, January 2011”.
This request is a set of terms covering spatial (Trento)
and temporal (January 2011) aspects to be addressed
while looking for a specific theme, that is of hydrogra-
phy. Handling such a request involves interpreting at
run time the user query and creating an alignment
between the relevant GI resources, such as those
having up to date (January 2011) topography and hy-
drography maps of Trento in order to ultimately com-
pose these into a single one. Technically, alignments
are used in such a setting for query expansion. For
what concerns thematic part, e.g., hydrography, stan-
dard matching technology can be widely reused [2,
32–34], while the spatial and temporal counterparts
that constitute the specificity of GI applications have
not received enough attention so far in the ontology
matching field (with exceptions, such as [35, 36]), and
hence, this gap will have to be covered in future.
4 RECENT MATCHING SYSTEMS
We now review several state of the art matching sys-
tems (§4.1–§4.7) that appeared in the recent years and
have not been covered by the previous surveys (§1).
Among the several dozens of systems that have ap-
peared in these recent years, we selected some which
(i) have repeatedly participated to the Ontology
Alignment Evaluation Initiative (OAEI) campaigns
4
(see §5) in order to have a basis for comparisons and
(ii) have corresponding archival publications, hence
the complete account of these works is also available.
An overview of the considered systems is presented
in Table 1. The first half of the table provides a general
outlook over the systems. The input column presents
the input format used by the systems, the output
column describes the cardinality of the computed
alignment (see §2.2), the GUI column shows if a
system is equipped with a graphical user interface,
4. http://oaei.ontologymatching.org

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, JANUARY 201X 4
System Input Output GUI Operation Terminological Structural Extensional Semantic
SAMBO 1:1 Ontology n-gram, Iterative structural Naive Bayes
§4.1 OWL alignments Yes merging edit distance, similarity based on over -
UMLS, WordNet is-a, part-of hierarchies documents
Falcon RDFS, 1:1 I-SUB, Structural Object
§4.2 OWL alignments - - Virtual proximities, similarity -
documents clustering, GMO
Tokenization, Rule-based
DSsim OWL, 1:1 AQUA Question Monger-Elkan, Graph similarity - fuzzy
§4.3 SKOS alignments Q/A [31] answering Jaccard, based on leaves inference
WordNet
RiMOM 1:1 Edit distance, Similarity Vector
§4.4 OWL alignments - - vector distance, propagation distance -
WordNet
Tokenization, Iterative fix point
ASMOV OWL n:m - - string equality, computation, Object Rule-based
§4.5 alignments Levenstein distance, hierarchical, restriction similarity inference
WordNet, UMLS similarities
Tokenization, Internal, external
Anchor-Flood RDFS, 1:1 - - string equality, similarities; - -
§4.6 OWL alignments Winkler-based sim., iterative anchor-based
WordNet similarity propagation
XML, TF·IDF, Descendant,
AgreementMaker RDFS, n:m Yes - edit distance, sibling - -
§4.7 OWL, alignments substrings, similarities
N3 WordNet
TABLE 1: Analytical comparison of the recent matching systems.
and the operation column describes the ways in which
a system can process alignments. The second half of
the table classifies the available matching methods
depending on which kind of data the algorithms work
on: strings (terminological), structure (structural), data
instances (extensional) or models (semantics). Strings
and structures are found in the ontology descriptions,
e.g., labels, comments, attributes and their types, re-
lations of entities with other entities. Instances consti-
tutes the actual population of an ontology. Models are
the result of semantic interpretation and usually use
logic reasoning to deduce correspondences. Table 1
illustrates particular matching methods employed by
the systems under consideration. Below, we discuss
these systems in more details.
4.1 SAMBO (Link
¨
opings U.)
SAMBO is a system for matching and merging
biomedical ontologies [37]. It handles ontologies in
OWL and outputs 1:1 alignments between concepts
and relations. The system uses various similarity-
based matchers, including:
terminological: n-gram, edit distance, comparison
of the lists of words of which the terms are
composed. The results of these matchers are
combined via a weighted sum with pre-defined
weights;
structural, through an iterative algorithm that
checks if two concepts occur in similar positions
with respect to is-a or part-of hierarchies relative
to already matched concepts, with the intuition
that the concepts under consideration are likely
to be similar as well;
background knowledge based, using (i) a relation-
ship between the matched entities in UMLS (Uni-
fied Medical Language System) [38] and (ii) a
corpus of knowledge collected from the pub-
lished literature exploited through a naive Bayes
classifier.
The results produced by these matchers are com-
bined based on user-defined weights. Then, filtering
based on thresholds is applied to come up with an
alignment suggestion, which is further displayed to
the user for feedback (approval, rejection or modifi-
cation). Once matching has been accomplished, the
system can merge the matched ontologies, compute
the consequences, check the newly created ontology
for consistency, etc. SAMBO has been subsequently
extended into a toolkit for evaluation of ontology
matching strategies, called KitAMO [39].
4.2 Falcon (Southeast U.)
Falcon is an automatic divide-and-conquer approach
to ontology matching [40]. It handles ontologies in
RDFS and OWL. It has been designed with the
goal of dealing with large ontologies (of thousands
of entities). The approach operates in three phases:
(i) partitioning ontologies, (ii) matching blocks, and
(iii) discovering alignments. The first phase starts
with a structure-based partitioning to separate enti-
ties (classes and properties) of each ontology into a
set of small clusters. Partitioning is based on struc-
tural proximities between classes and properties, e.g.,
how closely are the classes in the hierarchies of
rdfs:subClassOf relations and on an extension of the
Rock agglomerative clustering algorithm [41]. Then it
constructs blocks out of these clusters. In the second
phase the blocks from distinct ontologies are matched
based on anchors (pairs of entities matched in ad-
vance), i.e., the more anchors are found between two
blocks, the more similar the blocks are. In turn, the
anchors are discovered by matching entities with the
help of the I-SUB string comparison technique [42].

Citations
More filters
Book

Ontology Matching

TL;DR: The second edition of Ontology Matching has been thoroughly revised and updated to reflect the most recent advances in this quickly developing area, which resulted in more than 150 pages of new content.
Journal ArticleDOI

Smart Factory of Industry 4.0: Key Technologies, Application Case, and Challenges

TL;DR: A hierarchical architecture of the smart factory was proposed first, and then the key technologies were analyzed from the aspects of the physical resource layer, the network layer, and the data application layer, which showed that the overall equipment effectiveness of the equipment is significantly improved.
Journal ArticleDOI

Ontology matching

TL;DR: A literature review regarding articles on ontology matching published in the last decade serves the purpose of offering an up-to-date review of the field and showing its evolution trends.

Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering

TL;DR: In this special issue, the focus will be on the technical side, although other issues related to knowledge and data engineering for e-Iearning may also be considered.
References
More filters
Journal ArticleDOI

WordNet: a lexical database for English

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Book

A mathematical theory of evidence

Glenn Shafer
TL;DR: This book develops an alternative to the additive set functions and the rule of conditioning of the Bayesian theory: set functions that need only be what Choquet called "monotone of order of infinity." and Dempster's rule for combining such set functions.
Journal ArticleDOI

The Unified Medical Language System (UMLS): integrating biomedical terminology

TL;DR: The Unified Medical Language System is a repository of biomedical vocabularies developed by the US National Library of Medicine and includes tools for customizing the Metathesaurus (MetamorphoSys), for generating lexical variants of concept names (lvg) and for extracting UMLS concepts from text (MetaMap).
Journal ArticleDOI

A survey of approaches to automatic schema matching

TL;DR: A taxonomy is presented that distinguishes between schema-level and instance-level, element- level and structure- level, and language-based and constraint-based matchers and is intended to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component.
Proceedings ArticleDOI

Data integration: a theoretical perspective

TL;DR: The tutorial is focused on some of the theoretical issues that are relevant for data integration: modeling a data integration application, processing queries in data integration, dealing with inconsistent data sources, and reasoning on queries.
Frequently Asked Questions (12)
Q1. What are the contributions in "Ontology matching: state of the art and future challenges" ?

Is this progress significant enough to pursue further research ? To answer these questions, the authors review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. The authors conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. The authors present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field. If so, what are the particularly promising directions ? 

The authors expect that, as ontology matching technologies are becoming more mature, practitioners will increase their expectations and will want to experiment with them more intensively. 

The semantic verification process examines five types of patterns, e.g., disjoint-subsumption contradiction, subsumption incompleteness. 

Strings and structures are found in the ontology descriptions, e.g., labels, comments, attributes and their types, relations of entities with other entities. 

The first step in integrating ontologies is matching, which identifies correspondences, namely the candidate entities to be merged or to have subsumption relationships under an integrated ontology. 

AgreementMaker is a system comprising a wide range of automatic matchers, an extensible and modular architecture, a multi-purpose user interface, a set of evaluation strategies, and various manual, e.g., visual comparison, and semi-automatic features, e.g., user feedback [52]. 

There are some emerging applications that can be characterized by their dynamics, such as peer-topeer information sharing [27], web service composition [28], search [29], and query answering [22]. 

alignments can be used as navigation links within a multi-faceted browser to access a collection via thesauri it was not originally indexed with [30]. 

Handling such a request involves interpreting at run time the user query and creating an alignment between the relevant GI resources, such as those having up to date (January 2011) topography and hydrography maps of Trento in order to ultimately compose these into a single one. 

This is often achieved through employing various ontology partitioning and anchor-based strategies, such as in Falcon, DSSim or Anchor-Flood. 

The second layer uses structural ontology properties and includes two matchers called descendants similarity inheritance (if two nodes are matched with high similarity, then the similarity between the descendants of those nodes should increase) and siblings similarity contribution (which uses the relationships between sibling concepts) [33]. • 

In this paper the authors consider ontologies expressed in OWL as a typical example of a knowledge representation language on which most of the issues can be illustrated.