What future works have the authors mentioned in the paper "Ontology matching: state of the art and future challenges" ?

The authors expect that, as ontology matching technologies are becoming more mature, practitioners will increase their expectations and will want to experiment with them more intensively.

What type of patterns is the semantic verification process?

The semantic verification process examines five types of patterns, e.g., disjoint-subsumption contradiction, subsumption incompleteness.

What is the main feature of AgreementMaker?

AgreementMaker is a system comprising a wide range of automatic matchers, an extensible and modular architecture, a multi-purpose user interface, a set of evaluation strategies, and various manual, e.g., visual comparison, and semi-automatic features, e.g., user feedback [52].

How is the system able to handle large-scale ontologies?

This is often achieved through employing various ontology partitioning and anchor-based strategies, such as in Falcon, DSSim or Anchor-Flood.

What is the main difference between the two layers?

The second layer uses structural ontology properties and includes two matchers called descendants similarity inheritance (if two nodes are matched with high similarity, then the similarity between the descendants of those nodes should increase) and siblings similarity contribution (which uses the relationships between sibling concepts) [33]. •

(Open Access) Ontology Matching: State of the Art and Future Challenges (2013) | Pavel Shvaiko

Q: What are the contributions in "Ontology matching: state of the art and future challenges" ?

Is this progress significant enough to pursue further research ? To answer these questions, the authors review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. The authors conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. The authors present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field. If so, what are the particularly promising directions ?

Q: What is the first step in integrating ontologies?

The first step in integrating ontologies is matching, which identifies correspondences, namely the candidate entities to be merged or to have subsumption relationships under an integrated ontology.

HAL Id: hal-00917910

https://hal.inria.fr/hal-00917910

Submitted on 12 Dec 2013

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Ontology matching: state of the art and future

challenges

Pavel Shvaiko, Jérôme Euzenat

To cite this version:

Pavel Shvaiko, Jérôme Euzenat. Ontology matching: state of the art and future challenges. IEEE

Transactions on Knowledge and Data Engineering, Institute of Electrical and Electronics Engineers,

2013, 25 (1), pp.158-176. �10.1109/TKDE.2011.253�. �hal-00917910�

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, JANUARY 201X 1

Ontology matching:

state of the art and future challenges

Pavel Shvaiko and J

ome Euzenat

Abstract—After years of research on ontology matching, it is reasonable to consider several questions: is the ﬁeld of ontology

matching still making progress? Is this progress signiﬁcant enough to pursue further research? If so, what are the particularly

promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of

recent ontology matching evaluations. These results show a measurable improvement in the ﬁeld, the speed of which is albeit

slowing down. We conjecture that signiﬁcant improvements can be obtained only by addressing important challenges for ontology

matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most

promising tracks and to facilitate the progress of the ﬁeld.

Index Terms—Semantic heterogeneity, semantic technologies, ontology matching, ontology alignment, schema matching.

✦

1 INTRODUCTION

The progress of information and communication tech-

nologies has made available a huge amount of dis-

parate information. The problem of managing het-

erogeneity among various information resources is

increasing. For example, most of the database research

self-assessment reports recognize that the thorny

question of semantic heterogeneity, that is of handling

variations in meaning or ambiguity in entity interpre-

tation, remains open [1]. As a consequence, various

solutions have been proposed to facilitate dealing

with this situation, and speciﬁcally, to automate in-

tegration of distributed information sources. Among

these, semantic technologies have attracted particular

attention. In this paper we focus on a kind of semantic

technologies, namely, ontology matching.

An ontology typically provides a vocabulary that de-

scribes a domain of interest and a speciﬁcation of the

meaning of terms used in the vocabulary. Depending

on the precision of this speciﬁcation, the notion of on-

tology encompasses several data and conceptual mod-

els, including, sets of terms, classiﬁcations, thesauri,

database schemas, or fully axiomatized theories [2].

When several competing ontologies are used in differ-

ent applications, most often these applications cannot

immediately interoperate. In this paper we consider

ontologies expressed in OWL as a typical example of

a knowledge representation language on which most

of the issues can be illustrated. OWL is succeeding to a

large degree as a knowledge representation standard,

for instance, used for building knowledge systems.

• Pavel Shvaiko is with TasLab, Informatica Trentina SpA. Via G. Gilli

2, 38121 Trento, Italy. E-mail: pavel.shvaiko@infotn.it

• J´erˆome Euzenat is with INRIA & LIG. 655 avenue de l’Europe, 38334

Saint-Ismier, France. Email: jerome.euzenat@inria.fr

However, several matching systems discussed in the

paper are able to deal with RDFS or SKOS as well.

Database schemas and ontologies share similarity

since they both provide a vocabulary of terms and

somewhat constrain the meaning of terms used in the

vocabulary. Hence, they often share similar matching

solutions [3–7]. Therefore, we discuss in this paper ap-

proaches that come from semantic web and artiﬁcial

intelligence as well as from databases.

Overcoming semantic heterogeneity is typically

achieved in two steps, namely: (i) matching entities

to determine an alignment, i.e., a set of correspon-

dences, and (ii) interpreting an alignment according

to application needs, such as data translation or query

answering. We focus only on the matching step.

Ontology matching is a solution to the semantic het-

erogeneity problem. It ﬁnds correspondences between

semantically related entities of ontologies. These cor-

respondences can be used for various tasks, such as

ontology merging, query answering, or data transla-

tion. Thus, matching ontologies enables the knowl-

edge and data expressed with respect to the matched

ontologies to interoperate [2]. Diverse solutions for

matching have been proposed in the last decades [8,

9]. Several recent surveys [10–16] and books [2, 7] have

been written on the topic

as well.

As evaluations of the recent years indicate, the

ﬁeld of ontology matching has made a measurable

improvement, the speed of which is albeit slowing

down. In order to achieve similar or better results

in the forthcoming years, actions have to be taken.

We believe this can be done through addressing

speciﬁcally promising challenges that we identify as:

(i) large-scale matching evaluation, (ii) efﬁciency of

matching techniques, (iii) matching with background

1. See http://www.ontologymatching.org for more details on the topic.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, JANUARY 201X 2

knowledge, (iv) matcher selection, combination and

tuning, (v) user involvement, (vi) explanation of

matching results, (vii) social and collaborative match-

ing, (viii) alignment management: infrastructure and

support.

This article is an expanded and updated version

of an earlier invited conference paper [17]. The ﬁrst

contribution of this work is a review of the state of

the art backed up with analytical and experimental

comparisons. Its second contribution is an in-depth

discussion of the challenges in the ﬁeld, of the recent

advances made in the areas of each of the challenges,

and an outline of potentially useful approaches to

tackle the challenges identiﬁed.

The remainder of the paper is organized as follows.

Section 2 presents the basics of ontology matching.

Section 3 outlines some ontology matching applica-

tions. Sections 4 and 5 discuss the state of the art

in ontology matching together with analytical and

experimental comparisons. Section 6 overviews the

challenges of the ﬁeld, while Sections 7–14 discuss

them in detail. Finally, Section 15 provides the major

conclusions.

2 THE ONTOLOGY MATCHING PROBLEM

In this section we ﬁrst discuss a motivating exam-

ple (§2.1) and then we provide some basics of ontol-

ogy matching (§2.2).

2.1 Motivating example

In order to illustrate the matching problem let us use

the two simple ontologies, O1 and O2, of Figure 1.

Classes are shown in rectangles with rounded corners,

e.g., in O1, Book being a specialization (subclass) of

Product, while relations are shown without the latter,

such as price being an attribute deﬁned on the integer

domain and creator being a property. Albert Camus: La

chute is a shared instance. Correspondences are shown

as thick arrows that link an entity from O1 with an

entity from O2. They are annotated with the relation

that is expressed by the correspondence: for example,

Person in O1 is less general (⊑) than Human in O2.

Assume that an e-commerce company acquires an-

other one. Technically, this acquisition requires the

integration of their information sources, and hence,

of the ontologies of these companies. The documents

or instance data of both companies are stored ac-

cording to ontologies O1 and O2, respectively. In

our example these ontologies contain subsumption

statements, property speciﬁcations and instance de-

scriptions. The ﬁrst step in integrating ontologies is

matching, which identiﬁes correspondences, namely

the candidate entities to be merged or to have sub-

sumption relationships under an integrated ontology.

Once the correspondences between two ontologies

have been determined, they may be used, for instance,

for generating query expressions that automatically

Product

Book

price

title

doi

creator

. . .

author

integer string

Person

Monograph

Essay

Literary critics

Politics

Biography

. . .

Literature

isbn

. . .

title

subject

Human

Writer

Albert Camus: La chute

⊒

⊑

O1 O2

Fig. 1: Two simple ontologies and an alignment.

translate instances of these ontologies under an inte-

grated ontology [18]. For example, the attributes with

labels title in O1 and in O2 are the candidates to be

merged, while the class with label Monograph in O2

should be subsumed by the class Product in O1.

2.2 Problem statement

There have been different formalizations of the match-

ing operation and its result [11, 14, 19–21]. We follow

the work in [2] that provided a uniﬁed account over

the previous works.

The matching operation determines an alignment A

′

for a pair of ontologies O1 and O2. Hence, given a pair

of ontologies (which can be very simple and contain

one entity each), the matching task is that of ﬁnding an

alignment between these ontologies. There are some

other parameters that can extend the deﬁnition of

matching, namely: (i) the use of an input alignment A,

which is to be extended; (ii) the matching parameters,

for instance, weights, or thresholds; and (iii) external

resources, such as common knowledge and domain

speciﬁc thesauri, see Figure 2.

matching

′

parameters

resources

Fig. 2: The ontology matching operation.

We use interchangeably the terms matching oper-

ation, thereby focussing on the input and the result;

matching task, thereby focussing on the goal and the

insertion of the task in a wider context; and matching

process, thereby focussing on its internals.

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, JANUARY 201X 3

It can be useful to speciﬁcally consider match-

ing more than two ontologies within the same pro-

cess [22], though this is out of the scope of this paper.

An alignment is a set of correspondences between

entities belonging to the matched ontologies. Align-

ments can be of various cardinalities: 1:1 (one-to-one),

1:m (one-to-many), n:1 (many-to-one) or n:m (many-

to-many).

Given two ontologies, a correspondence is a 4-uple:

hid, e

, e

, ri,

such that:

• id is an identiﬁer for the given correspondence;

• e

and e

are entities, e.g., classes and properties

of the ﬁrst and the second ontology, respectively;

• r is a relation, e.g., equivalence (=), more general

(⊒), disjointness (⊥), holding between e

and e

The correspondence hid, e

, e

, ri asserts that the

relation r holds between the ontology entities e

and

. For example, hid

7,1

, Book, Monograph, ⊒i asserts that

Book in O1 is more general (⊒) than Monograph in

O2. Correspondences have some associated metadata,

such as the correspondence author name. A frequently

used metadata element is a conﬁdence in the corre-

spondence (typically in the [0, 1] range). The higher

the conﬁdence, the higher the likelihood that the

relation holds.

3 APPLICATIONS

Ontology matching is an important operation in tra-

ditional applications, e.g., ontology evolution [23], on-

tology integration [24], data integration [25], and data

warehouses [26]. These applications are characterized

by heterogeneous models, e.g., database schemas or

ontologies, that are analyzed and matched manually

or semi-automatically at design time. In such applica-

tions, matching is a prerequisite to running the actual

system.

There are some emerging applications that can be

characterized by their dynamics, such as peer-to-

peer information sharing [27], web service compo-

sition [28], search [29], and query answering [22].

Such applications, contrary to traditional ones, re-

quire (ultimately) a run time matching operation and

take advantage of more explicit conceptual models.

A detailed description of these applications as well

as of the requirements they pose to matching can

be found in [2]. We illustrate only some of these

applications with the help of two short real-world

examples in order to facilitate the comprehension of

the forthcoming material.

Cultural heritage. A typical situation consists of hav-

ing several large thesauri, such as: Iconclass

(25.000

entities) and the Aria collection (600 terms) from the

Rijksmuseum

. The documents indexed by these the-

2. http://www.iconclass.nl/

3. http://www.rijksmuseum.nl/collectie/index.jsp?lang=en

sauri are illuminated manuscripts and masterpieces,

i.e., image data. The labels are gloss-like, i.e., sen-

tences or phrases describing the concept, since they

have to capture what is depicted on a masterpiece.

Examples of labels from Iconclass include: city-view, and

landscape with man-made constructions and earth, world as

celestial body. In contrast to Iconclass, Aria uses simple

terms as labels. Examples of these include: landscapes,

personiﬁcations and wild animals. Matching between

these thesauri (that can be performed at design time)

is required in order to enable an integrated access

to the masterpieces of both collections. Speciﬁcally,

alignments can be used as navigation links within

a multi-faceted browser to access a collection via

thesauri it was not originally indexed with [30].

Geo-information (GI). A typical situation at a ur-

ban planning department of a public administration

consists of a simple keyword-like request for a map

generation, such as: “hydrography, Trento, January 2011”.

This request is a set of terms covering spatial (Trento)

and temporal (January 2011) aspects to be addressed

while looking for a speciﬁc theme, that is of hydrogra-

phy. Handling such a request involves interpreting at

run time the user query and creating an alignment

between the relevant GI resources, such as those

having up to date (January 2011) topography and hy-

drography maps of Trento in order to ultimately com-

pose these into a single one. Technically, alignments

are used in such a setting for query expansion. For

what concerns thematic part, e.g., hydrography, stan-

dard matching technology can be widely reused [2,

32–34], while the spatial and temporal counterparts

that constitute the speciﬁcity of GI applications have

not received enough attention so far in the ontology

matching ﬁeld (with exceptions, such as [35, 36]), and

hence, this gap will have to be covered in future.

4 RECENT MATCHING SYSTEMS

We now review several state of the art matching sys-

tems (§4.1–§4.7) that appeared in the recent years and

have not been covered by the previous surveys (§1).

Among the several dozens of systems that have ap-

peared in these recent years, we selected some which

(i) have repeatedly participated to the Ontology

Alignment Evaluation Initiative (OAEI) campaigns

(see §5) in order to have a basis for comparisons and

(ii) have corresponding archival publications, hence

the complete account of these works is also available.

An overview of the considered systems is presented

in Table 1. The ﬁrst half of the table provides a general

outlook over the systems. The input column presents

the input format used by the systems, the output

column describes the cardinality of the computed

alignment (see §2.2), the GUI column shows if a

system is equipped with a graphical user interface,

4. http://oaei.ontologymatching.org

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, JANUARY 201X 4

System Input Output GUI Operation Terminological Structural Extensional Semantic

SAMBO 1:1 Ontology n-gram, Iterative structural Naive Bayes

§4.1 OWL alignments Yes merging edit distance, similarity based on over -

UMLS, WordNet is-a, part-of hierarchies documents

Falcon RDFS, 1:1 I-SUB, Structural Object

§4.2 OWL alignments - - Virtual proximities, similarity -

documents clustering, GMO

Tokenization, Rule-based

DSsim OWL, 1:1 AQUA Question Monger-Elkan, Graph similarity - fuzzy

§4.3 SKOS alignments Q/A [31] answering Jaccard, based on leaves inference

WordNet

RiMOM 1:1 Edit distance, Similarity Vector

§4.4 OWL alignments - - vector distance, propagation distance -

WordNet

Tokenization, Iterative ﬁx point

ASMOV OWL n:m - - string equality, computation, Object Rule-based

§4.5 alignments Levenstein distance, hierarchical, restriction similarity inference

WordNet, UMLS similarities

Tokenization, Internal, external

Anchor-Flood RDFS, 1:1 - - string equality, similarities; - -

§4.6 OWL alignments Winkler-based sim., iterative anchor-based

WordNet similarity propagation

XML, TF·IDF, Descendant,

AgreementMaker RDFS, n:m Yes - edit distance, sibling - -

§4.7 OWL, alignments substrings, similarities

N3 WordNet

TABLE 1: Analytical comparison of the recent matching systems.

and the operation column describes the ways in which

a system can process alignments. The second half of

the table classiﬁes the available matching methods

depending on which kind of data the algorithms work

on: strings (terminological), structure (structural), data

instances (extensional) or models (semantics). Strings

and structures are found in the ontology descriptions,

e.g., labels, comments, attributes and their types, re-

lations of entities with other entities. Instances consti-

tutes the actual population of an ontology. Models are

the result of semantic interpretation and usually use

logic reasoning to deduce correspondences. Table 1

illustrates particular matching methods employed by

the systems under consideration. Below, we discuss

these systems in more details.

4.1 SAMBO (Link

opings U.)

SAMBO is a system for matching and merging

biomedical ontologies [37]. It handles ontologies in

OWL and outputs 1:1 alignments between concepts

and relations. The system uses various similarity-

based matchers, including:

• terminological: n-gram, edit distance, comparison

of the lists of words of which the terms are

composed. The results of these matchers are

combined via a weighted sum with pre-deﬁned

weights;

• structural, through an iterative algorithm that

checks if two concepts occur in similar positions

with respect to is-a or part-of hierarchies relative

to already matched concepts, with the intuition

that the concepts under consideration are likely

to be similar as well;

• background knowledge based, using (i) a relation-

ship between the matched entities in UMLS (Uni-

ﬁed Medical Language System) [38] and (ii) a

corpus of knowledge collected from the pub-

lished literature exploited through a naive Bayes

classiﬁer.

The results produced by these matchers are com-

bined based on user-deﬁned weights. Then, ﬁltering

based on thresholds is applied to come up with an

alignment suggestion, which is further displayed to

the user for feedback (approval, rejection or modiﬁ-

cation). Once matching has been accomplished, the

system can merge the matched ontologies, compute

the consequences, check the newly created ontology

for consistency, etc. SAMBO has been subsequently

extended into a toolkit for evaluation of ontology

matching strategies, called KitAMO [39].

4.2 Falcon (Southeast U.)

Falcon is an automatic divide-and-conquer approach

to ontology matching [40]. It handles ontologies in

RDFS and OWL. It has been designed with the

goal of dealing with large ontologies (of thousands

of entities). The approach operates in three phases:

(i) partitioning ontologies, (ii) matching blocks, and

(iii) discovering alignments. The ﬁrst phase starts

with a structure-based partitioning to separate enti-

ties (classes and properties) of each ontology into a

set of small clusters. Partitioning is based on struc-

tural proximities between classes and properties, e.g.,

how closely are the classes in the hierarchies of

rdfs:subClassOf relations and on an extension of the

Rock agglomerative clustering algorithm [41]. Then it

constructs blocks out of these clusters. In the second

phase the blocks from distinct ontologies are matched

based on anchors (pairs of entities matched in ad-

vance), i.e., the more anchors are found between two

blocks, the more similar the blocks are. In turn, the

anchors are discovered by matching entities with the

help of the I-SUB string comparison technique [42].

Ontology Matching: State of the Art and Future Challenges

Figures

Citations

Ontology Matching

Smart Factory of Industry 4.0: Key Technologies, Application Case, and Challenges

Results of the ontology alignment evaluation initiative 2012

Ontology matching

Knowledge and Data Engineering for e-Learning Special Issue of IEEE Transactions on Knowledge and Data Engineering

References

WordNet: a lexical database for English

A mathematical theory of evidence

The Unified Medical Language System (UMLS): integrating biomedical terminology

A survey of approaches to automatic schema matching

Data integration: a theoretical perspective

Related Papers (5)

Ontology Matching

WordNet: a lexical database for English

A survey of approaches to automatic schema matching

A translation approach to portable ontology specifications

Results of the Ontology Alignment Evaluation Initiative

Frequently Asked Questions (12)

Q1. What are the contributions in "Ontology matching: state of the art and future challenges" ?

Q2. What future works have the authors mentioned in the paper "Ontology matching: state of the art and future challenges" ?

Q3. What type of patterns is the semantic verification process?

Q4. What are the types of entities used in the ontology?

Q5. What is the first step in integrating ontologies?

Q6. What is the main feature of AgreementMaker?

Q7. What are some emerging applications that can be characterized by their dynamics?

Q8. What is the use of alignments in a multi-faceted browser?

Q9. What is the common way to handle a request?

Q10. How is the system able to handle large-scale ontologies?

Q11. What is the main difference between the two layers?

Q12. What is the main idea of the paper?