Open AccessBookDOI

On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE

- Vol. 2888, pp 444-461

Chats0

TLDR

In this article, a model supporting dynamic heterogeneous workflow process interconnection is proposed to co-ordinate geographically distributed business processes in order to strength awareness inside virtual enterprises, to facilitate multinational e-transactions, etc.

Abstract:

Process interconnection mechanisms are necessary to co-ordinate geographically distributed business processes in order to strength awareness inside virtual enterprises, to facilitate multinational e-transactions, etc Actually, existing business process modelling and enactment systems (workflow systems, project management tools, shared agendas, to do lists, etc) have been mainly developed to suit enterprise internal needs Thus, most of these systems are not adapted to inter-enterprise co-operation As we are interested in workflow processes, we aim, through this paper, to provide a model supporting dynamic heterogeneous workflow process interconnection We consider the interconnection of enterprise workflow processes as the management of a workflow of workflows in which several heterogeneous workflow systems coexist This paper introduces our process interconnection model, its implementation, and its validation through an experimentation

Content maybe subject to copyright Report

ZurichOpenRepositoryand

Archive

UniversityofZurich

UniversityLibrary

Strickhofstrasse39

CH-8057Zurich

www.zora.uzh.ch

Year:2003

Breakingthedeadlock

Rinaldi,Fabio;Kaljurand,K;Dowdall,J;Hess,M

Abstract:Manyoftheproposedapproachestothesemanticwebhaveasubstantialdrawback.Theyare

allbasedontheideathatwebpages(ormoregenerally,resources),willcontainsemanticannotationsthat

wouldallowremoteagentstoaccessthem. Howevertheproblemofthecreationofthoseannotationsis

seldomaddressed.Manualcreationoftheannotationsisnotafeasibleoption,exceptinafewexperimental

cases.WeproposeanapproachbasedonLanguageProcessingtechniquesthataddressesthisissue,at

leastfortextualresources(whichstillconstitutethevastmajorityofthematerialavailableontheweb).

Documentsareanalizedfullyautomaticallyandconvertedintoasemanticannotation,whichcanthen

be stored together withthe original documents.It is this annotationthat constitutes the machine

understandableresourcethatremoteagentscanquery.Asemi-automaticapproachisalsoconsidered,

inwhichthesystemsuggestscandidateannotationsandtheusersimplyhastoapproveorrejectthem.

Advantagesanddrawbacksofbothapproachesarediscussed.

DOI:https://doi.org/10.1007/b94348

PostedattheZurichOpenRepositoryandArchive,UniversityofZurich

ZORAURL:https://doi.org/10.5167/uzh-19102

ConferenceorWorkshopItem

Originallypublishedat:

Rinaldi,Fabio;Kaljurand,K;Dowdall,J;Hess,M(2003).Breakingthedeadlock. In:ODBASE,2003

(InternationalConferenceonOntologies,DatabasesandApplicationsofSEmantics),Catania,Italy,2003,

876-888.

DOI:https://doi.org/10.1007/b94348

Breaking the Deadlock

Fabio Rinaldi, Kaarel Kaljurand, James Dowdall, and Michael Hess

Institute of Computational Linguistics,

University of Z

urich,

Winterthurerstrasse 190, CH-8057 Z

urich,

Switzerland

{rinaldi}@cl.unizh.ch

Abstract. Many of the proposed approaches to the semantic web have a substan-

tial drawback. They are all based on the idea that web pages (or more generally,

resources), will contain semantic annotations that would allow remote agents to

access them. However the problem of creating these annotations is seldom ad-

dressed. Manual creation of the annotations is not a feasible option, except in a

few experimental cases.

We propose an approach based on Language Processing techniques that addresses

this issue, at least for textual resources (which still constitute the vast majority of

the material available on the web). Documents are analized fully automatically

and converted into a semantic annotation, which can then be stored together with

the original documents. It is this annotation that constitutes the machine under-

standable resource that remote agents can query. A semi-automatic approach is

also considered, in which the system suggests candidate annotations and the user

simply has to approve or reject them. Advantages and drawbacks of both ap-

proaches are discussed.

1 Introduction

The major purpose of activities in the Semantic Web area is to help users better locate,

organize, and process content, irrespective of its physical location and of the way it is

presented. Adding machine-understandable semantics to web resources will make them

processable by software agents, and ultimately make them more useful to all of us.

There is a wealth of research efforts focusing on the foundations of the semantic

web [8], and in particular on the problem of how to represent the semantic information

carried by web resources (be they structured databases or unstructured natural language

documents, or a combination of both). The XML-based Resources Description Frame-

work [14] is the standardized Semantic Web language, however it is really meant for

use by computers, not humans. The same applies to all the extensions that have been

proposed, such as RDF Schema [2], which provides a basic type system for use in RDF

models, or DAML+OIL [4], which provides a language with well-deﬁned semantics for

the speciﬁcation of Ontologies.

However, there seems to be signiﬁcantly less interest in the problem of how to help

users in the transition from conventional web pages to richly annotated semantic web

resources. The major barrier to a wider adoption of the Semantic Web proposals is a

classic deadlock problem [11]. On the one hand, signiﬁcant additional effort is required

to add semantic annotations to existing (or newly created) web resources, and people

are not willing to pay this price until they can see a clear beneﬁt for it. On the other

hand, software agents that can reap the beneﬁt of richer annotations will not be useful

(and thus there will be fewer incentives to develop them) until a “critical mass” of

semantically annotated web resources has been achieved.

Current efforts to tackle this problem seem to focus on the development of user-

friendly editors for semantic annotations: details of XML/RDF should be hidden be-

hind GUI authoring tools. Users do not need (and do not want) to get in contact with

XML/RDF. However this approach defeats the purpose of the Semantic Web vision: to

make the web more effective for users by making it machine-understandable. Instead it

makes it less effective for users: by forcing them to add machine-level markup (albeit

shielded by an effective GUI). Unless the users can see the real beneﬁt, they will not be

motivated to adopt such editors and be prepared to pay the price (in terms of additional

effort that might be required).

The beneﬁts of the semantic web should come for free to most of the users: semantic

markup should be a by-product of normal computer use. There is a real need to lower

the barrier of entry: the vast majority of the users cannot be expected to understand and

use formal ontologies. In order to achieve interoperability between software agents,

a lot of human understandability has been sacriﬁced: precise ontologies and formally

deﬁned semantics are foreign concepts to the average users.

As a very large proportion of existing web resources are represented by human-

readable documentation, we believe that a possible way to break the deadlock men-

tioned above is to start using available information extraction tools to enrich the docu-

ments with automatically generated annotations. In this paper we propose an approach

based on natural language processing (NLP) techniques, geared towards the creation of

semantic annotations, starting from the available textual documents.

One of the motivations behind the semantic web movement was that computers are

not powerful enough to process (and understand) natural language. Therefore machine

understandable information should be added to web resources. This is still true: it would

be unfeasible to process the enormous amounts of textual resources that are added to

the web every day (let alone process all the existing web content). However, it is techni-

cally possible (and practically conceivable) to have specialised editors that process (in

a transparent fashion) textual resources as the users publish them on the web, and add

semantic annotations automatically extracted from the documents. In other words, the

idea is to move the problem from the consumer of the information to the producer.

As Natural Language is the information access most users are comfortable with, we

will also discuss possible ways to access the information encoded in the semantic an-

notations. Given a user question phrased in natural language, existing tools can convert

it into the same kind of annotations as those stored in the documents. A new type of

software agent (or search engine) might then be capable of retrieving those web pages

whose annotations match those derived from the user question.

The approach presented in this paper is based on our previous work in the area

of Question Answering, resulting in the ExtrAns system [22]. Speciﬁc research in the

area of Question Answering has been promoted in the last few years in particular by

the Question Answering track of the Text REtrieval Conference (TREC-QA) competi-

Document

MLF

Generator

Document

Term

Processing

Thesaurus

Linguistic

Processing

Fig. 1. Ofﬂine analysis of documents

tions [26]. ExtrAns uses a combination of robust natural language processing technol-

ogy and dedicated terminology processing [19, 20] to create a Knowledge Base, con-

taining a semantic representation for the propositional content of the documents [23].

Our research group has been working in the area of Question Answering for a few

years, targeting different domains, such as the Aircraft Maintenance Manual (AMM) of

a large aircraft [22] or a computer manual [15].

In a recently started EU project (“Parmenides”) focusing on the integration of Infor-

mation Extraction and Data Mining techniques, we aim at exploiting the work done in

the ExtrAns system by moving from the system-speciﬁc semantic representation (Min-

imal Logical Forms) to a semantic representation based on W3C standards, like RDF.

A secondary aim might be to explore possible synergies with the standardization effort

of the ISO TC37/SC4 committe in the domain of linguistic annotations [21].

We will ﬁrst brieﬂy describe our past work resulting in the ExtrAns system (sec-

tion 2), then describe the annotations that we aim at generating automatically in the

Parmenides project (section 3). The following section (4) will describe in detail the

approach that we propose in order to automatically create semantic annotation for tex-

tual web resources. Finally, in section (5) we explore advantages and disadvantages of

the proposed methodologies, and describe our current work and suggestions for future

development.

2 ExtrAns

In this section we brieﬂy describe the linguistic processing performed in the ExtrAns

systems, extended details can be found in [22]. An initial phase of syntactic analy-

sis, based on the Link Grammar parser [24] is followed by a transformation of the

dependency-based syntactic structures generated by the parser into a semantic repre-

sentation based on Minimal Logical Forms, or MLFs [15]. As the name suggests, the

MLF of a sentence does not attempt to encode the full semantics of the sentence. Cur-

rently the MLFs encode the semantic dependencies between the open-class words of

the sentences (nouns, verbs, adjectives, and adverbs) plus prepositional phrases. The

notation used has been designed to incrementally incorporate additional information

if needed. Thus, other modules of the NLP system can add new information without

having to remove old information.

We have chosen a computationally intensive approach, which allows a deeper lin-

guistic analysis to be performed, at the cost of higher processing time. Such costs are

negligible in the case of a single sentence (like a user query) but become rapidly im-

practical in the case of the analysis of a large document set. The approach we take is to

analyse all the documents in an off-line stage (see ﬁgure 1) and store a representation

of their contents (the MLFs) in a Knowledge Base. In an on-line phase, the MLF which

results from the analysis of the user query is matched in the KB against the stored repre-

sentations, locating those MLFs that best answer the query. At this point the system can

locate in the original documents the sentences from which the MLFs where generated

(see ﬁgure 2).

One of the most serious problems that we have encountered in processing technical

documentation is the syntactic ambiguity generated by multi-word units, in particular

technical terms. Any generic parser, unless developed speciﬁcally for the domain at

hand, will have serious problems dealing with them. On the one hand, it is likely that

they contain tokens that do not correspond to any word in the parser’s lexicon, on the

other, their syntactic structure is highly ambiguous (alternative internal structures, as

well as possible undesired combinations with neighbouring tokens). In fact, it is pos-

sible to show that, when all the terminology of the domain is available, a much more

efﬁcient approach is to pack the multi-word units into single lexical tokens prior to syn-

tactical analysis [5]. In our case, such an approach brings a reduction in the complexity

of parsing of almost 50%.

During the process described above, terms are gathered into WordNet style synsets

and organized into a taxonomy. During the analysis of documents and queries, if a term

belonging to a synset is identiﬁed, it is replaced by its synset identiﬁer, which then

allows retrieval using any other term in the same synset. This amounts to an implicit

‘terminological normalization’ for the domain, where the synset identiﬁer can be taken

as a reference to the ‘concept’ that each of the terms in the synset describe [10]. In this

way any term contained in a user query is automatically mapped to all its variants.

When an answer cannot be located with the approach described so far, the system

is capable of ‘relaxing’ the query, gradually expanding the set of acceptable answers.

A ﬁrst step consists of including hyponyms and hyperonyms of terms in the query. If

the query extended with this ontological information fails to ﬁnd an exact answer, the

system returns the sentence (or set of sentences) whose MLF is semantically closest

with the MLF of the question. Semantic closeness is measured here in terms of overlap

of logical forms; the use of ﬂat expressions for the MLFs allows for a quick computation

of this overlap after unifying the variables of the question with those of the answer

candidate. The current algorithm for approximate matching compares pairs of MLF

predicates and returns 0 or 1 on the basis of whether the predicates unify or not. An

alternative that is worth exploring is the use of ontological information to compute a

measure based on the ontological distance between words, i.e. by exploring its shared

information content [18].

HTML Viewer

Figures

Fig. 2. Example of interaction with the system

Fig. 6. Visualization of Semantic Annotations

Fig. 3. Visualization of Structural Annotations

Fig. 4. Visualization of Lexical Annotations and their attributes

Fig. 5. From Documents to Semantic Annotations

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Metadata creation system for mobile images

Risto Sarvas, +3 more

TL;DR: The main findings were that the creation process could be implemented with current technology and it facilitated the creation of semantic metadata at the time of image capture.

...read moreread less

Journal ArticleDOI

Infectious Diseases: Preparing for the Future

David A. King, +4 more

- 08 Sep 2006 -

Science

TL;DR: A recent Foresight project report analyzes technological and policy priorities for meeting future challenges of infectious diseases affecting humans, plants, and animals.

...read moreread less

Book

The Foundations for Provenance on the Web

Luc Moreau

TL;DR: This monograph contends that provenance can and should reliably be tracked and exploited on the Web, and investigates the necessary foundations to achieve such a vision, as well as identifying an open approach and a model for provenance.

...read moreread less

Book ChapterDOI

SPADE: support for provenance auditing in distributed environments

Ashish Gehani, +1 more

TL;DR: The system has been designed to decouple the collection, storage, and querying of provenance metadata, with a novel provenance kernel that mediates between the producers and consumers ofprovenance information, and handles the persistent storage of records.

...read moreread less

Book ChapterDOI

Linking lexical resources and ontologies on the semantic web with lemon

John P. McCrae, +2 more

TL;DR: It is shown that the adoption of Semantic Web standards can provide added value for lexicon models by supporting a rich axiomatization of linguistic categories that can be used to constrain the usage of the model and to perform consistency checks.

...read moreread less

Collapse

The Semantic Web: Research and Applications

Christoph Bussler, +3 more

Workflow mining: discovering process models from event logs

W.M.P. van der Aalst, +2 more

- 01 Sep 2004 -

IEEE Transactions on Knowledge and Data ...

Workflow mining: a survey of issues and approaches

W.M.P. van der Aalst, +5 more

A translation approach to portable ontology specifications

Thomas R. Gruber

- 01 Jun 1993 -

Knowledge Acquisition

KNN Model-Based Approach in Classification

Gongde Guo, +4 more

- 03 Nov 2003 -

Lecture Notes in Computer Science

Frequently Asked Questions (17)

Q1. What have the authors contributed in "Breaking the deadlock" ?

The authors propose an approach based on Language Processing techniques that addresses this issue, at least for textual resources ( which still constitute the vast majority of the material available on the web ). Advantages and drawbacks of both approaches are discussed. The authors propose an approach based on Language Processing techniques that addresses this issue, at least for textual resources ( which still constitute the vast majority of the material available on the web ). Advantages and drawbacks of both approaches are discussed. The authors propose an approach based on Language Processing techniques that addresses this issue, at least for textual resources ( which still constitute the vast majority of the material available on the web ). Advantages and drawbacks of both approaches are discussed. A semi-automatic approach is also considered, in which the system suggests candidate annotations and the user simply has to approve or reject them. A semi-automatic approach is also considered, in which the system suggests candidate annotations and the user simply has to approve or reject them. A semi-automatic approach is also considered, in which the system suggests candidate annotations and the user simply has to approve or reject them.

Q2. What are the future works in "Breaking the deadlock" ?

Despite the still experimental level of the current implementation, the authors are confident that the ideas described in this paper provide a powerful ( and extremely useful ) contribution to the future developments of the Semantic Web. The authors are certain that they will witness in the near future a deeper convergence of the Semantic Web and the Natural Language Processing communities, towards the common goal of easing the information access bottleneck to web resources.

Q3. What is the next step of processing?

The next step of processing involves addition of basic linguistic information: documents are tokenized, morphologically analyzed and tagged.

Q4. What is the result of this phase of analysis?

The result of this phase of analysis is a representations of the propositional content of the sentences, as minimal logical forms.

Q5. What is the purpose of the project?

In general terms the project is concerned with organisational knowledge management, specifically, by developing an ontology driven systematic approach to integrating the entire process of information gathering, processing and analysis.

Q6. What is the purpose of the annotation scheme?

The annotation scheme is intended to work as the projects’ lingua franca: all the modules will be required to accept as input and generate as output documents conformant to the (agreed) annotation scheme.

Q7. What is the main idea behind the semantic web movement?

As a very large proportion of existing web resources are represented by humanreadable documentation, the authors believe that a possible way to break the deadlock mentioned above is to start using available information extraction tools to enrich the documents with automatically generated annotations.

Q8. What is the main benefit of the NIST-supported competitive evaluations?

Some of the NIST-supported competitive evaluations (e.g. MUC) greatly benefited from the existence of scoring tools, which could automatically compare the results of each participant against a gold standard.

Q9. What is the way to find the lexical entities?

Simple string based match might suffice in some cases of named entities, however in more complex cases complex pronominal resolution algorithms are needed.

Q10. What is the current algorithm for approximate matching?

The current algorithm for approximate matching compares pairs of MLF predicates and returns 0 or 1 on the basis of whether the predicates unify or not.

Q11. What is the first step of the process?

The first step of processing is going to be a conversion from the source-specific document format to the agreed Parmenides format.

Q12. What was the motivation behind the semantic web movement?

One of the motivations behind the semantic web movement was that computers are not powerful enough to process (and understand) natural language.

Q13. What is the purpose of structural annotations?

Broadly speaking, structural annotations are concerned with the organization of documents into sub-units, such as section, title, paragraphs and sentences.

Q14. What is the process of locating the MLFs?

In an on-line phase, the MLF which results from the analysis of the user query is matched in the KB against the stored representations, locating those MLFs that best answer the query.

Q15. What is the advantage of using XML?

This is in fact one of the advantages of using XML: many readily available off-the-shelf tools can be used for parsing and filtering the XML annotations, according to the needs of each module.

Q16. What is the common problem with tokens?

On the one hand, it is likely that they contain tokens that do not correspond to any word in the parser’s lexicon, on the other, their syntactic structure is highly ambiguous (alternative internal structures, as well as possible undesired combinations with neighbouring tokens).

Q17. What is the conversion of the document?

This conversion is based on a set of source-specific wrappers [13], which transforms the original document into the XML structural annotations previously described.

On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE

Figures

Citations

Metadata creation system for mobile images

Infectious Diseases: Preparing for the Future

The Foundations for Provenance on the Web

SPADE: support for provenance auditing in distributed environments

Linking lexical resources and ontologies on the semantic web with lemon

Related Papers (5)

The Semantic Web: Research and Applications

Workflow mining: discovering process models from event logs

Workflow mining: a survey of issues and approaches

A translation approach to portable ontology specifications

KNN Model-Based Approach in Classification

Frequently Asked Questions (17)

Q1. What have the authors contributed in "Breaking the deadlock" ?

Q2. What are the future works in "Breaking the deadlock" ?

Q3. What is the next step of processing?

Q4. What is the result of this phase of analysis?

Q5. What is the purpose of the project?

Q6. What is the purpose of the annotation scheme?

Q7. What is the main idea behind the semantic web movement?

Q8. What is the main benefit of the NIST-supported competitive evaluations?

Q9. What is the way to find the lexical entities?

Q10. What is the current algorithm for approximate matching?

Q11. What is the first step of the process?

Q12. What was the motivation behind the semantic web movement?

Q13. What is the purpose of structural annotations?

Q14. What is the process of locating the MLFs?

Q15. What is the advantage of using XML?

Q16. What is the common problem with tokens?

Q17. What is the conversion of the document?