scispace - formally typeset
Open AccessBookDOI

On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE

Reads0
Chats0
TLDR
In this article, a model supporting dynamic heterogeneous workflow process interconnection is proposed to co-ordinate geographically distributed business processes in order to strength awareness inside virtual enterprises, to facilitate multinational e-transactions, etc.
Abstract
Process interconnection mechanisms are necessary to co-ordinate geographically distributed business processes in order to strength awareness inside virtual enterprises, to facilitate multinational e-transactions, etc Actually, existing business process modelling and enactment systems (workflow systems, project management tools, shared agendas, to do lists, etc) have been mainly developed to suit enterprise internal needs Thus, most of these systems are not adapted to inter-enterprise co-operation As we are interested in workflow processes, we aim, through this paper, to provide a model supporting dynamic heterogeneous workflow process interconnection We consider the interconnection of enterprise workflow processes as the management of a workflow of workflows in which several heterogeneous workflow systems coexist This paper introduces our process interconnection model, its implementation, and its validation through an experimentation

read more

Content maybe subject to copyright    Report

ZurichOpenRepositoryand
Archive
UniversityofZurich
UniversityLibrary
Strickhofstrasse39
CH-8057Zurich
www.zora.uzh.ch
Year:2003
Breakingthedeadlock
Rinaldi,Fabio;Kaljurand,K;Dowdall,J;Hess,M
Abstract:Manyoftheproposedapproachestothesemanticwebhaveasubstantialdrawback.Theyare
allbasedontheideathatwebpages(ormoregenerally,resources),willcontainsemanticannotationsthat
wouldallowremoteagentstoaccessthem. Howevertheproblemofthecreationofthoseannotationsis
seldomaddressed.Manualcreationoftheannotationsisnotafeasibleoption,exceptinafewexperimental
cases.WeproposeanapproachbasedonLanguageProcessingtechniquesthataddressesthisissue,at
leastfortextualresources(whichstillconstitutethevastmajorityofthematerialavailableontheweb).
Documentsareanalizedfullyautomaticallyandconvertedintoasemanticannotation,whichcanthen
be stored together withthe original documents.It is this annotationthat constitutes the machine
understandableresourcethatremoteagentscanquery.Asemi-automaticapproachisalsoconsidered,
inwhichthesystemsuggestscandidateannotationsandtheusersimplyhastoapproveorrejectthem.
Advantagesanddrawbacksofbothapproachesarediscussed.
DOI:https://doi.org/10.1007/b94348
PostedattheZurichOpenRepositoryandArchive,UniversityofZurich
ZORAURL:https://doi.org/10.5167/uzh-19102
ConferenceorWorkshopItem
Originallypublishedat:
Rinaldi,Fabio;Kaljurand,K;Dowdall,J;Hess,M(2003).Breakingthedeadlock. In:ODBASE,2003
(InternationalConferenceonOntologies,DatabasesandApplicationsofSEmantics),Catania,Italy,2003,
876-888.
DOI:https://doi.org/10.1007/b94348

Breaking the Deadlock
Fabio Rinaldi, Kaarel Kaljurand, James Dowdall, and Michael Hess
Institute of Computational Linguistics,
University of Z
¨
urich,
Winterthurerstrasse 190, CH-8057 Z
¨
urich,
Switzerland
{rinaldi}@cl.unizh.ch
Abstract. Many of the proposed approaches to the semantic web have a substan-
tial drawback. They are all based on the idea that web pages (or more generally,
resources), will contain semantic annotations that would allow remote agents to
access them. However the problem of creating these annotations is seldom ad-
dressed. Manual creation of the annotations is not a feasible option, except in a
few experimental cases.
We propose an approach based on Language Processing techniques that addresses
this issue, at least for textual resources (which still constitute the vast majority of
the material available on the web). Documents are analized fully automatically
and converted into a semantic annotation, which can then be stored together with
the original documents. It is this annotation that constitutes the machine under-
standable resource that remote agents can query. A semi-automatic approach is
also considered, in which the system suggests candidate annotations and the user
simply has to approve or reject them. Advantages and drawbacks of both ap-
proaches are discussed.
1 Introduction
The major purpose of activities in the Semantic Web area is to help users better locate,
organize, and process content, irrespective of its physical location and of the way it is
presented. Adding machine-understandable semantics to web resources will make them
processable by software agents, and ultimately make them more useful to all of us.
There is a wealth of research efforts focusing on the foundations of the semantic
web [8], and in particular on the problem of how to represent the semantic information
carried by web resources (be they structured databases or unstructured natural language
documents, or a combination of both). The XML-based Resources Description Frame-
work [14] is the standardized Semantic Web language, however it is really meant for
use by computers, not humans. The same applies to all the extensions that have been
proposed, such as RDF Schema [2], which provides a basic type system for use in RDF
models, or DAML+OIL [4], which provides a language with well-defined semantics for
the specification of Ontologies.
However, there seems to be significantly less interest in the problem of how to help
users in the transition from conventional web pages to richly annotated semantic web
resources. The major barrier to a wider adoption of the Semantic Web proposals is a
classic deadlock problem [11]. On the one hand, significant additional effort is required

to add semantic annotations to existing (or newly created) web resources, and people
are not willing to pay this price until they can see a clear benefit for it. On the other
hand, software agents that can reap the benefit of richer annotations will not be useful
(and thus there will be fewer incentives to develop them) until a “critical mass” of
semantically annotated web resources has been achieved.
Current efforts to tackle this problem seem to focus on the development of user-
friendly editors for semantic annotations: details of XML/RDF should be hidden be-
hind GUI authoring tools. Users do not need (and do not want) to get in contact with
XML/RDF. However this approach defeats the purpose of the Semantic Web vision: to
make the web more effective for users by making it machine-understandable. Instead it
makes it less effective for users: by forcing them to add machine-level markup (albeit
shielded by an effective GUI). Unless the users can see the real benefit, they will not be
motivated to adopt such editors and be prepared to pay the price (in terms of additional
effort that might be required).
The benefits of the semantic web should come for free to most of the users: semantic
markup should be a by-product of normal computer use. There is a real need to lower
the barrier of entry: the vast majority of the users cannot be expected to understand and
use formal ontologies. In order to achieve interoperability between software agents,
a lot of human understandability has been sacrificed: precise ontologies and formally
defined semantics are foreign concepts to the average users.
As a very large proportion of existing web resources are represented by human-
readable documentation, we believe that a possible way to break the deadlock men-
tioned above is to start using available information extraction tools to enrich the docu-
ments with automatically generated annotations. In this paper we propose an approach
based on natural language processing (NLP) techniques, geared towards the creation of
semantic annotations, starting from the available textual documents.
One of the motivations behind the semantic web movement was that computers are
not powerful enough to process (and understand) natural language. Therefore machine
understandable information should be added to web resources. This is still true: it would
be unfeasible to process the enormous amounts of textual resources that are added to
the web every day (let alone process all the existing web content). However, it is techni-
cally possible (and practically conceivable) to have specialised editors that process (in
a transparent fashion) textual resources as the users publish them on the web, and add
semantic annotations automatically extracted from the documents. In other words, the
idea is to move the problem from the consumer of the information to the producer.
As Natural Language is the information access most users are comfortable with, we
will also discuss possible ways to access the information encoded in the semantic an-
notations. Given a user question phrased in natural language, existing tools can convert
it into the same kind of annotations as those stored in the documents. A new type of
software agent (or search engine) might then be capable of retrieving those web pages
whose annotations match those derived from the user question.
The approach presented in this paper is based on our previous work in the area
of Question Answering, resulting in the ExtrAns system [22]. Specific research in the
area of Question Answering has been promoted in the last few years in particular by
the Question Answering track of the Text REtrieval Conference (TREC-QA) competi-

Document
KB
MLF
Generator
Document
Term
Processing
Thesaurus
Linguistic
Processing
Fig. 1. Offline analysis of documents
tions [26]. ExtrAns uses a combination of robust natural language processing technol-
ogy and dedicated terminology processing [19, 20] to create a Knowledge Base, con-
taining a semantic representation for the propositional content of the documents [23].
Our research group has been working in the area of Question Answering for a few
years, targeting different domains, such as the Aircraft Maintenance Manual (AMM) of
a large aircraft [22] or a computer manual [15].
In a recently started EU project (“Parmenides”) focusing on the integration of Infor-
mation Extraction and Data Mining techniques, we aim at exploiting the work done in
the ExtrAns system by moving from the system-specific semantic representation (Min-
imal Logical Forms) to a semantic representation based on W3C standards, like RDF.
A secondary aim might be to explore possible synergies with the standardization effort
of the ISO TC37/SC4 committe in the domain of linguistic annotations [21].
We will first briefly describe our past work resulting in the ExtrAns system (sec-
tion 2), then describe the annotations that we aim at generating automatically in the
Parmenides project (section 3). The following section (4) will describe in detail the
approach that we propose in order to automatically create semantic annotation for tex-
tual web resources. Finally, in section (5) we explore advantages and disadvantages of
the proposed methodologies, and describe our current work and suggestions for future
development.
2 ExtrAns
In this section we briefly describe the linguistic processing performed in the ExtrAns
systems, extended details can be found in [22]. An initial phase of syntactic analy-
sis, based on the Link Grammar parser [24] is followed by a transformation of the
dependency-based syntactic structures generated by the parser into a semantic repre-
sentation based on Minimal Logical Forms, or MLFs [15]. As the name suggests, the
MLF of a sentence does not attempt to encode the full semantics of the sentence. Cur-
rently the MLFs encode the semantic dependencies between the open-class words of
the sentences (nouns, verbs, adjectives, and adverbs) plus prepositional phrases. The
notation used has been designed to incrementally incorporate additional information

if needed. Thus, other modules of the NLP system can add new information without
having to remove old information.
We have chosen a computationally intensive approach, which allows a deeper lin-
guistic analysis to be performed, at the cost of higher processing time. Such costs are
negligible in the case of a single sentence (like a user query) but become rapidly im-
practical in the case of the analysis of a large document set. The approach we take is to
analyse all the documents in an off-line stage (see figure 1) and store a representation
of their contents (the MLFs) in a Knowledge Base. In an on-line phase, the MLF which
results from the analysis of the user query is matched in the KB against the stored repre-
sentations, locating those MLFs that best answer the query. At this point the system can
locate in the original documents the sentences from which the MLFs where generated
(see figure 2).
One of the most serious problems that we have encountered in processing technical
documentation is the syntactic ambiguity generated by multi-word units, in particular
technical terms. Any generic parser, unless developed specifically for the domain at
hand, will have serious problems dealing with them. On the one hand, it is likely that
they contain tokens that do not correspond to any word in the parser’s lexicon, on the
other, their syntactic structure is highly ambiguous (alternative internal structures, as
well as possible undesired combinations with neighbouring tokens). In fact, it is pos-
sible to show that, when all the terminology of the domain is available, a much more
efficient approach is to pack the multi-word units into single lexical tokens prior to syn-
tactical analysis [5]. In our case, such an approach brings a reduction in the complexity
of parsing of almost 50%.
During the process described above, terms are gathered into WordNet style synsets
and organized into a taxonomy. During the analysis of documents and queries, if a term
belonging to a synset is identified, it is replaced by its synset identifier, which then
allows retrieval using any other term in the same synset. This amounts to an implicit
‘terminological normalization’ for the domain, where the synset identifier can be taken
as a reference to the ‘concept’ that each of the terms in the synset describe [10]. In this
way any term contained in a user query is automatically mapped to all its variants.
When an answer cannot be located with the approach described so far, the system
is capable of ‘relaxing’ the query, gradually expanding the set of acceptable answers.
A first step consists of including hyponyms and hyperonyms of terms in the query. If
the query extended with this ontological information fails to find an exact answer, the
system returns the sentence (or set of sentences) whose MLF is semantically closest
with the MLF of the question. Semantic closeness is measured here in terms of overlap
of logical forms; the use of flat expressions for the MLFs allows for a quick computation
of this overlap after unifying the variables of the question with those of the answer
candidate. The current algorithm for approximate matching compares pairs of MLF
predicates and returns 0 or 1 on the basis of whether the predicates unify or not. An
alternative that is worth exploring is the use of ontological information to compute a
measure based on the ontological distance between words, i.e. by exploring its shared
information content [18].

Citations
More filters
Proceedings ArticleDOI

Metadata creation system for mobile images

TL;DR: The main findings were that the creation process could be implemented with current technology and it facilitated the creation of semantic metadata at the time of image capture.
Journal ArticleDOI

Infectious Diseases: Preparing for the Future

TL;DR: A recent Foresight project report analyzes technological and policy priorities for meeting future challenges of infectious diseases affecting humans, plants, and animals.
Book

The Foundations for Provenance on the Web

TL;DR: This monograph contends that provenance can and should reliably be tracked and exploited on the Web, and investigates the necessary foundations to achieve such a vision, as well as identifying an open approach and a model for provenance.
Book ChapterDOI

SPADE: support for provenance auditing in distributed environments

TL;DR: The system has been designed to decouple the collection, storage, and querying of provenance metadata, with a novel provenance kernel that mediates between the producers and consumers ofprovenance information, and handles the persistent storage of records.
Book ChapterDOI

Linking lexical resources and ontologies on the semantic web with lemon

TL;DR: It is shown that the adoption of Semantic Web standards can provide added value for lexicon models by supporting a rich axiomatization of linguistic categories that can be used to constrain the usage of the model and to perform consistency checks.
Related Papers (5)
Frequently Asked Questions (17)
Q1. What have the authors contributed in "Breaking the deadlock" ?

The authors propose an approach based on Language Processing techniques that addresses this issue, at least for textual resources ( which still constitute the vast majority of the material available on the web ). Advantages and drawbacks of both approaches are discussed. The authors propose an approach based on Language Processing techniques that addresses this issue, at least for textual resources ( which still constitute the vast majority of the material available on the web ). Advantages and drawbacks of both approaches are discussed. The authors propose an approach based on Language Processing techniques that addresses this issue, at least for textual resources ( which still constitute the vast majority of the material available on the web ). Advantages and drawbacks of both approaches are discussed. A semi-automatic approach is also considered, in which the system suggests candidate annotations and the user simply has to approve or reject them. A semi-automatic approach is also considered, in which the system suggests candidate annotations and the user simply has to approve or reject them. A semi-automatic approach is also considered, in which the system suggests candidate annotations and the user simply has to approve or reject them. 

Despite the still experimental level of the current implementation, the authors are confident that the ideas described in this paper provide a powerful ( and extremely useful ) contribution to the future developments of the Semantic Web. The authors are certain that they will witness in the near future a deeper convergence of the Semantic Web and the Natural Language Processing communities, towards the common goal of easing the information access bottleneck to web resources. 

The next step of processing involves addition of basic linguistic information: documents are tokenized, morphologically analyzed and tagged. 

The result of this phase of analysis is a representations of the propositional content of the sentences, as minimal logical forms. 

In general terms the project is concerned with organisational knowledge management, specifically, by developing an ontology driven systematic approach to integrating the entire process of information gathering, processing and analysis. 

The annotation scheme is intended to work as the projects’ lingua franca: all the modules will be required to accept as input and generate as output documents conformant to the (agreed) annotation scheme. 

As a very large proportion of existing web resources are represented by humanreadable documentation, the authors believe that a possible way to break the deadlock mentioned above is to start using available information extraction tools to enrich the documents with automatically generated annotations. 

Some of the NIST-supported competitive evaluations (e.g. MUC) greatly benefited from the existence of scoring tools, which could automatically compare the results of each participant against a gold standard. 

Simple string based match might suffice in some cases of named entities, however in more complex cases complex pronominal resolution algorithms are needed. 

The current algorithm for approximate matching compares pairs of MLF predicates and returns 0 or 1 on the basis of whether the predicates unify or not. 

The first step of processing is going to be a conversion from the source-specific document format to the agreed Parmenides format. 

One of the motivations behind the semantic web movement was that computers are not powerful enough to process (and understand) natural language. 

Broadly speaking, structural annotations are concerned with the organization of documents into sub-units, such as section, title, paragraphs and sentences. 

In an on-line phase, the MLF which results from the analysis of the user query is matched in the KB against the stored representations, locating those MLFs that best answer the query. 

This is in fact one of the advantages of using XML: many readily available off-the-shelf tools can be used for parsing and filtering the XML annotations, according to the needs of each module. 

On the one hand, it is likely that they contain tokens that do not correspond to any word in the parser’s lexicon, on the other, their syntactic structure is highly ambiguous (alternative internal structures, as well as possible undesired combinations with neighbouring tokens). 

This conversion is based on a set of source-specific wrappers [13], which transforms the original document into the XML structural annotations previously described.