scispace - formally typeset
Open AccessJournal ArticleDOI

Ontology-driven document enrichment

Reads0
Chats0
TLDR
An approach to document enrichment is presented, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and (possibly) additional knowledge-intensive services, beyond what is currently available using “standard” information retrieval and search facilities.
Abstract
In this paper, we present an approach to document enrichment, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and (possibly) additional knowledge-intensive services, beyond what is currently available using “standard” information retrieval and search facilities. Our approach is ontology-driven, in the sense that the construction of the knowledge model is carried out in a top-down fashion, by populating a given ontology, rather than in a bottom-up fashion, by annotating a particular document. In this paper, we give an overview of the approach and we examine the various types of issues (e.g. modelling, organizational and user interface issues) which need to be tackled to effectively deploy our approach in the workplace. In addition, we also discuss a number of technologies we have developed to support ontology-driven document enrichment and we illustrate our ideas in the domains of electronic news publishing, scholarly discourse and medical guidelines.

read more

Content maybe subject to copyright    Report

Open Research Online
The Open University’s repository of research publications
and other research outputs
Ontology-driven document enrichment: principles,
tools and applications
Journal Item
How to cite:
Motta, Enrico; Buckingham Shum, Simon and Domingue, John (2000). Ontology-driven document enrichment:
principles, tools and applications. International Journal of Human-Computer Studies, 52(6) pp. 1071–1109.
For guidance on citations see FAQs.
c
2000 Academic Press
Version: Accepted Manuscript
Link(s) to article on publisher’s website:
http://dx.doi.org/doi:10.1006/ijhc.2000.0384
Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright
owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies
page.
oro.open.ac.uk

To appear in the International Journal of Human-Computer Studies
Ontology-Driven Document Enrichment:
Principles, Tools and Applications
Enrico Motta, Simon Buckingham Shum and John Domingue
Knowledge Media Institute
The Open University
Walton Hall, MK7 6AA
Milton Keynes, UK
{e.motta, s.buckingham.shum, j.b.domingue}@open.ac.uk
Abstract. In this paper we present an approach to document enrichment, which
consists of developing and integrating formal knowledge models with archives of
documents, to provide intelligent knowledge retrieval and (possibly) additional
knowledge-intensive services, beyond what is currently available using 'standard'
information retrieval and search facilities. Our approach is ontology-driven, in the
sense that the construction of the knowledge model is carried out in a top-down
fashion, by populating a given ontology, rather than in a bottom-up fashion, by
annotating a particular document. In the paper we give an overview of the approach
and we examine the various types of issues (e.g., modelling, organizational and user
interface issues) which need to be tackled to effectively deploy our approach in the
workplace. In addition we also discuss a number of technologies we have developed
to support ontology-driven document enrichment and we illustrate our ideas in the
domains of electronic news publishing, scholarly discourse and medical guidelines.
1. INTRODUCTION
An important activity in knowledge management is "to convert text to knowledge" (O’Leary,
1998). This activity is central to knowledge management for two reasons: i) work practices and
information flow in organizations tend to be document-centred and ii) documents themselves do
not normally exhibit the amount of structure required to support semantically-aware search
engines or other forms of intelligent services. For these reasons there has been much interest in
technology to support the specification of structured information in textual documents, especially
web pages. The web standardisation community has focused on the underlying representational
infrastructure: XML (XML, 1999) has been proposed as the basic annotation formalism to
support the specification of structured information in web pages, while RDF builds on the XML
syntax to provide a standard declarative representation, which allows users to express semantic
relationships between items on the Web. Approaches such as Ontobroker (Fensel et al., 1998)
and Shoe (Heflin et al., 1998) provide formalisms and associated interpreters which make it

Ontology-Driven Document Enrichment. Page 2
possible to embed knowledge representation structures in web pages and use them to perform
inferences.
In this paper we look at the wider issues concerning "the conversion of text to knowledge" and
discuss a comprehensive approach to document enrichment (Sumner et al., 1998), which we are
trying out in a number of projects here at the Knowledge Media Institute. The approach is
characterized in terms of a set of activities, with associated informal guidelines. In the paper we
also describe a number of technologies, which we have developed to support our approach to
document-centred knowledge management. These technologies include a knowledge modelling
language
1
, form-based interfaces for adding and retrieving knowledge from a model, and a web-
based browser/editor, which supports the collaborative development of knowledge models over
the World-Wide-Web. Finally, we discuss the application of our approach to three domains:
electronic news publishing (Domingue and Motta, 1999), scholarly discourse (Buckingham
Shum et al., 1999) and medical guidelines (PatMan, 1998).
The paper is organized as follows: in the next section we give an overview of our approach, in
terms of the underlying methodological assumptions and the associated process model. In
section 3 we describe the technology we have developed to support the approach. In sections 4,
5 and 6 we discuss the application of the approach to the three aforementioned domains. Finally,
in sections 7 and 8 we discuss related work and reiterate the main contributions of this paper.
2. ONTOLOGY-DRIVEN DOCUMENT ENRICHMENT
Our approach is ontology-driven, in the sense that the construction of the knowledge model is
carried out in a top-down fashion, by populating a given ontology (Gruber, 1993), rather than in a
bottom-up fashion, by annotating a particular document. Figure 1 underlines this point
graphically, by emphasizing that the construction of a knowledge model is driven by a pre-
existing ontology, a set of documents and other sources of knowledge, such as appropriate
(human) experts. Following Gruber, we use the term “ontology” to indicate “a specification of a
reusable conceptualization”. More simply, an ontology can be seen as providing a vocabulary for
describing a range of models. For instance, an ontology for medical guidelines provides a
generic set of concepts and relations (e.g., medical condition, diagnostic guideline, guideline user
type), which can then be instantiated for particular guidelines to build guideline-specific models,
in domains such as stroke management or prevention of pressure ulcer.
An ontology-driven approach to model construction affords several advantages. Instantiating an
ontology is usually simpler and speedier than developing a model from scratch. In addition,
1
Here we use the term “knowledge modelling” as a short form for “knowledge-level modelling”, an expression
introduced by Allen Newell (1982) to describe models of knowledge-intensive behaviour which abstract from the
way this behaviour is implemented and focus instead on the knowledge employed by an agent and the goals the
agent is trying to achieve.

Ontology-Driven Document Enrichment. Page 3
because an ontology makes explicit the conceptualization underlying a particular model, it
becomes easier to maintain, reuse and interoperate the model with other components. Finally,
reasoning modules can be associated with an ontology and these are then applicable to all models
built by instantiating the ontology in question. For instance, in the case of medical guidelines,
one can envisage building ontology-specific guideline verification tools, which can then be used
to verify individual guidelines developed by instantiating the same generic guideline ontology.
Figure 1. Ontology-driven Document Enrichment
Because our model construction process is ontology-driven, we prefer to use the term
"enrichment" (Sumner et al., 1998), rather than "conversion" or "annotation", to refer to the
process of associating a formal model to a document (or set of documents). In general, a
representation, whether formal, graphical or textual, can be enriched in several different ways -
e.g., i) by providing information about the context in which it was created, ii) by linking it to
related artefacts of the same nature, or iii) by linking it to related artefacts of a different nature.
Although in our document-centred knowledge management work we provide multiple forms of
document enrichment, such as associating discussion spaces to documents (Sumner and

Ontology-Driven Document Enrichment. Page 4
Buckingham Shum, 1998), in this paper we will primarily concentrate on the association of
formal knowledge models to documents
2
.
Thus, an important facet of an ontology-centred approach to document enrichment is that the
formalised knowledge is not meant to be a translation of what is informally specified in the
associated document. Hence the knowledge model typically plays a different role from the
associated text. For instance, in the medical guideline scenario the knowledge model helps to
verify that all the kinds of knowledge expected to be found in a document describing a medical
guideline are indeed there. In the scholarly discourse scenario the knowledge model is meant to
capture the meta-knowledge required to structure academic debates (e.g., theory X contradicts
theory Y), which is often expressed only implicitly in publications (i.e., acquiring it typically
requires some interpretation effort) and is not modelled at all in traditional libraries. In a
nutshell, the emphasis in our approach is in identifying the added value (in terms of enabling
semantic retrieval and document indexing capabilities, or other reasoning services), which can be
provided by a formalised knowledge model. Our methodology comprises the following six steps.
1. Identify use scenario.
2. Characterize viewpoint for ontology.
3. Develop the ontology.
4. Perform ontology-driven model construction.
5. Customise query interface for semantic knowledge retrieval.
6. Develop additional reasoning services on top of knowledge model.
These steps are briefly described in the next sub-sections.
2.1 Identify Use Scenario
At this stage the services to be delivered by the knowledge management system are defined. In
particular, issues of feasibility and cost are investigated. Addressing the latter involves
answering questions such as: “What is the added value provided by the knowledge model,
considering the non-trivial costs associated with the development and instantiation of an
ontology?”, “Is there the need for a ‘full-blown’ knowledge model and for going beyond the
facilities provided by off-the-shelf search engines?”, “What additional reasoning services will be
provided, beyond deductive knowledge retrieval?”. Addressing feasibility issues requires
assessing (among other things) whether or not it is feasible to expect the target user community
to perform document enrichment or whether specialized human editors will be needed. This
latter solution introduces a significant bottleneck in the process and moreover assumes that to
introduce a central editor in the model development is actually feasible. This is definitely not the
case in some of our application domains. For instance, in the scholarly discourse scenario our
2
Having said so, the medical guideline scenario described in section 6 does integrate a formal model with a set of
discussion spaces, to provide multiple forms of document enrichment.

Citations
More filters
Book ChapterDOI

The Semantic Grid: A Future e‐Science Infrastructure

TL;DR: This paper presents a conceptual architecture for the Semantic Grid, a service-oriented perspective in which distinct stakeholders in the scientific process, represented as software agents, provide services to one another, under various service level agreements, in various forms of marketplace.
Journal ArticleDOI

Towards a dialogic understanding of the relationship between CSCL and teaching thinking skills

TL;DR: This paper uses critical literature review, conceptual analysis, and evidence from case studies to argue for the value of a dialogic interpretative framework that links the goal of teaching thinking with the method of CSCL, and suggests that dialogue is itself the primary thinking skill from which all others are derived.
Journal ArticleDOI

The semantic web: yet another hip?

TL;DR: In this paper, the authors summarize ongoing research in the area of the semantic web, focusing especially on ontology technology, and provide an overview of the current state of the art in this area.
Journal ArticleDOI

User acceptance of intergovernmental services: An example of electronic document management system

TL;DR: The findings indicate that perceived usefulness, perceived ease of use, training, compatibility, external influence, interpersonal influence, self-efficacy, and facilitating conditions are significant predictors of users' intention to utilize EDMS.
Proceedings Article

Ontology library systems: the key to successful ontology re-use

TL;DR: Examining existing library systems of ontology library systems identified the main criteria (management, adaptation, and standardization) for evaluating the functionality of the library systems and proposed various important requirements for structuring ontological library systems.
References
More filters
Journal ArticleDOI

A translation approach to portable ontology specifications

TL;DR: This paper describes a mechanism for defining ontologies that are portable over representation systems, basing Ontolingua itself on an ontology of domain-independent, representational idioms.
Journal ArticleDOI

Ontologies: principles, methods and applications

TL;DR: This paper outlines a methodology for developing and evaluating ontologies, first discussing informal techniques, concerning such issues as scoping, handling ambiguity, reaching agreement and producing definitions, and considers, a more formal approach.
Proceedings Article

Letizia: an agent that assists web browsing

TL;DR: Letizia is a user interface agent that assists a user browsing the World Wide Web by automates a browsing strategy consisting of a best-first search augmented by heuristics inferring user interest from browsing behavior.
Book

Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project

TL;DR: This review has been difficult for me to write, because my thoughts about Cyc have changed a great deal since I first read the book in the spring of 1990 and I agree with his complaints about the confusing organization of the book and the lack of precise definitions.
Book

Building large knowledge-based systems

TL;DR: In this age of modern era, the use of internet must be maximized as one of the benefits is to get the on-line building large knowledge based systems book, as the world window, as many people suggest.
Frequently Asked Questions (11)
Q1. What have the authors contributed in "Ontology-driven document enrichment: principles, tools and applications" ?

In this paper the authors present an approach to document enrichment, which consists of developing and integrating formal knowledge models with archives of documents, to provide intelligent knowledge retrieval and ( possibly ) additional knowledge-intensive services, beyond what is currently available using 'standard ' information retrieval and search facilities. Their approach is ontology-driven, in the sense that the construction of the knowledge model is carried out in a top-down fashion, by populating a given ontology, rather than in a bottom-up fashion, by annotating a particular document. In the paper the authors give an overview of the approach and they examine the various types of issues ( e. g., modelling, organizational and user interface issues ) which need to be tackled to effectively deploy their approach in the workplace. In addition the authors also discuss a number of technologies they have developed to support ontology-driven document enrichment and they illustrate their ideas in the domains of electronic news publishing, scholarly discourse and medical guidelines. 

Because their model construction process is ontology-driven, the authors prefer to use the term "enrichment" (Sumner et al., 1998), rather than "conversion" or "annotation", to refer to the process of associating a formal model to a document (or set of documents). 

For instance, in the case of electronic publishing, the ontology is used to enrich news items, which are submitted either through email or through a web-based form. 

In the scholarly discourse scenario the knowledge model is meant to capture the meta-knowledge required to structure academic debates (e.g., theory X contradicts theory Y), which is often expressed only implicitly in publications (i.e., acquiring it typically requires some interpretation effort) and is not modelled at all in traditional libraries. 

The design of the ontology was based on the analysis of scholarly articles from a range of different fields, and took about two person weeks’ effort. 

These technologies include a knowledge modelling language1, form-based interfaces for adding and retrieving knowledge from a model, and a webbased browser/editor, which supports the collaborative development of knowledge models over the World-Wide-Web. 

In addition to the need for better search and retrieval facilities, the experience of a day-to-day use of Planet over more than two years has highlighted a number of other issues. 

A key to successful knowledge management is tointegrate these different media to provide the appropriate services in the relevant scenarios. 

It might appear paradoxical to propose the use of ontologies to support scholarly communities in managing their knowledge, since conflicting worldviews, evidence and frames of reference lie at the heart of research and debate. 

i) given that WebOnto has been designed to be as easy to use as possible and ii) in many cases end users want to inspect an ontology directly (for instance, to gain a better understanding of the underlying organization), the authors usually include pointers to WebOnto in their application interfaces. 

because an ontology makes explicit the conceptualization underlying a particular model, it becomes easier to maintain, reuse and interoperate the model with other components.