Showing papers presented at "Human Language Technology in 2001"

PDF

Open Access

Proceedings Article•DOI•

Identification of relevant terms to support the construction of domain ontologies

[...]

Paola Velardi¹, Michele Missikoff, Roberto Basili²•Institutions (2)

Sapienza University of Rome¹, University of Rome Tor Vergata²

06 Jul 2001

TL;DR: A text mining technique to aid an Ontology Engineer to identify the important concepts in a Domain Ontology is described.

...read moreread less

Abstract: Though the utility of domain Ontologies is now widely acknowledged in the IT (Information Technology) community, several barriers must be overcome before Ontologies become practical and useful tools. One important achievement would be to reduce the cost of identifying and manually entering several thousand-concept descriptions. This paper describes a text mining technique to aid an Ontology Engineer to identify the important concepts in a Domain Ontology.

...read moreread less

96 citations

Proceedings Article•DOI•

The form is the substance: classification of genres in text

[...]

Nigel Dewdney, Carol VanEss-Dykema, Richard MacMillan¹•Institutions (1)

Mitre Corporation¹

06 Jul 2001

TL;DR: This work investigates text classification by format style, i.e. "genre", and demonstrates, by complementing topic classification, that it can significantly improve retrieval of information.

...read moreread less

Abstract: Categorization of text in IR has traditionally focused on topic. As use of the Internet and e-mail increases, categorization has become a key area of research as users demand methods of prioritizing documents. This work investigates text classification by format style, i.e. "genre", and demonstrates, by complementing topic classification, that it can significantly improve retrieval of information. The paper compares use of presentation features to word features, and the combination thereof, using Naive Bayes, C4.5 and SVM classifiers. Results show use of combined feature sets with SVM yields 92% classification accuracy in sorting seven genres.

...read moreread less

95 citations

Proceedings Article•DOI•

Component-based multimodal dialog interfaces for mobile knowledge creation

[...]

Georg Niklfeld, Robert Finan, Michael Pucher

06 Jul 2001

TL;DR: It is argued that multimodal dialog systems and the naturalized mobile access to company data they offer will trigger a new knowledge management practice of importance for knowledge-intensive companies.

...read moreread less

Abstract: This paper addresses two related topics: Firstly, it presents building-blocks for flexible multimodal dialog interfaces based on standardized components (VoiceXML, XML) to indicate that thanks to well-supported standardizations, mobile multimodal interfaces to heterogeneous data sources are becoming ready for mass-market deployment, provided that adequate modularization is respected. Secondly, this is put in the perspective of a discussion of knowledge management in firms, and the paper argues that multimodal dialog systems and the naturalized mobile access to company data they offer will trigger a new knowledge management practice of importance for knowledge-intensive companies.

...read moreread less

32 citations

Proceedings Article•DOI•

The automatic generation of formal annotations in a multimedia indexing and searching environment

[...]

Thierry Declerck, Peter Wittenburg, Hamish Cunningham¹•Institutions (1)

University of Sheffield¹

06 Jul 2001

TL;DR: The MU-MIS Project (Multimedia Indexing and Searching Environment) is described, concerned with the development and integration of base technologies, demonstrated within a laboratory prototype, to support automated multimedia indexing and to facilitate search and retrieval from multimedia databases.

...read moreread less

Abstract: We describe in this paper the MU-MIS Project (Multimedia Indexing and Searching Environment), which is concerned with the development and integration of base technologies, demonstrated within a laboratory prototype, to support automated multimedia indexing and to facilitate search and retrieval from multimedia databases. We stress the role linguistically motivated annotations, coupled with domain-specific information, can play within this environment. The project will demonstrate that innovative technology components can operate on multilingual, multisource, and multimedia information and create a meaningful and queryable database.

...read moreread less

25 citations

Proceedings Article•DOI•

Semi-automatic practical ontology construction by using a thesaurus, computational dictionaries, and large corpora

[...]

Sin-Jae Kang¹, Jong-Hyeok Lee¹•Institutions (1)

Pohang University of Science and Technology¹

06 Jul 2001

TL;DR: The Kadokawa thesaurus is extended by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations, which enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology.

...read moreread less

Abstract: This paper presents the semi-automatic construction method of a practical ontology by using various resources. In order to acquire a reasonably practical ontology in a limited time and with less manpower, we extend the Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously-built computational dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from large corpora. The ontology stores rich semantic constraints among 1,110 concepts, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In our practical machine translation system, our ontology-based word sense disambiguation method achieved an 8.7% improvement over methods which do not use an ontology for Korean translation.

...read moreread less

23 citations

Proceedings Article•DOI•

GIST-IT: summarizing email using linguistic knowledge and machine learning

[...]

Evelyne Tzoukermann¹, Smaranda Muresan², Judith L. Klavans²•Institutions (2)

Alcatel-Lucent¹, Columbia University²

06 Jul 2001

TL;DR: This paper presents a system for the automatic extraction of salient information from email messages, thus providing the gist of their meaning, and combines shallow linguistic processing with machine learning to extract phrasal units that are representative of email content.

...read moreread less

Abstract: We present a system for the automatic extraction of salient information from email messages, thus providing the gist of their meaning. Dealing with email raises several challenges that we address in this paper: heterogeneous data in terms of length and topic. Our method combines shallow linguistic processing with machine learning to extract phrasal units that are representative of email content. The GIST-IT application is fully implemented and embedded in an active mailbox platform. Evaluation was performed over three machine learning paradigms.

...read moreread less

19 citations

Proceedings Article•DOI•

Gathering knowledge for a question answering system from heterogeneous information sources

[...]

Boris Katz¹, Jimmy Lin¹, Sue Felshin¹•Institutions (1)

Massachusetts Institute of Technology¹

06 Jul 2001

TL;DR: This work proposes an architecture for a collaborative question answering system that contains four primary components: an annotations system for storing knowledge, a ternary expression representation of language, a transformational rule system for handling some complexities oflanguage, and a collaborative mechanism by which ordinary users can contribute new knowledge by teaching the system new information.

...read moreread less

Abstract: Although vast amounts of information are available electronically today, no effective information access mechanism exists to provide humans with convenient information access. A general, open-domain question answering system is a solution to this problem. We propose an architecture for a collaborative question answering system that contains four primary components: an annotations system for storing knowledge, a ternary expression representation of language, a transformational rule system for handling some complexities of language, and a collaborative mechanism by which ordinary users can contribute new knowledge by teaching the system new information. We have developed a initial prototype, called Webnotator, with which to test these ideas.

...read moreread less

19 citations

Proceedings Article•DOI•

Knowledge portals

[...]

Steffen Staab¹•Institutions (1)

Karlsruhe Institute of Technology¹

06 Jul 2001

TL;DR: In this paper, the authors use ontologies as a conceptual backbone for providing, accessing and structuring information in a comprehensive approach for building and maintaining knowledge portals, and use HLT to reduce the costs of ontology engineering and narrow the gap between finding knowledge in texts and providing it to the portal.

...read moreread less

Abstract: Knowledge portals provide views onto domainspecific information on the World Wide Web, thus facilitating their users to find relevant, domainspecific information. The construction of intelligent access and the provisioning of information to knowledge portals, however, remained an ad hoc task requiring extensive manual editing and maintenance by the knowledge portal providers. In order to diminish these efforts we use ontologies as a conceptual backbone for providing, accessing and structuring information in a comprehensive approach for building and maintaining knowledge portals. We have built several experimental and one commercial knowledge portal for knowledge management tasks such as skill management and corporate history analysis that show how our approach is used in practice. This practice, however, has exhibited a number bottlenecks, many of which could be avoided or at least diminished by Human Language Technology. We have used HLT in order to reduce the costs of ontology engineering and in order to narrow the gap between finding knowledge in texts and providing it to the portal.

...read moreread less

13 citations

Proceedings Article•DOI•

Human language technologies for knowledge management: challenges and opportunities

[...]

Mark T. Maybury¹•Institutions (1)

Mitre Corporation¹

06 Jul 2001

TL;DR: The central role a range of human language technologies play in the emerging discipline of knowledge management is outlined and several grand challenges are articulated.

...read moreread less

Abstract: This paper outlines the central role a range of human language technologies play in the emerging discipline of knowledge management. We articulate several grand challenges, illustrate some early successes, and recommend areas of continued research.

...read moreread less

10 citations

Proceedings Article•DOI•

Multilingual authoring: the NAMIC approach

[...]

Roberto Basili¹, Maria Teresa Pazienza¹, Fabio Massimo Zanzotto¹, Roberta Catizone², Andrea Setzer², Nick Webb², Yorick Wilks², Lluís Padró³, German Rigau³ - Show less +5 more•Institutions (3)

University of Rome Tor Vergata¹, University of Sheffield², Polytechnic University of Catalonia³

06 Jul 2001

TL;DR: This paper proposes an approach to processing/structuring text so that Multilingual Authoring (creating hypertext links) can be effectively carried out.

...read moreread less

Abstract: With increasing amounts of electronic information available, and the increase in the variety of languages used to produce documents of the same type, the problem of how to manage similar documents in different languages arises This paper proposes an approach to processing/structuring text so that Multilingual Authoring (creating hypertext links) can be effectively carried out This work, funded by the European Union, is applied to the Multilingual Authoring of news agency text We have applied methods from Natural Language Processing, especially Information Extraction technology, to both monolingual and Multilingual Authoring

...read moreread less

8 citations

Proceedings Article•DOI•

Using HLT for acquiring, retrieving and publishing knowledge in AKT: position paper

[...]

Kalina Bontcheva¹, Christopher Brewster¹, Fabio Ciravegna¹, Hamish Cunningham¹, Louise Guthrie¹, Robert Gaizauskas¹, Yorick Wilks¹ - Show less +3 more•Institutions (1)

University of Sheffield¹

06 Jul 2001

Abstract: AKT is a major research project applying a variety of technologies to knowledge management. Knowledge is a dynamic, ubiquitous resource, which is to be found equally in an expert's head, under terabytes of data, or explicitly stated in manuals. AKT will extend knowledge management technologies to exploit the potential of the semantic web, covering the use of knowledge over its entire lifecycle, from acquisition to maintenance and deletion. In this paper we discuss how HLT will be used in AKT and how the use of HLT will affect different areas of KM, such as knowledge acquisition, retrieval and publishing.

...read moreread less

Proceedings Article•DOI•

Document fusion for comprehensive event description

[...]

Christof Monz¹•Institutions (1)

University of Amsterdam¹

06 Jul 2001

TL;DR: A fully implemented system for fusing related news stories into a single comprehensive description of an event using a computationally feasible and robust notion of entailment for comparing information stemming from different documents is described.

...read moreread less

Abstract: This paper describes a fully implemented system for fusing related news stories into a single comprehensive description of an event. The basic components and the underlying algorithm are explained. The system uses a computationally feasible and robust notion of entailment for comparing information stemming from different documents. We discuss the issue of evaluating document fusion and provide some preliminary results.

...read moreread less

Proceedings Article•DOI•

Adapting and extending lexical resources in a dialogue system

[...]

Ana García-Serrano¹, Paloma Martínez², Luis Rodrigo¹•Institutions (2)

Technical University of Madrid¹, Charles III University of Madrid²

06 Jul 2001

TL;DR: This paper presents the adaptation and customization of two lexical resources: Brill tagger, Brill (1992), and EuroWordNet, Vossen et al. (1998), to be used in the ADVICE project devoted to build an intelligent virtual reality sales and service system that uses human language technology.

...read moreread less

Abstract: This paper presents the adaptation and customization of two lexical resources: Brill tagger, Brill (1992), and EuroWordNet, Vossen et al. (1998), to be used in the ADVICE project devoted to build an intelligent virtual reality sales and service system that uses human language technology.

...read moreread less

Proceedings Article•DOI•

Automatic augmentation of translation dictionary with database terminologies in multilingual query interpretation

[...]

Hodong Lee¹, Jong Cheol Park¹•Institutions (1)

KAIST¹

06 Jul 2001

TL;DR: This work proposes to disambiguate the senses of the source lexical items by automatically augmenting a simple translation dictionary with database terminologies and describes an implemented multilingual query interpretation system in a combinatory categorial grammar framework.

...read moreread less

Abstract: In interpreting multilingual queries to databases whose domain information is described in a particular language, we must address the problem of word sense disambiguation. Since full-fledged semantic classification information is difficult to construct either automatically or manually for this purpose, we propose to disambiguate the senses of the source lexical items by automatically augmenting a simple translation dictionary with database terminologies and describe an implemented multilingual query interpretation system in a combinatory categorial grammar framework.

...read moreread less

Proceedings Article•DOI•

What are the points?: what are the stances? decanting for question-driven retrieval and executive summarization

[...]

Jean-François Delannoy¹•Institutions (1)

University of Ottawa¹

06 Jul 2001

TL;DR: Decanter illustrates a heuristic approach to extraction for information retrieval and question answering, with emphasis on the argumentative dimension, to address three types of questions: question-answering, information retrieval, summarization, critical thinking and assistance to speed reading.

...read moreread less

Abstract: Decanter illustrates a heuristic approach to extraction for information retrieval and question answering. Generic information about argumentative text is found and stored, easing user-focused, question-driven access to the core information. The emphasis is placed on the argumentative dimension, to address in particular three types of questions: "What are the points?", "Based on what?" "What are the comments?". The areas of application of this approach include: question-answering, information retrieval, summarization, critical thinking and assistance to speed reading.

...read moreread less

Proceedings Article•DOI•

Crosslingual language technologies for knowledge creation and knowledge sharing

[...]

Hans Uszkoreit

06 Jul 2001

TL;DR: A large and fast growing part of corporate knowledge is encoded in electronic texts, and classical information retrieval helps to sort and find information in large libraries of documents by matching strings of characters.

...read moreread less

Abstract: A large and fast growing part of corporate knowledge is encoded in electronic texts. Although digital information repositories are becoming truly multimedial, human language will remain the only medium for preserving and sharing complex concepts, experiences and ideas. It is also the only medium suited for expressing metainformation. For a human reader a text has a rich structure, for a data processing machine it is merely a string of symbols. Classical information retrieval helps to sort and find information in large libraries of documents by matching strings of characters. Effective information management is a building block of modern knowledge management. However, language technology can contribute much more than methods for finding information.

...read moreread less