scispace - formally typeset
Search or ask a question

Showing papers presented at "Human Language Technology in 2001"


Proceedings ArticleDOI
06 Jul 2001
TL;DR: A text mining technique to aid an Ontology Engineer to identify the important concepts in a Domain Ontology is described.
Abstract: Though the utility of domain Ontologies is now widely acknowledged in the IT (Information Technology) community, several barriers must be overcome before Ontologies become practical and useful tools. One important achievement would be to reduce the cost of identifying and manually entering several thousand-concept descriptions. This paper describes a text mining technique to aid an Ontology Engineer to identify the important concepts in a Domain Ontology.

96 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: This work investigates text classification by format style, i.e. "genre", and demonstrates, by complementing topic classification, that it can significantly improve retrieval of information.
Abstract: Categorization of text in IR has traditionally focused on topic. As use of the Internet and e-mail increases, categorization has become a key area of research as users demand methods of prioritizing documents. This work investigates text classification by format style, i.e. "genre", and demonstrates, by complementing topic classification, that it can significantly improve retrieval of information. The paper compares use of presentation features to word features, and the combination thereof, using Naive Bayes, C4.5 and SVM classifiers. Results show use of combined feature sets with SVM yields 92% classification accuracy in sorting seven genres.

95 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: It is argued that multimodal dialog systems and the naturalized mobile access to company data they offer will trigger a new knowledge management practice of importance for knowledge-intensive companies.
Abstract: This paper addresses two related topics: Firstly, it presents building-blocks for flexible multimodal dialog interfaces based on standardized components (VoiceXML, XML) to indicate that thanks to well-supported standardizations, mobile multimodal interfaces to heterogeneous data sources are becoming ready for mass-market deployment, provided that adequate modularization is respected. Secondly, this is put in the perspective of a discussion of knowledge management in firms, and the paper argues that multimodal dialog systems and the naturalized mobile access to company data they offer will trigger a new knowledge management practice of importance for knowledge-intensive companies.

32 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: The MU-MIS Project (Multimedia Indexing and Searching Environment) is described, concerned with the development and integration of base technologies, demonstrated within a laboratory prototype, to support automated multimedia indexing and to facilitate search and retrieval from multimedia databases.
Abstract: We describe in this paper the MU-MIS Project (Multimedia Indexing and Searching Environment), which is concerned with the development and integration of base technologies, demonstrated within a laboratory prototype, to support automated multimedia indexing and to facilitate search and retrieval from multimedia databases. We stress the role linguistically motivated annotations, coupled with domain-specific information, can play within this environment. The project will demonstrate that innovative technology components can operate on multilingual, multisource, and multimedia information and create a meaningful and queryable database.

25 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: The Kadokawa thesaurus is extended by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations, which enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology.
Abstract: This paper presents the semi-automatic construction method of a practical ontology by using various resources. In order to acquire a reasonably practical ontology in a limited time and with less manpower, we extend the Kadokawa thesaurus by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. The former can be obtained by converting valency information and case frames from previously-built computational dictionaries used in machine translation. The latter can be acquired from concept co-occurrence information, which is extracted automatically from large corpora. The ontology stores rich semantic constraints among 1,110 concepts, and enables a natural language processing system to resolve semantic ambiguities by making inferences with the concept network of the ontology. In our practical machine translation system, our ontology-based word sense disambiguation method achieved an 8.7% improvement over methods which do not use an ontology for Korean translation.

23 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: This paper presents a system for the automatic extraction of salient information from email messages, thus providing the gist of their meaning, and combines shallow linguistic processing with machine learning to extract phrasal units that are representative of email content.
Abstract: We present a system for the automatic extraction of salient information from email messages, thus providing the gist of their meaning. Dealing with email raises several challenges that we address in this paper: heterogeneous data in terms of length and topic. Our method combines shallow linguistic processing with machine learning to extract phrasal units that are representative of email content. The GIST-IT application is fully implemented and embedded in an active mailbox platform. Evaluation was performed over three machine learning paradigms.

19 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: This work proposes an architecture for a collaborative question answering system that contains four primary components: an annotations system for storing knowledge, a ternary expression representation of language, a transformational rule system for handling some complexities oflanguage, and a collaborative mechanism by which ordinary users can contribute new knowledge by teaching the system new information.
Abstract: Although vast amounts of information are available electronically today, no effective information access mechanism exists to provide humans with convenient information access. A general, open-domain question answering system is a solution to this problem. We propose an architecture for a collaborative question answering system that contains four primary components: an annotations system for storing knowledge, a ternary expression representation of language, a transformational rule system for handling some complexities of language, and a collaborative mechanism by which ordinary users can contribute new knowledge by teaching the system new information. We have developed a initial prototype, called Webnotator, with which to test these ideas.

19 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: In this paper, the authors use ontologies as a conceptual backbone for providing, accessing and structuring information in a comprehensive approach for building and maintaining knowledge portals, and use HLT to reduce the costs of ontology engineering and narrow the gap between finding knowledge in texts and providing it to the portal.
Abstract: Knowledge portals provide views onto domainspecific information on the World Wide Web, thus facilitating their users to find relevant, domainspecific information. The construction of intelligent access and the provisioning of information to knowledge portals, however, remained an ad hoc task requiring extensive manual editing and maintenance by the knowledge portal providers. In order to diminish these efforts we use ontologies as a conceptual backbone for providing, accessing and structuring information in a comprehensive approach for building and maintaining knowledge portals. We have built several experimental and one commercial knowledge portal for knowledge management tasks such as skill management and corporate history analysis that show how our approach is used in practice. This practice, however, has exhibited a number bottlenecks, many of which could be avoided or at least diminished by Human Language Technology. We have used HLT in order to reduce the costs of ontology engineering and in order to narrow the gap between finding knowledge in texts and providing it to the portal.

13 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: The central role a range of human language technologies play in the emerging discipline of knowledge management is outlined and several grand challenges are articulated.
Abstract: This paper outlines the central role a range of human language technologies play in the emerging discipline of knowledge management. We articulate several grand challenges, illustrate some early successes, and recommend areas of continued research.

10 citations


Proceedings ArticleDOI
06 Jul 2001
TL;DR: This paper proposes an approach to processing/structuring text so that Multilingual Authoring (creating hypertext links) can be effectively carried out.
Abstract: With increasing amounts of electronic information available, and the increase in the variety of languages used to produce documents of the same type, the problem of how to manage similar documents in different languages arises This paper proposes an approach to processing/structuring text so that Multilingual Authoring (creating hypertext links) can be effectively carried out This work, funded by the European Union, is applied to the Multilingual Authoring of news agency text We have applied methods from Natural Language Processing, especially Information Extraction technology, to both monolingual and Multilingual Authoring

8 citations


Proceedings ArticleDOI
06 Jul 2001
Abstract: AKT is a major research project applying a variety of technologies to knowledge management. Knowledge is a dynamic, ubiquitous resource, which is to be found equally in an expert's head, under terabytes of data, or explicitly stated in manuals. AKT will extend knowledge management technologies to exploit the potential of the semantic web, covering the use of knowledge over its entire lifecycle, from acquisition to maintenance and deletion. In this paper we discuss how HLT will be used in AKT and how the use of HLT will affect different areas of KM, such as knowledge acquisition, retrieval and publishing.

Proceedings ArticleDOI
06 Jul 2001
TL;DR: A fully implemented system for fusing related news stories into a single comprehensive description of an event using a computationally feasible and robust notion of entailment for comparing information stemming from different documents is described.
Abstract: This paper describes a fully implemented system for fusing related news stories into a single comprehensive description of an event. The basic components and the underlying algorithm are explained. The system uses a computationally feasible and robust notion of entailment for comparing information stemming from different documents. We discuss the issue of evaluating document fusion and provide some preliminary results.

Proceedings ArticleDOI
06 Jul 2001
TL;DR: This paper presents the adaptation and customization of two lexical resources: Brill tagger, Brill (1992), and EuroWordNet, Vossen et al. (1998), to be used in the ADVICE project devoted to build an intelligent virtual reality sales and service system that uses human language technology.
Abstract: This paper presents the adaptation and customization of two lexical resources: Brill tagger, Brill (1992), and EuroWordNet, Vossen et al. (1998), to be used in the ADVICE project devoted to build an intelligent virtual reality sales and service system that uses human language technology.

Proceedings ArticleDOI
Hodong Lee1, Jong Cheol Park1
06 Jul 2001
TL;DR: This work proposes to disambiguate the senses of the source lexical items by automatically augmenting a simple translation dictionary with database terminologies and describes an implemented multilingual query interpretation system in a combinatory categorial grammar framework.
Abstract: In interpreting multilingual queries to databases whose domain information is described in a particular language, we must address the problem of word sense disambiguation. Since full-fledged semantic classification information is difficult to construct either automatically or manually for this purpose, we propose to disambiguate the senses of the source lexical items by automatically augmenting a simple translation dictionary with database terminologies and describe an implemented multilingual query interpretation system in a combinatory categorial grammar framework.

Proceedings ArticleDOI
06 Jul 2001
TL;DR: Decanter illustrates a heuristic approach to extraction for information retrieval and question answering, with emphasis on the argumentative dimension, to address three types of questions: question-answering, information retrieval, summarization, critical thinking and assistance to speed reading.
Abstract: Decanter illustrates a heuristic approach to extraction for information retrieval and question answering. Generic information about argumentative text is found and stored, easing user-focused, question-driven access to the core information. The emphasis is placed on the argumentative dimension, to address in particular three types of questions: "What are the points?", "Based on what?" "What are the comments?". The areas of application of this approach include: question-answering, information retrieval, summarization, critical thinking and assistance to speed reading.

Proceedings ArticleDOI
06 Jul 2001
TL;DR: A large and fast growing part of corporate knowledge is encoded in electronic texts, and classical information retrieval helps to sort and find information in large libraries of documents by matching strings of characters.
Abstract: A large and fast growing part of corporate knowledge is encoded in electronic texts. Although digital information repositories are becoming truly multimedial, human language will remain the only medium for preserving and sharing complex concepts, experiences and ideas. It is also the only medium suited for expressing metainformation. For a human reader a text has a rich structure, for a data processing machine it is merely a string of symbols. Classical information retrieval helps to sort and find information in large libraries of documents by matching strings of characters. Effective information management is a building block of modern knowledge management. However, language technology can contribute much more than methods for finding information.