scispace - formally typeset
Search or ask a question

Showing papers on "Natural language understanding published in 1998"


Patent
11 Aug 1998
TL;DR: In this paper, a customer service system includes a natural language device, a remote device remotely coupled to the natural language devices over a network, and a database coupled to a NLP system.
Abstract: A customer service system includes a natural language device, a remote device remotely coupled to the natural language device over a network and a database coupled to the natural language device. The database has a plurality of answers stored on it that are indexed to natural language keys. The natural language device implements a natural language understanding system. The natural language device receives a natural language question over the network from the remote device. The question is analyzed using the natural language understanding system. Based on the analysis, the database is then queried. An answer to the question is received based on the query, and the answer is provided to the remote device over the network.

298 citations


Journal ArticleDOI
TL;DR: What is involved in natural language generation, and how evaluation has figured in work in this area to date is described; a particular text generation application is examined and the issues that are raised in assessing its performance on a variety of dimensions are looked at.

77 citations


Journal ArticleDOI
TL;DR: A niching technique called GAS is introduced which dynamically creates a subpopulation structure (taxonomic chart) using a radius function instead of a single radius, and a “cooling” method similar to simulated annealing is presented.

66 citations


Journal ArticleDOI
TL;DR: It is demonstrated that for limited applications, a stochastic method outperforms a well-tuned rule-based component and the human effort can be limited to the task of data labeling, which is much simpler than the design, maintenance and extension of the grammar rules.

47 citations


Proceedings ArticleDOI
Wu jiangqin1, Gao wen, Song yibo, Liu wei, Pang bo 
12 Oct 1998
TL;DR: In this paper the process of building a simple word-level sign language recognition system is presented, and the method for recognizing sign language words is also proposed.
Abstract: Human beings usually interact with each other either by via a natural language channel such as speech and writing, or by body languag, e.g. hand gestures, head gestures, facial expression, lip motion and so on. As a part of natural language understanding, sign language recognition is very important. On one hand, it is one of the main methods of human-computer interaction in virtual reality; on the other hand, it is an auxiliary tool for a deaf-mute to communicate with ordinary people through computer. In this paper the process of building a simple word-level sign language recognition system is presented, and the method for recognizing sign language words is also proposed.

45 citations


Journal ArticleDOI
TL;DR: It is argued that in a multilingual environment, linguistic ontologies should be designed as interfaces between domain conceptualizations and linguistic knowledge bases and described how the distinction between conceptual and linguistic semantics may assist in reaching this objective.
Abstract: Natural language understanding systems have to exploit various kinds of knowledge in order to represent the meaning behind texts. Getting this knowledge in place is often such a huge enterprise that it is tempting to look for systems that can discover such knowledge automatically. We describe how the distinction between conceptual and linguistic semantics may assist in reaching this objective, provided that distinguishing between them is not done too rigorously. We present several examples to support this view and argue that in a multilingual environment, linguistic ontologies should be designed as interfaces between domain conceptualizations and linguistic knowledge bases.

43 citations


Patent
04 Nov 1998
TL;DR: In this paper, a system and method for continuous speech recognition (CSR) is optimized to reduce processing time for connected word grammars bounded by semantically null words, which can be achieved by performing only the minimal amount of computation required to produce an exact N-best list of semantically meaningful words (N- best list of salient words).
Abstract: A system and method for continuous speech recognition (CSR) is optimized to reduce processing time for connected word grammars bounded by semantically null words. The savings, which reduce processing time both during the forward and the backward passes of the search, as well as during rescoring, are achieved by performing only the minimal amount of computation required to produce an exact N-best list of semantically meaningful words (N-best list of salient words). This departs from the standard Spoken Language System modeling which any notion of meaning is handled by the Natural Language Understanding (NLU) component. By expanding the task of the recognizer component from a simple acoustic match to allow semantic information to be fed to the recognizer, significant processing time savings are achieved, and make it possible to run an increased number of speech recognition channels in parallel for improved performance, which may enhance users perception of value and quality of service.

36 citations


Proceedings ArticleDOI
10 Aug 1998
TL;DR: GSG is presented: an empathic computer system for the rapid deployment of NLU front-ends and their dynamic customization by non-expert end-users and three learning methods are employed in the acquisition of semantic mappings from unseen data.
Abstract: A critical path in the development of natural language understanding (NLU) modules lies in the difficulty of defining a mapping from words to semantics: Usually it takes in the order of years of highly-skilled labor to develop a semantic mapping, e.g., in the form of a semantic grammar, that is comprehensive enough for a given domain. Yet, due to the very nature of human language, such mapping invariably fail to achieve full coverage on unseen data. Acknowledging the impossibility of stating a priori all the surface forms by which a concept can be expressed, we present GSG: an empathic computer system for the rapid deployment of NLU front-ends and their dynamic customization by non-expert end-users. Given a new domain for which an NLU front-end is to be developed, two stages are involved. In the authoring stage, GSG aids the developer in the construction of a simple domain model and a kernel analysis grammar. Then, in the run-time stage, GSG provides the end-user with an interactive environment in which the kernel grammar is dynamically extended. Three learning methods are employed in the acquisition of semantic mappings from unseen data: (i) parser predictions, (ii) hidden understanding model, and (iii) end-user paraphrases. A baseline version of GSG has been implemented and preliminary experiments show promising results.

35 citations


Proceedings Article
01 Jan 1998
TL;DR: The NLU system allows users to switch languages seamlessly in a single session without requiring any switch in the speech recognition system, making system development, maintenance, and portability vastly superior to systems built customly for each language.
Abstract: In this paper we describe our initial e orts in building a natural language understanding (NLU) system across multiple languages. The system allows users to switch languages seamlessly in a single session without requiring any switch in the speech recognition system. Context dependence is maintained across sentences, even when the user changes languages. Towards this end we have begun building a universal speech recognizer for English and French languages. We experiment with a universal phonology for both French and English with a novel mechanism to handle language dependent variations. Our best results so far show about 5% relative performance degradation for English relative to a unilingual English system and a 9% relative degradation in French relative to a unilingual French system. The NLU system uses the same statistical understanding algorithms for each language, making system development, maintenance, and portability vastly superior to systems built customly for each language.

29 citations


Proceedings ArticleDOI
21 Mar 1998
TL;DR: This work describes the design and implementation of natural language sensing for autonomous agents such as VMattie, an autonomous clerical agent that lives in an UNIX system, communicates with humans via e-mail in natural language with no agreed upon protocol, and autonomously carries out her tasks without human intervention.
Abstract: In sufficiently narrow domains, natural language understanding may be achieved via an analysis of surface features without the use of a traditional symbolic parser. We illustrate this notion by describing the natural language sensing in Virtual Mattie (VMattie), an autonomous clerical agent. VMattie "lives" in an UNIX system, communicates with humans via e-mail in natural language with no agreed upon protocol, and autonomously carries out her tasks without human intervention. VMattie's limited domain allows for surface level natural language processing. VMattie's language understanding module has been implemented as a copycat-like architecture though her understanding takes place differently. The mechanism includes a slipnet storing domain knowledge and a pool of codelets templates for building and verifying understanding. We describe the design and implementation of natural language sensing for autonomous agents such as VMattie.

29 citations



Journal ArticleDOI
TL;DR: This hybrid system is more sophisticated in tackling language disambiguation problems by using linguistic clues from disparate sources as well as modeling context effects into the sentence analysis, and is potentially more powerful than any systems relying on one processing paradigm.
Abstract: Natural language understanding involves the simultaneous consideration of a large number of different sources of information. Traditional methods employed in language analysis have focused on developing powerful formalisms to represent syntactic or semantic structures along with rules for transforming language into these formalisms. However, they make use of only small subsets of knowledge. This article describes how to use the whole range of information through a neurosymbolic architecture which is a hybridization of a symbolic network and subsymbol vectors generated from a connectionist network. Besides initializing the symbolic network with prior knowledge, the subsymbol vectors are used to enhance the system's capability in disambiguation and provide flexibility in sentence understanding. The model captures a diversity of information including word associations, syntactic restrictions, case-role expectations, semantic rules and context. It attains highly interactive processing by representing knowledge in an associative network on which actual semantic inferences are performed. An integrated use of previously analyzed sentences in understanding is another important feature of our model. The model dynamically selects one hypothesis among multiple hypotheses. This notion is supported by three simulations which show the degree of disambiguation relies both on the amount of linguistic rules and the semantic-associative information available to support the inference processes in natural language understanding. Unlike many similar systems, our hybrid system is more sophisticated in tackling language disambiguation problems by using linguistic clues from disparate sources as well as modeling context effects into the sentence analysis. It is potentially more powerful than any systems relying on one processing paradigm.

Proceedings Article
01 Sep 1998
TL;DR: The natural language generation community is a thriving one, with a research base that has been developing steadily--although perhaps at a slower pace because of the smaller size of the community--for just as long as work in natural language understanding.
Abstract: There are two sides to natural language processing. On the one hand, work in natural language understanding is concerned with the mapping from some surface representation of linguistic material expressed as speech or text--to an underlying representation of the meaning carried by that surface representation. But there is also the question of how one maps from some underlying representation of meaning into text or speech: this is the domain of natural language generation. Whether our end-goal is the construction of artifacts that use natural languages intelligently, the formal characterization of phenomena in human languages, or the computational modeling of the human language processing mechanism, we cannot ignore the fact that language is both spoken (or written) and heard (or read). Both are equally large and important problems, but the literature contains much less work on natural language generation (NLG) than it does on natural language understanding (NLU). There are many reasons why this might be so, although clearly an important one is that researchers in natural language understanding in some sense start out with a more well-defined task: the input is known, and there is a lot of it around. This is not the case in natural language generation: there, it is the desired output that is known, but the input is an unknown; and while the world is awash with text waiting to be processed, there are fewer instances of what we might consider appropriate inputs for the process of natural language generation. For researchers in the field, this highlights the fundamental question that always has to be asked: What do we generate from? Despite this problem, the natural language generation community is a thriving one, with a research base that has been developing steadily--although perhaps at a slower pace because of the smaller size of the community--for just as long as work in natural language understanding. It should not be forgotten that much of NLP has its origins in the early work on machine translation in the 1950s; and that to carry out machine translation, one has to not only analyze existing texts but also to generate new ones. The early machine translation experiments, however, did not recognize the problems that give modern work in NLG its particular character. The first significant pieces of work in the field appeared during the 1970s; in particular, Goldman's work on the problem of lexicalizing underlying conceptual material (Goldman 1974) and

Proceedings Article
01 Jan 1998
TL;DR: C5.0, a decision tree generator, was used to create a rule base for a natural language understanding system, and the generated rule base performed as well as lay persons, but worse than physicians.
Abstract: As natural language processing systems become more frequent in clinical use, methods for interpreting the output of these programs become increasingly important. These methods require the effort of a domain expert, who must build specific queries and rules for interpreting the processor output. Knowledge discovery and data mining tools can be used instead of a domain expert to automatically generate these queries and rules. C5.0, a decision tree generator, was used to create a rule base for a natural language understanding system. A general-purpose natural language processor using this rule base was tested on a set of 200 chest radiograph reports. When a small set of reports, classified by physicians, was used as the training set, the generated rule base performed as well as lay persons, but worse than physicians. When a larger set of reports, using ICD9 coding to classify the set, was used for training the system, the rule base performed worse than the physicians and lay persons. It appears that a larger, more accurate training set is needed to increase performance of the method.

Journal ArticleDOI
TL;DR: Conceptual graph applications to biomedical data and concept representation, classification systems, information retrieval, and natural language understanding and processing are presented.
Abstract: The basis of conceptual graphs theory is an ontology of types of concepts. Concepts issued from the ontology are interlinked by semantic relationships and constitute canonical conceptual graphs. Canonical graphs may be combined to derive new conceptual graphs by means of formation rules. This formalism allows to separate knowledge representation into a conceptual level and a domain-dependent level, and enables to share and reuse a representation. This paper presents conceptual graph applications to biomedical data and concept representation, classification systems, information retrieval, and natural language understanding and processing. A discussion on the unifying role conceptual graphs theory plays in the implementation of knowledge-based systems is also presented.

01 Aug 1998
TL;DR: The transformational approach presented is compared with more traditional grammatical and parsing approaches and is used in a first prototype of a natural language interface for a theatre information and booking system.
Abstract: Natural language understanding in dialogue systems is taken to consist of two subsequent processes: rewrite and understand. The rewriting proces is a transformational proces in which natural language utterances are mapped via a sequence of context-sensitive string to string transformations onto some semantic normal form. In the understand proces an interpretation of the semantic form is made, with respect to a particular dialogue state. The rewrite proces is completely independent of the state of the dialogue. The transformational approach presented is compared with more traditional grammatical and parsing approaches. The approach is used in a first prototype of a natural language interface for a theatre information and booking system.

Journal ArticleDOI
TL;DR: This paper presents a possible solution for the text inference problem-extracting information unstated in a text, but implied that takes advantage of a semantic English dictionary available in electronic form that provides the basis for the development of a large linguistic knowledge base.
Abstract: This paper presents a possible solution for the text inference problem-extracting information unstated in a text, but implied. Text inference is central to natural language applications such as information extraction and dissemination, text understanding, summarization, and translation. Our solution takes advantage of a semantic English dictionary available in electronic form that provides the basis for the development of a large linguistic knowledge base. The inference algorithm consists of a set of highly parallel search methods that, when applied to the knowledge base, find contexts in which sentences are interpreted. These contexts reveal information relevant to the text. Implementation, results, and parallelism analysis are discussed.

Book ChapterDOI
04 Dec 1998
TL;DR: Accommodating all these capabilities into a “Grand Unified Representation” is, it is maintained, a prerequisite for solving the most difficult problems in Artificial Intelligence, including natural language understanding.
Abstract: Context Vectors are fixed-length vector representations useful for document retrieval and word sense disambiguation. Context vectors were motivated by four goals: 1 Capture “similarity of use” among words (“car” is similar to “auto”, but not similar to “hippopotamus”). 2 Quickly find constituent objects (eg., documents that contain specified words). 3 Generate context vectors automatically from an unlabeled corpus. 4 Use context vectors as input to standard learning algorithms. Context Vectors lack, however, a natural way to represent syntax, discourse, or logic. Accommodating all these capabilities into a “Grand Unified Representation” is, we maintain, a prerequisite for solving the most difficult problems in Artificial Intelligence, including natural language understanding.

Journal ArticleDOI
TL;DR: The authors adopted the description logic system BACK to achieve a common representation of information from each of the two sources to facilitate comparison, and adapted a sub-category grammar to achieve automatic classification in BACK, and a bidirectional chart parser is adapted to operate with this grammar.

Patent
30 Sep 1998
TL;DR: In this paper, a probabilistic model of lexical semantics is used to determine the most probable concept or meaning associated with a sentence or phrase, in order to generate concept codes from freetext medical data.
Abstract: A natural language understanding is described which provides for the generation of concept codes from free-text medical data. A probabilistic model of lexical semantics, in the preferred embodiment of the invention implemented by means of a Bayesian network, is used to determine the most probable concept or meaning associated with a sentence or phrase. The inventive method and system include the steps of checking for synonyms (301), checking spelling (302), performing syntactic parsing (303), transforming text into its 'deep' or semantic form (304), and performing a semantic analysis based on a probabilistic model of lexical semantics (305). In the preferred embodiment of the invention, spell checking and transformational processing as well as semantic analysis make use of semantic probabilistic determinations.


Book ChapterDOI
16 Aug 1998
TL;DR: The Authoring Assistant is described, a component of the pSAT system for authoring problems for the PAT Algebra tutor which addresses issues of natural language understanding and burden on the author.
Abstract: In some domains, including those requiring natural language understanding, we cannot build a system that can complete the entire task. One way to deal with such cases is to encode the result of understanding the problem description along with traces of the process used to understand that description. This places the natural language understanding portion of the system in the authoring system, as opposed to the run-time system (the one used by the student). This solution, however, puts a burden on the author, who must now verify that the problem encoding reflects the desired understanding. We describe the Authoring Assistant, a component of the pSAT system [11] for authoring problems for the PAT Algebra tutor [7] which addresses these issues.

Proceedings Article
01 Jan 1998
TL;DR: This interface utilizes robust natural language understanding and resolves some of the ambiguities in natural language by means of gesture input to create a natural language and gesture interface to a mobile robot.
Abstract: The Intelligent Multimodal Multimedia and the Adaptive Systems Groups at the Navy Center for Applied Research in Artificial Intelligence have been investigating a natural language and gesture interface to a mobile robot Our interface utilizes robust natural language understanding and resolves some of the ambiguities in natural language by means of gesture input The natural language and gestural information is integrated with knowledge of a particular environment and appropriate robotic responses are

Proceedings ArticleDOI
Li Li1, Deborah A. Dahl1, Lewis M. Norton1, Marcia C. Linebarger1, Dongdong Chen1 
10 Aug 1998
TL;DR: The ETE assists in the management and analysis of the thousands of complex data structures created during natural language processing of a large corpus using relational database technology in a network environment.
Abstract: The Natural Language Understanding Engine Test Environment (ETE) is a GUI software tool that aids in the development and maintenance of large, modular, natural language understanding (NLU) systems. Natural language understanding systems are composed of modules (such as part-of-speech taggers, parsers and semantic analyzers) which are difficult to test individually because of the complexity of their output data structures. Not only are the output data structures of the internal modules complex, but also many thousands of test items (messages or sentences) are required to provide a reasonable sample of the linguistic structures of a single human language, even if the language is restricted to a particular domain. The ETE assists in the management and analysis of the thousands of complex data structures created during natural language processing of a large corpus using relational database technology in a network environment.

Journal ArticleDOI
TL;DR: An object-oriented lexical representation language that encodes linguistic and semantic information uniformly as classes and objects and an efficient bottom-up parsing method for UCG using selection sets technique is described.
Abstract: This paper describes an object-oriented lexical representation language based on Unification Categorial Grammar (UCG) that encodes linguistic and semantic information uniformly as classes and objects and an efficient bottom-up parsing method for UCG using selection sets technique. The lexical representation language, implemented in the logic and object-oriented programming language LIFE, introduces several new information sharing mechanisms to enable natural, declarative, modular and economial construction of large and complex computational lexicons. The selection sets are deduced from a transformation between UCG and Context-Free Grammar (CFG) and used to reduce search space for the table-driven algorithm. The experimental tests on a spoken English corpus show that the hierarchical lexicon achieves a dramatic reduction on redundant information and that selection sets significantly improve parsing UCG with a polynomial time complexity.

Proceedings ArticleDOI
10 Aug 1998
TL;DR: This paper describes how client-server architecture was used to make a large volume of lexical information and a large knowledge base available to the system at development and/or run time.
Abstract: Knowledge acquisition is a serious bottleneck for natural language understanding systems. For this reason, large-scale linguistic resources have been compiled and made available by organizations such as the Linguistic Data Consortium (Comlex) and Princeton University (WordNet). Systems making use of these resources can greatly accelerate the development process by avoiding the need for the developer to re-create this information.In this paper we describe how we integrated these large scale linguistic resources into our natural language understanding system. Client-server architecture was used to make a large volume of lexical information and a large knowledge base available to the system at development and/or run time. We discuss issues of achieving compatibility between these disparate resources.