scispace - formally typeset
Search or ask a question

Showing papers on "Natural language understanding published in 2001"


Patent
23 Jul 2001
TL;DR: In this article, a probabilistic model of lexical semantics, implemented by means of a Bayesian network, is used to determine the most probable concept or meaning associated with a sentence or phrase.
Abstract: A natural language understanding system is described to provide generation of concept codes from free-text medical data. A probabilistic model of lexical semantics, is implemented by means of a Bayesian network, and is used to determine the most probable concept or meaning associated with a sentence or phrase. The inventive method and system includes the steps of checking for synonyms, checking spelling, performing syntactic parsing, transforming text to its “deep” or semantic form, and performing a semantic analysis based on a probabilistic model of lexical semantics.

118 citations


01 Jan 2001
TL;DR: A prototype of a cognitive tutor that understands students' explanations and provides feedback is implemented, which uses a knowledge-based approach to natural language understanding and is entering a phase of pilot testing.
Abstract: Self-explanation is an effective metacognitive strategy, as a number of cognitive science studies have shown. In a previous study we showed that self-explanation can be supported effectively in a cognitive tutor for geometry problem solving. In that study, students explained their own problem-solving steps by selecting from a menu the name of a problem-solving principle that justifies the step. They learned with greater understanding, as compared to students who did not explain their reasoning. Currently, we are working toward testing the hypothesis that students will learn even better when they provide explanations in their own words rather than selecting them from a menu. We have implemented a prototype of a cognitive tutor that understands students' explanations and provides feedback. The tutor uses a knowledge-based approach to natural language understanding. We are entering a phase of pilot testing, both for the purpose of assessing the coverage of the natural language understanding component and for gaining insight into the kinds of dialog strategies that are needed.

88 citations


Patent
14 Sep 2001
TL;DR: In this article, a Monte Carlo method for use with natural language understanding and speech recognition language models can include a series of steps that include identifying at least one phrase embedded in a body of text wherein the phrase can belong to a phrase class.
Abstract: A Monte Carlo method for use with natural language understanding and speech recognition language models can include a series of steps. The steps can include identifying at least one phrase embedded in a body of text wherein the phrase can belong to a phrase class. An additional attribute corresponding to the identified phrase can be determined. The body of text can be copied and the identified phrase can be replaced with a different phrase selected from a plurality of phrases. The different phrase can belong to the phrase class and correspond to the attribute.

65 citations


Proceedings ArticleDOI
02 Jun 2001
TL;DR: It is shown that naive Bayes classification can be used to identify non-native utterances of English, and that classification of errorful speech recognizer hypotheses is more accurate than classification of perfect transcriptions.
Abstract: Native and non-native use of language differs, depending on the proficiency of the speaker, in clear and quantifiable ways. It has been shown that customizing the acoustic and language models of a natural language understanding system can significantly improve handling of non-native input; in order to make such a switch, however, the nativeness status of the user must be known. In this paper, we show that naive Bayes classification can be used to identify non-native utterances of English. The advantage of our method is that it relies on text, not on acoustic features, and can be used when the acoustic source is not available. We demonstrate that both read and spontaneous utterances can be classified with high accuracy, and that classification of errorful speech recognizer hypotheses is more accurate than classification of perfect transcriptions. We also characterize part-of-speech sequences that play a role in detecting non-native speech.

54 citations


Patent
22 Feb 2001
TL;DR: In this paper, a probabilistic error-tolerant natural language understanding method is proposed, where the process of language understanding is divided into a concept parse and a concept sequence comparison steps.
Abstract: A method of probabilistic error-tolerant natural language understanding. The process of language understanding is divided into a concept parse and a concept sequence comparison steps. The concept parse uses a parse driven by a concept grammar to construct a concept parse forest set by parsing results of speech recognition. The concept sequence comparison uses an error-tolerant interpreter to compare the hypothetical concept sequences included by the concept parse forest set and the exemplary concept sequences included in the database of the system. A most possible concept sequence is found and converted into a semantic framed that expresses the intention of the user. The whole process is led by a probability oriented scoring function. When error occurs in the speech recognition and a correct concept sequence cannot be formed, the position of the error is determined and the error is recovered according to the scoring function to reduce the negative effect.

54 citations


Journal ArticleDOI
TL;DR: In this article, an exploratory study designed to assess the feasibility of two new technologies for the treatment of aphasic sentence processing disorders: a computerised therapy program incorporating natural language understanding (NLU), software which enables the computer to understand spoken utterances; and an augmentative communication system functioning primarily as a processing prosthesis, which allows patients to construct spoken sentences piecemeal and maintain elements already produced.
Abstract: Described here is an exploratory study designed to assess the feasibility of two new technologies for the treatment of aphasic sentence processing disorders: a computerised therapy programme incorporating natural language understanding (NLU), software which enables the computer to understand spoken utterances; and an augmentative communication system functioning primarily as a “processing prosthesis”, which allows patients to construct spoken sentences piecemeal and maintain elements already produced. Five agrammatic patients participated in a series of studies incorporating one or both of these technologies, and made language gains (ranging from modest to quite marked) following independent home use of the software. We hypothesise that the therapy and communication systems played complementary roles in this process, with the former training and/or priming specific grammatical structures and the latter providing the processing support necessary for these structures to be practised under normal conditions.

35 citations


01 Jan 2001
TL;DR: This chapter describes the role of language engineering techniques in text mining, a discipline focusing on information extraction from free texts, and concentrates on a specific application in the domain of medicine.
Abstract: In this chapter, we describe the role of language engineering techniques in text mining, a discipline focusing on information extraction from free texts. Indeed, text mining expands the idea of data mining in structured databases towards information discovering in natural language documents. After an introduction of the various tools and techniques that are available for text mining from the linguistic engineering point of view, we concentrate on a specific application in the domain of medicine.

27 citations


Proceedings ArticleDOI
07 May 2001
TL;DR: This work focuses on extending small amounts of language model training data by integrating semantic classes that were created for a natural language understanding module by converting finite state parses of a training corpus into a probabilistic context free grammar and subsequently generating artificial data from the contextfree grammar.
Abstract: When dialogue system developers tackle a new domain, much effort is required; the development of different parts of the system usually proceeds independently. Yet it may be profitable to coordinate development efforts between different modules. We focus our efforts on extending small amounts of language model training data by integrating semantic classes that were created for a natural language understanding module. By converting finite state parses of a training corpus into a probabilistic context free grammar and subsequently generating artificial data from the context free grammar, we can significantly reduce perplexity and automatic speech recognition (ASR) word error for situations with little training data. Experiments are presented using data from the ATIS and DARPA Communicator travel corpora.

26 citations


Journal ArticleDOI
TL;DR: The author examines help-related dialogue to show how reports of troubles that often appear ambiguous and vague can be better understood by looking at the sequential design of speakers' turn constructions.
Abstract: Developers of dialogue systems must confront the complexities of natural language. The purpose of this paper is to demonstrate how “sequence package” analysis, as a novel approach, can help to improve natural language understanding. Such an approach would go beyond the standard grammatical formalisms represented in most dialogue systems, to include context-dependent utterance sequences that are shaped by the unfolding talk. What is then comprised in a sequence package is a series of related turn construction units and turns that make up either single or multiple episodes of talk, and sometimes an entire conversation. The author examines help-related dialogue to show how reports of troubles that often appear ambiguous and vague can be better understood by looking at the sequential design of speakers' turn constructions. Subtle features found in troubles-related talk that are important, but often overlooked, may be identified by mapping out the sequence package arrangement of the talk. For example, a caller's need for vital empathic support, before he or she can be ready to receive help, might be hard to detect if the caller only provides hidden, and possibly contradictory, signs of emotional distress. Or a patient might be unclear and somewhat inconsistent when trying to describe his or her chief complaint in the course of a medical interview. Thus, an analysis of sequence packages can potentially uncover crucial information often buried in the talk. In designing dialogue systems that model spontaneous speech, a sequence package analysis might serve as a basic component of natural language systems.

25 citations


01 Jan 2001
TL;DR: A prototype system that is able to analyze student explanations, stated in their own words, recognize the types of omissions that the authors typically see in these explanations, and provide feedback is pilottesting.
Abstract: We are engaged in a research project to create a tutorial dialogue system that helps students to explain the reasons behind their problem-solving actions, in order to help them learn with greater understanding. Currently, we are pilottesting a prototype system that is able to analyze student explanations, stated in their own words, recognize the types of omissions that we typically see in these explanations, and provide feedback. The system takes a knowledge-based approach to natural language understanding and uses a statistical text classifier as a backup. The main features are: robust parsing, logic-based representation of semantic content, representation of pedagogical content knowledge in the form of a hierarchy of partial and complete explanations, and reactive dialogue management. A preliminary evaluation study indicates that the knowledge-based natural language component correctly classifies 80% of explanations and produces a reasonable classification for all but 6% of explanations.

20 citations


Book ChapterDOI
01 Jan 2001
TL;DR: In spoken language understanding systems, the interface between automatic speech recognition on the one hand and natural language understanding on the other hand often consists of word graphs: a compact representation of the sequences of words that the automaticspeech recognition component hypothesises for a given utterance.
Abstract: In spoken language understanding systems, the interface between automatic speech recognition on the one hand and natural language understanding on the other hand often consists of word graphs: a compact representation of the sequences of words that the automatic speech recognition component hypothesises for a given utterance.

Book ChapterDOI
11 Sep 2001
TL;DR: This work presents a two-level stochastic model approach to the construction of the natural language understanding component of a dialog system in the domain of database queries, which answers queries about a railway timetable in Spanish.
Abstract: Over the last few years, stochastic models have been widely used in the natural language understanding modeling Almost all of these works are based on the definition of segments of words as basic semantic units for the stochastic semantic models In this work, we present a two-level stochastic model approach to the construction of the natural language understanding component of a dialog system in the domain of database queries This approach will treat this problem in a way similar to the stochastic approach for the detection of syntactic structures (Shallow Parsing or Chunking) in natural language sentences; however, in this case, stochastic semantic language models are based on the detection of some semantic units from the user turns of the dialog We give the results of the application of this approach to the construction of the understanding component of a dialog system, which answers queries about a railway timetable in Spanish

Journal Article
TL;DR: The paper is focused on the Natural Language Understanding and Dialogue Management Agents, and discusses their integration over a global agent architecture (which includes Action and Knowledge Managers, Speech Input/Output components and HomeSetup controllers).
Abstract: This paper presents the main characteristics of an Agent-based Architecture for the design and implementation of a Spoken Dialogue System. From a theoretical point of view, the system is based on the Information State Update approach, in particular, the system aims at the management of Natural Command Language Dialogue Moves in a Home Machine Environment. Specifically, the paper is focused on the Natural Language Understanding and Dialogue Management Agents, and discusses their integration over a global agent architecture (which includes Action and Knowledge Managers, Speech Input/Output components and HomeSetup controllers).

Proceedings ArticleDOI
03 Dec 2001
TL;DR: This work investigates opportunities and constraints for the integration of intelligent component technologies into VoiceXML-based systems: such components will solve tasks from both sides of natural language processing, natural language understanding/analysis as well as natural language generation.
Abstract: VoiceXML offers the prospect of a streamlined deployment process of voice interfaces for commercial applications, similar to the ease of Web development. This in turn promises increased dynamism in the field of speech and language research. We investigate opportunities and constraints for the integration of intelligent component technologies into VoiceXML-based systems: such components will solve tasks from both sides of natural language processing, natural language understanding/analysis as well as natural language generation. We ask what role these fields originating from AI research will play in the VoiceXML environment. For a more detailed report including also the issues of multilinguality and natural language interfaces see (Mittendorfer et al., 2001).

Proceedings Article
01 Jan 2001
TL;DR: A robust parsing scheme is proposed, which integrates the following methods, to cope with the severe spoken linguistic phenomena, such as garbage, repetition, ellipsis, word disordering, fragment and ill form.
Abstract: The rule-based parsing is a prevalent method for the natural language understanding (NLU) and has been introduced in dialogue systems for spoken language processing (SLP). However, additional measures must be taken to cope with the severe spoken linguistic phenomena, such as garbage, repetition, ellipsis, word disordering, fragment and ill form, which frequently occur in the spoken language. We propose in this paper a robust parsing scheme, which integrates the following methods.

Journal ArticleDOI
TL;DR: This article proposes the method FNLU (Filtering based on Natural Language Understanding) including the algorithms for Extracting Typical Phrase, Calculating feature vector, Mining threshold vector, Objective judging and Subjective judging.
Abstract: Web document filtering is an important aspect in information security. The traditional strategy based on simple keywords matching often leads to low accuracy of discrimination. This article proposes the method FNLU (Filtering based on Natural Language Understanding) including the algorithms for Extracting Typical Phrase, Calculating feature vector, Mining threshold vector, Objective judging and Subjective judging. The experimental result shows that the algorithms are efficient.

01 Jan 2001
TL;DR: The model constitutes an integrated system involving a cyclical process of parsing, semantic adjudication and filtering and it is argued that the model successfully explains certain garden path phenomena in English.
Abstract: This thesis presents a model of incremental natural language understanding based on the grammatical formalism known as Combinatory Categorial Grammar. The model constitutes an integrated system involving a cyclical process of parsing, semantic adjudication and filtering. The motivating data for the model are the well-known observations about garden path effects in human sentence processing, and particularly the fact that the presence and strength of the garden path effect is influenced by the referential context in which the sentence is uttered, as well as the actual lexical items selected. It is argued that the model successfully explains certain garden path phenomena in English. The model has been implemented in the Java programming language.

PatentDOI
Mark E. Epstein1
TL;DR: A method for processing dual tone multi-frequency signals for use with a natural language understanding system can include several steps that determine a meaning from the text equivalent.
Abstract: A method for processing dual tone multi-frequency signals for use with a natural language understanding system can include several steps The step of determining whether a audio input signal is a dual tone multi-frequency signal or a human speech signal can be included If the audio input signal is determined to be a dual tone multi-frequency signal, the audio input signal can be converted to at least one text equivalent Also, the step of providing the at least one text equivalent to a natural language understanding system can be included The natural language understanding system can determine a meaning from the text equivalent

Patent
Rajesh Balchandran1, Mark E. Epstein1
31 Jan 2001
TL;DR: This article proposed a multi-pass method for processing text for use with a natural language understanding system, which can include a series of steps, such as determining at least one contextual marker in the text and identifying a referrent in a question in a text.
Abstract: A multi-pass method for processing text for use with a natural language understanding system can include a series of steps. The steps can include determining at least one contextual marker in the text and identifying a referrent in a question in the text. In a separate referrent mapping pass through the text, the method can include classifying the identified referrent as a particular type of referrent using the contextual marker and the identified referrent.

Patent
Mark E. Epstein1
31 Jan 2001
TL;DR: In this article, a method of configuring classes in a natural language understanding (NLU) system can include the steps of assigning a unique value to members of a class in the NLU system.
Abstract: A method of configuring classes in a natural language understanding (NLU) system. The method can include the steps of assigning a unique value to members of a class in the NLU system. The step of generating possible substrings from the members in the class also can be included. Additionally, for each generated substring having at least one term in common with one of the members in the class, the step of associating with the generated substring the unique value assigned to the member can be included.

Book ChapterDOI
TL;DR: This paper addresses the key question of this book by applying the chaotic dynamics found in biological brains to design of a strictly sequential artificial neural network-based natural language understanding (NLU) system.
Abstract: This paper addresses the key question of this book by applying the chaotic dynamics found in biological brains to design of a strictly sequential artificial neural network-based natural language understanding (NLU) system. The discussion is in three parts. The first part argues that, for NLU, two foundational principles of generative linguistics, mainstream cognitive science, and much of artificial intelligence -that natural language strings have complex syntactic structure processed by structure-sensitive algorithms, and that this syntactic structure determines string semantics- are unnecessary, and that it is sufficient to process strings purely as symbol sequences. The second part then describes neuroscientific work which identifies chaotic attractor trajectory in state space as the fundamental principle of brain function at a level above that of the individual neuron, and which indicates that sensory processing, and perhaps higher cognition more generally, are implemented by cooperating attractor sequence processes. Finally, the third part sketches a possible application of this neuroscientific work to design of an a sequential NLU system.

Patent
Mark E. Epstein1
11 Aug 2001
TL;DR: The authors applied a context free grammar to the text input to determine substrings and corresponding parse trees, and examined each possible substring using an inventory of queries corresponding to the CFG.
Abstract: A method and system for use in a natural language understanding system for including grammars within a statistical parser. The method involves a series of steps. The invention receives a text input. The invention applies a first context free grammar to the text input to determine substrings and corresponding parse trees, wherein the substrings and corresponding parse trees further correspond to the first context free grammar. Additionally, the invention can examine each possible substring using an inventory of queries corresponding to the CFG.

Journal ArticleDOI
TL;DR: This system carries on a conversation with the user in Arabic or English or a combination of the two and attempts to help the user use the Novell network operating system, called the Dialoguer.
Abstract: We have developed a bilingual interface to the Novell network operating system, called the Dialoguer. This system carries on a conversation with the user in Arabic or English or a combination of the two and attempts to help the user use the Novell network operating system. Learning to use an operating system is a major barrier in starting to use computers. There is no single standard for operating systems which makes it difficult for novice users to learn a new operating system. With the proliferation of client–server environments, users will eventually end up using one network operating system or another. These problems motivated our choice of an area to work in and they have made it easy to find real users to test our system. This system is both an expert system and a natural language interface. The system embodies expert knowledge of the operating system commands and of a large variety of plans that the user may want to carry out. The system also contains a natural language understanding component and a response generation component. The Dialoguer makes extensive use of case frame tables in both components. Algorithms for handling a bilingual dialogue are one of the important contributions of this paper along with the Arabic case frames.

Book ChapterDOI
TL;DR: This account is distinguished from previous theories of computer program comprehension by its emphasis on the social and economic perspective, and by its recognition of the similarities between computer program understanding and natural language understanding.
Abstract: This paper explores how computer programmers extract meaning from the computer program texts that they read. This issue is examined from the perspective that program reading is governed by a number of economic choices, since resources, particularly cognitive resources, are severely constrained. These economic choices are informed by the reader's existing belief set, which includes beliefs pertaining to the overlapping and enclosing social groups to which the program reader, the original programmer, and the program's users belong. Membership within these social groups, which may be as specific as the set of programmers working within a particular organization or as general as the members of a particular nation or cultural group, implies a set of shared knowledge that characterizes membership in the social group. This shared knowledge includes both linguistic and non-linguistic components and is what ultimately provides the interpretative context in which meaning is constructed. This account is distinguished from previous theories of computer program comprehension by its emphasis on the social and economic perspective, and by its recognition of the similarities between computer program understanding and natural language understanding.

Book ChapterDOI
10 Dec 2001
TL;DR: An approach to building a commonsense ontology for language understanding using language itself as a design guide using Frege's conception of compositional semantics and the idea of type inferences in strongly-typed, polymorphic programming languages is suggested.
Abstract: It is by now widely accepted that a number of tasks in natural language understanding (NLU) require the storage of and reasoning with a vast amount of background (commonsense) knowledge. While several efforts have been made to build such ontologies, a consensus on a scientific methodology for ontological design is yet to emerge. In this paper we suggest an approach to building a commonsense ontology for language understanding using language itself as a design guide. The idea is rooted in Frege's conception of compositional semantics and is related to the idea of type inferences in strongly-typed, polymorphic programming languages. The method proposed seems to (i) resolve the problem of multiple inheritance; (ii) suggest an explanation for polysemy and metaphor; and (iii) provide a step towards establishing a systematic approach to ontological design.

Journal Article
TL;DR: Anew model selection method based on natural language understanding and genetic algorithm is proposed in this paper on the basis of evaluation of current several famous model selection methods.
Abstract: Anew model selection method based on natural language understanding and genetic algorithm is proposed in this paper on the basis of evaluation of current several famous model selection methods.

Proceedings ArticleDOI
Owen Rambow1
06 Jul 2001
TL;DR: This paper will argue that NLG can and does profit from corpus-based methods, and the resistance to corpus- based approaches in NLG may have more to do with the fact that in many NLG applications the output to be generated is extremely limited.
Abstract: In computational linguistics, the 1990s were characterized by the rapid rise to prominence of corpus-based methods in natural language understanding (NLU). These methods include statistical and machine-learning and approaches. In natural language generation (NLG), in the mean time, there was little work using statistical and machine learning approaches. Some researchers felt that the kind of ambiguities that appeared to profit from corpus-based approaches in NLU did not exist in NLG: if the input is adequately specified, then all the rules that map to a correct output can also be explicitly specified. However, this paper will argue that this view is not correct, and NLG can and does profit from corpus-based methods. The resistance to corpus-based approaches in NLG may have more to do with the fact that in many NLG applications (such as report or description generation) the output to be generated is extremely limited. As is the case with NLU, if the language is limited, hand-crafted methods are adequate and successful. Thus, it is not a surprise that the first use of corpus-based techniques, at ISI (Knight and Hatzivassiloglou, 1995; Langkilde and Knight, 1998) was motivated by the use of NLG not in "traditional" NLG applications, but in machine translation, in which the range of output language is (potentially) much larger.

Book ChapterDOI
01 Jan 2001
TL;DR: This article explains the techniques used and present performance data for current state-of-the-art speech recognition and text-to-speech mechanisms, and does not address natural language understanding or generation.
Abstract: The fields of speech recognition and speech production (text-to-speech) have made great progress since the early 1990s. The technologies are now at the point of becoming commercially viable, and a number of products are currently available. The early years of the twenty-first century should see a marked increase in the proliferation of these technologies into consumer products. Progress in both technologies has been characterized by rapid incremental improvement rather than any theoretical breakthroughs. Both technologies are based on stochastic models and have benefited greatly from increased computing and data resources. This article explains the techniques used and present performance data for current state-of-the-art systems. It concentrates on the basic speech-to-text and text-to-speech mechanisms, and does not address natural language understanding or generation.

Proceedings ArticleDOI
Géraldine Damnati1
09 Dec 2001
TL;DR: The attempt is not to evaluate natural language understanding but to propose a more appropriate evaluation of speech recognition, by making use of semantic information to define the notion of critical errors.
Abstract: Evaluating a speech recognition system is a key issue towards understanding its deficiencies and focusing potential improvements on useful aspects. When a system is designed for a given application, it is particularly relevant to have an evaluation procedure that reflects the role of the system in this application. Evaluating continuous speech recognition through word error rate is not completely appropriate when the speech recognizer is used as spoken dialogue system input. Some errors are particularly harmful, when they concern content words for example, while some others do not have any impact on the following comprehension step. The attempt is not to evaluate natural language understanding but to propose a more appropriate evaluation of speech recognition, by making use of semantic information to define the notion of critical errors.