scispace - formally typeset
Search or ask a question

Showing papers on "Question answering published in 1994"


Proceedings ArticleDOI
01 Aug 1994
TL;DR: The experiments show that on average a current generation natural language system provides better retrieval performance than expert searchers using a Boolean retrieval system when searching full-text legal materials.
Abstract: The results of experiments comparing the relative performance of natural language and Boolean query formulations are presented. The experiments show that on average a current generation natural language system provides better retrieval performance than expert searchers using a Boolean retrieval system when searching full-text legal materials. Methodological issues are reviewed and the effect of database size on query formulation strategy is discussed.

140 citations


Book
01 Nov 1994
Abstract: This paper surveys some of the fundamental problems in natural language (NL) understanding (syntax, semantics, pragmatics, and discourse) and the current approaches to solving them. Some recent developments in NL processing include increased emphasis on corpus-based rather than exampleor intuition-based work, attempts to measure the coverage and effectiveness of NL systems, dealing with discourse and dialogue phenomena, and attempts to use both analytic and stochastic knowledge. Critical areas for the future include grammars that are appropriate to processing large amounts of real language; automatic (or at least semiautomatic) methods for deriving models of syntax, semantics, and pragmatics; self-adapting systems; and integration with speech processing. Of particular importance are techniques that can be tuned to such requirements as full versus partial understanding and spoken language versus text. Portability (the ease with which one can configure an NL system for a particular application) is one of the largest barriers to application of this technology. Natural language (NL) understanding by computer began in the 1950s as a discipline closely related to linguistics. It has evolved to incorporate aspects of many other disciplines (such as artificial intelligence, computer science, and lexicography). Yet it continues to be the Holy Grail of those who try to make computers deal intelligently with one of the most complex characteristics of human beings: language. Language is so fundamental to humans, and so ubiquituous, that fluent use of it is often considered almost synonymous with intelligence. Given that, it is not surprising that computers have difficulty with natural language. Nonetheless, many people seem to think it should be easy for computers to deal with human language, just because they themselves do so easily. Research in both speech recognition (i.e., literal transcription of spoken words) and language processing (i.e., understanding the meaning of a sequence of words) has been going on for decades. But quite recently, speech recognition started to make the transition from laboratory to widespread successful use in a large number of different kinds of systems. What is responsible for this technology transition? Two key features that have allowed the development of successful speech recognition systems are (i) a simple general description of the speech recognition problem (which results in a simple general way to measure the performance of recognizers) and (ii) a simple general way to automatically train a recognizer on a new vocabulary or corpus. Together, these features helped to open the floodgates to the successful, widespread application of speech recognition technology. Many of the papers in this volume, particularly those by The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. ?1734 solely to indicate this fact. Makhoul and Schwartz (1), Jelinek (2), Levinson (3), Oberteuffer (4), Weinstein (5), and Wilpon (6) attest to this fact. But it is important to distinguish "language understanding" from "recognizing speech," so it is natural to ask, why the same path has not been followed in natural language understanding. In natural language processing (NLP), as we shall see, there is no easy way to define the problem being solved (which results in difficulty evaluating the performance of NL systems), and there is currently no general way for NL systems to automatically learn the information they need to deal effectively with new words, new meanings, new grammatical structures, and new domains. Some aspects of language understanding seem tantalizingly similar to problems that have been solved (or at least attacked) in speech recognition, but other aspects seem to emphasize differences that may never allow the same solutions to be used for both problems. This paper briefly touches on some of the history of NLP, the types of NLP and their applications, current problem areas and suggested solutions, and areas for future work. A BRIEF HISTORY OF NLP NLP has a long, diverse history. One way of looking at that history is as a sequence of application areas, each of which has been the primary focus of research efforts in the computational linguistics community, and each of which has produced different techniques for language understanding. A number of excellent references are available that survey the field in various ways (7-11). In the 1950s, machine translation was the first area to receive considerable attention, only to be abandoned when it was discovered that, although it was easy to get computers to map one word string to another, the problem of translating between one natural language and another was much too complex to be expressible as such a mapping. In the 1960s the focus turned to question answering. To "understand" and respond to typed questions, most NL systems used a strongly knowledge-based approach, attempting to encode knowledge for use by a system capable of producing an in-depth analysis of the input question. That arnalysis would then be used to retrieve the answer to the question from a database. In the 1970s interest broadened from database interfaces to other kinds of application systems, but the focus was still on the kinds of natural language that would be produced by a person interacting with a computer system-typed queries or commands issued one at a time by the person, each of which needed to be understood completely in order to produce the correct response. That is, virtually every word in the input had some effect on the meaning that the system produced. This tended to result in systems that, for each sentence they were given, either succeeded perfectly or failed completely. The 1980s saw the first commercialization of research that was done in the previous two decades: natural language

107 citations


Proceedings Article
11 Oct 1994
TL;DR: This paper presents a meta-analyses of several methods used which simulate (or approximate) representation of the content of both queries and documents in the context of a semantic conceptual structure of a text.
Abstract: Accuracy in information retrieval, that is, achieving both high recall and precision, is challenging because the relationship between natural language and semantic conceptual structure is not straightforward. However, effective retrieval requires that the semantic conceptual structure (or content) of both queries and documents be known. Natural language processing is one way to determine the content of a text. But, due to the complexity involved in natural language processing, various methods have been used which simulate (or approximate) representation of the content of both queries and documents.

106 citations


Proceedings Article
01 Jan 1994

78 citations


Proceedings ArticleDOI
13 Oct 1994
TL;DR: This paper describes a method for converting a document image into character shape codes and word shape tokens, which it is shown is sufficient for determining which of 23 languages the document is written in, using only a small number of features.
Abstract: Many documents are available to a computer only as images from paper. However, most natural language processing systems expect their input as character-coded text, which may be difficult or expensive to extract accurately from the page. We describe a method for converting a document image into character shape codes and word shape tokens. We believe that this representation, which is both cheap and robust, is sufficient for many NLP tasks. In this paper, we show that the representation is sufficient for determining which of 23 languages the document is written in, using only a small number of features, with greater than 90% accuracy overall.

66 citations


Proceedings ArticleDOI
13 Oct 1994
TL;DR: It is demonstrated that the use of syntactic compounds in the representation of database documents as well as in the user queries, coupled with an appropriate term weighting strategy, can considerably improve the effectiveness of retrospective search.
Abstract: We report on the results of a series of experiments with a prototype text retrieval system which uses relatively advanced natural language processing techniques in order to enhance the effectiveness of statistical document retrieval. In this paper we show that large-scale natural language processing (hundreds of millions of words and more) is not only required for a better retrieval, but it is also doable, given appropriate resources. In particular, we demonstrate that the use of syntactic compounds in the representation of database documents as well as in the user queries, coupled with an appropriate term weighting strategy, can considerably improve the effectiveness of retrospective search. The experiments reported here were conducted on TIPSTER database in connection with the Text REtrieval Conference series (TREC).

58 citations




Proceedings ArticleDOI
Peter Anick1
01 Aug 1994
TL;DR: The challenges of tuning an IR system to the domain of computer troubleshooting, where user queries tend to be very short and natural language query terms are intermixed with terminology from a variety of technical sublanguages are considered.
Abstract: There has been much research in full-text information retrieval on automated and semi-automated methods of query expansion to improve the effectiveness of user queries In this paper we consider the challenges of tuning an IR system to the domain of computer troubleshooting, where user queries tend to be very short and natural language query terms are intermixed with terminology from a variety of technical sublanguages A number of heuristic techniques for domain knowledge acquisition are described in which the complementary contributions of query log data and corpus analysis are exploited We discuss the implications of sublanguage domain tuning for run-time query expansion tools and document indexing, arguing that the conventional devices for more purely “natural language” domains may be inadequate

34 citations


Book
01 Jan 1994
TL;DR: This volume offers 13 contributions by scientists from the fields of computer, artificial intelligence, and computational linguistics, as well as articles that deal with both coarse-grained (symbolic) and fine- grained (connectionist) approaches to parallel processing.
Abstract: Parallel processing is not only a general topic of interest for computer scientists and researchers in artificial intelligence, but it is gaining more and more attention in the community of scientists studying natural language and its processing (computational linguists, AI researchers, psychologists). The growing need to integrate large divergent bodies of knowledge in natural language processing applications, or the belief that massively parallel systems are the only ones capable of handling the complexities and subtleties of natural language, are just two examples of the reasons for this increasing interest. This volume offers 13 contributions by scientists from the fields of computer, artificial intelligence, and computational linguistics. The chapters provide an extensive introduction to the field, as well as articles that deal with both coarse-grained (symbolic) and fine-grained (connectionist) approaches. Along another axis, theoretical and methodological, as well as empirical and implementational issues are treated, and both language analysis and language generation are covered.

29 citations


Journal ArticleDOI
TL;DR: The proposed system provides the capability to assess whether a database contains information pertinent to a subject of interest by evaluating each comment in the database via a fuzzy evaluator that attributes a fuzzy membership value indicating its relationship to the subject.
Abstract: Describes a question-answering system based on fuzzy logic. The proposed system provides the capability to assess whether a database contains information pertinent to a subject of interest by evaluating each comment in the database via a fuzzy evaluator that attributes a fuzzy membership value indicating its relationship to the subject. An assessment is provided for the database as a whole regarding its pertinence to the subject of interest, and consequently comments that are considered irrelevant to the subject may be discarded. The system has been developed for the examination of databases that were created during the development of the IBM 4381 computer systems, for bookkeeping purposes, to assess whether such databases contain information pertinent to the functional changes that occurred during the development cycle. The system, however, can be applied with minimal changes to a variety of circumstances, provided that the fundamental assumptions for the development of the membership functions are respected in the new application. Its applicability, without modifications, assuming the same subject of interest, is granted for databases comprising similar characteristics to that of the original database for which the system has been developed. >

Book ChapterDOI
16 Aug 1994
TL;DR: A knowledge base consisting of over 12,000 case frames for verbs and a large number of other linguistic patterns that reveal conceptual relations was constructed and used to process a Wall Street Journal database covering a period of three years.
Abstract: This paper describes our large-scale effort to build a conceptual Information Retrieval system that converts a large volume of natural language text into Conceptual Graph representation by means of knowledge-based processing. In order to automatically extract concepts and conceptual relations between concepts from texts, we constructed a knowledge base consisting of over 12,000 case frames for verbs and a large number of other linguistic patterns that reveal conceptual relations. They were used to process a Wall Street Journal database covering a period of three years. We describe our methods for constructing the knowledge base, how the linguistic knowledge is used to process the text, and how the retrieval system makes use of the rich representation of documents and information needs.

01 Jan 1994
TL;DR: This paper deals with a natural language dialogue tool for supporting the database design process using a moderated dialogue for drawing the designer's attention to objects in order to extract comprehensive information about the domain.
Abstract: This paper deals with a natural language dialogue tool for supporting the database design process. We want to illustrate how natural language (German) can be used for obtaining a skeleton design and for supporting the acquisition of semantics of the prospective database. The approach is based on the assumption that verbs form a central part in defining the meaning of sentences and imply semantic roles in the sentences which have to be filled by objects. We are using a moderated dialogue for drawing the designer's attention to these objects in order to extract comprehensive information about the domain.

Journal ArticleDOI
TL;DR: The level of correct interpretation achieved in this study is far above any previously reported and suggests that existing natural language query systems may be practical.
Abstract: The study reported here involved the use of the natural language query system INTELLECT. It evaluated the level of correct interpretation to investigate whether the use of such a system is practical. Two sets of queries generated by two groups of senior-level business students were used. Questions from the first set were generated by "naive" students who were untrained, and not aware that they were providing queries which were to be executed by a computer. Students from the second group attended a short lecture and understood that they were to generate natural language queries to be executed by a computer. INTELLECT's lexicon was augmented in stages. The level of correct interpretation achieved in this study is far above any previously reported and suggests that existing natural language query systems may be practical. key features in the accuracy of interpretation were user training and iterative lexicon enhancement.

15 Dec 1994
TL;DR: This dissertation addresses the knowledge-engineering bottleneck for a natural language processing task called "information extraction" and presents a system called AutoSlog, which automatically constructs dictionaries for information extraction, given an appropriate training corpus.
Abstract: Knowledge-based natural language processing systems have achieved good success with many tasks, but they often require many person-months of effort to build an appropriate knowledge base. As a result, they are not portable across domains. This knowledge-engineering bottleneck must be addressed before knowledge-based systems will be practical for real-world applications. This dissertation addresses the knowledge-engineering bottleneck for a natural language processing task called "information extraction". A system called AutoSlog is presented which automatically constructs dictionaries for information extraction, given an appropriate training corpus. In the domain of terrorism, AutoSlog created a dictionary using a training corpus and five person-hours of effort that achieved 98% of the performance of a hand-crafted dictionary that took approximately 1500 person-hours to build. This dissertation also describes three algorithms that use information extraction to support high-precision text classification. As more information becomes available on-line, intelligent information retrieval will be crucial in order to navigate the information highway efficiently and effectively. The approach presented here represents a compromise between keyword-based techniques and in-depth natural language processing. The text classification algorithms classify texts with high accuracy by using an underlying information extraction system to represent linguistic phrases and contexts. Experiments in the terrorism domain suggest that increasing the amount of linguistic context can improve performance. Both AutoSlog and the text classification algorithms are evaluated in three domains: terrorism, joint ventures, and microelectronics. An important aspect of this dissertation is that AutoSlog and the text classification systems can be easily ported across domains.

Journal Article
TL;DR: In this article, a question-answering system based on fuzzy logic is proposed to assess whether a database contains information pertinent to a subject of interest by evaluating each comment in the database via a fuzzy evaluator that attributes a fuzzy membership value indicating its relationship to the subject.
Abstract: Describes a question-answering system based on fuzzy logic. The proposed system provides the capability to assess whether a database contains information pertinent to a subject of interest by evaluating each comment in the database via a fuzzy evaluator that attributes a fuzzy membership value indicating its relationship to the subject. An assessment is provided for the database as a whole regarding its pertinence to the subject of interest, and consequently comments that are considered irrelevant to the subject may be discarded. The system has been developed for the examination of databases that were created during the development of the IBM 4381 computer systems, for bookkeeping purposes, to assess whether such databases contain information pertinent to the functional changes that occurred during the development cycle. The system, however, can be applied with minimal changes to a variety of circumstances, provided that the fundamental assumptions for the development of the membership functions are respected in the new application. Its applicability, without modifications, assuming the same subject of interest, is granted for databases comprising similar characteristics to that of the original database for which the system has been developed. >

01 Jan 1994
TL;DR: This work presents an augmentation to the representation of variables so that variables are not atomic terms, which leads to an extended, more "natural" logical language whose use and representations are consistent with the use of variables in natural language.
Abstract: We define a knowledge representation and inference formalism that is well suited to natural language processing. In this formalism every subformula of a formula is closed. We motivate this by observing that any formal language with (potentially) open sentences is an inappropriate medium for the representation of natural language sentences. Open sentences in such languages are a consequence of the separation of variables from their quantifier and type constraints, typically in the antecedents of rules. This is inconsistent with the use of descriptions and noun phrases corresponding to variables in language. Variables in natural language are constructions that are typed and quantified as they are used. A consequence of this is that variables in natural language may be freely reused in dialog. This leads to the use of pronouns and discourse phenomena such as ellipsis involving reuse of entire subformulas. We present an augmentation to the representation of variables so that variables are not atomic terms. These "structured" variables are typed and quantified as they are defined and used. This leads to an extended, more "natural" logical language whose use and representations are consistent with the use of variables in natural language. Structured variables simplify the tasks associated with natural language processing and generation, by localizing noun phrase processing. The formalism is defined in terms of a propositional semantic network, starting from nodes and arcs connecting nodes, subsumption, matching, to inference. It allows the resolution of some representational difficulties with certain classes of natural language sentences (e.g. the so-called "donkey" sentences and sentences involving branching quantifiers). Reuse phenomena, such as pronominalization and ellipsis, are captured in the representation by structure-sharing. A major advantage of this structured representation of variables is that it allows a form of terminological and derived subsumption similar to surface reasoning in natural language.

Journal ArticleDOI
01 Oct 1994
TL;DR: Ways in which GDSSs can be improved by using a natural language interface to allow group members to communicate with deeper-level information systems using human languages are described.
Abstract: Group Decision Support Systems (GDSSs) are currently used primarily as surface-level discussion tools. That is, current GDSSs do not allow group members to easily access information in deeper levels, such as the data base, model base, and application programs. This paper describes ways in which GDSSs can be improved by using a natural language interface to allow group members to communicate with deeper-level information systems using human languages. The system consists of database, model base, application programs and natural language interface system. The system is designed to both route questions to appropriate subsystems and translate these questions into the computer language controlling these subsystems. Finally, experimental results demonstrate the feasibility of the technique.

Journal ArticleDOI
TL;DR: It is argued that adequate information retrieval in hospital records will have to rely on the exploitation of the conceptual knowledge in those records rather than superficial string searches, and a retrieval system, called CONIR, is presented, which attempts to realise the second of these developments.

01 Jan 1994
TL;DR: This paper surveys in short the activity of the Knowledge Representation and Reasoning group at IRST for Natural Language Processing, and develops two Description Logic based systems to be used in large Natural Language dialogue architectures.
Abstract: This paper surveys in short the activity of the Knowledge Representation and Reasoning group at IRST for Natural Language Processing. We have developed two Description Logic based systems to be used in large Natural Language dialogue architectures. The functional interaction of such KR systems with the other modules is briefly described. Then, several qualifying extensions of the basic systems are introduced, and their usefulness for naturM language applications is explained.


Book
01 Nov 1994
TL;DR: Surveying in particular three areas of recent progress: part-of-speech tagging, stochastic parsing, and lexical semantics finds hybrid methods that combine new empirical corpus-based methods with traditional symbolic methods are in use.
Abstract: The field of natural language processing (NLP) has seen a dramatic shift in both research direction and methodology in the past several years. In the past, most work in computational linguistics tended to focus on purely symbolic methods. Recently, more and more work is shifting toward hybrid methods that combine new empirical corpus-based methods, including the use of probabilistic and information- theoretic techniques, with traditional symbolic methods. This work is made possible by the recent availability of linguistic databases that add rich linguistic annotation to corpora of natural language text. Already, these methods have led to a dramatic improvement in the performance of a variety of NLP systems with similar improvement likely in the coming years. This paper focuses on these trends, surveying in particular three areas of recent progress: part-of-speech tagging, stochas- tic parsing, and lexical semantics.

Journal ArticleDOI
25 Apr 1994
TL;DR: These ten contributions describe the major technical ideas underlying many of the significant advances in natural language processing over the last decade, focusing in particular on the challenges in areas such as knowledge representation, reasoning, planning, and integration of multiple knowledge sources, where NLP and AI research intersect.
Abstract: From the Publisher: These ten contributions describe the major technical ideas underlying many of the significant advances in natural language processing over the last decade, focusing in particular on the challenges in areas such as knowledge representation, reasoning, planning, and integration of multiple knowledge sources, where NLP and AI research intersect. Included are chapters that deal with all the main aspects of natural language processing, from analysis to interpretation to generation. Fruitful new relations between language research and AI, such as the use of statistical decision techniques in speech and language processing, are also discussed.

Journal ArticleDOI
TL;DR: This special series on natural-language processing is an attempt to bring language processing and its applications into focus, to demonstrate techniques that have recently been applied to real-world problems, to identify research ripe for practical exploitation, and to illustrate some promising combinations of natural- language processing with other emerging technologies.
Abstract: Processing natural language such as English has always been one of the central research issues of artificial intelligence, both because of the key role language plays in human intelligence and because of the wealth of potential applications. Many of the knowledge representation and inference techniques that have been applied successfully in knowledge-based systems were originally developed for processing natural language, but the language-processing applications themselves have always seemed far from being realized. The special series on natural-language processing is an attempt to bring language processing and its applications into focus/spl minus/to demonstrate techniques that have recently been applied to real-world problems, to identify research ripe for practical exploitation, and to illustrate some promising combinations of natural-language processing with other emerging technologies. Each of the four articles in the series provides some insight into the state of the art and conveys the practical significance of recent research in the field. >

Journal ArticleDOI
TL;DR: A range of the systems that have been developed in the domain of machine learning and natural language processing are referenced and overviews, and each system is categorised into either a symbolic or connectionist paradigm, and has its own characteristics and limitations described.
Abstract: A fundamental issue in natural language processing is the prerequisite of an enormous quantity of preprogrammed knowledge concerning both the language and the domain under examination. Manual acquisition of this knowledge is tedious and error prone. Development of an automated acquisition process would prove invaluable.

Journal ArticleDOI
Scott P. Robertson1
TL;DR: A highly interactive model is described in which an expectation-driven parser generates multiple question candidates, including partially-specified candidates, and three experiments on human question answering provide evidence that working memory load during question reading is affected by processes related to answer retrieval.

Book ChapterDOI
16 Aug 1994
TL;DR: The semantics of conceptual graphs with respect to context have been clarified in recent years, however, problems remain in the application.
Abstract: The semantics of conceptual graphs (cgs) with respect to context have been clarified in recent years, however, problems remain in the application. Contexts are especially useful in representing the large amounts of knowledge suitable for information retrieval. Some examples of the use of contexts are presented, and some questions are raised about how contexts can be used effectively. The discussion deals with natural language representation, reasoning and context management issues.

Journal ArticleDOI
TL;DR: The author considers how the Natural Language Group designed the ALFresco prototype to provide information on 14th-century Italian frescoes and monuments and to suggest other masterpieces that might interest the user.
Abstract: Two prototype information-access applications show how the integration of natural language and hypermedia produces systems that allow users and systems to exchange more information. The author considers how the Natural Language Group designed the ALFresco prototype to provide information on 14th-century Italian frescoes and monuments and to suggest other masterpieces that might interest the user. He also discusses the MAIA project which integrates components developed by IRST researchers in speech recognition, natural language, knowledge representation, vision, reasoning, and other areas of AI. >

Proceedings Article
11 Oct 1994
TL;DR: It is demonstrated that a proper term weighting is at least as important as their selection, and that different types of terms (e.g., words, phrases, names), and terms derived by different means must be treated differently for a maximum benefit in retrieval.
Abstract: In information retrieval, the content of a document may be represented as a collection of terms: words, stems, phrases, or other units derived or inferred from the text of the document. These terms are usually weighted to indicate their importance within the document which can then be viewed as a vector in a N-dimensional space. In this paper we demonstrate that a proper term weighting is at least as important as their selection, and that different types of terms (e.g., words, phrases, names), and terms derived by different means (e.g., statistical, linguistic) must be treated differently for a maximum benefit in retrieval. We report results of selected experiments with our prototype natural language information retrieval system performed in connection with the second Text REtrieval Conference (TREC-2) using a 550 Mbytes Wall Street Journal database and a 300 Mbytes San Jose Mercury database.

Proceedings ArticleDOI
13 Oct 1994
TL;DR: This paper presents Delphi, the natural language component of the BBN Spoken Language System, a domain-independent natural language question answering system that is solidly based on linguistic principles, yet which is also robust to ungrammatical input.
Abstract: This paper presents Delphi, the natural language component of the BBN Spoken Language System. Delphi is a domain-independent natural language question answering system that is solidly based on linguistic principles, yet which is also robust to ungrammatical input. It includes a domain-independent, broad-coverage grammar of English. Analysis components include an agenda-based best-first parser and a fallback component for partial understanding that works by fragment combination. Delphi has been formally evaluated in the ARPA Spoken Language program's ATIS (Airline Travel Information System) domain, and has performed well. Delphi has also been ported to a spoken language demonstration system in an Air Force Resource Management domain. We discuss results of the evaluation as well as the porting process.