scispace - formally typeset
Search or ask a question

Showing papers on "Question answering published in 1993"


Patent
12 Oct 1993
TL;DR: A natural language processing system uses unformatted naturally occurring text and generates a subject vector representation of the text, which may be an entire document or a part thereof such as its title, a paragraph, clause, or a sentence therein this paper.
Abstract: A natural language processing system uses unformatted naturally occurring text and generates a subject vector representation of the text, which may be an entire document or a part thereof such as its title, a paragraph, clause, or a sentence therein. The subject codes which are used are obtained from a lexical database and the subject code(s) for each word in the text is looked up and assigned from the database. The database may be a dictionary or other word resource which has a semantic classification scheme as designators of subject domains. Various meanings or senses of a word may have assigned thereto multiple, different subject codes and psycholinguistically justified sense meaning disambiguation is used to select the most appropriate subject field code. Preferably, an ordered set of sentence level heuristics is used which is based on the statistical probability or likelihood of one of the plurality of codes being the most appropriate one of the plurality. The subject codes produce a weighted, fixed-length vector (regardless of the length of the document) which represents the semantic content thereof and may be used for various purposes such as information retrieval, categorization of texts, machine translation, document detection, question answering, and generally for extracting knowledge from the document. The system has particular utility in classifying documents by their general subject matter and retrieving documents relevant to a query.

452 citations


Proceedings ArticleDOI
01 Jul 1993
TL;DR: The methods hypothesize noun phrases that are likely to be the answer, and present the user with relevant text in which they are marked, focussing the user's attention appropriately, illustrated in a broad domain: answering general-knowledge questions using an on-line encyclopedia.
Abstract: Robust linguistic methods are applied to the task of answering closed-class questions using a corpus of natural language. The methods are illustrated in a broad domain: answering general-knowledge questions using an on-line encyclopedia.A closed-class question is a question stated in natural language, which assumes some definite answer typified by a noun phrase rather than a procedural answer. The methods hypothesize noun phrases that are likely to be the answer, and present the user with relevant text in which they are marked, focussing the user's attention appropriately. Furthermore, the sentences of matching text that are shown to the user are selected to confirm phrase relations implied by the question, rather than being selected solely on the basis of word frequency.The corpus is accessed via an information retrieval (IR) system that supports boolean search with proximity constraints. Queries are automatically constructed from the phrasal content of the question, and passed to the IR system to find relevant text. Then the relevant text is itself analyzed; noun phrase hypotheses are extracted and new queries are independently made to confirm phrase relations for the various hypotheses.The methods are currently being implemented in a system called MURAX and although this process is not complete, it is sufficiently advanced for an interim evaluation to be presented.

176 citations


Proceedings ArticleDOI
01 Dec 1993
TL;DR: The InfoCrystal is introduced that can be used both as a visualization tool and a visual query language to help users search for information and specify Boolean as well as vector-space queries graphically.
Abstract: This demonstration introduces the InfoCrystalm that can be used both as a visualization tool and a visual query language to help users search for information. The InfoCrystal visualizes all the possible binary as well as continuous relationships among N concepts. Users can assign relevance weights to the concepts and set a threshold to select relationships of interest. The InfoCrystal allows users to specify Boolean as well as vector-space queries graphically. Arbitrarily complex queries can be created by using the InfoCrystals as building blocks and organizing them in a hierarchical structure. The InfoCrystal enables users to explore and filter information in a flexible, dynamic and interactive way.

157 citations


Journal Article
TL;DR: The focus of the effort is the development of SPECIALIST, an experimental natural language processing system for the biomedical domain that includes a broad coverage parser supported by a large lexicon, modules that provide access to the extensive Unified Medical Language System Knowledge Sources, and a retrieval module that permits experiments in information retrieval.
Abstract: This paper describes efforts to provide access to the free text in biomedical databases. The focus of the effort is the development of SPECIALIST, an experimental natural language processing system for the biomedical domain. The system includes a broad coverage parser supported by a large lexicon, modules that provide access to the extensive Unified Medical Language System (UMLS) Knowledge Sources, and a retrieval module that permits experiments in information retrieval. The UMLS Metathesaurus and Semantic Network provide a rich source of biomedical concepts and their interrelationships. Investigations have been conducted to determine the type of information required to effect a map between the language of queries and the language of relevant documents. Mappings are never straightforward and often involve multiple inferences.

91 citations




Journal ArticleDOI
TL;DR: This system accepts an arabic sentence (declarative statements or questions to be answered) and generates the appropriate output to the user and uses frames technique to represent the knowledge base of a radiation domain.
Abstract: This system accepts an arabic sentence (declarative statements or questions to be answered) and generates the appropriate output to the user. The proposed system could be considered an Arabic implementation to the general natural language processing system. It uses frames technique to represent the knowledge base of a radiation domain.

63 citations


01 Jan 1993
TL;DR: The design of and experimentation with the Knowledge Query and Manipulation Language (KQML) are described, a new language and protocol for exchanging information and knowledge aimed at developing techniques and methodology for building large-scale knowledge bases which are sharable and reusable.

56 citations


Book ChapterDOI
28 Apr 1993
TL;DR: Guided by the identification of major generation challenges viewed from the angles of knowledge-based systems and cognitive psychology, some new directions for future research are sketched.
Abstract: Current research in natural language generation is situated in a computational linguistics tradition that was founded several decades ago. We critically analyse some of the architectural assumptions underlying existing systems and point out some problems in the domains of text planning and lexicalization. Guided by the identification of major generation challenges viewed from the angles of knowledge-based systems and cognitive psychology, we sketch some new directions for future research.

53 citations



11 Jan 1993
TL;DR: It is argued that most instructions don't exactly mirror the agent's knowledge, but are understood by accommodating them in the context of the general plan the agent is considering; the accommodation process is guided by the goal(s) that theAgent is trying to achieve.
Abstract: Human agents are extremely flexible in dealing with Natural Language instructions. I argue that most instructions don't exactly mirror the agent's knowledge, but are understood by accommodating them in the context of the general plan the agent is considering; the accommodation process is guided by the goal(s) that the agent is trying to achieve. Therefore a NL system which interprets instructions must be able to recognize and/or hypothesize goals; it must make use of a flexible knowledge representation system, able to support the specialized inferences necessary to deal with input action descriptions that do not exactly match the stored knowledge. The data that support my claim are Purpose Clauses (PCs), infinitival constructions as in $Do\ \alpha\ to\ do\ \beta$, and Negative Imperatives. I present a pragmatic analysis of both PCs and Negative Imperatives. Furthermore, I analyze the computational consequences of PCs, in terms of the relations between actions PCs express, and of the inferences an agent has to perform to understand PCs. I propose an action representation formalism that provides the required flexibility. It has two components. The Terminological Box (TBox) encodes linguistic knowledge about actions, and is expressed by means of the hybrid system CLASSIC. To guarantee that the primitives of the representation are linguistically motivated, I derive them from Jackendoff's work on Conceptual Structures. The Action Library encodes planning knowledge about actions. The action terms used in the plans are those defined in the TBox. Finally, I present an algorithm that implements inferences necessary to understand $Do\ \alpha\ to\ do\ \beta$, and supported by the formalism I propose. In particular, I show how the TBox classifier is used to infer whether $\alpha$ can be assumed to match one of the substeps in the plan for $\beta$, and how expectations necessary for the match to hold are computed.

01 Jun 1993
TL;DR: A relatively large-scale lexical knowledge base that is constructed automatically from an online version of Longman's Dictionary of Contemporary English, and that is currently used in the authors' NL analysis system to direct phrasal attachments is described.
Abstract: We propose combining dictionary-based and example-based natural language (NL) processing techniques in a framework that we believe will provide substantive enhancements to NL analysis systems. The centerpiece of this framework is a relatively large-scale lexical knowledge base that we have constructed automatically from an online version of Longman's Dictionary of Contemporary English (LDOCE), and that is currently used in our NL analysis system to direct phrasal attachments. After discussing the effective use of example-based processing in hybrid NL systems, we compare recent dictionarybased and example-based work, and identify the aspects of this work that are included in the proposed framework. We then describe the methods employed in automatically creating our lexical knowledge base from LDOCE, and its current and planned use as a large-scale example base in our NL analysis system. This knowledge base is structured as a highly interconnected network of words linked by semantic relations such as is_a, has_part, location_of, typical_object, and is_for. We claim that within the proposed hybrid framework, it provides a uniquely rich source of information for use during NL analysis.

Journal ArticleDOI
TL;DR: A knowledge representation and inference formalism, based on an intensional propositional semantic network, in which variables are structures terms consisting of quantifier, type, and other information, which leads to an extended, more “natural” formalism whose use and representations are consistent with the use of variables in natural language.
Abstract: We describe a knowledge representation and inference formalism, based on an intensional propositional semantic network, in which variables are structures terms consisting of quantifier, type, and other information. This has three important consequences for natural language processing. First, this leads to an extended, more “natural” formalism whose use and representations are consistent with the use of variables in natural language in two ways: the structure of representations mirrors the structure of the language and allows re-use phenomena such as pronouns and ellipsis. Second, the formalism allows the specification of description subsumption as a partial ordering on related concepts (variable nodes in a semantic network) that relates more general concepts to more specific instances of that concept, as is done in language. Finally, this structured variable representation simplifies the resolution of some representational difficulties with certain classes of natural language sentences, namely, donkey sentences and sentences involving branching quantifiers. The implementation of this formalism is called ANALOG (A NAtural LOGIC) and its utility for natural language processing tasks is illustrated.

Book ChapterDOI
02 Jan 1993
TL;DR: It is argued that one of the crucial issues facing future natural language systems is the development of knowledge representation formalisms that can effectively handle ambiguity.
Abstract: : Current natural language understanding systems generally maintain a strict division between the parsing processes and the representation that supports general reasoning about the world. This paper examines why these two forms of processing are separated, determines the current advantages and limitations of this approach, and identifies the inherent limitations of the approach. I will point out some fundamental problems with the models as they are defined today and suggest some important directions of research in natural language and knowledge representation. In particular, I will argue that one of the crucial issues facing future natural language systems is the development of knowledge representation formalisms that can effectively handle ambiguity.

Proceedings Article
01 Jan 1993
TL;DR: This paper reports on some revent developments in the natural language text retrieval system, a traditional statistical engine which builds inverted index files from pre-processed documents and then searches and ranks the documents in response to user queries.
Abstract: This paper reports on some revent developments in our natural language text retrieval system. The system uses advanced natural language processing techniques to enhance the effectiveness of term-based document retrieval. Th backbone of our system is a traditional statistical engine which builds inverted index files from pre-processed documents and then searches and ranks the documents in response to user queries. Natural language processing is used to (1) preprocess the document in order to extract content-carrying term, (2) discover inter-term dependencies and build a conceptual hierarchy specific to the database domain, and (3) process user's natural language requests into effective search queries. While the general desigh of the system has not changed since TREC-2 conference, we nonetheless replaced several components and added a number of new features which are describes in the present paper. While the general design of the system has not changed since TREC-1 conference, we nonetheless replaced several components and added a number of new features which are described in the present paper

01 Jan 1993
TL;DR: This dissertation provides an analysis of how the content and organization of explanations function to achieve communicative goals under potentially conflicting constraints, and applies this analysis to the design of a planner for generation of explanations by computer.
Abstract: The dissertation provides an analysis of how the content and organization of explanations function to achieve communicative goals under potentially conflicting constraints, and applies this analysis to the design of a planner for generation of explanations by computer. An implementation of this planner as a multimedia question answering system is described. The functional analysis has four major subparts: (1) A theory of the kinds of knowledge that can provide the basis for "informatively satisfying" responses to a given question. (2) A theory of context sensitive constraints on the choice between alternate domain models that compete as the basis for answering a given question. (3) A theory of how supplemental explanations aid the comprehension and retention of the primary explanation. (4) A theory of how the sequencing of the parts of an explanation enhances the communicative functionality of those parts. The functional aspects of explanation just outlined imply a variety of explanation planning subtasks having distinct information processing requirements. A planning architecture is presented that matches these planning subtasks to appropriate mechanisms: (1) Top-down goal refinement translates queries into specifications of relevant knowledge on which a response can be based. (2) Prioritized preferences restrict competing domain models to those that are expected to be both informative and comprehensible to the questioner at a given point in the dialogue. (3) Plan critics examine the evolving plan and post new goals to supplement the explanation as needed. (4) A constrained graph traversal mechanism sequences the parts of an explanation in a manner respecting certain functional relationships between the parts. Contributions include: (1) the clarification and integration of a variety of functional aspects of explanatory text, (2) an analysis of the roles and limitations of various explanation planning mechanisms, (3) the design of a flexible explanation planner that applies various constraints on explanation independently of each other, and (4) an approach to selection between multiple domain models that is more general than previous approaches. Together these contributions clarify the correspondence between knowledge about communication, planning tasks, and types of discourse structure and provide improved interactive explanation capabilities.

Journal ArticleDOI
TL;DR: Lexfix, a vocabulary correction and standardization system, has been designed to improve keyword-based retrieval on free-form text fields in GM's Technical Assistance System (TAS) database, which contains about 300000 cases of vehicle symptoms and repair information.
Abstract: The use of natural language processing and machine learning techniques to help interpret, characterize, and standardize data, thereby enhancing the extraction of knowledge from diagnostic databases, is discussed. In particular Lexfix, a vocabulary correction and standardization system, has been designed to improve keyword-based retrieval on free-form text fields in GM's Technical Assistance System (TAS) database, which contains about 300000 cases of vehicle symptoms and repair information. Also implemented was a natural language parser called TASLink, designed to interpret various kinds of ill-formed English, particularly free-form descriptions of vehicle faults. Inferule, an inductive machine-learning system that infers diagnostic rules from database cases containing information about vehicle symptoms and their solutions, is also described. >

Proceedings Article
01 Jan 1993
TL;DR: This article aims at describing the current development of a multilingual natural language system, strongly oriented towards the semantics of the domain, and to establish direct links with the language platform.
Abstract: Natural Language Understanding (NLU) is a rapidly growing field in medical informatics. Its potential for tomorrow's applications is important. However, it is limited by its ability to ground its components on a solid model of the domain. This opens the way for the emergence of the discipline of medical domain modelling, as part of the vast field of Knowledge Base (KB) engineering. This article aims at describing the current development of a multilingual natural language system, strongly oriented towards the semantics of the domain. Special emphasis is presently given to the task of building a domain model, and to establish direct links with the language platform. The result is a model-driven NLU system. Numerous benefits are expected in the long term.

Proceedings ArticleDOI
21 Mar 1993
TL;DR: Most of the unknown words in texts which degrade the performance of natural language processing systems are proper nouns, but proper nounS are recognized as a crucial source of information for identifying a topic in a text, extracting contents from a text or detecting relevant documents in information retrieval.
Abstract: Most of the unknown words in texts which degrade the performance of natural language processing systems are proper nouns. On the other hand, proper nouns are recognized as a crucial source of information for identifying a topic in a text, extracting contents from a text, or detecting relevant documents in information retrieval (Rau, 1991).

01 Feb 1993
TL;DR: A representation called em transition space is presented which portrays events in terms of ``transitions,'''' or collections of changes expressible in everyday language, and a program called PATHFINDER is described, which uses the transition space representation to perform causal reconstruction on simplified English descriptions of physical activity.
Abstract: Causal reconstruction is the task of reading a written causal description of a physical behavior, forming an internal model of the described activity, and demonstrating comprehension through question answering. This task is difficult because written descriptions often do not specify exactly how referenced events fit together. This article (1) characterizes the causal reconstruction problem, (2) presents a representation called {\em transition space,} which portrays events in terms of ``transitions,'''' or collections of changes expressible in everyday language, and (3) describes a program called PATHFINDER, which uses the transition space representation to perform causal reconstruction on simplified English descriptions of physical activity.

Book ChapterDOI
01 Jan 1993
TL;DR: A document production tool that appears to a user as a word processor but also acts as an expert system shell with frame and rule representations supporting deductive inference, developed as an alternative approach to the dissemination of knowledge bases.
Abstract: This paper is written in a document production tool that appears to a user as a word processor but also acts as an expert system shell with frame and rule representations supporting deductive inference. The electronic version of the document is active, providing typographic text and page layout facilities, versioning, hypermedia sound and movies, hypertext links, and knowledge structures represented in a visual language. It can be read as a hypermedia document and also interrogated as a knowledgebased system for problem-solving. The paper version of the document, which you are now reading, is produced by printing the electronic version. It loses its active functionality but continues to act as a record of the knowledge in the document. The overall technology has been developed as an alternative approach to the dissemination of knowledge bases. It also provides a different interface to knowledge-based systems that emulates document interfaces with which many users are already familiar.

Proceedings Article
01 Jan 1993
TL;DR: This work demonstrates that conceptual graphs are a suitable means to model the end-users queries on the basis of the thesaurus and the semantic network of the UMLS project.
Abstract: Information retrieval in large information databases is a non-deterministic process which needs a sequence of search steps generally. One of the main problems to which the end-users are faced is to parse efficiently their questions into the query language that the computer systems allow. Conceptual graphs were initially designed for natural language analysis and understanding. Due to their closeness to semantic networks, their expressiveness is powerful enough to be applied to knowledge representation and use by computer systems. This work demonstrates that conceptual graphs are a suitable means to model the end-users querieson the basis of the thesaurus and the semantic network of the UMLS project.

Book ChapterDOI
06 Sep 1993
TL;DR: This thesaurus, together with the expert rules for structuring the document, provides the user with an analytical document representation and hypertext links will be supplemented in order to offer the lawyer a convenient tool for efficient information retrieval.
Abstract: Our legal expert system KONTERM contains a selective thesaurus and a knowledge base for the automatic representation of the structure and the contents of the document. The thesaurus takes into account the necessary degree of formalism of legal language and therefore overcomes the untidiness of natural language and represents automatically the expert knowledge of a lawyer about legal terminology. The required selectivity of the individual descriptors is achieved by distinguishing between precise legal terms and words with fuzzy meanings as well as by detecting hidden word senses. This thesaurus, together with the expert rules for structuring the document, provides the user with an analytical document representation. In addition to that, hypertext links will be supplemented in order to offer the lawyer a convenient tool for efficient information retrieval.

Journal ArticleDOI
TL;DR: The representation demonstrates that Boolean semantics of natural language can be successfully modeled in terms of representation and inference by knowledge representation formalisms with Boolean semantics.
Abstract: A formal, computational, semantically clean representation of natural language is presented. This representation captures the fact that logical inferences in natural language crucially depend on the semantic relation of entailment between sentential constituents such as determiner, noun, adjective, adverb, preposition, and verb phrases.


Journal ArticleDOI
TL;DR: This article examined the relationship between question parsing and memory retrieval and found that the question component relative to the presupposition was more likely to be read before or after the main concept being queried.

ReportDOI
01 Jun 1993
TL;DR: This project explored dialogue patterns in two corpora: graduate students tutoring undergraduates in research methods, and high school students tutors 7th graders in algebra, and analyzed pedagogical strategies, feedback mechanisms, question asking, question answering, and pragmatic assumptions during the tutoring process.
Abstract: : One-to-one tutoring is more effective than alternative training methods, yet there have been few attempts to examine the process of naturalistic tutoring. This project explored dialogue patterns in two corpora: graduate students tutoring undergraduates in research methods, and high school students tutoring 7th graders in algebra. We analyzed pedagogical strategies, feedback mechanisms, question asking, question answering, and pragmatic assumptions during the tutoring process. One pervasive dialogue pattern was a five-step frame: (1) tutor asks question, (2) student answers question, (3) tutor gives short feedback on answer quality, (4) tutor and student collaboratively improve on answer quality, and (5) tutor assesses the student's understanding of the answer. Tutor questions were primarily motivated by curriculum scripts and the process of coaching students through exemplar problems -- rarely by attempts to diagnose and remediate the student's idiosyncratic knowledge deficits. Dialogue patterns were simulated by two computational models: a recurrent connectionist network and a recursive transition network. These models capture the systematicity in the sequential ordering of speech act categories. That is, to what extent does a model accurately predict the category of speech act N+1, given speech acts 1 through N?.... Question asking, Question answering, Tutoring, Learning.

Proceedings ArticleDOI
01 Dec 1993
TL;DR: The results of this study have successfully been applied to a Data Dictionary of a large Dutch company and it was possible to generate Natural Language (Dutch) sentences from the definitions of the words in the lexicon, replacing the fixed strings originally put in the Data Dictionary.
Abstract: Data Dictionaries (DD) contain crucial information about the (technical) meaning of words used in a certain company. In linguistics a lexicon contains syntactic and semantic information about words used in the society. In this paper we study the possibility y of structuring a Data Dictionary as if it were a lexicon. The results of this study have successfully been applied to a Data Dictionary of a large Dutch company. It was possible to generate Natural Language (Dutch) sentences from the definitions of the words (concepts) in the lexicon, replacing the fixed strings originally put in the Data Dictionary.


01 Jan 1993
TL;DR: This material is provided by the U.S. Department of Commerce under cooperative agreement number EEC-9209623 and the author(s) and do not necessarily reflect those of the sponsor.
Abstract: ment of Commerce under cooperative agreement number EEC-9209623. Any opinions, findings and conclusions or recommendations expressed in this material are the author(s) and do not necessarily reflect those of the sponsor.