scispace - formally typeset
Search or ask a question

Showing papers on "Natural language published in 1990"


Journal ArticleDOI
TL;DR: There is every reason to believe that a specialization for grammar evolved by a conventional neo-Darwinian process, as well as other arguments and data.
Abstract: Many people have argued that the evolution of the human language faculty cannot be explained by Darwinian natural selection. Chomsky and Gould have suggested that language may have evolved as the by-product of selection for other abilities or as a consequence of as-yet unknown laws of growth and form. Others have argued that a biological specialization for grammar is incompatible with every tenet of Darwinian theory – that it shows no genetic variation, could not exist in any intermediate forms, confers no selective advantage, and would require more evolutionary time and genomic space than is available. We examine these arguments and show that they depend on inaccurate assumptions about biology or language or both. Evolutionary theory offers clear criteria for when a trait should be attributed to natural selection: complex design for some function, and the absence of alternative processes capable of explaining such complexity. Human language meets these criteria: Grammar is a complex mechanism tailored to the transmission of propositional structures through a serial interface. Autonomous and arbitrary grammatical phenomena have been offered as counterexamples to the position that language is an adaptation, but this reasoning is unsound: Communication protocols depend on arbitrary conventions that are adaptive as long as they are shared. Consequently, language acquisition in the child should systematically differ from language evolution in the species, and attempts to analogize them are misleading. Reviewing other arguments and data, we conclude that there is every reason to believe that a specialization for grammar evolved by a conventional neo-Darwinian process.

2,002 citations



Book
01 Jan 1990
TL;DR: Focusing on the structure of meaning in English sentences at a "subatomic" level - that is, a level below the one most theories accept as basic or "atomic" - Parsons asserts that the semantics of simple English sentences require logical forms somewhat more complex than is normally assumed in natural language semantics.
Abstract: This extended investigation of the semantics of event (and state) sentences in their various forms is a major contribution to the semantics of natural language, simultaneously encompassing important issues in linguistics, philosophy, and logic. It develops the view that the logical forms of simple English sentences typically contain quantification over events or states and shows how this view can account for a wide variety of semantic phenomena.Focusing on the structure of meaning in English sentences at a "subatomic" level - that is, a level below the one most theories accept as basic or "atomic" - Parsons asserts that the semantics of simple English sentences require logical forms somewhat more complex than is normally assumed in natural language semantics. His articulation of underlying event theory explains a wide variety of apparently diverse semantic characteristics of natural language, and his development of the theory shows the importance of seeing the distinction between events and states.Parsons demonstrates that verbs, also, indicate kinds of actions rather than specific, individual actions. Verb phrases, too, he argues, depend on modifiers to make their function and meaning in a sentence specific. An appendix gives many of the details needed to formalize the theory discussed in the body of the text and provides a series of templates that permit the generation of atomic formulas of English.Terence Parsons is Professor of Philosophy and Dean of Humanities at the University of California, Irvine.

1,437 citations


Proceedings ArticleDOI
24 Jun 1990
TL;DR: This pilot marks the first full-scale attempt to collect a corpus to measure progress in Spoken Language Systems that include both a speech and natural language component and provides guidelines for future efforts.
Abstract: Speech research has made tremendous progress in the past using the following paradigm:• define the research problem,• collect a corpus to objectively measure progress, and• solve the research problem.Natural language research, on the other hand, has typically progressed without the benefit of any corpus of data with which to test research hypotheses. We describe the Air Travel Information System (ATIS) pilot corpus, a corpus designed to measure progress in Spoken Language Systems that include both a speech and natural language component. This pilot marks the first full-scale attempt to collect such a corpus and provides guidelines for future efforts.

842 citations


Journal ArticleDOI
TL;DR: A novel kind of language model which reflects short-term patterns of word use by means of a cache component (analogous to cache memory in hardware terminology) is presented and contains a 3g-gram component of the traditional type.
Abstract: Speech-recognition systems must often decide between competing ways of breaking up the acoustic input into strings of words. Since the possible strings may be acoustically similar, a language model is required; given a word string, the model returns its linguistic probability. Several Markov language models are discussed. A novel kind of language model which reflects short-term patterns of word use by means of a cache component (analogous to cache memory in hardware terminology) is presented. The model also contains a 3g-gram component of the traditional type. The combined model and a pure 3g-gram model were tested on samples drawn from the Lancaster-Oslo/Bergen (LOB) corpus of English text. The relative performance of the two models is examined, and suggestions for the future improvements are made. >

555 citations


Journal ArticleDOI
TL;DR: In this article, a new method for evaluating the grammatical complexity of preschool natural language corpora is introduced, based on the Index of Productive Syntax, where occurrences of 56 syntactic and morphological forms are counted, yielding a total score and subscores for noun phrases, verb phrases, questions/negations, and sentence structures.
Abstract: A new method for evaluating the grammatical complexity of preschool natural language corpora is introduced. In the Index of Productive Syntax, occurrences of 56 syntactic and morphological forms are counted, yielding a total score and subscores for noun phrases, verb phrases, questions/negations, and sentence structures. Development of the index and analyses of its reliability and age-sensitivity when applied to language samples of 2- to 4-year-olds are described. Some advantages and limitations of the index as a research and clinical instrument are also discussed.

480 citations


Journal ArticleDOI
TL;DR: The future of natural language text processing is examined in the SCISOR prototype, drawing on artificial intelligence techniques, and applying them to financial news items through a combination of bottom-up and top-down processing.
Abstract: The future of natural language text processing is examined in the SCISOR prototype. Drawing on artificial intelligence techniques, and applying them to financial news items, this powerful tool illustrates some of the future benefits of natural language analysis through a combination of bottom-up and top-down processing.

433 citations


Book
01 Jan 1990
TL;DR: Referential Practice as discussed by the authors is an anthropological study of language use in a contemporary Yucatec Maya community, which examines the routine conversational practices in which Maya speakers make reference to themselves and to each other, to their immediate contexts, and to their world.
Abstract: "Referential Practice" is an anthropological study of language use in a contemporary Maya community. It examines the routine conversational practices in which Maya speakers make reference to themselves and to each other, to their immediate contexts, and to their world. Drawing on extensive fieldwork in Oxkutzcab, Yucatan, William F. Hanks develops a sociocultural approach to reference in natural languages. The core of this approach lies in treating speech as a social engagement and reference as a practice through which actors orient themselves in the world. The conceptual framework derives from cultural anthropology, linguistic pragmatics, interpretive sociology, and cognitive semantics. As his central case, Hanks undertakes a comprehensive analysis of deixis-linguistic forms that fix reference in context, such as English "I, you, this, that, here, " and "there." He shows that Maya deixis is a basic cultural construct linking language with body space, domestic space, agricultural and ritual practices, and other fields of social activity. Using this as a guide to ethnographic description, he discovers striking regularities in person reference and modes of participation, the role of perception in reference, and varieties of spatial orientation, including locative deixis. Traditionally considered a marginal area in linguistics and virtually untouched in the ethnographic literature, the study of referential deixis becomes in Hanks's treatment an innovative and revealing methodology. "Referential Practice" is the first full-length study of actual deictic use in a non-Western language, the first in-depth study of speech practice in Yucatec Maya culture, and the first detailed account of the relation between routine conversation, embodiment, and ritual discourse."

413 citations


Journal ArticleDOI

382 citations


Proceedings ArticleDOI
03 Apr 1990
TL;DR: The N-best algorithm is a time-synchronous Viterbi-style beam search procedure that is guaranteed to find the N most likely whole sentence alternatives that are within a given beam of the most likely sentence.
Abstract: A search algorithm that provides a simple, clean, and efficient interface between the speech and natural language components of a spoken language system is introduced. The N-best algorithm is a time-synchronous Viterbi-style beam search procedure that is guaranteed to find the N most likely whole sentence alternatives that are within a given beam of the most likely sentence. The computation is linear with the length of the utterance, and faster than linear in N. When used together with a first-order statistical grammar, the correct sentence is usually within the first few sentence choices. The output of the algorithm, which is an ordered set of sentence hypotheses with acoustic and language model scores can easily be processed by natural language knowledge sources without the huge expansion of the search space that would be needed to include all possible knowledge sources in a top-down search. In experiments using a first-order statistical language model, the average rank of the correct answer was 1.8 and was within the first 24 choices 99% of the time. >

356 citations


Proceedings ArticleDOI
20 Aug 1990
TL;DR: A variant of TAGs is presented, called synchronous TAGs, which characterize correspondences between languages, to allow TAGs to be used beyond their role in syntax proper.
Abstract: The unique properties of tree-adjoining grammars (TAG) present a challenge for the application of TAGs beyond the limited confines of syntax, for instance, to the task of semantic interpretation or automatic translation of natural language. We present a variant of TAGs, called synchronous TAGs, which characterize correspondences between languages. The formalism's intended usage is to relate expressions of natural languages to their associated semantics represented in a logical form language, or to their translates in another natural language; in summary, we intend it to allow TAGs to be used beyond their role in syntax proper. We discuss the application of synchronous TAGs to concrete examples, mentioning primarily in passing some computational issues that arise in its interpretation.

Patent
16 Aug 1990
TL;DR: A method and apparatus for computer text retrieval which involves annotating at least selected text subdivisions, preferably with natural language questions, assertions or noun phrases, is described in this paper.
Abstract: A method and apparatus for computer text retrieval which involves annotating at least selected text subdivisions, preferably with natural language questions, assertions or noun phrases. However, the annotations may also initially be generated in a structured form. Annotations are, if required, converted to a structured form and are stored in that form. Searching for relevant text subdivisions involves entering a query in natural language or structured form, converting natural language queries to structured form, matching the structured form query against stored annotations and retrieving text subdivisions connected to matched annotations. The annotation process may be aided by utilizing various techniques for automatically or semiautomatically generating the annotations.

Book
03 Dec 1990
TL;DR: The author's own model is presented, which infers new goals from user utterances and integrates them into the system's model of the user's plan, incrementally expanding and adding detail to its beliefs about what the information seeker wants to do.
Abstract: From the Publisher: In most current natural language systems each query is treated as an isolated request for information regardless of its context in dialogue. Sandra Carberry addresses the problem of creating computational strategies that can improve user-computer communication by assimilating ongoing dialogue and reasoning on the acquired knowledge. Plan Recognition in Natural Language Dialogue critically examines plan recognition - the inference of an agent's goals and how he or she intends to achieve them. It describes significant models of plan inference and presents in detail the author's own model, which infers new goals from user utterances and integrates them into the system's model of the user's plan, incrementally expanding and adding detail to its beliefs about what the information seeker wants to do. Carberry then outlines computational strategies for interpreting two kinds of problematic utterances: utterances that violate the pragmatic rules of the system's world model and intersentential elliptical fragments. She also suggests directions for future research. Sandra Carberry is Assistant Professor in the Department of Computer and Information Sciences at the University of Delaware. Plan Recognition in Natural Language Dialogue is included in the ACL-MIT Press Series in Natural Language Processing edited by Aravind Joshi.

Book ChapterDOI
01 Oct 1990
TL;DR: This chapter provides an overview of L2 writing process research, including (1) its relationship to first language (L1) research, (2) a survey of L1 studies, (3) a summary of recurring issues, and (4) suggestions for future research.
Abstract: Not too long ago, second language acquisition theorist Stephen Krashen claimed that “studies of second language writing are sadly lacking” (1984: 41). To be sure, to that date, few studies had yet been shared with the second language research community, but many studies were being conducted at that time, and shortly thereafter these studies became a part of the growing body of literature on second language (L2) writing research. Of particular concern here is research on second language writing processes. This chapter provides an overview of L2 writing process research, including (1) its relationship to first language (L1) research, (2) a survey of L2 studies, (3) a summary of recurring issues, and (4) suggestions for future research. A few product-based studies are also included because they corroborate the findings of process-oriented research and because many L2 studies include both product- and process-based data (see Connor 1987). The relationship of L1 research to L2 research Second language composition textbooks abound, and, as Silva points out in the first chapter of this volume, approaches to teaching L2 writing exist in plenty, supported by ardent, even “evangelical,” advocates and readily accessible materials. Second language composition instruction is, then, well established and much of it follows theory. However, L2 composition teaching has generally not been based on theoretically derived insights gained from L2 composition research, because until the 1980s there was not much L2 research to draw upon in building theory or planning classes.

Book ChapterDOI
TL;DR: This paper discusses three different but related large-scale computational methods to transform Mrds into Mtds, the Longman Dictionary of Contemporary English (Ldoce), which requires some handcoding of initial information but are largely automatic.
Abstract: Machine readable dictionaries (Mrds) contain knowledge about language and the world essential for tasks in natural language processing (Nlp). However, this knowledge, collected and recorded by lexicographers for human readers, is not presented in a manner for Mrds to be used directly for Nlp tasks. What is badly needed are machine tractable dictionaries (Mtds): Mrds transformed into a format usable for Nlp. This paper discusses three different but related large-scale computational methods to transform Mrds into Mtds. The Mrd used is The Longman Dictionary of Contemporary English (Ldoce). The three methods differ in the amount of knowledge they start with and the kinds of knowledge they provide. All require some handcoding of initial information but are largely automatic. Method I, a statistical approach, uses the least handcoding. It generates “relatedness” networks for words in Ldoce and presents a method for doing partial word sense disambiguation. Method II employs the most handcoding because it develops and builds lexical entries for a very carefully controlled defining vocabulary of 2,000 word senses (1,000 words). The payoff is that the method will provide an Mtd containing highly structured semantic information. Method III requires the handcoding of a grammar and the semantic patterns used by its parser, but not the handcoding of any lexical material. This is because the method builds up lexical material from sources wholly within Ldoce. The information extracted is a set of sources of information, individually weak, but which can be combined to give a strong and determinate linguistic data base.

Journal ArticleDOI
TL;DR: Various linguistic approaches proposed for document analysis in information retrieval environments are summarized, including standard syntactic methods to generate complex content identifiers, and the use of semantic know-how obtained from machine-readable dictionaries, and from specially constructed knowledge bases.
Abstract: This study summarizes various linguistic approaches proposed for document analysis in information retrieval environments. Included are standard syntactic methods to generate complex content identifiers, and the use of semantic know-how obtained from machine-readable dictionaries, and from specially constructed knowledge bases. Certain syntactic term phrase generation systems are examined in detail and their usefulness for text analysis purposes is evaluated.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: An approach to implementing spoken language systems that takes full advantage of syntactic and semantic constraints provided by a natural language processing component in the speed understanding task and provides a tractable search space is discussed.
Abstract: An approach to implementing spoken language systems is discussed. This approach takes full advantage of syntactic and semantic constraints provided by a natural language processing component in the speed understanding task and provides a tractable search space. The results indicate that the approach is a promising one for large-vocabulary spoken language systems. Parse times within a factor of 20 of real time are achieved for high-perplexity syntactic grammars with resulting hidden Markov model recognition computational requirements (2500 active words/frame) that are well within the capability of high-speed multiprocessor computers or special-purpose speech recognition hardware. >

Book
01 Jan 1990
TL;DR: Norbert Hornstein shows how Reichenbach's basic ideas can be combined with poverty-of-stimulus considerations to yield a restricted account of possible tense in natural language, and proposes a theory of natural-language tense that will be responsive to the language-acquisition problem.
Abstract: How do humans acquire, at a very early age and from fragmentary and haphazard data, the complex patterns of their native language? This is the logical problem of language acquisition, and it is the question that directs the search for an innate universal grammar. "As Time Goes By extends the search by proposing a theory of natural-language tense that will be responsive to the language-acquisition problem. The core of the theory is a revision of Hans Reichenbach's theory of tense, modified to satisfy a variety of serious objections that have been brought against it. Hornstein shows how Reichenbach's basic ideas can be combined with poverty-of-stimulus considerations to yield a restricted account of possible tense in natural language. The clearly written discussion proceeds step by step from simple observations and principles to far-reaching conclusions involving complex data carefully selected and persuasively presented. The topics covered include adverbial modification, temporal adjuncts, conditionals, and sequence of tense. Throughout, Hornstein focuses on the logical problem of language acquisition, highlighting the importance of explanatory adequacy and the role of syntactic representations in determining intricate properties of semantic interpretation. Norbert Hornstein is Associate Professor of Linguistics at the University of Maryland.

Proceedings ArticleDOI
04 Nov 1990
TL;DR: The development of a natural language knowledge retrieval (NLKR) system of building an intelligent help prototype and the current implementation of an NLKR system, called GKR (graph-based knowledge retrieval) system, is described.
Abstract: The development of a natural language knowledge retrieval (NLKR) system of building an intelligent help prototype is outlined The objectives of NLKR systems are defined A general architecture, the main components of which are a semantic interpreter, a query processor, and a natural language generator, is proposed The current implementation of an NLKR system, called GKR (graph-based knowledge retrieval) system, is described The GKR system is based on the use of conceptual graphs as a knowledge representation scheme Query processing is treated as a graph matching process, and retrieval is performed by a semantic-based search The query processor uses an algorithm that automatically generates additional queries (subgoals) in conceptual graph form The GKR system is implemented in Prolog Its possible integration into hypertext environments is described Related work is discussed >

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A system that automatically acquires a language model for a particular task from semantic-level information is described, in contrast to systems with predefined vocabulary and syntax.
Abstract: A system that automatically acquires a language model for a particular task from semantic-level information is described. This is in contrast to systems with predefined vocabulary and syntax. The purpose of the system is to map spoken or typed input into a machine action. To accomplish this task a medium-grain neural network is used. An adaptive training procedure is introduced for estimating the connection weights. It has the advantages of rapid, single-pass and order-invariant learning. The resulting weights have information-theoretic significance and do not require gradient search techniques for their estimation. The system was experimentally evaluated on three text-based tasks; a three-class inward-call manager with an acquired vocabulary of over 1600 words, a 15-action subset of the DARPA Resource Manager with an acquired vocabulary of over 700 words, and discrimination between idiomatic phrases meaning yes or no. >

Journal ArticleDOI
TL;DR: This article examined revision in controlled Li and L2 writing tasks and found that while proficient writers are capable of transferring their revision processes across languages, they are also capable of adapting some of those processes to new problems imposed by a second language.
Abstract: investigated this relationship in the context of the revising process. This article examines revision in controlled Li and L2 writing tasks. Four advanced ESL writers with differing first language backgrounds wrote two argumentative essays in their native languages and two in English. Revisions were then analyzed for specific discourse and linguistic features. The results, for the most part, indicate striking similarities across languages. However, some differences are noted, suggesting that while proficient writers are capable of transferring their revision processes across languages, they are also capable of adapting some of those processes to new problems imposed by a second language. Until quite recently, process-oriented research in ESL writing was dominated by studies that sought to identify those strategies and behaviors second language writers share and do not share with their first language counterparts (Raimes, 1985). Such comparisons between primary and nonprimary speakers of English are, of course, valuable if ESL researchers and teachers are to determine the relevance of first language pedagogy for second language writers. Yet they do not represent the only direction for comparative studies with consequences for theory and practice. Researchers are now beginning to examine ESL writers' behaviors and strategies in both their first and second languages, a much needed area of investigation that Zamel (1984) has claimed may have implications for the notion that composing processes are universal and transitive across languages. While most of these crosslanguage studies have focused on general surveys of composing processes (Chelala, 1981) or concentrated on text planning (Lay, 1982, 1983; Jones & Tetroe, 1987), to date, only one significant study has been conducted on revising (Gaskill, 1987). Yet "revision," as

Patent
27 Aug 1990
TL;DR: In this paper, the parser constructs sentences which are valid according to the grammar out of words accepted by the predictor, and the predictor accesses only the lexicon and the knowledge base, if one is used, to determine the valid next input words.
Abstract: A method for parsing for natural languages includes a grammar and a lexicon. A knowledge base may be used to define elements in the lexicon. A processor receives single words input by a user and adds them to a sentence under construction. Valid next words are predicted after each received input word. The preferred system has two major components: a parser and a predictor. The predictor accesses only the lexicon and the knowledge base, if one is used, to determine the valid next input words. The parser constructs sentences which are valid according to the grammar out of words accepted by the predictor.

Book ChapterDOI
01 Oct 1990
TL;DR: The field of writing assessment has developed considerably in the last quarter of the century, but many if not most people in North America believed that writing could be validly tested by an indirect test of writing, and many became convinced of the hopelessness of direct writing assessment.
Abstract: The field of writing assessment has developed considerably in the last quarter of the century. Twenty years or so ago, many if not most people in North America (to a lesser extent in Great Britain and Australia) believed that writing could be validly tested by an indirect test of writing. As we enter the 1990s, however, they have not only been defeated but also chased from the battlefield. This change is the result of social pressure from schools, colleges, and parents, who argued that failure to learn and practice writing reasonable lengths of text in school was leading to declining literacy levels and to a college-entry population that could not think critically about intellectual ideas and academic material. In 1970 researchers had begun to respond to these social pressures, but there were serious questions about the levels of reliability that could be achieved on a direct test of writing (these same questions had been primarily responsible for the disfavor writing had fallen into as a test method from the 1940s on). What was happening in the field of writing assessment then was the kind of transatlantic conflict of philosophies that we have become familiar with in many areas of English as a second language (ESL) teaching. North American research emphasized the failure of direct writing tests to achieve score reliability levels that could compete with score reliabilities on multiple choice items, and many, perhaps even most people became convinced of the hopelessness of direct writing assessment.

Journal ArticleDOI
TL;DR: The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.
Abstract: This paper reviews the current state of the art in natural language access to databases. This has been a long-standing area of work in natural language processing. But though some commercial systems are now available, providing front ends has proved much harder than was expected, and the necessary limitations on front ends have to be recognized. The paper discusses the issues, both general to language and task-specific, involved in front end design, and the way these have been addressed, concentrating on the work of the last decade. The focus is on the central process of translating a natural language question into a database query, but other supporting functions are also covered. The points are illustrated by the use of a single example application. The paper concludes with an evaluation of the current state, indicating that future progress will depend on the one hand on general advances in natural language processing, and on the other on expanding the capabilities of traditional databases.

Book
29 Jun 1990
TL;DR: The relation between language, mind, and reality is discussed in this paper, with a focus on the direct interpretation of natural language in the context of situation theory, and the relation between representation, psychology, or reality.
Abstract: Preface Part I. Introduction: The Relation Between Language, Mind, and Reality Part II. On The Direct Interpretation of Natural Languages: 1. Contexts, models, and meanings: a note on the data of semantics James Higginbotham 2. Facts in situation theory: representation, psychology, or reality? Robin Cooper 3. Relational interpretation Elisabet Engdahl Part III. On The Syntactic Base For Interpretation: 4. Bound variable anaphora Robert May 5. On implicit arguments Michael Brody and M. Rita Manzini Part IV. On Internal Representations and Natural Language Use: 6. Representation and relevance Deirdre Wilson and Dan Sperber 7. Implicature, explicature, and truth-theoretic semantics Robyn Carston 8. 'So' as a constraint on relevance Diane Blakemore Part V. The Language Faculty and Cognition: 9. On the grammar-cognition interface: the principle of full interpretation Ruth M. Kempson Index.

01 May 1990
TL;DR: This work has defined and implemented a schema specification and recognition language for the TACITUS natural language system and gives examples of the use of this schema language in a diagnostic task, an application involving data base entry from messages, and a script recognition task.
Abstract: : Many seemingly very different application tasks for natural language systems can be viewed as a matter of inferring the instance of a prespecified schema from the information in the text and the knowledge base. We have defined and implemented a schema specification and recognition language for the TACITUS natural language system. This effort entailed adding operators sensitive to resource bounds to the first-order predicate calculus accepted by a theorem-prover. We give examples of the use of this schema language in a diagnostic task, an application involving data base entry from messages, and a script recognition task, and we consider further possible developments.

Journal ArticleDOI
TL;DR: The language can be viewed as a very high-level special-purpose language, that moves the user several levels away from the inherent programming language (e.g. LISP, as in this case, Pascal or C).

Proceedings Article
29 Jul 1990
TL;DR: The generalized mutual information statistic is derived, the parsing algorithm is described, and results and sample output from the parser are presented.
Abstract: The purpose of this paper is to characterize a constituent boundary parsing algorithm, using an information-theoretic measure called generalized mutual information, which serves as an alternative to traditional grammar-based parsing methods. This method is based on the hypothesis that constituent boundaries can be extracted from a given sentence (or word sequence) by analyzing the mutual information values of the part of speech n-grams within the sentence. This hypothesis is supported by the performance of an implementation of this parsing algorithm which determines a recursive unlabeled bracketing of unrestricted English text with a relatively low error rate. This paper derives the generalized mutual information statistic, describes the parsing algorithm, and presents results and sample output from the parser.

Proceedings ArticleDOI
01 Jan 1990
TL;DR: The kind of application that the text categorization shell, TCS, can produce is characterized and how it meets its design goals are described, and examples of applications built with TCS are given.
Abstract: The kind of application that the text categorization shell, TCS, can produce is characterized. Many of its applications have great commercial value. The design goals for TCS are discussed, and other approaches to text categorization in the light of these goals are examined. The TCS and how it meets its design goals are described, and examples of applications built with TCS are given. A text-categorization application developed with TCS consists of the TCS run-time system and a rule base. The rule base defines what categories the application can assign to texts and contains rules that make the categorization decisions for particular texts. The data-driven nature of TCS allows it is to satisfy fully the requirements of ease of application development, portability to other applications and maintainability. >

Proceedings ArticleDOI
05 Feb 1990
TL;DR: ViewSystem provides an object-oriented query language and a method language with universal computational power and a distinguished set of class constructors for deriving classes from underlying classes.
Abstract: In order to overcome practical integration problems, ViewSystem, an object-oriented programming environment with dedicated integration operators by which syntactically and semantically heterogeneous information can be integrated incrementally, has been developed. ViewSystem provides an object-oriented query language and a method language with universal computational power and a distinguished set of class constructors for deriving classes from underlying classes. To process queries against derived classes, a hybrid approach combining the techniques of query decomposition and materialization is proposed. >