Open AccessPosted Content
Natural language parsing as statistical pattern recognition
Reads0
Chats0
TLDR
This work proposes an automatic method for acquiring a statistical parser from a set of parsed sentences which takes advantage of some initial linguistic input, but avoids the pitfalls of the iterative and seemingly endless grammar development process.Abstract:
Traditional natural language parsers are based on rewrite rule systems developed in an arduous, time-consuming manner by grammarians. A majority of the grammarian's efforts are devoted to the disambiguation process, first hypothesizing rules which dictate constituent categories and relationships among words in ambiguous sentences, and then seeking exceptions and corrections to these rules.
In this work, I propose an automatic method for acquiring a statistical parser from a set of parsed sentences which takes advantage of some initial linguistic input, but avoids the pitfalls of the iterative and seemingly endless grammar development process. Based on distributionally-derived and linguistically-based features of language, this parser acquires a set of statistical decision trees which assign a probability distribution on the space of parse trees given the input sentence. These decision trees take advantage of significant amount of contextual information, potentially including all of the lexical information in the sentence, to produce highly accurate statistical models of the disambiguation process. By basing the disambiguation criteria selection on entropy reduction rather than human intuition, this parser development method is able to consider more sentences than a human grammarian can when making individual disambiguation rules.
In experiments between a parser, acquired using this statistical framework, and a grammarian's rule-based parser, developed over a ten-year period, both using the same training material and test sentences, the decision tree parser significantly outperformed the grammar-based parser on the accuracy measure which the grammarian was trying to maximize, achieving an accuracy of 78% compared to the grammar-based parser's 69%.read more
Citations
More filters
Book
Foundations of Statistical Natural Language Processing
TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Book
Speech and Language Processing
Dan Jurafsky,James Martin +1 more
TL;DR: It is now clear that HAL's creator, Arthur C. Clarke, was a little optimistic in predicting when an artificial agent such as HAL would be avail-able as discussed by the authors.
Journal ArticleDOI
Expectation-based syntactic comprehension
TL;DR: A simple information-theoretic characterization of processing difficulty as the work incurred by resource reallocation during parallel, incremental, probabilistic disambiguation in sentence comprehension is proposed, and its equivalence to the theory of Hale is demonstrated.
Book
Artificial Intelligence: A New Synthesis
TL;DR: Intelligent agents are employed as the central characters in this new introductory text and Nilsson gradually increases their cognitive horsepower to illustrate the most important and lasting ideas in AI.
Journal ArticleDOI
Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey
TL;DR: This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art.
References
More filters
Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book
Classification and regression trees
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Journal ArticleDOI
Aspects of the Theory of Syntax
Ann S. Ferebee,Noam Chomsky +1 more
TL;DR: Methodological preliminaries of generative grammars as theories of linguistic competence; theory of performance; organization of a generative grammar; justification of grammar; descriptive and explanatory theories; evaluation procedures; linguistic theory and language learning.
Book
Aspects of the Theory of Syntax
TL;DR: Generative grammars as theories of linguistic competence as discussed by the authors have been used as a theory of performance for language learning. But they have not yet been applied to the problem of language modeling.