scispace - formally typeset
Open AccessPosted Content

Natural language parsing as statistical pattern recognition

Reads0
Chats0
TLDR
This work proposes an automatic method for acquiring a statistical parser from a set of parsed sentences which takes advantage of some initial linguistic input, but avoids the pitfalls of the iterative and seemingly endless grammar development process.
Abstract
Traditional natural language parsers are based on rewrite rule systems developed in an arduous, time-consuming manner by grammarians. A majority of the grammarian's efforts are devoted to the disambiguation process, first hypothesizing rules which dictate constituent categories and relationships among words in ambiguous sentences, and then seeking exceptions and corrections to these rules. In this work, I propose an automatic method for acquiring a statistical parser from a set of parsed sentences which takes advantage of some initial linguistic input, but avoids the pitfalls of the iterative and seemingly endless grammar development process. Based on distributionally-derived and linguistically-based features of language, this parser acquires a set of statistical decision trees which assign a probability distribution on the space of parse trees given the input sentence. These decision trees take advantage of significant amount of contextual information, potentially including all of the lexical information in the sentence, to produce highly accurate statistical models of the disambiguation process. By basing the disambiguation criteria selection on entropy reduction rather than human intuition, this parser development method is able to consider more sentences than a human grammarian can when making individual disambiguation rules. In experiments between a parser, acquired using this statistical framework, and a grammarian's rule-based parser, developed over a ten-year period, both using the same training material and test sentences, the decision tree parser significantly outperformed the grammar-based parser on the accuracy measure which the grammarian was trying to maximize, achieving an accuracy of 78% compared to the grammar-based parser's 69%.

read more

Citations
More filters
Book

Foundations of Statistical Natural Language Processing

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Book

Speech and Language Processing

Dan Jurafsky, +1 more
TL;DR: It is now clear that HAL's creator, Arthur C. Clarke, was a little optimistic in predicting when an artificial agent such as HAL would be avail-able as discussed by the authors.
Journal ArticleDOI

Expectation-based syntactic comprehension

TL;DR: A simple information-theoretic characterization of processing difficulty as the work incurred by resource reallocation during parallel, incremental, probabilistic disambiguation in sentence comprehension is proposed, and its equivalence to the theory of Hale is demonstrated.
Book

Artificial Intelligence: A New Synthesis

TL;DR: Intelligent agents are employed as the central characters in this new introductory text and Nilsson gradually increases their cognitive horsepower to illustrate the most important and lasting ideas in AI.
Journal ArticleDOI

Automatic Construction of Decision Trees from Data: A Multi-Disciplinary Survey

TL;DR: This paper surveys existing work on decision tree construction, attempting to identify the important issues involved, directions the work has taken and the current state of the art.
References
More filters
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book

Classification and regression trees

Leo Breiman
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.
Journal ArticleDOI

Aspects of the Theory of Syntax

TL;DR: Methodological preliminaries of generative grammars as theories of linguistic competence; theory of performance; organization of a generative grammar; justification of grammar; descriptive and explanatory theories; evaluation procedures; linguistic theory and language learning.
Book

Aspects of the Theory of Syntax

Noam Chomsky
TL;DR: Generative grammars as theories of linguistic competence as discussed by the authors have been used as a theory of performance for language learning. But they have not yet been applied to the problem of language modeling.