A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text
Kenneth Church
- pp 136-143
TLDR
The authors used a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (pb probability of observing n following partsof speech).Abstract:
A program that tags each word in an input sentence with the most likely part of speech has been written. The program uses a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (probability of observing part of speech i given n following parts of speech). Program performance is encouraging; a 400-word sample is presented and is judged to be 99.5% correct. >read more
Citations
More filters
Journal ArticleDOI
Word association norms, mutual information, and lexicography
Kenneth Church,Patrick Hanks +1 more
TL;DR: The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.
Proceedings ArticleDOI
Feature-rich part-of-speech tagging with a cyclic dependency network
TL;DR: A new part-of-speech tagger is presented that demonstrates the following ideas: explicit use of both preceding and following tag contexts via a dependency network representation, broad use of lexical features, and effective use of priors in conditional loglinear models.
Journal ArticleDOI
An empirical study of smoothing techniques for language modeling
TL;DR: This work surveys the most widely-used algorithms for smoothing models for language n -gram modeling, and presents an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980), and introduces methodologies for analyzing smoothing algorithm efficacy in detail.
Proceedings ArticleDOI
Predicting the Semantic Orientation of Adjectives
TL;DR: A log-linear regression model uses constraints from conjunctions to predict whether conjoined adjectives are of same or different orientations, achieving 82% accuracy in this task when each conjunction is considered independently.
Proceedings ArticleDOI
A Simple Rule-Based Part of Speech Tagger
TL;DR: This work presents a simple rule-based part of speech tagger which automatically acquires its rules and tags with accuracy comparable to stochastic taggers, demonstrating that the stochastics method is not the only viable method for part ofspeech tagging.
References
More filters
Journal ArticleDOI
Three models for the description of language
TL;DR: It is found that no finite-state Markov process that produces symbols with transition from state to state can serve as an English grammar, and the particular subclass of such processes that produce n -order statistical approximations to English do not come closer, with increasing n, to matching the output of anEnglish grammar.
Journal ArticleDOI
Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561:
Journal ArticleDOI
Theory of Syntactic Recognition for Natural Languages
TL;DR: A theory of syntactic recognition for natural language can be found in this article, where the authors make use of the deterministic hypothesis that the syntax can be parsed by a mechanism which operates "strictly deterministically" in that it does not simulate a non-deterministic machine.
Book
Collins COBUILD English Language Dictionary
TL;DR: This is a dictionary of English as it is actually used and is also written and presented in plain English, enabling easier and earlier use of a monolingual dictionary.
The Automatic Grammatical Tagging of the LOB Corpus
TL;DR: An account of the automatic grammatical tagging of the LOB (LancasterOslo/Bergen) Corpus of British English, with special reference to the methods of tagging the authors have adopted.