scispace - formally typeset
Open AccessJournal ArticleDOI

Word association norms, mutual information, and lexicography

Kenneth Church, +1 more
- 01 Mar 1990 - 
- Vol. 16, Iss: 1, pp 22-29
Reads0
Chats0
TLDR
The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.
Abstract
The term word association is used in a very particular sense in the psycholinguistic literature (Generally speaking, subjects respond quicker than normal to the word nurse if it follows a highly associated word such as doctor ) We will extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word) This paper will propose an objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable) The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words

read more

Citations
More filters
Proceedings ArticleDOI

Mining and summarizing customer reviews

TL;DR: This research aims to mine and to summarize all the customer reviews of a product, and proposes several novel techniques to perform these tasks.
Journal ArticleDOI

A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.

TL;DR: A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
Proceedings Article

A Comparative Study on Feature Selection in Text Categorization

TL;DR: This paper finds strong correlations between the DF IG and CHI values of a term and suggests that DF thresholding the simplest method with the lowest cost in computation can be reliably used instead of IG or CHI when the computation of these measures are too expensive.
Posted Content

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

TL;DR: A simple unsupervised learning algorithm for classifying reviews as recommended (thumbs up) or not recommended (Thumbs down) if the average semantic orientation of its phrases is positive.
Book

Speech and Language Processing

Dan Jurafsky, +1 more
TL;DR: It is now clear that HAL's creator, Arthur C. Clarke, was a little optimistic in predicting when an artificial agent such as HAL would be avail-able as discussed by the authors.
References
More filters
Proceedings ArticleDOI

A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

TL;DR: The authors used a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (pb probability of observing n following partsof speech).
Related Papers (5)