Distributional clustering of english words

doi:10.3115/981574.981598

Open AccessProceedings ArticleDOI

Distributional clustering of english words

- pp 183-190

TLDR

In this article, a method for clustering words according to their distribution in particular syntactic contexts is described and evaluated experimentally, where words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for word clustering.

Abstract:

We describe and evaluate experimentally a method for clustering words according to their distribution in particular syntactic contexts. Words are represented by the relative frequency distributions of contexts in which they appear, and relative entropy between those distributions is used as the similarity measure for clustering. Clusters are represented by average context distributions derived from the given words according to their probabilities of cluster membership. In many cases, the clusters can be thought of as encoding coarse sense distinctions. Deterministic annealing is used to find lowest distortion sets of clusters: as the annealing parameter increases, existing clusters become unstable and subdivide, yielding a hierarchical "soft" clustering of the data. Clusters are used as the basis for class models of word coocurrence, and the models evaluated with respect to held-out test data.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

An introduction to variable and feature selection

Isabelle Guyon, +1 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

...read moreread less

Journal ArticleDOI

A neural probabilistic language model

Yoshua Bengio, +3 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.

...read moreread less

Journal ArticleDOI

Probabilistic latent semantic indexing

Thomas Hofmann

TL;DR: Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data.

...read moreread less

Proceedings Article

An Information-Theoretic Definition of Similarity

Dekang Lin

TL;DR: This work presents an informationtheoretic definition of similarity that is applicable as long as there is a probabilistic model and demonstrates how this definition can be used to measure the similarity in a number of different domains.

...read moreread less

BookDOI

Semi-Supervised Learning

Olivier Chapelle, +2 more

TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Maximum likelihood from incomplete data via the EM algorithm

Arthur P. Dempster, +2 more

- 01 Sep 1977 -

Journal of the royal statistical society...

Book

Elements of information theory

Thomas M. Cover, +1 more

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.

...read moreread less

Book

Pattern classification and scene analysis

Richard O. Duda, +1 more

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

Journal ArticleDOI

Class-based n -gram models of natural language

Peter Fitzhugh Brown, +4 more

- 01 Dec 1992 -

Computational Linguistics

TL;DR: This work addresses the problem of predicting a word from previous words in a sample of text and discusses n-gram models based on classes of words, finding that these models are able to extract classes that have the flavor of either syntactically based groupings or semanticallybased groupings, depending on the nature of the underlying statistics.

...read moreread less

Proceedings ArticleDOI

A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

Kenneth Church

TL;DR: The authors used a linear-time dynamic programming algorithm to find an assignment of parts of speech to words that optimizes the product of (a) lexical probabilities (probability of observing part of speech i given word i) and (b) contextual probabilities (pb probability of observing n following partsof speech).

...read moreread less

Distributional clustering of english words

Citations

An introduction to variable and feature selection

A neural probabilistic language model

Probabilistic latent semantic indexing

An Information-Theoretic Definition of Similarity

Semi-Supervised Learning

References

Maximum likelihood from incomplete data via the EM algorithm

Elements of information theory

Pattern classification and scene analysis

Class-based n -gram models of natural language

A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text

Related Papers (5)

Indexing by Latent Semantic Analysis

WordNet : an electronic lexical database

Elements of information theory

Maximum likelihood from incomplete data via the EM algorithm

Foundations of Statistical Natural Language Processing