Estimation of probabilities from sparse data for the language model component of a speech recognizer

doi:10.1109/TASSP.1987.1165125

Journal ArticleDOI

Estimation of probabilities from sparse data for the language model component of a speech recognizer

S. Katz

- 01 Mar 1987 -

IEEE Transactions on Acoustics, Speech, ...

- Vol. 35, Iss: 3, pp 400-401

Chats0

TLDR

The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data, and compares favorably to other proposed methods.

Abstract:

The description of a novel type of m-gram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. While the method has been developed for and successfully implemented in the IBM Real Time Speech Recognizers, its generality makes it applicable in other areas where the problem of estimating probabilities from sparse data arises.

Citations

PDF

Open Access

More filters

Book

Deep Learning

Ian Goodfellow, +2 more

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.

...read moreread less

Journal ArticleDOI

A neural probabilistic language model

Yoshua Bengio, +3 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.

...read moreread less

Journal ArticleDOI

Unsupervised Learning by Probabilistic Latent Semantic Analysis

Thomas Hofmann

- 01 Jan 2001 -

Machine Learning

TL;DR: This paper proposes to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice, and results in a more principled approach with a solid foundation in statistical inference.

...read moreread less

Journal ArticleDOI

Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language

Philip Resnik

- 01 Jul 1999 -

Journal of Artificial Intelligence Resea...

TL;DR: In this paper, a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content is presented, and experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge counting approach.

...read moreread less

Journal ArticleDOI

An empirical study of smoothing techniques for language modeling

Stanley F. Chen, +1 more

- 01 Oct 1999 -

Computer Speech & Language

TL;DR: This work surveys the most widely-used algorithms for smoothing models for language n -gram modeling, and presents an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980), and introduces methodologies for analyzing smoothing algorithm efficacy in detail.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

The population frequencies of species and the estimation of population parameters

Irving John Good

- 01 Dec 1953 -

Biometrika

Interpolated estimation of Markov source parameters from sparse data

Frederick Jelinek

Book ChapterDOI

An Empirical Bayes Approach to Statistics

Herbert Robbins

TL;DR: In this paper, a random variable with a priori distribution function is considered, and a probability distribution depending in a known way on an unknown real parameter A, where A is assumed to have discrete values.

...read moreread less

Journal ArticleDOI

Estimation of probabilities in the language model of the IBM speech recognition system

A. Nadas

- 01 Aug 1984 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: The predictive power of the model thus fitted is compared by means of its experimental perplexity to the model as fitted by the Jelinek-Mercer deleted estimator and by the Turing-Good formulas for probabilities of unseen or rarely seen events.

...read moreread less

Journal ArticleDOI

On Turing's formula for word probabilities

A. Nadas

- 01 Dec 1985 -

IEEE Transactions on Acoustics, Speech, ...

TL;DR: It is remarkable that Turing's formula can be obtained by significantly different statistical methods; it is compared three ways to obtain it.

...read moreread less

Estimation of probabilities from sparse data for the language model component of a speech recognizer

Citations

Deep Learning

A neural probabilistic language model

Unsupervised Learning by Probabilistic Latent Semantic Analysis

Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language

An empirical study of smoothing techniques for language modeling

References

The population frequencies of species and the estimation of population parameters

Interpolated estimation of Markov source parameters from sparse data

An Empirical Bayes Approach to Statistics

Estimation of probabilities in the language model of the IBM speech recognition system

On Turing's formula for word probabilities

Related Papers (5)

The population frequencies of species and the estimation of population parameters

Class-based n -gram models of natural language

SRILM – An Extensible Language Modeling Toolkit

A tutorial on hidden Markov models and selected applications in speech recognition

Maximum likelihood from incomplete data via the EM algorithm