scispace - formally typeset
Journal ArticleDOI

Estimation of probabilities from sparse data for the language model component of a speech recognizer

S. Katz
- 01 Mar 1987 - 
- Vol. 35, Iss: 3, pp 400-401
Reads0
Chats0
TLDR
The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data, and compares favorably to other proposed methods.
Abstract
The description of a novel type of m-gram language model is given. The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data. This solution compares favorably to other proposed methods. While the method has been developed for and successfully implemented in the IBM Real Time Speech Recognizers, its generality makes it applicable in other areas where the problem of estimating probabilities from sparse data arises.

read more

Citations
More filters
Book

Deep Learning

TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Journal ArticleDOI

A neural probabilistic language model

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Journal ArticleDOI

Unsupervised Learning by Probabilistic Latent Semantic Analysis

TL;DR: This paper proposes to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice, and results in a more principled approach with a solid foundation in statistical inference.
Journal ArticleDOI

Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language

TL;DR: In this paper, a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content is presented, and experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge counting approach.
Journal ArticleDOI

An empirical study of smoothing techniques for language modeling

TL;DR: This work surveys the most widely-used algorithms for smoothing models for language n -gram modeling, and presents an extensive empirical comparison of several of these smoothing techniques, including those described by Jelinek and Mercer (1980), and introduces methodologies for analyzing smoothing algorithm efficacy in detail.
References
More filters
Book ChapterDOI

An Empirical Bayes Approach to Statistics

TL;DR: In this paper, a random variable with a priori distribution function is considered, and a probability distribution depending in a known way on an unknown real parameter A, where A is assumed to have discrete values.
Journal ArticleDOI

Estimation of probabilities in the language model of the IBM speech recognition system

TL;DR: The predictive power of the model thus fitted is compared by means of its experimental perplexity to the model as fitted by the Jelinek-Mercer deleted estimator and by the Turing-Good formulas for probabilities of unseen or rarely seen events.
Journal ArticleDOI

On Turing's formula for word probabilities

TL;DR: It is remarkable that Turing's formula can be obtained by significantly different statistical methods; it is compared three ways to obtain it.