Proceedings ArticleDOI
Comparison of part-of-speech and automatically derived category-based language models for speech recognition
Thomas Niesler,E.W.D. Whittaker,Philip C. Woodland +2 more
- Vol. 1, pp 177-180
Reads0
Chats0
TLDR
This paper compares various category-based language models when used in conjunction with a word-based trigram by means of linear interpolation to find the largest improvement with a model using automatically determined categories.Abstract:
This paper compares various category-based language models when used in conjunction with a word-based trigram by means of linear interpolation. Categories corresponding to parts-of-speech as well as automatically clustered groupings are considered. The category-based model employs variable-length n-grams and permits each word to belong to multiple categories. Relative word error rate reductions of between 2 and 7% over the baseline are achieved in N-best rescoring experiments on the Wall Street Journal corpus. The largest improvement is obtained with a model using automatically determined categories. Perplexities continue to decrease as the number of different categories is increased, but improvements in the word error rate reach an optimum.read more
Citations
More filters
Journal ArticleDOI
A neural probabilistic language model
TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Proceedings Article
Hierarchical Probabilistic Neural Network Language Model.
Frederic Morin,Yoshua Bengio +1 more
TL;DR: A hierarchical decomposition of the conditional probabilities that yields a speed-up of about 200 both during training and recognition, constrained by the prior knowledge extracted from the WordNet semantic hierarchy is introduced.
Book ChapterDOI
Neural Probabilistic Language Models
TL;DR: This work proposes to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, and incorporates this new language model into a state-of-the-art speech recognizer of conversational speech.
Posted Content
A Bit of Progress in Language Modeling
TL;DR: A combination of all techniques together to a Katz smoothed trigram model with no count cutoffs achieves perplexity reductions between 38 and 50% (1 bit of entropy), depending on training data size, as well as a word error rate reduction of 8.9%.
Journal ArticleDOI
A bit of progress in language modeling
TL;DR: The authors compare a combination of all of these techniques together to a Katz smoothed trigram model with no count cutoffs, achieving perplexity reductions between 38 and 50% depending on training data size, as well as a word error rate reduction of 8.9%.
References
More filters
Journal ArticleDOI
Estimation of probabilities from sparse data for the language model component of a speech recognizer
TL;DR: The model offers, via a nonlinear recursive procedure, a computation and space efficient solution to the problem of estimating probabilities from sparse data, and compares favorably to other proposed methods.
Proceedings ArticleDOI
The design for the wall street journal-based CSR corpus
Douglas B. Paul,Janet M. Baker +1 more
TL;DR: This paper presents the motivating goals, acoustic data design, text processing steps, lexicons, and testing paradigms incorporated into the multi-faceted WSJ CSR Corpus, a corpus containing significant quantities of both speech data and text data.
Proceedings Article
The design for the wall street journal-based CSR corpus.
Douglas B. Paul,Janet M. Baker +1 more
TL;DR: The WSJ CSR Corpus as mentioned in this paper is the first general-purpose English, large vocabulary, natural language, high perplexity, corpus containing significant quantities of both speech data (400 hrs.) and text data (47M words), thereby providing a means to integrate speech recognition and natural language processing in application domains with high potential practical value.
Proceedings Article
Statistical Language Modeling using the CMU-Cambridge Toolkit
Philip Clarkson,Ronald Rosenfeld +1 more
TL;DR: The CMU Statistical Language Modeling toolkit was re leased in in order to facilitate the construction and testing of bigram and trigram language models and the technology as implemented in the toolkit is outlined.
Journal ArticleDOI
On structuring probabilistic dependences in stochastic language modelling
TL;DR: The problem of stochastic language modelling is studied from the viewpoint of introducing suitable structures into the conditional probability distributions, and nonlinear interpolation as an alternative to linear interpolation; equivalence classes for word histories and single words; cache memory and word associations are considered.