Phrase bigrams for continuous speech recognition

doi:10.1109/ICASSP.1995.479405

Proceedings ArticleDOI

Phrase bigrams for continuous speech recognition

E.P. Giachin

- Vol. 1, pp 225-228

Chats0

TLDR

Two procedures for automatically determining frequent phrases (within the framework of a probabilistic language model) in an unlabeled training set of written sentences are discussed and one procedure is optimal since it minimises the set perplexity.

Abstract:

In some speech recognition tasks, such as man-machine dialogue systems, the spoken sentences include several recurrent phrases. A bigram language model does not adequately represent these phrases because it underestimates their probability. A better approach consists of modeling phrases as if they were individual dictionary elements. They we inserted as additional entries into the word lexicon, on which bigrams are finally computed. This paper discusses two procedures for automatically determining frequent phrases (within the framework of a probabilistic language model) in an unlabeled training set of written sentences. One procedure is optimal since it minimises the set perplexity. The other, based on information theoretic criteria, insures that the resulting model has a high statistical robustness. The two procedures are tested on a 762-word spontaneous speech recognition task. They give similar results and provide a moderate improvement over standard bigrams.

Phrase bigrams for continuous speech recognition

Citations

How may I help you

Grammar fragment acquisition using syntactic and semantic clustering

Toward a unified approach to statistical language modeling for Chinese

Metrics for evaluating dialogue strategies in a spoken language system

Improving speech understanding by incorporating database constraints and dialogue history

References

Self-organized language modeling for speech recognition

On smoothing techniques for bigram-based natural language modelling

Towards better language models for spontaneous speech.

Language models for spontaneous speech recognition: a bootstrap method for learning phrase digrams.

Using grammars in forward and backward search

Related Papers (5)

Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams

Towards better language models for spontaneous speech.

Class phrase models for language modeling

Class-based n -gram models of natural language

Self-organized language modeling for speech recognition