Finding consensus among words : Lattice-based word error minimization

Open AccessProceedings Article

Finding consensus among words : Lattice-based word error minimization

TLDR

A new algorithm for finding the hypothesis in a recognition lattice that is expected to minimize the word error rate (WER) is described, which overcomes the mismatch between the word-based performance metric and the standard MAP scoring paradigm that is sentence-based.

Abstract:

We describe a new algorithm for finding the hypothesis in a recognition lattice that is expected to minimize the word error rate (WER). Our approach thus overcomes the mismatch between the word-based performance metric and the standard MAP scoring paradigm that is sentence-based, and that can lead to sub-optimal recognition results. To this end we first find a complete alignment of all words in the recognition lattice, identifying mutually supporting and competing word hypotheses. Finally, a new sentence hypothesis is formed by concatenating the words with maximal posterior probabilities. Experimentally, this approach leads to a significant WER reduction in a large vocabulary recognition task.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Finding consensus in speech recognition: word error minimization and other applications of confusion networks☆

Lidia Mangu, +2 more

- 01 Oct 2000 -

Computer Speech & Language

TL;DR: A new framework for distilling information from word lattices is described to improve the accuracy of the speech recognition output and obtain a more perspicuous representation of a set of alternative hypotheses.

...read moreread less

Book

Application of Hidden Markov Models in Speech Recognition

Mark J. F. Gales, +1 more

TL;DR: The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance.

...read moreread less

Journal ArticleDOI

The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

Steve Young, +6 more

- 01 Apr 2010 -

Computer Speech & Language

TL;DR: This paper explains how Partially Observable Markov Decision Processes (POMDPs) can provide a principled mathematical framework for modelling the inherent uncertainty in spoken dialogue systems and describes a form of approximation called the Hidden Information State model which does scale and which can be used to build practical systems.

...read moreread less

Posterior probability decoding, confidence estimation and system combination

Gunnar Evermann, +1 more

TL;DR: The word lattices produced by the Viterbi decoder were used to generate confusion networks, which provide a compact representation of the most likely word hypotheses and their associated word posterior probabilities.

...read moreread less

Journal ArticleDOI

A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context

Mirko Plitt, +1 more

- 01 Jan 2010 -

The Prague Bulletin of Mathematical Ling...

TL;DR: A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context and results show a productivity increase for each participant, with significant variance across inviduals.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Pattern Classification and Scene Analysis.

Ulf Grenander, +2 more

- 01 Sep 1974 -

Journal of the American Statistical Asso...

Book

Pattern classification and scene analysis

Richard O. Duda, +1 more

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.

...read moreread less

Proceedings ArticleDOI

SWITCHBOARD: telephone speech corpus for research and development

J.J. Godfrey, +2 more

TL;DR: SWITCHBOARD as mentioned in this paper is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition.

...read moreread less

Journal ArticleDOI

A Maximum Likelihood Approach to Continuous Speech Recognition

Lalit R. Bahl, +2 more

- 01 Feb 1983 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This paper describes a number of statistical models for use in speech recognition, with special attention to determining the parameters for such models from sparse data, and describes two decoding methods appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks.

...read moreread less

Proceedings ArticleDOI

A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)

Jonathan G. Fiscus

TL;DR: The NIST Recognizer Output Voting Error Reduction (ROVER) system as discussed by the authors was developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which the composite ASR output has a lower error rate than any of the individual systems.

...read moreread less