scispace - formally typeset
Open AccessProceedings Article

Finding consensus among words : Lattice-based word error minimization

TLDR
A new algorithm for finding the hypothesis in a recognition lattice that is expected to minimize the word error rate (WER) is described, which overcomes the mismatch between the word-based performance metric and the standard MAP scoring paradigm that is sentence-based.
Abstract
We describe a new algorithm for finding the hypothesis in a recognition lattice that is expected to minimize the word error rate (WER). Our approach thus overcomes the mismatch between the word-based performance metric and the standard MAP scoring paradigm that is sentence-based, and that can lead to sub-optimal recognition results. To this end we first find a complete alignment of all words in the recognition lattice, identifying mutually supporting and competing word hypotheses. Finally, a new sentence hypothesis is formed by concatenating the words with maximal posterior probabilities. Experimentally, this approach leads to a significant WER reduction in a large vocabulary recognition task.

read more

Citations
More filters
Journal ArticleDOI

Finding consensus in speech recognition: word error minimization and other applications of confusion networks☆

TL;DR: A new framework for distilling information from word lattices is described to improve the accuracy of the speech recognition output and obtain a more perspicuous representation of a set of alternative hypotheses.
Book

Application of Hidden Markov Models in Speech Recognition

TL;DR: The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance.
Journal ArticleDOI

The Hidden Information State model: A practical framework for POMDP-based spoken dialogue management

TL;DR: This paper explains how Partially Observable Markov Decision Processes (POMDPs) can provide a principled mathematical framework for modelling the inherent uncertainty in spoken dialogue systems and describes a form of approximation called the Hidden Information State model which does scale and which can be used to build practical systems.

Posterior probability decoding, confidence estimation and system combination

TL;DR: The word lattices produced by the Viterbi decoder were used to generate confusion networks, which provide a compact representation of the most likely word hypotheses and their associated word posterior probabilities.
Journal ArticleDOI

A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context

TL;DR: A Productivity Test of Statistical Machine Translation Post-Editing in a Typical Localisation Context and results show a productivity increase for each participant, with significant variance across inviduals.
References
More filters
Book

Pattern classification and scene analysis

TL;DR: In this article, a unified, comprehensive and up-to-date treatment of both statistical and descriptive methods for pattern recognition is provided, including Bayesian decision theory, supervised and unsupervised learning, nonparametric techniques, discriminant analysis, clustering, preprosessing of pictorial data, spatial filtering, shape description techniques, perspective transformations, projective invariants, linguistic procedures, and artificial intelligence techniques for scene analysis.
Proceedings ArticleDOI

SWITCHBOARD: telephone speech corpus for research and development

TL;DR: SWITCHBOARD as mentioned in this paper is a large multispeaker corpus of conversational speech and text which should be of interest to researchers in speaker authentication and large vocabulary speech recognition.
Journal ArticleDOI

A Maximum Likelihood Approach to Continuous Speech Recognition

TL;DR: This paper describes a number of statistical models for use in speech recognition, with special attention to determining the parameters for such models from sparse data, and describes two decoding methods appropriate for constrained artificial languages and one appropriate for more realistic decoding tasks.
Proceedings ArticleDOI

A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)

TL;DR: The NIST Recognizer Output Voting Error Reduction (ROVER) system as discussed by the authors was developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which the composite ASR output has a lower error rate than any of the individual systems.
Related Papers (5)