A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)

doi:10.1109/ASRU.1997.659110

Proceedings ArticleDOI

A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)

Jonathan G. Fiscus

- pp 347-354

Chats0

TLDR

The NIST Recognizer Output Voting Error Reduction (ROVER) system as discussed by the authors was developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which the composite ASR output has a lower error rate than any of the individual systems.

Abstract:

Describes a system developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which, in many cases, the composite ASR output has a lower error rate than any of the individual systems. The system implements a "voting" or rescoring process to reconcile differences in ASR system outputs. We refer to this system as the NIST Recognizer Output Voting Error Reduction (ROVER) system. As additional knowledge sources are added to an ASR system (e.g. acoustic and language models), error rates are typically decreased. This paper describes a post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate. To accomplish this, the outputs of multiple of ASR systems are combined into a single, minimal-cost word transition network (WTN) via iterative applications of dynamic programming (DP) alignments. The resulting network is searched by an automatic rescoring or "voting" process that selects the output sequence with the lowest score.

A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)

Citations

SRILM – An Extensible Language Modeling Toolkit

Intelligent Automated Assistant

Weighted finite-state transducers in speech recognition

Finding consensus in speech recognition: word error minimization and other applications of confusion networks☆

Application of Hidden Markov Models in Speech Recognition

References

Tools for the analysis of benchmark speech recognition tests

Related Papers (5)

SRILM – An Extensible Language Modeling Toolkit

Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models

The Kaldi Speech Recognition Toolkit

Bleu: a Method for Automatic Evaluation of Machine Translation

Perceptual linear predictive (PLP) analysis of speech