scispace - formally typeset
Proceedings ArticleDOI

A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)

Jonathan G. Fiscus
- pp 347-354
Reads0
Chats0
TLDR
The NIST Recognizer Output Voting Error Reduction (ROVER) system as discussed by the authors was developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which the composite ASR output has a lower error rate than any of the individual systems.
Abstract
Describes a system developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which, in many cases, the composite ASR output has a lower error rate than any of the individual systems. The system implements a "voting" or rescoring process to reconcile differences in ASR system outputs. We refer to this system as the NIST Recognizer Output Voting Error Reduction (ROVER) system. As additional knowledge sources are added to an ASR system (e.g. acoustic and language models), error rates are typically decreased. This paper describes a post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate. To accomplish this, the outputs of multiple of ASR systems are combined into a single, minimal-cost word transition network (WTN) via iterative applications of dynamic programming (DP) alignments. The resulting network is searched by an automatic rescoring or "voting" process that selects the output sequence with the lowest score.

read more

Citations
More filters
Proceedings Article

SRILM – An Extensible Language Modeling Toolkit

TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.
Patent

Intelligent Automated Assistant

TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.
Journal ArticleDOI

Weighted finite-state transducers in speech recognition

TL;DR: WFSTs provide a common and natural representation for hidden Markov models (HMMs), context-dependency, pronunciation dictionaries, grammars, and alternative recognition outputs, and general transducer operations combine these representations flexibly and efficiently.
Journal ArticleDOI

Finding consensus in speech recognition: word error minimization and other applications of confusion networks☆

TL;DR: A new framework for distilling information from word lattices is described to improve the accuracy of the speech recognition output and obtain a more perspicuous representation of a set of alternative hypotheses.
Book

Application of Hidden Markov Models in Speech Recognition

TL;DR: The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance.
References
More filters
Proceedings ArticleDOI

Tools for the analysis of benchmark speech recognition tests

TL;DR: The development of tools for the analysis of benchmark speech recognition system tests and studies of an alternative to the alignment process presently used in the DARPA/NIST scoring software are reported.
Related Papers (5)