scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Proceedings ArticleDOI
19 Mar 1984
TL;DR: Results for a speaker dependent connected digit speech recognition task with a base error rate of 1.6%, show that preprocessing the noisy unknown speech with a 10 dB signal-to-noise ratio reduces the error rate from 42% to 10%.
Abstract: Acoustic noise suppression is treated as a problem of finding the minimum mean square error estimate of the speech spectrum from a noisy version. This estimate equals the expected value of its conditional distribution given the noisy spectral value, the mean noise power and the mean speech power. It is shown that speech is not Gaussian. This results in an optimal estimate which is a non-linear function of the spectral magnitude. This function differs from the Wiener filter, especially at high instantaneous signal-to-noise ratios. Since both speech and Gaussian noise have a uniform phase distribution, the optimal estimator of the phase equals the noisy phase. The paper describes how the estimator can be calculated directly from noise-free speech. It describes how to find the optimal estimator for the complex spectrum, the magnitude, the squared magnitude, the log magnitude, and the root-magnitude spectra. Results for a speaker dependent connected digit speech recognition task with a base error rate of 1.6%, show that preprocessing the noisy unknown speech with a 10 dB signal-to-noise ratio reduces the error rate from 42% to 10%. If the template data are also preprocessed in the same way, the error rate reduces to 2.1%, thus recovering 99% of the recognition performance lost due to noise.

138 citations

Journal ArticleDOI
TL;DR: One of the first robust LVCSR systems that uses a syllable-level acoustic unit for LV CSR on telephone-bandwidth speech and exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity.
Abstract: Most large vocabulary continuous speech recognition (LVCSR) systems in the past decade have used a context-dependent (CD) phone as the fundamental acoustic unit. We present one of the first robust LVCSR systems that uses a syllable-level acoustic unit for LVCSR on telephone-bandwidth speech. This effort is motivated by the inherent limitations in phone-based approaches-namely the lack of an easy and efficient way for modeling long-term temporal dependencies. A syllable unit spans a longer time frame, typically three phones, thereby offering a more parsimonious framework for modeling pronunciation variation in spontaneous speech. We present encouraging results which show that a syllable-based system exceeds the performance of a comparable triphone system both in terms of word error rate (WER) and complexity. The WER of the best syllabic system reported here is 49.1% on a standard Switchboard evaluation, a small improvement over the triphone system. We also report results on a much smaller recognition task, OGI Alphadigits, which was used to validate some of the benefits syllables offer over triphones. The syllable-based system exceeds the performance of the triphone system by nearly 20%, an impressive accomplishment since the alphadigits application consists mostly of phone-level minimal pair distinctions.

137 citations

Journal ArticleDOI
TL;DR: This work describes different approaches to develop this biometric technique based on the human iris using Gabor filters and Hamming distance, and the last proposed approach is translation, rotation and scale invariant.

137 citations

Proceedings ArticleDOI
15 Apr 2007
TL;DR: The AMI transcription system for speech in meetings developed in collaboration by five research groups includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data.
Abstract: This paper describes the AMI transcription system for speech in meetings developed in collaboration by five research groups. The system includes generic techniques such as discriminative and speaker adaptive training, vocal tract length normalisation, heteroscedastic linear discriminant analysis, maximum likelihood linear regression, and phone posterior based features, as well as techniques specifically designed for meeting data. These include segmentation and cross-talk suppression, beam-forming, domain adaptation, Web-data collection, and channel adaptive training. The system was improved by more than 20% relative in word error rate compared to our previous system and was used in the NIST RT106 evaluations where it was found to yield competitive performance.

137 citations

Proceedings ArticleDOI
07 May 2001
TL;DR: This work proposes a method for using the World Wide Web to acquire trigram estimates for statistical language modeling, and shows that the interpolated models improve speech recognition word error rate significantly over a small test set.
Abstract: We propose a method for using the World Wide Web to acquire trigram estimates for statistical language modeling. We submit an N-gram as a phrase query to Web search engines. The search engines return the number of Web pages containing the phrase, from which the N-gram count is estimated. The N-gram counts are then used to form Web-based trigram probability estimates. We discuss the properties of such estimates, and methods to interpolate them with traditional corpus based trigram estimates. We show that the interpolated models improve speech recognition word error rate significantly over a small test set.

137 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528