scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Proceedings ArticleDOI
22 May 2011
TL;DR: It is shown that acoustic model adaptation yields an average relative word error rate (WER) reduction and that pronunciation lexicon adaptation (PLA) further reduces the relative WER by an average of 8.29% on a large vocabulary task of over 1500 words for six speakers with severe to moderate dysarthria.
Abstract: Dysarthria is a motor speech disorder resulting from neurological damage to the part of the brain that controls the physical production of speech. It is, in part, characterized by pronunciation errors that include deletions, substitutions, insertions, and distortions of phonemes. These errors follow consistent intra-speaker patterns that we exploit through acoustic and lexical model adaptation to improve automatic speech recognition (ASR) on dysarthric speech. We show that acoustic model adaptation yields an average relative word error rate (WER) reduction of 36.99% and that pronunciation lexicon adaptation (PLA) further reduces the relative WER by an average of 8.29% on a large vocabulary task of over 1500 words for six speakers with severe to moderate dysarthria. PLA also shows an average relative WER reduction of 7.11% on speaker-dependent models evaluated using 5-fold cross-validation.

71 citations

Proceedings Article
01 Jan 2008
TL;DR: Results show that the Automatic Speech Recognition Word Error Rates for elderly voices are significantly higher than those of adult voices and use of maximum likelihood linear regression (MLLR) based speaker adaptation on ageing voices improves the WER though the performance is still considerably lower compared to adult voices.
Abstract: This paper presents the results of a longitudinal study of ASR performance on ageing voices Experiments were conducted on the audio recordings of the proceedings of the Supreme Court Of The United States (SCOTUS) Results show that the Automatic Speech Recognition (ASR) Word Error Rates (WERs) for elderly voices are significantly higher than those of adult voices The word error rate increases gradually as the age of the elderly speakers increase Use of maximum likelihood linear regression (MLLR) based speaker adaptation on ageing voices improves the WER though the performance is still considerably lower compared to adult voices Speaker adaptation however reduces the increase in WER with age during old age IndexTerms: Ageing Voices, longitudinal study, SCOTUScorpus, MLLR

71 citations

Journal ArticleDOI
01 Jan 2001
TL;DR: It is found that more detailed analysis of the error rate is necessary in order to judge the performance of the learning and classification method.
Abstract: In this paper, we are empirically comparing the performance of neural nets and decision trees based on a data set for the detection of defects in welding seams. This data set was created by image feature extraction procedures working on digitized X-ray films. We introduce a framework for distinguishing classification methods. We found that more detailed analysis of the error rate is necessary in order to judge the performance of the learning and classification method. However, the error rate cannot be the only criterion for comparing between the different learning methods. This is a more complex selection process that involves more criteria that we are describing in this paper.

71 citations

Proceedings ArticleDOI
Jinyu Li1, Michael L. Seltzer1, Xi Wang, Rui Zhao1, Yifan Gong1 
20 Aug 2017
TL;DR: In this paper, the posterior probabilities generated by the source domain model are used in lieu of labels to train the target-domain model for domain adaptation without transcribed data in the target domain.
Abstract: High accuracy speech recognition requires a large amount of transcribed data for supervised training. In the absence of such data, domain adaptation of a well-trained acoustic model can be performed, but even here, high accuracy usually requires significant labeled data from the target domain. In this work, we propose an approach to domain adaptation that does not require transcriptions but instead uses a corpus of unlabeled parallel data, consisting of pairs of samples from the source domain of the well-trained model and the desired target domain. To perform adaptation, we employ teacher/student (T/S) learning, in which the posterior probabilities generated by the source-domain model can be used in lieu of labels to train the target-domain model. We evaluate the proposed approach in two scenarios, adapting a clean acoustic model to noisy speech and adapting an adults’ speech acoustic model to children’s speech. Significant improvements in accuracy are obtained, with reductions in word error rate of up to 44% over the original source model without the need for transcribed data in the target domain. Moreover, we show that increasing the amount of unlabeled data results in additional model robustness, which is particularly beneficial when using simulated training data in the target-domain.

70 citations

Proceedings ArticleDOI
15 Mar 1999
TL;DR: There appears to be an optimal ratio of training patterns to parameters of around 25:1 in these circumstances, and doubling the training data and system size appears to provide diminishing returns of error rate reduction for the largest systems.
Abstract: We have trained and tested a number of large neural networks for the purpose of emission probability estimation in large vocabulary continuous speech recognition. In particular, the problem under test is the DARPA Broadcast News task. Our goal here was to determine the relationship between training time, word error rate, size of the training set, and size of the neural network. In all cases, the network architecture was quite simple, comprising a single large hidden layer with an input window consisting of feature vectors from 9 frames around the current time, with a single output for each of 54 phonetic categories. Thus far, simultaneous increases to the size of the training set and the neural network improve performance; in other words, more data helps, as does the training of more parameters. We continue to be surprised that such a simple system works as well as it does for complex tasks. Given a limitation in training time, however, there appears to be an optimal ratio of training patterns to parameters of around 25:1 in these circumstances. Additionally, doubling the training data and system size appears to provide diminishing returns of error rate reduction for the largest systems.

70 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528