scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Journal ArticleDOI
22 May 2011
TL;DR: It is shown that it is possible to train a gender-independent discriminative model that achieves state-of-the-art accuracy, comparable to the one of aGender-dependent system, saving memory and execution time both in training and in testing.
Abstract: This work presents a new and efficient approach to discriminative speaker verification in the i-vector space. We illustrate the development of a linear discriminative classifier that is trained to discriminate between the hypothesis that a pair of feature vectors in a trial belong to the same speaker or to different speakers. This approach is alternative to the usual discriminative setup that discriminates between a speaker and all the other speakers. We use a discriminative classifier based on a Support Vector Machine (SVM) that is trained to estimate the parameters of a symmetric quadratic function approximating a log-likelihood ratio score without explicit modeling of the i-vector distributions as in the generative Probabilistic Linear Discriminant Analysis (PLDA) models. Training these models is feasible because it is not necessary to expand the i -vector pairs, which would be expensive or even impossible even for medium sized training sets. The results of experiments performed on the tel-tel extended core condition of the NIST 2010 Speaker Recognition Evaluation are competitive with the ones obtained by generative models, in terms of normalized Detection Cost Function and Equal Error Rate. Moreover, we show that it is possible to train a gender-independent discriminative model that achieves state-of-the-art accuracy, comparable to the one of a gender-dependent system, saving memory and execution time both in training and in testing.

110 citations

Proceedings ArticleDOI
Kartik Audhkhasi1, Brian Kingsbury1, Bhuvana Ramabhadran1, George Saon1, Michael Picheny1 
15 Apr 2018
TL;DR: In this paper, a joint word-character A2W model was proposed to learn to first spell the word and then recognize it, achieving a word error rate of 8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder, pronunciation lexicon, or externally-trained language model.
Abstract: Direct acoustics-to-word (A2W) models in the end-to-end paradigm have received increasing attention compared to conventional subword based automatic speech recognition models using phones, characters, or context-dependent hidden Markov model states. This is because A2W models recognize words from speech without any decoder, pronunciation lexicon, or externally-trained language model, making training and decoding with such models simple. Prior work has shown that A2W models require orders of magnitude more training data in order to perform comparably to conventional models. Our work also showed this accuracy gap when using the English Switchboard-Fisher data set. This paper describes a recipe to train an A2W model that closes this gap and is at-par with state-of-the-art sub-word based models. We achieve a word error rate of 8.8.8%/13.9% on the Hub5-2000 Switchboard/CallHome test sets without any decoder or language model. We find that model initialization, training data order, and regularization have the most impact on the A2W model performance. Next, we present a joint word-character A2W model that learns to first spell the word and then recognize it. This model provides a rich output to the user instead of simple word hypotheses, making it especially useful in the case of words unseen or rarely-seen during training.

110 citations

Proceedings ArticleDOI
03 Aug 2010
TL;DR: This work proposes and validate a novel physics-based method to detect images recaptured from printed material using only a single image, and shows that the classifier can be generalizable to contrast enhanced recapture images and LCD screen recaptured images without re-training, demonstrating the robustness of the approach.
Abstract: Face recognition is an increasingly popular method for user authentication. However, face recognition is susceptible to playback attacks. Therefore, a reliable way to detect malicious attacks is crucial to the robustness of the system. We propose and validate a novel physics-based method to detect images recaptured from printed material using only a single image. Micro-textures present in printed paper manifest themselves in the specular component of the image. Features extracted from this component allows a linear SVM classifier to achieve 2.2% False Acceptance Rate and 13% False Rejection Rate (6.7% Equal Error Rate). We also show that the classifier can be generalizable to contrast enhanced recaptured images and LCD screen recaptured images without re-training, demonstrating the robustness of our approach.1

110 citations

Journal ArticleDOI
TL;DR: An i-vector representation based on bottleneck (BN) features is presented for language identification (LID) and the resulting performance of LID has been significantly improved with the proposed BN feature based i- vector representation.
Abstract: An i-vector representation based on bottleneck (BN) features is presented for language identification (LID). In the proposed system, the BN features are extracted from a deep neural network, which can effectively mine the contextual information embedded in speech frames. The i-vector representation of each utterance is then obtained by applying a total variability approach on the BN features. The resulting performance of LID has been significantly improved with the proposed BN feature based i-vector representation. Compared with the stateof- the-art techniques, the equal error rate is relatively reduced by about 40% on the National Institute of Standards and Technology (NIST) 2009 evaluation sets.

110 citations

Journal ArticleDOI
TL;DR: An extension of the Central Limit Theorem based on Lindeberg condition is adopted here to derive the distribution of the number of design samples with wrong sign estimate and subsequently determine the maximum error rate for failure probability estimates.

110 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528