scispace - formally typeset
Search or ask a question
Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.


Papers
More filters
Proceedings ArticleDOI
13 May 2002
TL;DR: An approach to close the gap between text-dependent and text-independent speaker verification performance is presented and results on the 2001 NIST extended data task show this approach can be used to produce an equal error rate.
Abstract: In this paper we present an approach to close the gap between text-dependent and text-independent speaker verification performance. Text-constrained GMM-UBM systems are created using word segmentations produced by a LVCSR system on conversational speech allowing the system to focus on speaker differences over a constrained set of acoustic units. Results on the 2001 NIST extended data task show this approach can be used to produce an equal error rate of < 1 %.

91 citations

Proceedings ArticleDOI
Xie Chen1, Adam Eversole1, Gang Li1, Dong Yu1, Frank Seide1 
09 Sep 2012
TL;DR: It is shown that the pipelined approximation to BP, which parallelizes computation with respect to layers, is an efficient way of utilizing multiple GPGPU cards in a single server.
Abstract: The Context-Dependent Deep-Neural-Network HMM, or CDDNN-HMM, is a recently proposed acoustic-modeling technique for HMM-based speech recognition that can greatly outperform conventional Gaussian-mixture based HMMs For example, a CD-DNN-HMM trained on the 2000h Fisher corpus achieves 144% word error rate on the Hub5’00-FSH speakerindependent phone-call transcription task, compared to 196% obtained by a state-of-the-art, conventional discriminatively trained GMM-based HMM That CD-DNN-HMM, however, took 59 days to train on a modern GPGPU—the immense computational cost of the minibatch based back-propagation (BP) training is a major roadblock Unlike the familiar Baum-Welch training for conventional HMMs, BP cannot be efficiently parallelized across data In this paper we show that the pipelined approximation to BP, which parallelizes computation with respect to layers, is an efficient way of utilizing multiple GPGPU cards in a single server Using 2 and 4 GPGPUs, we achieve a 19 and 33 times end-to-end speed-up, at parallelization efficiency of 095 and 082, respectively, at no loss of recognition accuracy

91 citations

Proceedings ArticleDOI
01 Jun 2000
TL;DR: A phonetic tied-mixture model for efficient large vocabulary continuous speech recognition that enables the decoder to perform efficient Gaussian pruning and it is found out that computing only two out of 64 components does not cause any loss of accuracy.
Abstract: A phonetic tied-mixture (PTM) model for efficient large vocabulary continuous speech recognition is presented. It is synthesized from context-independent phone models with 64 mixture components per state by assigning different mixture weights according to the shared states of triphones. Mixtures are then re-estimated for optimization. The model achieves a word error rate of 7.0% with a 20000-word dictation of newspaper corpus, which is comparable to the best figure by the triphone of much higher resolutions. Compared with conventional PTMs that share Gaussians by all states, the proposed model is easily trained and reliably estimated. Furthermore, the model enables the decoder to perform efficient Gaussian pruning. It is found out that computing only two out of 64 components does not cause any loss of accuracy. Several methods for the pruning are proposed and compared, and the best one reduced the computation to about 20%.

91 citations

Journal ArticleDOI
TL;DR: A new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed, which addresses some limitations of HMMs while maintaining many of the aspects which have made them successful.
Abstract: Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT phone recognition task, a phone error rate of 23.0% was recorded on the full test set, a significant improvement over comparable HMM-based systems.

91 citations

Journal ArticleDOI
TL;DR: A polynomial time algorithm for the construction and training of a class of multilayer perceptrons for classification that uses linear programming models to incrementally generate the hidden layer in a restricted higher-order perceptron.

91 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
88% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Convolutional neural network
74.7K papers, 2M citations
85% related
Artificial neural network
207K papers, 4.5M citations
84% related
Cluster analysis
146.5K papers, 2.9M citations
83% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023271
2022562
2021640
2020643
2019633
2018528