Home
/
Topics
/
Word error rate

Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Speaker verification using text-constrained Gaussian Mixture Models

[...]

Douglas E. Sturim¹, D.A. Reynolds¹, Robert B. Dunn¹, Thomas F. Quatieri¹•Institutions (1)

Massachusetts Institute of Technology¹

13 May 2002

TL;DR: An approach to close the gap between text-dependent and text-independent speaker verification performance is presented and results on the 2001 NIST extended data task show this approach can be used to produce an equal error rate.

...read moreread less

Abstract: In this paper we present an approach to close the gap between text-dependent and text-independent speaker verification performance. Text-constrained GMM-UBM systems are created using word segmentations produced by a LVCSR system on conversational speech allowing the system to focus on speaker differences over a constrained set of acoustic units. Results on the 2001 NIST extended data task show this approach can be used to produce an equal error rate of < 1 %.

...read moreread less

91 citations

Proceedings Article•DOI•

Pipelined Back-Propagation for Context-Dependent Deep Neural Networks.

[...]

Xie Chen¹, Adam Eversole¹, Gang Li¹, Dong Yu¹, Frank Seide¹ - Show less +1 more•Institutions (1)

Microsoft¹

09 Sep 2012

TL;DR: It is shown that the pipelined approximation to BP, which parallelizes computation with respect to layers, is an efficient way of utilizing multiple GPGPU cards in a single server.

...read moreread less

Abstract: The Context-Dependent Deep-Neural-Network HMM, or CDDNN-HMM, is a recently proposed acoustic-modeling technique for HMM-based speech recognition that can greatly outperform conventional Gaussian-mixture based HMMs For example, a CD-DNN-HMM trained on the 2000h Fisher corpus achieves 144% word error rate on the Hub5’00-FSH speakerindependent phone-call transcription task, compared to 196% obtained by a state-of-the-art, conventional discriminatively trained GMM-based HMM That CD-DNN-HMM, however, took 59 days to train on a modern GPGPU—the immense computational cost of the minibatch based back-propagation (BP) training is a major roadblock Unlike the familiar Baum-Welch training for conventional HMMs, BP cannot be efficiently parallelized across data In this paper we show that the pipelined approximation to BP, which parallelizes computation with respect to layers, is an efficient way of utilizing multiple GPGPU cards in a single server Using 2 and 4 GPGPUs, we achieve a 19 and 33 times end-to-end speed-up, at parallelization efficiency of 095 and 082, respectively, at no loss of recognition accuracy

...read moreread less

91 citations

Proceedings Article•DOI•

A new phonetic tied-mixture model for efficient decoding

[...]

Akinobu Lee¹, Tatsuya Kawahara, Kazuya Takeda, Kiyohiro Shikano•Institutions (1)

Kyoto University¹

01 Jun 2000

TL;DR: A phonetic tied-mixture model for efficient large vocabulary continuous speech recognition that enables the decoder to perform efficient Gaussian pruning and it is found out that computing only two out of 64 components does not cause any loss of accuracy.

...read moreread less

Abstract: A phonetic tied-mixture (PTM) model for efficient large vocabulary continuous speech recognition is presented. It is synthesized from context-independent phone models with 64 mixture components per state by assigning different mixture weights according to the shared states of triphones. Mixtures are then re-estimated for optimization. The model achieves a word error rate of 7.0% with a 20000-word dictation of newspaper corpus, which is comparable to the best figure by the triphone of much higher resolutions. Compared with conventional PTMs that share Gaussians by all states, the proposed model is easily trained and reliably estimated. Furthermore, the model enables the decoder to perform efficient Gaussian pruning. It is found out that computing only two out of 64 components does not cause any loss of accuracy. Several methods for the pruning are proposed and compared, and the best one reduced the computation to about 20%.

...read moreread less

91 citations

Journal Article•DOI•

Speech Recognition Using Augmented Conditional Random Fields

[...]

Yasser Hifny¹, Steve Renals²•Institutions (2)

IBM¹, University of Edinburgh²

01 Feb 2009-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed, which addresses some limitations of HMMs while maintaining many of the aspects which have made them successful.

...read moreread less

Abstract: Acoustic modeling based on hidden Markov models (HMMs) is employed by state-of-the-art stochastic speech recognition systems. Although HMMs are a natural choice to warp the time axis and model the temporal phenomena in the speech signal, their conditional independence properties limit their ability to model spectral phenomena well. In this paper, a new acoustic modeling paradigm based on augmented conditional random fields (ACRFs) is investigated and developed. This paradigm addresses some limitations of HMMs while maintaining many of the aspects which have made them successful. In particular, the acoustic modeling problem is reformulated in a data driven, sparse, augmented space to increase discrimination. Acoustic context modeling is explicitly integrated to handle the sequential phenomena of the speech signal. We present an efficient framework for estimating these models that ensures scalability and generality. In the TIMIT phone recognition task, a phone error rate of 23.0% was recorded on the full test set, a significant improvement over comparable HMM-based systems.

...read moreread less

91 citations

Journal Article•DOI•

Original Contribution: A polynomial time algorithm for the construction and training of a class of multilayer perceptrons

[...]

Asim Roy¹, Lark Sang Kim¹, Somnath Mukhopadhyay¹•Institutions (1)

Arizona State University¹

06 Apr 1993-Neural Networks

TL;DR: A polynomial time algorithm for the construction and training of a class of multilayer perceptrons for classification that uses linear programming models to incrementally generate the hidden layer in a restricted higher-order perceptron.

...read moreread less

91 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
…
124
125
126
127
128
129
130
…
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics