Home
/
Topics
/
Word error rate

Topic

Word error rate

About: Word error rate is a research topic. Over the lifetime, 11939 publications have been published within this topic receiving 298031 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Book Chapter•DOI•

Context-independent phonetic hidden Markov models for speaker-independent continuous speech recognition

[...]

Kai-Fu Lee¹•Institutions (1)

Carnegie Mellon University¹

01 May 1990-IEEE Transactions on Acoustics, Speech, and Signal Processing

TL;DR: Two new context-dependent phonetic units are introduced: function-word-dependent phone models, which focus on the most difficult subvocabulary; and generalized triphones, which combine similar triphones on the basis of an information-theoretic measure.

...read moreread less

Abstract: Context-dependent phone models are applied to speaker-independent continuous speech recognition and shown to be effective in this domain. Several previously proposed context-dependent models are evaluated, and two new context-dependent phonetic units are introduced: function-word-dependent phone models, which focus on the most difficult subvocabulary; and generalized triphones, which combine similar triphones on the basis of an information-theoretic measure. The subword clustering procedure used for generalized triphones can find the optimal number of models, given a fixed amount of training data. It is shown that context-dependent modeling reduces the error rate by as much as 60%. >

...read moreread less

228 citations

Patent•DOI•

Speech recognition apparatus which predicts word classes from context and words from word classes

[...]

Peter Fitzhugh Brown¹, Stephen A. Della Pietra¹, Vincent J. Della Pietra¹, Robert Leroy Mercer¹, Philip Resnik¹, Stanley S. Chen¹ - Show less +2 more•Institutions (1)

IBM¹

10 Feb 1992-Journal of the Acoustical Society of America

TL;DR: In this paper, a language generator for a speech recognition apparatus scores a word-series hypothesis by combining individual scores for each word in the hypothesis, and the hypothesis score for a single word comprises a combination of the estimated conditional probability of occurrence of a first class of words comprising the word being scored, given the occurrence of the context comprising the words in the word series hypothesis other than the word was being scored.

...read moreread less

Abstract: A language generator for a speech recognition apparatus scores a word-series hypothesis by combining individual scores for each word in the hypothesis. The hypothesis score for a single word comprises a combination of the estimated conditional probability of occurrence of a first class of words comprising the word being scored, given the occurrence of a context comprising the words in the word-series hypothesis other than the word being scored, and the estimated conditional probability of occurrence of the word being scored given the occurrence of the first class of words, and given the occurrence of the context. An apparatus and method are provided for classifying multiple series of words for the purpose of obtaining useful hypothesis scores in the language generator and speech recognition apparatus.

...read moreread less

227 citations

Proceedings Article•DOI•

Cross-Entropy vs. Squared Error Training: a Theoretical and Experimental Comparison

[...]

Pavel Golik¹, Patrick Doetsch¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

25 Aug 2013

TL;DR: It is found that with randomly initialized weights, the squared error based ANN does not converge to a good local optimum, and with a good initialization by pre-training, the word error rate of the best CE trained system could be reduced.

...read moreread less

Abstract: In this paper we investigate the error criteria that are optimized during the training of artificial neural networks (ANN). We compare the bounds of the squared error (SE) and the crossentropy (CE) criteria being the most popular choices in stateof-the art implementations. The evaluation is performed on automatic speech recognition (ASR) and handwriting recognition (HWR) tasks using a hybrid HMM-ANN model. We find that with randomly initialized weights, the squared error based ANN does not converge to a good local optimum. However, with a good initialization by pre-training, the word error rate of our best CE trained system could be reduced from 30.9% to 30.5% on the ASR, and from 22.7% to 21.9% on the HWR task by performing a few additional “fine-tuning” iterations with the SE criterion.

...read moreread less

226 citations

Journal Article•DOI•

The Word Problem

[...]

William W. Boone

01 Sep 1959-Annals of Mathematics

222 citations

Journal Article•DOI•

Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

[...]

Tara N. Sainath¹, Ron Weiss¹, Kevin W. Wilson¹, Bo Li¹, Arun Narayanan¹, Ehsan Variani¹, Michiel Bacchiani¹, Izhak Shafran¹, Andrew W. Senior¹, Kean Chin¹, Ananya Misra¹, Chanwoo Kim¹ - Show less +8 more•Institutions (1)

Google¹

01 May 2017-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper introduces a neural network architecture, which performs multichannel filtering in the first layer of the network, and shows that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target Speaker direction.

...read moreread less

Abstract: Multichannel automatic speech recognition (ASR) systems commonly separate speech enhancement, including localization, beamforming, and postfiltering, from acoustic modeling. In this paper, we perform multichannel enhancement jointly with acoustic modeling in a deep neural network framework. Inspired by beamforming, which leverages differences in the fine time structure of the signal at different microphones to filter energy arriving from different directions, we explore modeling the raw time-domain waveform directly. We introduce a neural network architecture, which performs multichannel filtering in the first layer of the network, and show that this network learns to be robust to varying target speaker direction of arrival, performing as well as a model that is given oracle knowledge of the true target speaker direction. Next, we show how performance can be improved by factoring the first layer to separate the multichannel spatial filtering operation from a single channel filterbank which computes a frequency decomposition. We also introduce an adaptive variant, which updates the spatial filter coefficients at each time frame based on the previous inputs. Finally, we demonstrate that these approaches can be implemented more efficiently in the frequency domain. Overall, we find that such multichannel neural networks give a relative word error rate improvement of more than 5% compared to a traditional beamforming-based multichannel ASR system and more than 10% compared to a single channel waveform model.

...read moreread less

221 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
…
36
37
38
39
40
41
42
…
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

12,777

Papers

335,740

Citations

No. of papers in the topic in previous years
Year	Papers
2023	271
2022	562
2021	640
2020	643
2019	633
2018	528

Word error rate

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics