Home
/
Authors
/
W M. Fisher

Author

W M. Fisher

National Institute of Standards and Technology

Bio: W M. Fisher is an academic researcher from National Institute of Standards and Technology. The author has contributed to research in topics: Language model & Benchmark (computing). The author has an hindex of 8, co-authored 10 publications receiving 2576 citations.

Papers

PDF

Open Access

More filters

Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST

[...]

John S. Garofolo, Lori Lamel, W M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren - Show less +2 more

01 Feb 1993

2,164 citations

Proceedings Article•DOI•

Tools for the analysis of benchmark speech recognition tests

[...]

D.S. Pallet, W M. Fisher, Jonathan G. Fiscus

03 Apr 1990

TL;DR: The development of tools for the analysis of benchmark speech recognition system tests and studies of an alternative to the alignment process presently used in the DARPA/NIST scoring software are reported.

...read moreread less

Abstract: The development of tools for the analysis of benchmark speech recognition system tests is reported. One development is a tool implementing two statistical significance tests. Another involves studies of an alternative to the alignment process presently used in the DARPA/NIST scoring software (which presently minimizes a weighted sum of elementary word error types). The alternative process minimizes a measure of phonological implausibility. The purpose in developing a standard implementation of these tools is to make these tools uniformly available to system developers. >

...read moreread less

127 citations

Proceedings Article•DOI•

Benchmark tests for the DARPA Spoken Language Program

[...]

David S. Pallett¹, Johathan G. Fiscus¹, W M. Fisher¹, John S. Garofolo¹•Institutions (1)

National Institute of Standards and Technology¹

21 Mar 1993

TL;DR: These tests were reported on and discussed in detail at the Spoken Language Systems Technology Workshop held at the Massachusetts Institute of Technology, January 20-22, 1993.

...read moreread less

Abstract: This paper documents benchmark tests implemented within the DARPA Spoken Language Program during the period November, 1992 - January, 1993. Tests were conducted using the Wall Street Journal-based Continuous Speech Recognition (WSJ-CSR) corpus and the Air Travel Information System (ATIS) corpus collected by the Multi-site ATIS Data COllection Working (MADCOW) Group. The WSJ-CSR tests consist of tests of large vocabulary (lexicons of 5,000 to more than 20,000 words) continuous speech recognition systems. The ATIS tests consist of tests of (1) ATIS-domain spontaneous speech (lexicons typically less than 2,000 words), (2) natural language understanding, and (3) spoken language understanding. These tests were reported on and discussed in detail at the Spoken Language Systems Technology Workshop held at the Massachusetts Institute of Technology, January 20-22, 1993.

...read moreread less

111 citations

Proceedings Article•DOI•

1993 benchmark tests for the ARPA spoken language program

[...]

David S. Pallett¹, Jonathan G. Fiscus¹, W M. Fisher¹, John S. Garofolo¹, Bruce A. Lund¹, Mark A. Przybocki¹ - Show less +2 more•Institutions (1)

National Institute of Standards and Technology¹

08 Mar 1994

TL;DR: This paper reports results obtained in benchmark tests conducted within the ARPA Spoken Language program in November and December of 1993, including foreign participants from Canada, France, Germany, and the United Kingdom.

...read moreread less

Abstract: This paper reports results obtained in benchmark tests conducted within the ARPA Spoken Language program in November and December of 1993. In addition to ARPA contractors, participants included a number of "volunteers", including foreign participants from Canada, France, Germany, and the United Kingdom. The body of the paper is limited to an outline of the structure of the tests and presents highlights and discussion of selected results. Detailed tabulations of reported "official" results, and additional explanatory text appears in the Appendix.

...read moreread less

105 citations

Proceedings Article•

Automatic language model adaptation for spoken document retrieval

[...]

Cedric G. P. Auzanne¹, John S. Garofolo¹, Jonathan G. Fiscus¹, W M. Fisher¹•Institutions (1)

National Institute of Standards and Technology¹

12 Apr 2000

TL;DR: The process to identify and implement the time-adaptive language model and the results of the experiment in terms of its effect on word error rate, out of vocabulary rate and retrieval accuracy (Mean Average Precision) are detailed.

...read moreread less

Abstract: This paper describes experiments implemented at NIST in adapting language models over time to improve recognition of broadcast news recorded over many months. These experiments were designed specifically to improve the utility of automatically generated transcripts for retrieval applications. To evaluate the potential of the approach, a time-adaptive automatic speech recognition run was implemented to support the 1999 TREC Spoken Document Retrieval (SDR) Track - more than 500 hours of broadcast news sampled across 5 months. The accuracy of retrieval for several systems using the time-adaptive system transcripts was evaluated against transcripts produced by virtually the same recognition system with a fixed language model. This paper details the process we employed to identify and implement the time-adaptive language model and discusses the results of the experiment in terms of its effect on word error rate, out of vocabulary rate and retrieval accuracy (Mean Average Precision).

...read moreread less

47 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

LSTM: A Search Space Odyssey

[...]

Klaus Greff¹, Rupesh Kumar Srivastava¹, Jan Koutník¹, Bas R. Steunebrink¹, Jürgen Schmidhuber¹ - Show less +1 more•Institutions (1)

University of Lugano¹

01 Oct 2017-IEEE Transactions on Neural Networks

TL;DR: This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

...read moreread less

Abstract: Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful functional ANalysis Of VAriance framework. In total, we summarize the results of 5400 experimental runs ( $\approx 15$ years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

...read moreread less

4,746 citations

Proceedings Article•

Framewise phoneme classification with bidirectional LSTM and other neural network architectures

[...]

Alex Graves, Jürgen Schmidhuber

01 Jan 2005

TL;DR: In this article, a modified, full gradient version of the LSTM learning algorithm was used for framewise phoneme classification, using the TIMIT database, and the results support the view that contextual information is crucial to speech processing, and suggest that bidirectional networks outperform unidirectional ones.

...read moreread less

Abstract: In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it'.

...read moreread less

3,028 citations

Journal Article•DOI•

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

[...]

Alex Graves¹, Jürgen Schmidhuber²•Institutions (2)

Dalle Molle Institute for Artificial Intelligence Research¹, Technische Universität München²

01 Jun 2005-Neural Networks

...read moreread less

2,200 citations

Book•

Supervised Sequence Labelling with Recurrent Neural Networks

[...]

Alex Graves

09 Feb 2012

TL;DR: A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

...read moreread less

Abstract: Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams of labels. The aim of this thesis is to advance the state-of-the-art in supervised sequence labelling with recurrent networks. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and (2) an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

...read moreread less

2,101 citations

Proceedings Article•

Attention-based models for speech recognition

[...]

Jan Chorowski¹, Dzmitry Bahdanau², Dmitriy Serdyuk³, Kyunghyun Cho³, Yoshua Bengio³ - Show less +1 more•Institutions (3)

University of Wrocław¹, Jacobs University Bremen², Université de Montréal³

07 Dec 2015

TL;DR: The authors proposed a location-aware attention mechanism for the TIMET phoneme recognition task, which achieved an improved 18.7% phoneme error rate (PER) on utterances which are roughly as long as the ones it was trained on.

...read moreread less

Abstract: Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks including machine translation, handwriting synthesis [1,2] and image caption generation [3]. We extend the attention-mechanism with features needed for speech recognition. We show that while an adaptation of the model used for machine translation in [2] reaches a competitive 18.7% phoneme error rate (PER) on the TIMET phoneme recognition task, it can only be applied to utterances which are roughly as long as the ones it was trained on. We offer a qualitative explanation of this failure and propose a novel and generic method of adding location-awareness to the attention mechanism to alleviate this issue. The new method yields a model that is robust to long inputs and achieves 18% PER in single utterances and 20% in 10-times longer (repeated) utterances. Finally, we propose a change to the attention mechanism that prevents it from concentrating too much on single frames, which further reduces PER to 17.6% level.

...read moreread less

1,574 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse