Home
/
Authors
/
Jonathan G. Fiscus

Author

Jonathan G. Fiscus

National Institute of Standards and Technology

Bio: Jonathan G. Fiscus is an academic researcher from National Institute of Standards and Technology. The author has contributed to research in topics: TRECVID & Closed captioning. The author has an hindex of 32, co-authored 60 publications receiving 7264 citations.

Topics: TRECVID, Closed captioning, Word error rate, NIST, Document retrieval ...read more

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2002
2000
1999
1998
1997
1994
1993
1992
1990

Papers

PDF

Open Access

More filters

Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST

[...]

John S. Garofolo, Lori Lamel, W M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren - Show less +2 more

01 Feb 1993

2,164 citations

Report•DOI•

DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1

[...]

John S. Garofolo, Lori Lamel, William M. Fisher, Jonathan G. Fiscus, David S. Pallett, Nancy L. Dahlgren - Show less +2 more

01 Feb 1993

1,238 citations

Proceedings Article•DOI•

A post-processing system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)

[...]

Jonathan G. Fiscus

14 Dec 1997

TL;DR: The NIST Recognizer Output Voting Error Reduction (ROVER) system as discussed by the authors was developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which the composite ASR output has a lower error rate than any of the individual systems.

...read moreread less

Abstract: Describes a system developed at NIST to produce a composite automatic speech recognition (ASR) system output when the outputs of multiple ASR systems are available, and for which, in many cases, the composite ASR output has a lower error rate than any of the individual systems. The system implements a "voting" or rescoring process to reconcile differences in ASR system outputs. We refer to this system as the NIST Recognizer Output Voting Error Reduction (ROVER) system. As additional knowledge sources are added to an ASR system (e.g. acoustic and language models), error rates are typically decreased. This paper describes a post-recognition process which models the output generated by multiple ASR systems as independent knowledge sources that can be combined and used to generate an output with reduced error rate. To accomplish this, the outputs of multiple of ASR systems are combined into a single, minimal-cost word transition network (WTN) via iterative applications of dynamic programming (DP) alignments. The resulting network is searched by an automatic rescoring or "voting" process that selects the output sequence with the lowest score.

...read moreread less

1,188 citations

TRECVID 2012 - An overview of the goals, tasks, data, evaluation mechanisms, and metrics

[...]

Paul Over, Jonathan G. Fiscus, G. Sanders, B. Shaw, Martial Michel, George Awad, Alan F. Smeaton, Wessel Kraaij, Georges Quénot - Show less +5 more

01 Jan 2013

TL;DR: The TREC Video Retrieval Evaluation (TRECVID) 2012 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation as mentioned in this paper.

...read moreread less

Abstract: The TREC Video Retrieval Evaluation (TRECVID) 2012 was a TREC-style video analysis and retrieval evaluation, the goal of which remains to promote progress in content-based exploitation of digital video via open, metrics-based evaluation. Over the last ten years this effort has yielded a better understanding of how systems can effectively accomplish such processing and how one can reliably benchmark their performance. TRECVID is funded by the NIST and other US government agencies. Many organizations and individuals worldwide contribute significant time and effort.

...read moreread less

582 citations

Results of the 2006 Spoken Term Detection Evaluation

[...]

Jonathan G. Fiscus¹, Jerome Ajot¹, John S. Garofolo¹, George Doddingtion•Institutions (1)

National Institute of Standards and Technology¹

01 Jan 2006

TL;DR: The paper describes the evaluation task posed to Spoken Term Detection systems, the evaluation methodologies, the Arabic, English and Mandarin evaluation corpora, and the results of the evaluation.

...read moreread less

Abstract: paper presents the pilot evaluation of Spoken Term Detection technologies, held during the latter part of 2006. Spoken Term Detection systems rapidly detect the presence of a term , which is a sequence of words consecutively spoken, in a large audio corpus of heterogeneous speech material. The paper describes the evaluation task posed to Spoken Term Detection systems, the evaluation methodologies, the Arabic, English and Mandarin evaluation corpora, and the results of the evaluation. Ten participants submitted systems for the evaluation.

...read moreread less

252 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•

SRILM – An Extensible Language Modeling Toolkit

[...]

Andreas Stolcke

01 Jan 2002

TL;DR: The functionality of the SRILM toolkit is summarized and its design and implementation is discussed, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

Abstract: SRILM is a collection of C++ libraries, executable programs, and helper scripts designed to allow both production of and experimentation with statistical language models for speech recognition and other applications. SRILM is freely available for noncommercial purposes. The toolkit supports creation and evaluation of a variety of language model types based on N-gram statistics, as well as several related tasks, such as statistical tagging and manipulation of N-best lists and word lattices. This paper summarizes the functionality of the toolkit and discusses its design and implementation, highlighting ease of rapid prototyping, reusability, and combinability of tools.

...read moreread less

4,904 citations

Journal Article•DOI•

LSTM: A Search Space Odyssey

[...]

Klaus Greff¹, Rupesh Kumar Srivastava¹, Jan Koutník¹, Bas R. Steunebrink¹, Jürgen Schmidhuber¹ - Show less +1 more•Institutions (1)

University of Lugano¹

01 Oct 2017-IEEE Transactions on Neural Networks

TL;DR: This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

...read moreread less

Abstract: Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful functional ANalysis Of VAriance framework. In total, we summarize the results of 5400 experimental runs ( $\approx 15$ years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

...read moreread less

4,746 citations

WaveNet: A Generative Model for Raw Audio

[...]

Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew W. Senior, Koray Kavukcuoglu - Show less +5 more

12 Sep 2016

TL;DR: WaveNet, a deep neural network for generating raw audio waveforms, is introduced; it is shown that it can be efficiently trained on data with tens of thousands of samples per second of audio, and can be employed as a discriminative model, returning promising results for phoneme recognition.

...read moreread less

Abstract: This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that it can be efficiently trained on data with tens of thousands of samples per second of audio. When applied to text-to-speech, it yields state-of-the-art performance, with human listeners rating it as significantly more natural sounding than the best parametric and concatenative systems for both English and Mandarin. A single WaveNet can capture the characteristics of many different speakers with equal fidelity, and can switch between them by conditioning on the speaker identity. When trained to model music, we find that it generates novel and often highly realistic musical fragments. We also show that it can be employed as a discriminative model, returning promising results for phoneme recognition.

...read moreread less

3,248 citations

Proceedings Article•

Framewise phoneme classification with bidirectional LSTM and other neural network architectures

[...]

Alex Graves, Jürgen Schmidhuber

01 Jan 2005

TL;DR: In this article, a modified, full gradient version of the LSTM learning algorithm was used for framewise phoneme classification, using the TIMIT database, and the results support the view that contextual information is crucial to speech processing, and suggest that bidirectional networks outperform unidirectional ones.

...read moreread less

Abstract: In this paper, we present bidirectional Long Short Term Memory (LSTM) networks, and a modified, full gradient version of the LSTM learning algorithm. We evaluate Bidirectional LSTM (BLSTM) and several other network architectures on the benchmark task of framewise phoneme classification, using the TIMIT database. Our main findings are that bidirectional networks outperform unidirectional ones, and Long Short Term Memory (LSTM) is much faster and also more accurate than both standard Recurrent Neural Nets (RNNs) and time-windowed Multilayer Perceptrons (MLPs). Our results support the view that contextual information is crucial to speech processing, and suggest that BLSTM is an effective architecture with which to exploit it'.

...read moreread less

3,028 citations

Journal Article•DOI•

2005 Special Issue: Framewise phoneme classification with bidirectional LSTM and other neural network architectures

[...]

Alex Graves¹, Jürgen Schmidhuber²•Institutions (2)

Dalle Molle Institute for Artificial Intelligence Research¹, Technische Universität München²

01 Jun 2005-Neural Networks

...read moreread less

2,200 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse