Home
/
Authors
/
Timothy J. Hazen

Author

Timothy J. Hazen

Other affiliations: Johns Hopkins University, Bundelkhand University, Vassar College ...read more

Bio: Timothy J. Hazen is an academic researcher from Microsoft. The author has contributed to research in topics: Speaker recognition & Speech processing. The author has an hindex of 34, co-authored 82 publications receiving 4738 citations. Previous affiliations of Timothy J. Hazen include Johns Hopkins University & Bundelkhand University.

Papers published on a yearly basis

2021
2020
2019
2018
2015
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1994
1993

Papers

PDF

Open Access

More filters

Journal Article•DOI•

JUPlTER: a telephone-based conversational interface for weather information

[...]

Victor W. Zue¹, Stephanie Seneff¹, James Glass¹, Joseph Polifroni¹, Christine Pao¹, Timothy J. Hazen¹, Lee Hetherington¹ - Show less +3 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2000-IEEE Transactions on Speech and Audio Processing

TL;DR: The purpose of this paper is to describe the development effort of JUPITER in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting.

...read moreread less

Abstract: In early 1997, our group initiated a project to develop JUPITER, a conversational interface that allows users to obtain worldwide weather forecast information over the telephone using spoken dialogue. It has served as the primary research platform for our group on many issues related to human language technology, including telephone-based speech recognition, robust language understanding, language generation, dialogue modeling, and multilingual interfaces. Over a two year period since coming online in May 1997, JUPITER has received, via a toll-free number in North America, over 30000 calls (totaling over 180000 utterances), mostly from naive users. The purpose of this paper is to describe our development effort in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting. We also present some evaluation results on the system and its components.

...read moreread less

697 citations

Journal Article•DOI•

Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

[...]

Erik McDermott¹, Timothy J. Hazen², J.-P Le Roux³, Atsushi Nakamura¹, Shigeru Katagiri¹ - Show less +1 more•Institutions (3)

Nippon Telegraph and Telephone¹, Massachusetts Institute of Technology², Pierre-and-Marie-Curie University³

01 Jan 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary speech recognition tasks.

...read moreread less

Abstract: The minimum classification error (MCE) framework for discriminative training is a simple and general formalism for directly optimizing recognition accuracy in pattern recognition problems. The framework applies directly to the optimization of hidden Markov models (HMMs) used for speech recognition problems. However, few if any studies have reported results for the application of MCE training to large-vocabulary, continuous-speech recognition tasks. This article reports significant gains in recognition performance and model compactness as a result of discriminative training based on MCE training applied to HMMs, in the context of three challenging large-vocabulary (up to 100 k word) speech recognition tasks: the Corpus of Spontaneous Japanese lecture speech transcription task, a telephone-based name recognition task, and the MIT Jupiter telephone-based conversational weather information task. On these tasks, starting from maximum likelihood (ML) baselines, MCE training yielded relative reductions in word error ranging from 7% to 20%. Furthermore, this paper evaluates the use of different methods for optimizing the MCE criterion function, as well as the use of precomputed recognition lattices to speed up training. An overview of the MCE framework is given, with an emphasis on practical implementation issues

...read moreread less

581 citations

Proceedings Article•DOI•

Query-by-example spoken term detection using phonetic posteriorgram templates

[...]

Timothy J. Hazen¹, Wade Shen¹, Christopher White²•Institutions (2)

Massachusetts Institute of Technology¹, Johns Hopkins University²

01 Dec 2009

TL;DR: A query-by-example approach to spoken term detection in audio files designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable.

...read moreread less

Abstract: This paper examines a query-by-example approach to spoken term detection in audio files. The approach is designed for low-resource situations in which limited or no in-domain training material is available and accurate word-based speech recognition capability is unavailable. Instead of using word or phone strings as search terms, the user presents the system with audio snippets of desired search terms to act as the queries. Query and test materials are represented using phonetic posteriorgrams obtained from a phonetic recognition system. Query matches in the test data are located using a modified dynamic time warping search between query templates and test utterances. Experiments using this approach are presented using data from the Fisher corpus.

...read moreread less

305 citations

Journal Article•DOI•

Robust Speaker Recognition in Noisy Conditions

[...]

Ji Ming¹, Timothy J. Hazen², James Glass², Douglas A. Reynolds²•Institutions (2)

Queen's University Belfast¹, Massachusetts Institute of Technology²

01 Jul 2007-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics, and is found to achieve lower error rates.

...read moreread less

Abstract: This paper investigates the problem of speaker identification and verification in noisy conditions, assuming that speech signals are corrupted by environmental noise, but knowledge about the noise characteristics is not available. This research is motivated in part by the potential application of speaker recognition technologies on handheld devices or the Internet. While the technologies promise an additional biometric layer of security to protect the user, the practical implementation of such systems faces many challenges. One of these is environmental noise. Due to the mobile nature of such systems, the noise sources can be highly time-varying and potentially unknown. This raises the requirement for noise robustness in the absence of information about the noise. This paper describes a method that combines multicondition model training and missing-feature theory to model noise with unknown temporal-spectral characteristics. Multicondition training is conducted using simulated noisy data with limited noise variation, providing a ldquocoarserdquo compensation for the noise, and missing-feature theory is applied to refine the compensation by ignoring noise variation outside the given training conditions, thereby reducing the training and testing mismatch. This paper is focused on several issues relating to the implementation of the new model for real-world applications. These include the generation of multicondition training data to model noisy speech, the combination of different training data to optimize the recognition performance, and the reduction of the model's complexity. The new algorithm was tested using two databases with simulated and realistic noisy speech data. The first database is a redevelopment of the TIMIT database by rerecording the data in the presence of various noise types, used to test the model for speaker identification with a focus on the varieties of noise. The second database is a handheld-device database collected in realistic noisy conditions, used to further validate the model for real-world speaker verification. The new model is compared to baseline systems and is found to achieve lower error rates.

...read moreread less

277 citations

Journal Article•DOI•

Recognition confidence scoring and its use in speech understanding systems

[...]

Timothy J. Hazen¹, Stephanie Seneff¹, Joseph Polifroni¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2002-Computer Speech & Language

TL;DR: This paper presents an approach to recognition confidence scoring and a set of techniques for integrating confidence scores into the understanding and dialogue components of a speech understanding system and demonstrates a relative reduction in concept error rate.

...read moreread less

179 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17

Collapse

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Librispeech: An ASR corpus based on public domain audio books

[...]

Vassil Panayotov¹, Guoguo Chen¹, Daniel Povey¹, Sanjeev Khudanpur¹•Institutions (1)

Johns Hopkins University¹

19 Apr 2015

TL;DR: It is shown that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models training on WSJ itself.

...read moreread less

Abstract: This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems.

...read moreread less

4,770 citations

Journal Article•DOI•

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

[...]

George E. Dahl¹, Dong Yu², Li Deng², Alex Acero²•Institutions (2)

University of Toronto¹, Microsoft²

01 Jan 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.

...read moreread less

Abstract: We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8% and 9.2% (or relative error reduction of 16.0% and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum-likelihood (ML) criteria, respectively.

...read moreread less

3,120 citations

Book•

Distributed Systems: Principles and Paradigms

[...]

Andrew S. Tanenbaum, Maarten van Steen¹•Institutions (1)

VU University Amsterdam¹

01 Jan 2001

TL;DR: Intended for use in a senior/graduate level distributed systems course or by professionals, this text systematically shows how distributed systems are designed and implemented in real systems.

...read moreread less

Abstract: From the Publisher: Andrew Tanenbaum and Maarten van Steen cover the principles, advanced concepts, and technologies of distributed systems in detail, including: communication, replication, fault tolerance, and security. Intended for use in a senior/graduate level distributed systems course or by professionals, this text systematically shows how distributed systems are designed and implemented in real systems. Written in the superb writing style of other Tanenbaum books, the material also features unique accessibility and a wide variety of real-world examples and case studies, such as NFS v4, CORBA, DOM, Jini, and the World Wide Web. FEATURES Detailed coverage of seven key principles. An introductory chapter followed by a chapter devoted to each key principle: communication, processes, naming, synchronization, consistency and replication, fault tolerance, and security, including unique comprehensive coverage of middleware models. Four chapters devoted to state-of-the-art real-world examples of middleware. Covers object-based systems, document-based systems, distributed file systems, and coordination-based systems including CORBA, DCOM, Globe, NFS v4, Coda, the World Wide Web, and Jini. Excellent coverage of timely, advanced, distributed systems topics: Security, payment systems, recent Internet and Web protocols, scalability, and caching and replication. NEW-The Prentice Hall Companion Website for this book contains PowerPoint slides, figures in various file formats, and other teaching aids, and a link to the author's Web site.

...read moreread less

2,011 citations

Patent•

Intelligent Automated Assistant

[...]

Thomas R. Gruber¹, Adam Cheyer¹, Dag Kittlaus¹, Didier Rene Guzzoni¹, Christopher Dean Brigham¹, Richard Donald Giuli¹, Marcello Bastea-Forte¹, Harry J. Saddler¹ - Show less +4 more•Institutions (1)

Apple Inc.¹

11 Jan 2011

TL;DR: In this article, an intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions.

...read moreread less

Abstract: An intelligent automated assistant system engages with the user in an integrated, conversational manner using natural language dialog, and invokes external services when appropriate to obtain information or perform various actions. The system can be implemented using any of a number of different platforms, such as the web, email, smartphone, and the like, or any combination thereof. In one embodiment, the system is based on sets of interrelated domains and tasks, and employs additional functionally powered by external services with which the system can interact.

...read moreread less

1,462 citations

Journal Article•DOI•

An overview of text-independent speaker recognition: From features to supervectors

[...]

Tomi Kinnunen¹, Haizhou Li²•Institutions (2)

University of Eastern Finland¹, Institute for Infocomm Research Singapore²

01 Jan 2010-Speech Communication

TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

...read moreread less

1,433 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse