Home
/
Authors
/
Victor W. Zue

Author

Victor W. Zue

Bio: Victor W. Zue is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Spoken language & Natural language. The author has an hindex of 36, co-authored 147 publications receiving 9097 citations.

Topics: Spoken language, Natural language, Speech corpus, Vowel, Speech synthesis ...read more

Papers published on a yearly basis

2016
2015
2012
2010
2009
2008
2007
2004
2003
2002
2001
2000
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1971

Papers

PDF

Open Access

More filters

Dataset•

TIMIT Acoustic-Phonetic Continuous Speech Corpus

[...]

John S. Garofolo, Lori Lamel, William M. Fisher, Jonathan C. Fiscus, David S. Pallett, Nancy L. Dahlgren, Victor W. Zue - Show less +3 more

01 Jan 1993

TL;DR: The TIMIT corpus as mentioned in this paper contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences, including time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance.

...read moreread less

Abstract: The TIMIT corpus of read speech is designed to provide speech data for acoustic-phonetic studies and for the development and evaluation of automatic speech recognition systems. TIMIT contains broadband recordings of 630 speakers of eight major dialects of American English, each reading ten phonetically rich sentences. The TIMIT corpus includes time-aligned orthographic, phonetic and word transcriptions as well as a 16-bit, 16kHz speech waveform file for each utterance. Corpus design was a joint effort among the Massachusetts Institute of Technology (MIT), SRI International (SRI) and Texas Instruments, Inc. (TI). The speech was recorded at TI, transcribed at MIT and verified and prepared for CD-ROM production by the National Institute of Standards and Technology (NIST). The TIMIT corpus transcriptions have been hand verified. Test and training subsets, balanced for phonetic and dialectal coverage, are specified. Tabular computer-searchable information is included as well as written documentation.

...read moreread less

2,096 citations

Book•

Speech recognition

[...]

Victor W. Zue, Ron Cole, Wayne H. Ward

01 Dec 1997

829 citations

Journal Article•DOI•

JUPlTER: a telephone-based conversational interface for weather information

[...]

Victor W. Zue¹, Stephanie Seneff¹, James Glass¹, Joseph Polifroni¹, Christine Pao¹, Timothy J. Hazen¹, Lee Hetherington¹ - Show less +3 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 2000-IEEE Transactions on Speech and Audio Processing

TL;DR: The purpose of this paper is to describe the development effort of JUPITER in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting.

...read moreread less

Abstract: In early 1997, our group initiated a project to develop JUPITER, a conversational interface that allows users to obtain worldwide weather forecast information over the telephone using spoken dialogue. It has served as the primary research platform for our group on many issues related to human language technology, including telephone-based speech recognition, robust language understanding, language generation, dialogue modeling, and multilingual interfaces. Over a two year period since coming online in May 1997, JUPITER has received, via a toll-free number in North America, over 30000 calls (totaling over 180000 utterances), mostly from naive users. The purpose of this paper is to describe our development effort in terms of the underlying human language technologies as well as other system-related issues such as utterance rejection and content harvesting. We also present some evaluation results on the system and its components.

...read moreread less

697 citations

Proceedings Article•

Conversational interfaces: advances and challenges.

[...]

Victor W. Zue

01 Jan 1997

TL;DR: The past decade has witnessed the emergence of a new breed of human-computer interfaces that combines several human language technologies to enable humans to converse with computers using spoken dialogue for information access, creation and processing as mentioned in this paper.

...read moreread less

Abstract: The past decade has witnessed the emergence of a new breed of human-computer interfaces that combines several human language technologies to enable humans to converse with computers using spoken dialogue for information access, creation and processing. In this paper, we introduce the nature of these conversational interfaces and describe the underlying human language technologies on which they are based. After summarizing some of the recent progress in this area around the world, we discuss development issues faced by researchers creating these kinds of systems and present some of the ongoing and unmet research challenges in this field.

...read moreread less

359 citations

Proceedings Article•

GALAXY-II: a reference architecture for conversational system development.

[...]

Stephanie Seneff, Edward Hurley, Raymond Lau, Christine Pao, Philipp Schmid, Victor W. Zue - Show less +2 more

01 Jan 1998

TL;DR: The changes to GALAXY that led to this first reference architecture, which makes use of a scripting language for flow control to provide flexible interaction among the servers, and a set of libraries to support rapid prototyping of new servers, are documented.

...read moreread less

Abstract: GALAXY is a client-server architecture for accessing on-line information using spoken dialogue that we introduced at ICSLP94. It has served as the testbed for developing human language technologies for our group for several years. Recently, we have initiated a significant redesign of the GALAXY architecture to make it easier for many researchers to develop their own applications, using either exclusively their own servers or intermixing them with servers developed by others. This redesign was done in part due to the fact that GALAXY has been designated as the first reference architecture for the new DARPA Communicator Program. The purpose of this paper is to document the changes to GALAXY that led to this first reference architecture, which makes use of a scripting language for flow control to provide flexible interaction among the servers, and a set of libraries to support rapid prototyping of new servers. We describe the new reference architecture in some detail, and report on the current status of its development.

...read moreread less

315 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Collapse

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

LSTM: A Search Space Odyssey

[...]

Klaus Greff¹, Rupesh Kumar Srivastava¹, Jan Koutník¹, Bas R. Steunebrink¹, Jürgen Schmidhuber¹ - Show less +1 more•Institutions (1)

University of Lugano¹

01 Oct 2017-IEEE Transactions on Neural Networks

TL;DR: This paper presents the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling, and observes that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

...read moreread less

Abstract: Several variants of the long short-term memory (LSTM) architecture for recurrent neural networks have been proposed since its inception in 1995. In recent years, these networks have become the state-of-the-art models for a variety of machine learning problems. This has led to a renewed interest in understanding the role and utility of various computational components of typical LSTM variants. In this paper, we present the first large-scale analysis of eight LSTM variants on three representative tasks: speech recognition, handwriting recognition, and polyphonic music modeling. The hyperparameters of all LSTM variants for each task were optimized separately using random search, and their importance was assessed using the powerful functional ANalysis Of VAriance framework. In total, we summarize the results of 5400 experimental runs ( $\approx 15$ years of CPU time), which makes our study the largest of its kind on LSTM networks. Our results show that none of the variants can improve upon the standard LSTM architecture significantly, and demonstrate the forget gate and the output activation function to be its most critical components. We further observe that the studied hyperparameters are virtually independent and derive guidelines for their efficient adjustment.

...read moreread less

4,746 citations

Book•

Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

[...]

Dan Jurafsky, James Martin

01 Jan 2000

TL;DR: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora, to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation.

...read moreread less

Abstract: From the Publisher: This book takes an empirical approach to language processing, based on applying statistical and other machine-learning algorithms to large corpora.Methodology boxes are included in each chapter. Each chapter is built around one or more worked examples to demonstrate the main idea of the chapter. Covers the fundamental algorithms of various fields, whether originally proposed for spoken or written language to demonstrate how the same algorithm can be used for speech recognition and word-sense disambiguation. Emphasis on web and other practical applications. Emphasis on scientific evaluation. Useful as a reference for professionals in any of the areas of speech and language processing.

...read moreread less

3,794 citations

Book•

Speech and Language Processing

[...]

Dan Jurafsky, James Martin

01 Dec 1999

TL;DR: It is now clear that HAL's creator, Arthur C. Clarke, was a little optimistic in predicting when an artiﬁcial agent such as HAL would be avail-able as discussed by the authors.

...read moreread less

Abstract: is one of the most recognizablecharacters in 20th century cinema. HAL is an artiﬁcial agent capable of such advancedlanguage behavior as speaking and understanding English, and at a crucial moment inthe plot, even reading lips. It is now clear that HAL’s creator, Arthur C. Clarke, wasa little optimistic in predicting when an artiﬁcial agent such as HAL would be avail-able. But just how far off was he? What would it take to create at least the language-relatedpartsofHAL?WecallprogramslikeHALthatconversewithhumansinnatural

...read moreread less

3,077 citations

Report•DOI•

The Skill Content of Recent Technological Change: An Empirical Exploration

[...]

David H. Autor¹, Frank Levy², Richard J. Murnane¹•Institutions (2)

National Bureau of Economic Research¹, Massachusetts Institute of Technology²

01 Nov 2003-Quarterly Journal of Economics

TL;DR: This paper found that computer capital substitutes for workers in performing cognitive and manual tasks that can be accomplished by following explicit rules, and complements workers in non-routine problem-solving and complex communications tasks.

...read moreread less

Abstract: We apply an understanding of what computers do to study how computerization alters job skill demands. We argue that computer capital (1) substitutes for workers in performing cognitive and manual tasks that can be accomplished by following explicit rules; and (2) complements workers in performing nonroutine problem-solving and complex communications tasks. Provided these tasks are imperfect substitutes, our model implies measurable changes in the composition of job tasks, which we explore using representative data on task input for 1960 to 1998. We find that within industries, occupations and education groups, computerization is associated with reduced labor input of routine manual and routine cognitive tasks and increased labor input of nonroutine cognitive tasks. Translating task shifts into education demand, the model can explain sixty percent of the estimated relative demand shift favoring college labor during 1970 to 1998. Task changes within nominally identical occupations account for almost half of this impact.

...read moreread less

2,843 citations

Book•

Supervised Sequence Labelling with Recurrent Neural Networks

[...]

Alex Graves

09 Feb 2012

TL;DR: A new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

...read moreread less

Abstract: Recurrent neural networks are powerful sequence learners. They are able to incorporate context information in a flexible way, and are robust to localised distortions of the input data. These properties make them well suited to sequence labelling, where input sequences are transcribed with streams of labels. The aim of this thesis is to advance the state-of-the-art in supervised sequence labelling with recurrent networks. Its two main contributions are (1) a new type of output layer that allows recurrent networks to be trained directly for sequence labelling tasks where the alignment between the inputs and the labels is unknown, and (2) an extension of the long short-term memory network architecture to multidimensional data, such as images and video sequences.

...read moreread less

2,101 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse