scispace - formally typeset
Search or ask a question
Author

Victor W. Zue

Bio: Victor W. Zue is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Spoken language & Audio mining. The author has an hindex of 7, co-authored 12 publications receiving 753 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.

570 citations

Book ChapterDOI
01 Jan 1996
TL;DR: The transcription and alignment of the TIMIT database is described, which was performed at MIT, and consists of 6,300 sentences from 639 speakers, representing over 5 hours of speech material, and was recorded by researchers at TI.
Abstract: Publisher Summary The TIMIT acoustic-phonetic database was designed jointly by researchers at MIT, TI, and SRI. It was intended to provide a rich collection of acoustic phonetic and phonological data, to be used for basic research as well as the development and evaluation of speech recognition systems. There are a total of 450 MIT sentences used in the TIMIT database. These were generated by hand in an iterative fashion, with the goal that they should be phonetically rich. To aid in the sentence generation process, Webster's Pocket Dictionary is used which, contains nearly 20,000 words. Words or word-sequences containing particular phone pairs could be accessed from this dictionary automatically, which greatly facilitated the database design process. The database consists of a total of 6,300 sentences from 639 speakers, representing over 5 hours of speech material, and was recorded by researchers at TI. This chapter describes the transcription and alignment of the TIMIT database, which was performed at MIT.

105 citations

Proceedings ArticleDOI
01 Mar 1984
TL;DR: A system for automatic alignment of phonetic transcriptions with continuous speech has been developed and 93% of the segments are mapped into only one phoneme, and the offset between the boundary found by the automatic alignment system and a hand transcriber is less than 10 ms.
Abstract: A system for automatic alignment of phonetic transcriptions with continuous speech has been developed The speech signal is first segmented into broad classes using a non-parametric Pattern classifier A knowledge-based dynamic programming algorithm then aligns the broad classes with the phonetic transcriptions These broad classes provide "islands of reliability" for more detailed segmentation and refinement of boundaries By doing alignment at the phonetic level, the system can often tolerate inter and intra-speaker variability The system was evaluated on sixty sentences spoken by three speakers, two male and one female 93% of the segments are mapped into only one phoneme, 70% of the time the offset between the boundary found by the automatic alignment system and a hand transcriber is less than 10 ms The performance can be improved by applying more heuristic rules

64 citations

Proceedings ArticleDOI
21 Mar 1993
TL;DR: In this paper, the VOYAGER spoken language system was ported to Japanese and the structure of the system was reorganized so that language dependent information is separated from the core engine as much as possible.
Abstract: This paper describes our initial efforts at porting the VOYAGER spoken language system to Japanese. In the process we have reorganized the structure of the system so that language dependent information is separated from the core engine as much as possible. For example, this information is encoded in tabular or rule-based form for the natural language understanding and generation components. The internal system manager, discourse and dialogue component, and database are all maintained in language transparent form. Once the generation component was ported, data were collected from 40 native speakers of Japanese using a wizard collection paradigm. A portion of these data was used to train the natural language and segment-based speech recognition components. The system obtained an overall understanding accuracy of 52% on the test data, which is similar to our earlier reported results for English [1].

30 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: How common paralinguistic speech characteristics are affected by depression and suicidality and the application of this information in classification and prediction systems is reviewed.

607 citations

Journal ArticleDOI
TL;DR: The model successfully performed speech segmentation, word discovery and visual categorization from spontaneous infant-directed speech paired with video images of single objects, demonstrating the possibility of using state-of-the-art techniques from sensory pattern recognition and machine learning to implement cognitive models which can process raw sensor data without the need for human transcription or labeling.

591 citations

Journal ArticleDOI
TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.

570 citations

Book
14 Jan 2010
TL;DR: In this article, the authors present a glossary for language analysis and understanding in the context of spoken language input and output technologies, and evaluate their work with a set of annotated corpora.
Abstract: 1. Spoken language input Ronald Cole, Victor Zue, Wayne Ward, Melvyn J. Hunt, Richard M. Stern, Renato De Mori, Fabio Brugnara, Salim Roukos, Sadaoki Furui and Patti Price 2. Written language input Joseph Mariani, Sargur N. Srihari, Rohini K. Srihari, Richard G. Casey, Abdel Belaid, Claudie Faure, Eric Lecolinet, Isabelle Guyo, Colin Warwick and Rejean Plamondon 3. Language analysis and understanding Annie Zaenen, Hans Uszkoreit, Fred Karlsson, Lauri Karttunen, Antonio Sanfilippo, Stephen F. Pulman, Fernando Pereira and Ted Briscoe 4. Language generation Hans Uszkoreit, Eduard Hovy, Gertjan van Noord, Gunter Neumann and John Bateman 5. Spoken output technologies Ronald Cole, Yoshinori Sagisaka, Christophe d'Alessandro, Jean-Sylvain Lienard, Richard Sproat, Kathleen R. McKeown and Johanna D. Moore 6. Discourse and dialogue Hans Uszkoreit, Barbara Grosz, Donia Scott, Hans Kamp, Phil Cohe and Egidio Giachin 7. Document processing Annie Zaenen, Per-Kristian Halvorsen, Donna Harman, Peter Schauble, Alan Smeaton, Paul Jacobs, Karen Sparck Jones, Robert Dale, Richard H. Wojcik and James E. Hoard 8. Multilinguality Annie Zaenen, Martin Kay, Christian Boitet, Christian Fluhr, Alexander Waibel, Yeshwant K. Muthusamy and A. Lawrence Spitz 9. Multimodality Joseph Mariani, James L. Flanagan, Gerard Ligozat, Wolfgang Wahlster, Yacine Bellik, Alan J. Goldschen, Christian Benoit, Dominic W. Massaro and Michael M. Cohen 10. Transmission and storage Victor Zue, Isabel Trancoso, Bishnu S. Atal, Nikil S. Jayant and Dirk Van Compernolle 11. Mathematical methods Ronald Cole, Hans Uszkoreit, Steve Levinson, John Makhoul, Aravind Joshi, Herve Bourlard, Nelson Morgan, Ronald M. Kaplan and John Bridle 12. Language resources Ronald Cole, Antonio Zampolli, Eva Ejerhed, Ken Church, Lori Lamel, Ralph Grishman, Nicoletta Calzolari, Christian Galinski and Gerhard Budin 13. Evaluation Joseph Mariani, Lynette Hirschman, Henry S. Thompson, Beth Sundheim, John Hutchins, Ezra Black, Margaret King, David S. Pallett, Adrian Fourcin, Louis C. W. Pols, Sharon Oviatt, Herman J. M. Steeneken and Junichi Kanai Glossary Citation index Index.

569 citations

Journal ArticleDOI
TL;DR: It was found thattalkers with larger vowel spaces were generally more intelligible than talkers with reduced spaces, and a substantial portion of variability in normal speech intelligibility is traceable to specific acoustic-phonetic characteristics of the talker.

535 citations