scispace - formally typeset
Search or ask a question
Author

Stephanie Seneff

Bio: Stephanie Seneff is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Spoken language & Natural language. The author has an hindex of 4, co-authored 5 publications receiving 610 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.

570 citations

Dissertation
01 Jan 1985
TL;DR: The approach of the thesis is to process the incoming speech signal through a system which models what is known about peripheral auditory processing, and then to apply a synchrony measure to accentuate the spectral attributes that are known to be important for the identification of the phonetic content of speech.
Abstract: There has been a substantial interest in the last few decades in the problem of training computers to recognize human speech. In spite of the concentrated efforts of conscientious teams of researchers, however, the solution remains elusive, unless the task is kept so restricted as to be uninteresting. These discouraging results may be due in part to the fact that researchers in the past paid little attention to models for human processing of auditory signals to guide in the design of speech frontend processing strategies. The picture is rapidly changing at the present time, although we have not yet realized any direct benefits from the available models. Voiced speech sounds are characterized in the spectral domain by prominent peaks at specific frequencies that correspond to certain resonances in the vocal tract. The frequencies of these formants" convey most of the information necessary to identify the phonetic content. The peripheral level of the auditory system performs a frequency analysis, but also compresses the dynamic range of input stimuli. The net effect is to reduce the prominence of spectral peaks, relative to those obtained through standard Fourier analysis. Recent research on the response of a large population of auditory nerve fibers in the cat's ear to speech-like stimuli [Sachs and Young, 1979, 19801 has demonstrated that mean rate response alone does not in general convey adequate information to show clearly the frequencies of the formants. However, a significant amount of information is retained in the patterns of firing which is lost when a simple count of number of spikes per unit time is derived. Sachs and Young, and others, have suggested that a form of processing that measures the synchrony in the response to certain periodicites may be able to accentuate peaks in the spectrum. This thesis concerns the development of a specific strategy for such synchrony detection, and its application to the two separate tasks of spectral analysis and estimation of the fundamental frequency of voicing. The approach of the thesis is to process the incoming speech signal through a system which models what is known about peripheral auditory processing, and then to apply a synchrony measure to accentuate the spectral attributes that are known to be important for the identification of the phonetic content of speech. The design of the synchrony measure is motivated in large part by a preconceived notion of what represents a good' result. The main criteria were that peaks in the original speech spectrum should be preserved, but amplitude information, particularly general

65 citations

Proceedings ArticleDOI
21 Mar 1993
TL;DR: In this paper, the VOYAGER spoken language system was ported to Japanese and the structure of the system was reorganized so that language dependent information is separated from the core engine as much as possible.
Abstract: This paper describes our initial efforts at porting the VOYAGER spoken language system to Japanese. In the process we have reorganized the structure of the system so that language dependent information is separated from the core engine as much as possible. For example, this information is encoded in tabular or rule-based form for the natural language understanding and generation components. The internal system manager, discourse and dialogue component, and database are all maintained in language transparent form. Once the generation component was ported, data were collected from 40 native speakers of Japanese using a wizard collection paradigm. A portion of these data was used to train the natural language and segment-based speech recognition components. The system obtained an overall understanding accuracy of 52% on the test data, which is similar to our earlier reported results for English [1].

30 citations

Proceedings Article
01 Jan 1993
TL;DR: This paper describes the initial efforts at porting the VOYAGER spoken language system to Japanese, and has reorganized the structure of the system so that language dependent information is separated from the core engine as much as possible.
Abstract: This paper describes our initial efforts at porting the VOYAGER spoken language system to Japanese. In the process we have reorganized the structure of the system so that language dependent information is separated from the core engine as much as possible. For example, this information is encoded in tabular or rule-based form for the natural language understanding and generation components. The internal system manager, discourse and dialogue component, and database are all maintained in language transparent form. Once the generation component was ported, data were collected from 40 native speakers of Japanese using a wizard collection paradigm. A portion of these data was used to train the natural language and segment-based speech recognition components. The system obtained an overall understanding accuracy of 52% on the test data, which is similar to our earlier reported results for English [1].

16 citations

Proceedings ArticleDOI
31 Oct 1991
TL;DR: Spoken language interfaces offer significant benefits over conventional user interfaces for certain classes of applications, particularly handsbusy or eyes-busy applications, where typed input and/or visual displays may not be possible or convenient.
Abstract: This paper describes research on spoken language interfaces for interactive problem solving A spoken language interface combines speech recognition technology with language understanding technology to provide an application-specific interface The interface converts acoustic input (speech) into a series of words which are interpreted to produce the appropriate response and/or action The system response may be spoken or it may be in the form of a display, as appropriate to the needs of the user Spoken language interfaces offer significant benefits over conventional user interfaces for certain classes of applications, particularly handsbusy or eyes-busy applications, where typed input and/or visual displays may not be possible or convenient To illustrate this, we present two examples of spoken language interfaces developed at MIT: an interactive system for urban navigation, VOYAGER; and an air travel planning system ATISThe VOYAGER system currently runs in a few times real time and is able to provide answers for more than 50% of user queries for untrained users

Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, a constant Q transform with a constant ratio of center frequency to resolution has been proposed to obtain a constant pattern in the frequency domain for sounds with harmonic frequency components.
Abstract: The frequencies that have been chosen to make up the scale of Western music are geometrically spaced. Thus the discrete Fourier transform (DFT), although extremely efficient in the fast Fourier transform implementation, yields components which do not map efficiently to musical frequencies. This is because the frequency components calculated with the DFT are separated by a constant frequency difference and with a constant resolution. A calculation similar to a discrete Fourier transform but with a constant ratio of center frequency to resolution has been made; this is a constant Q transform and is equivalent to a 1/24‐oct filter bank. Thus there are two frequency components for each musical note so that two adjacent notes in the musical scale played simultaneously can be resolved anywhere in the musical frequency range. This transform against log (frequency) to obtain a constant pattern in the frequency domain for sounds with harmonic frequency components has been plotted. This is compared to the conventio...

890 citations

Journal Article
TL;DR: Alk-3-en-1-ols are produced in good yields from isobutylene and formaldehyde in the presence of organic carboxylic acid salts of Group IB metals.
Abstract: The yield of alkenols and cycloalkenols is substantially improved by carrying out the reaction of olefins with formaldehyde in the presence of selected catalysts. In accordance with one embodiment, alk-3-en-1-ols are produced in good yields from isobutylene and formaldehyde in the presence of organic carboxylic acid salts of Group IB metals.

851 citations

Journal ArticleDOI
TL;DR: How common paralinguistic speech characteristics are affected by depression and suicidality and the application of this information in classification and prediction systems is reviewed.

607 citations

Book
14 Jan 2010
TL;DR: In this article, the authors present a glossary for language analysis and understanding in the context of spoken language input and output technologies, and evaluate their work with a set of annotated corpora.
Abstract: 1. Spoken language input Ronald Cole, Victor Zue, Wayne Ward, Melvyn J. Hunt, Richard M. Stern, Renato De Mori, Fabio Brugnara, Salim Roukos, Sadaoki Furui and Patti Price 2. Written language input Joseph Mariani, Sargur N. Srihari, Rohini K. Srihari, Richard G. Casey, Abdel Belaid, Claudie Faure, Eric Lecolinet, Isabelle Guyo, Colin Warwick and Rejean Plamondon 3. Language analysis and understanding Annie Zaenen, Hans Uszkoreit, Fred Karlsson, Lauri Karttunen, Antonio Sanfilippo, Stephen F. Pulman, Fernando Pereira and Ted Briscoe 4. Language generation Hans Uszkoreit, Eduard Hovy, Gertjan van Noord, Gunter Neumann and John Bateman 5. Spoken output technologies Ronald Cole, Yoshinori Sagisaka, Christophe d'Alessandro, Jean-Sylvain Lienard, Richard Sproat, Kathleen R. McKeown and Johanna D. Moore 6. Discourse and dialogue Hans Uszkoreit, Barbara Grosz, Donia Scott, Hans Kamp, Phil Cohe and Egidio Giachin 7. Document processing Annie Zaenen, Per-Kristian Halvorsen, Donna Harman, Peter Schauble, Alan Smeaton, Paul Jacobs, Karen Sparck Jones, Robert Dale, Richard H. Wojcik and James E. Hoard 8. Multilinguality Annie Zaenen, Martin Kay, Christian Boitet, Christian Fluhr, Alexander Waibel, Yeshwant K. Muthusamy and A. Lawrence Spitz 9. Multimodality Joseph Mariani, James L. Flanagan, Gerard Ligozat, Wolfgang Wahlster, Yacine Bellik, Alan J. Goldschen, Christian Benoit, Dominic W. Massaro and Michael M. Cohen 10. Transmission and storage Victor Zue, Isabel Trancoso, Bishnu S. Atal, Nikil S. Jayant and Dirk Van Compernolle 11. Mathematical methods Ronald Cole, Hans Uszkoreit, Steve Levinson, John Makhoul, Aravind Joshi, Herve Bourlard, Nelson Morgan, Ronald M. Kaplan and John Bridle 12. Language resources Ronald Cole, Antonio Zampolli, Eva Ejerhed, Ken Church, Lori Lamel, Ralph Grishman, Nicoletta Calzolari, Christian Galinski and Gerhard Budin 13. Evaluation Joseph Mariani, Lynette Hirschman, Henry S. Thompson, Beth Sundheim, John Hutchins, Ezra Black, Margaret King, David S. Pallett, Adrian Fourcin, Louis C. W. Pols, Sharon Oviatt, Herman J. M. Steeneken and Junichi Kanai Glossary Citation index Index.

569 citations

Journal ArticleDOI
TL;DR: It was found thattalkers with larger vowel spaces were generally more intelligible than talkers with reduced spaces, and a substantial portion of variability in normal speech intelligibility is traceable to specific acoustic-phonetic characteristics of the talker.

535 citations