Home
/
Authors
/
Stephanie Seneff

Author

Stephanie Seneff

Bio: Stephanie Seneff is an academic researcher from Massachusetts Institute of Technology. The author has contributed to research in topics: Spoken language & Natural language. The author has an hindex of 4, co-authored 5 publications receiving 610 citations.

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Speech database development at MIT: Timit and beyond

[...]

Victor W. Zue¹, Stephanie Seneff¹, James Glass¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Aug 1990-Speech Communication

TL;DR: The experiences of researchers at MIT in the collection of two large speech databases, timit and voyager, are described, which have somewhat complementary objectives.

...read moreread less

570 citations

Dissertation•

Pitch and spectral analysis of speech based on an auditory synchrony model

[...]

Stephanie Seneff

01 Jan 1985

TL;DR: The approach of the thesis is to process the incoming speech signal through a system which models what is known about peripheral auditory processing, and then to apply a synchrony measure to accentuate the spectral attributes that are known to be important for the identification of the phonetic content of speech.

...read moreread less

Abstract: There has been a substantial interest in the last few decades in the problem of training computers to recognize human speech. In spite of the concentrated efforts of conscientious teams of researchers, however, the solution remains elusive, unless the task is kept so restricted as to be uninteresting. These discouraging results may be due in part to the fact that researchers in the past paid little attention to models for human processing of auditory signals to guide in the design of speech frontend processing strategies. The picture is rapidly changing at the present time, although we have not yet realized any direct benefits from the available models. Voiced speech sounds are characterized in the spectral domain by prominent peaks at specific frequencies that correspond to certain resonances in the vocal tract. The frequencies of these formants" convey most of the information necessary to identify the phonetic content. The peripheral level of the auditory system performs a frequency analysis, but also compresses the dynamic range of input stimuli. The net effect is to reduce the prominence of spectral peaks, relative to those obtained through standard Fourier analysis. Recent research on the response of a large population of auditory nerve fibers in the cat's ear to speech-like stimuli [Sachs and Young, 1979, 19801 has demonstrated that mean rate response alone does not in general convey adequate information to show clearly the frequencies of the formants. However, a significant amount of information is retained in the patterns of firing which is lost when a simple count of number of spikes per unit time is derived. Sachs and Young, and others, have suggested that a form of processing that measures the synchrony in the response to certain periodicites may be able to accentuate peaks in the spectrum. This thesis concerns the development of a specific strategy for such synchrony detection, and its application to the two separate tasks of spectral analysis and estimation of the fundamental frequency of voicing. The approach of the thesis is to process the incoming speech signal through a system which models what is known about peripheral auditory processing, and then to apply a synchrony measure to accentuate the spectral attributes that are known to be important for the identification of the phonetic content of speech. The design of the synchrony measure is motivated in large part by a preconceived notion of what represents a good' result. The main criteria were that peaks in the original speech spectrum should be preserved, but amplitude information, particularly general

...read moreread less

65 citations

Proceedings Article•DOI•

A bilingual VOYAGER system

[...]

James Glass¹, David Goodine¹, Michael S. Phillips¹, Shinsuke Sakai¹, Stephanie Seneff¹, Victor W. Zue¹ - Show less +2 more•Institutions (1)

Massachusetts Institute of Technology¹

21 Mar 1993

TL;DR: In this paper, the VOYAGER spoken language system was ported to Japanese and the structure of the system was reorganized so that language dependent information is separated from the core engine as much as possible.

...read moreread less

Abstract: This paper describes our initial efforts at porting the VOYAGER spoken language system to Japanese. In the process we have reorganized the structure of the system so that language dependent information is separated from the core engine as much as possible. For example, this information is encoded in tabular or rule-based form for the natural language understanding and generation components. The internal system manager, discourse and dialogue component, and database are all maintained in language transparent form. Once the generation component was ported, data were collected from 40 native speakers of Japanese using a wizard collection paradigm. A portion of these data was used to train the natural language and segment-based speech recognition components. The system obtained an overall understanding accuracy of 52% on the test data, which is similar to our earlier reported results for English [1].

...read moreread less

30 citations

Proceedings Article•

A bilingual Voyager system.

[...]

James Glass¹, David Goodine¹, Michael S. Phillips¹, Shinsuke Sakai¹, Stephanie Seneff¹, Victor W. Zue¹ - Show less +2 more•Institutions (1)

Massachusetts Institute of Technology¹

01 Jan 1993

TL;DR: This paper describes the initial efforts at porting the VOYAGER spoken language system to Japanese, and has reorganized the structure of the system so that language dependent information is separated from the core engine as much as possible.

...read moreread less

16 citations

Proceedings Article•DOI•

Talking To Your Database: Interactive Spoken Language Interfaces

[...]

James Glass¹, David Goodine, L. Hirschman, Hong Leung, Michael S. Phillips, J. Polifroni, Stephanie Seneff, Victor W. Zue - Show less +4 more•Institutions (1)

Massachusetts Institute of Technology¹

31 Oct 1991

TL;DR: Spoken language interfaces offer significant benefits over conventional user interfaces for certain classes of applications, particularly handsbusy or eyes-busy applications, where typed input and/or visual displays may not be possible or convenient.

...read moreread less

Abstract: This paper describes research on spoken language interfaces for interactive problem solving A spoken language interface combines speech recognition technology with language understanding technology to provide an application-specific interface The interface converts acoustic input (speech) into a series of words which are interpreted to produce the appropriate response and/or action The system response may be spoken or it may be in the form of a display, as appropriate to the needs of the user Spoken language interfaces offer significant benefits over conventional user interfaces for certain classes of applications, particularly handsbusy or eyes-busy applications, where typed input and/or visual displays may not be possible or convenient To illustrate this, we present two examples of spoken language interfaces developed at MIT: an interactive system for urban navigation, VOYAGER; and an air travel planning system ATISThe VOYAGER system currently runs in a few times real time and is able to provide answers for more than 50% of user queries for untrained users

...read moreread less

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Calculation of a constant Q spectral transform

[...]

Judith C. Brown

01 Jan 1991-Journal of the Acoustical Society of America

TL;DR: In this article, a constant Q transform with a constant ratio of center frequency to resolution has been proposed to obtain a constant pattern in the frequency domain for sounds with harmonic frequency components.

...read moreread less

Abstract: The frequencies that have been chosen to make up the scale of Western music are geometrically spaced. Thus the discrete Fourier transform (DFT), although extremely efficient in the fast Fourier transform implementation, yields components which do not map efficiently to musical frequencies. This is because the frequency components calculated with the DFT are separated by a constant frequency difference and with a constant resolution. A calculation similar to a discrete Fourier transform but with a constant ratio of center frequency to resolution has been made; this is a constant Q transform and is equivalent to a 1/24‐oct filter bank. Thus there are two frequency components for each musical note so that two adjacent notes in the musical scale played simultaneously can be resolved anywhere in the musical frequency range. This transform against log (frequency) to obtain a constant pattern in the frequency domain for sounds with harmonic frequency components has been plotted. This is compared to the conventio...

...read moreread less

890 citations

Journal Article•

The Organization of the Cochlear Receptor

[...]

I. C. Whitfield

01 Jan 1967-Journal of Anatomy

TL;DR: Alk-3-en-1-ols are produced in good yields from isobutylene and formaldehyde in the presence of organic carboxylic acid salts of Group IB metals.

...read moreread less

Abstract: The yield of alkenols and cycloalkenols is substantially improved by carrying out the reaction of olefins with formaldehyde in the presence of selected catalysts. In accordance with one embodiment, alk-3-en-1-ols are produced in good yields from isobutylene and formaldehyde in the presence of organic carboxylic acid salts of Group IB metals.

...read moreread less

851 citations

Journal Article•DOI•

A review of depression and suicide risk assessment using speech analysis

[...]

Nicholas Cummins¹, Stefan Scherer², Jarek Krajewski, Sebastian Schnieder, Julien Epps¹, Thomas F. Quatieri³ - Show less +2 more•Institutions (3)

NICTA¹, University of Southern California², Massachusetts Institute of Technology³

01 Jul 2015-Speech Communication

TL;DR: How common paralinguistic speech characteristics are affected by depression and suicidality and the application of this information in classification and prediction systems is reviewed.

...read moreread less

607 citations

Book•

Survey of the State of the Art in Human Language Technology

[...]

R. Cole

14 Jan 2010

TL;DR: In this article, the authors present a glossary for language analysis and understanding in the context of spoken language input and output technologies, and evaluate their work with a set of annotated corpora.

...read moreread less

Abstract: 1. Spoken language input Ronald Cole, Victor Zue, Wayne Ward, Melvyn J. Hunt, Richard M. Stern, Renato De Mori, Fabio Brugnara, Salim Roukos, Sadaoki Furui and Patti Price 2. Written language input Joseph Mariani, Sargur N. Srihari, Rohini K. Srihari, Richard G. Casey, Abdel Belaid, Claudie Faure, Eric Lecolinet, Isabelle Guyo, Colin Warwick and Rejean Plamondon 3. Language analysis and understanding Annie Zaenen, Hans Uszkoreit, Fred Karlsson, Lauri Karttunen, Antonio Sanfilippo, Stephen F. Pulman, Fernando Pereira and Ted Briscoe 4. Language generation Hans Uszkoreit, Eduard Hovy, Gertjan van Noord, Gunter Neumann and John Bateman 5. Spoken output technologies Ronald Cole, Yoshinori Sagisaka, Christophe d'Alessandro, Jean-Sylvain Lienard, Richard Sproat, Kathleen R. McKeown and Johanna D. Moore 6. Discourse and dialogue Hans Uszkoreit, Barbara Grosz, Donia Scott, Hans Kamp, Phil Cohe and Egidio Giachin 7. Document processing Annie Zaenen, Per-Kristian Halvorsen, Donna Harman, Peter Schauble, Alan Smeaton, Paul Jacobs, Karen Sparck Jones, Robert Dale, Richard H. Wojcik and James E. Hoard 8. Multilinguality Annie Zaenen, Martin Kay, Christian Boitet, Christian Fluhr, Alexander Waibel, Yeshwant K. Muthusamy and A. Lawrence Spitz 9. Multimodality Joseph Mariani, James L. Flanagan, Gerard Ligozat, Wolfgang Wahlster, Yacine Bellik, Alan J. Goldschen, Christian Benoit, Dominic W. Massaro and Michael M. Cohen 10. Transmission and storage Victor Zue, Isabel Trancoso, Bishnu S. Atal, Nikil S. Jayant and Dirk Van Compernolle 11. Mathematical methods Ronald Cole, Hans Uszkoreit, Steve Levinson, John Makhoul, Aravind Joshi, Herve Bourlard, Nelson Morgan, Ronald M. Kaplan and John Bridle 12. Language resources Ronald Cole, Antonio Zampolli, Eva Ejerhed, Ken Church, Lori Lamel, Ralph Grishman, Nicoletta Calzolari, Christian Galinski and Gerhard Budin 13. Evaluation Joseph Mariani, Lynette Hirschman, Henry S. Thompson, Beth Sundheim, John Hutchins, Ezra Black, Margaret King, David S. Pallett, Adrian Fourcin, Louis C. W. Pols, Sharon Oviatt, Herman J. M. Steeneken and Junichi Kanai Glossary Citation index Index.

...read moreread less

569 citations

Journal Article•DOI•

Intelligibility of normal speech I: global and fine-grained acoustic-phonetic talker characteristics

[...]

Ann R. Bradlow¹, Gina M. Torretta¹, David B. Pisoni¹•Institutions (1)

Indiana University¹

01 Dec 1996-Speech Communication

TL;DR: It was found thattalkers with larger vowel spaces were generally more intelligible than talkers with reduced spaces, and a substantial portion of variability in normal speech intelligibility is traceable to specific acoustic-phonetic characteristics of the talker.

...read moreread less

535 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133

Collapse