Conference

International Conference on Spoken Language Processing

About: International Conference on Spoken Language Processing is an academic conference. The conference publishes majorly in the area(s): Speech processing & Speech synthesis. Over the lifetime, 802 publications have been published by the conference receiving 19622 citations.

...read moreread less

Topics: Speech processing, Speech synthesis, Speaker recognition, Hidden Markov model, Natural language ...read more

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A compact model for speaker-adaptive training

[...]

Tasos Anastasakos¹, J. McDonough, Richard Schwartz, John Makhoul•Institutions (1)

Northeastern University¹

03 Oct 1996

TL;DR: A novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition that jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models.

...read moreread less

Abstract: We formulate a novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition. It is motivated by the fact that variability in SI acoustic models is attributed to both phonetic variation and variation among the speakers of the training population, that is independent of the information content of the speech signal. These two variation sources are decoupled and the proposed method jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models. We compare the proposed training algorithm to the common SI training paradigm within the context of supervised adaptation. We show that the proposed acoustic models are more efficiently adapted to the test speakers, thus achieving significant overall word error rate reductions of 19% and 25% for 20K and 5K vocabulary tasks respectively.

...read moreread less

586 citations

Proceedings Article•DOI•

Recognizing emotion in speech

[...]

Frank Dellaert¹, Thomas Polzin, Alex Waibel•Institutions (1)

Carnegie Mellon University¹

03 Oct 1996

TL;DR: A new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour, is presented, which obtains classification performance that is close to human performance on the task.

...read moreread less

Abstract: The paper explores several statistical pattern recognition techniques to classify utterances according to their emotional content. The authors have recorded a corpus containing emotional speech with over a 1000 utterances from different speakers. They present a new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour. To make maximal use of the limited amount of training data available, they introduce a novel pattern recognition technique: majority voting of subspace specialists. Using this technique, they obtain classification performance that is close to human performance on the task.

...read moreread less

521 citations

Proceedings Article•DOI•

The MBROLA project: towards a set of high quality speech synthesizers free of use for non commercial purposes

[...]

Thierry Dutoit, Vincent Pagel¹, Nicolas Pierret, F. Bataille, O. van der Vrecken - Show less +1 more•Institutions (1)

Faculté polytechnique de Mons¹

03 Oct 1996

TL;DR: The MBROLA project, initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications.

...read moreread less

Abstract: The aim of the MBROLA project, initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. The ultimate goal is to boost academic research on speech synthesis, and particularly on prosody generation, known as one of the biggest challenges taken up by text-to-speech synthesizers for the years to come. Central to the MBROLA project is MBROLA 2.00, a speech synthesizer based on the concatenation of diphones. Executable files of this synthesizer have been made freely available for many computers/operating systems, as well as a first diphone database for a French male voice. We describe the terms of participation to the project, as a user, as an associated developer, or as a database provider.

...read moreread less

505 citations

Proceedings Article•DOI•

A form-based dialogue manager for spoken language applications

[...]

David Goddeau¹, Helen Meng, J. Polifroni, Stephanie Seneff, Senis Busayapongchai - Show less +1 more•Institutions (1)

Massachusetts Institute of Technology¹

03 Oct 1996

TL;DR: An alternative dialogue planning algorithm based on the notion of filling in an electronic form, or "E-form", which is used in an application to search a database of used car advertisements and found over 70% compliance in answering specific system prompts.

...read moreread less

Abstract: A popular approach to dialogue management is based on a finite state model, where user utterances trigger transitions between the dialogue states, and these states, in turn, determine the system's response. The paper describes an alternative dialogue planning algorithm based on the notion of filling in an electronic form, or "E-form". Each slot has associated prompts that guide the user through the dialogue, and a priority that determines the order in which the system tries to acquire information. These slots can be optional or mandatory. However, the user is not restricted to follow the system's lead, and is free to ignore the prompts and take the initiative in the dialogue. The E-form based dialogue planner has been used in an application to search a database of used car advertisements. The goal is to assist the user in selecting, from this database, a small list of cars which match their constraints. For a large number of dialogues collected from over 600 naive users, we found over 70% compliance in answering specific system prompts.

...read moreread less

330 citations

Journal Article•DOI•

Phonotactics and Syllable Stress: Implications for the Processing of Spoken Nonsense Words:

[...]

Michael S. Vitevitch¹, Paul A. Luce¹, Jan Charles‐Luce¹, David Kemmerer¹•Institutions (1)

University at Buffalo¹

01 Jan 1997

TL;DR: Two experiments using bisyllabic CVCCVC nonsense words that varied in phonotactic probability and stress placement were conducted to examine the influences of phonotact and metrical information on spoken word recognition.

...read moreread less

Abstract: phonotactics Two experiments using bisyllabic CVCCVC nonsense words that varied in phonotactic probability and stress placement were conducted to examine the influences of phonotactic and metrical information on spoken word recognition. Experiment 1 examined participants' intuitions about the phonological "goodness" of nonsense words. Experiment 2 examined processing times for the same stimuli in a speeded auditory repetition task. The results of both studies provide further evidence that the phonotactic configuration and stress placement of spoken stimuli have important implications for the representation and processing of spoken words. syllable stress

...read moreread less

313 citations

Collapse

Performance

Metrics

802

Papers

19,622

Citations

No. of papers from the Conference in previous years
Year	Papers
2015	1
2008	1
2007	3
2006	21
2004	19
2002	42