scispace - formally typeset
Search or ask a question
Conference

International Conference on Spoken Language Processing 

About: International Conference on Spoken Language Processing is an academic conference. The conference publishes majorly in the area(s): Speech processing & Speech synthesis. Over the lifetime, 802 publications have been published by the conference receiving 19622 citations.


Papers
More filters
Proceedings ArticleDOI
03 Oct 1996
TL;DR: A novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition that jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models.
Abstract: We formulate a novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition. It is motivated by the fact that variability in SI acoustic models is attributed to both phonetic variation and variation among the speakers of the training population, that is independent of the information content of the speech signal. These two variation sources are decoupled and the proposed method jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models. We compare the proposed training algorithm to the common SI training paradigm within the context of supervised adaptation. We show that the proposed acoustic models are more efficiently adapted to the test speakers, thus achieving significant overall word error rate reductions of 19% and 25% for 20K and 5K vocabulary tasks respectively.

586 citations

Proceedings ArticleDOI
03 Oct 1996
TL;DR: A new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour, is presented, which obtains classification performance that is close to human performance on the task.
Abstract: The paper explores several statistical pattern recognition techniques to classify utterances according to their emotional content. The authors have recorded a corpus containing emotional speech with over a 1000 utterances from different speakers. They present a new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour. To make maximal use of the limited amount of training data available, they introduce a novel pattern recognition technique: majority voting of subspace specialists. Using this technique, they obtain classification performance that is close to human performance on the task.

521 citations

Proceedings ArticleDOI
03 Oct 1996
TL;DR: The MBROLA project, initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications.
Abstract: The aim of the MBROLA project, initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. The ultimate goal is to boost academic research on speech synthesis, and particularly on prosody generation, known as one of the biggest challenges taken up by text-to-speech synthesizers for the years to come. Central to the MBROLA project is MBROLA 2.00, a speech synthesizer based on the concatenation of diphones. Executable files of this synthesizer have been made freely available for many computers/operating systems, as well as a first diphone database for a French male voice. We describe the terms of participation to the project, as a user, as an associated developer, or as a database provider.

505 citations

Proceedings ArticleDOI
03 Oct 1996
TL;DR: An alternative dialogue planning algorithm based on the notion of filling in an electronic form, or "E-form", which is used in an application to search a database of used car advertisements and found over 70% compliance in answering specific system prompts.
Abstract: A popular approach to dialogue management is based on a finite state model, where user utterances trigger transitions between the dialogue states, and these states, in turn, determine the system's response. The paper describes an alternative dialogue planning algorithm based on the notion of filling in an electronic form, or "E-form". Each slot has associated prompts that guide the user through the dialogue, and a priority that determines the order in which the system tries to acquire information. These slots can be optional or mandatory. However, the user is not restricted to follow the system's lead, and is free to ignore the prompts and take the initiative in the dialogue. The E-form based dialogue planner has been used in an application to search a database of used car advertisements. The goal is to assist the user in selecting, from this database, a small list of cars which match their constraints. For a large number of dialogues collected from over 600 naive users, we found over 70% compliance in answering specific system prompts.

330 citations

Journal ArticleDOI
01 Jan 1997
TL;DR: Two experiments using bisyllabic CVCCVC nonsense words that varied in phonotactic probability and stress placement were conducted to examine the influences of phonotact and metrical information on spoken word recognition.
Abstract: phonotactics Two experiments using bisyllabic CVCCVC nonsense words that varied in phonotactic probability and stress placement were conducted to examine the influences of phonotactic and metrical information on spoken word recognition. Experiment 1 examined participants' intuitions about the phonological "goodness" of nonsense words. Experiment 2 examined processing times for the same stimuli in a speeded auditory repetition task. The results of both studies provide further evidence that the phonotactic configuration and stress placement of spoken stimuli have important implications for the representation and processing of spoken words. syllable stress

313 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
20151
20081
20073
200621
200419
200242