Conference
International Conference on Spoken Language Processing
About: International Conference on Spoken Language Processing is an academic conference. The conference publishes majorly in the area(s): Speech processing & Speech synthesis. Over the lifetime, 802 publications have been published by the conference receiving 19622 citations.
Topics: Speech processing, Speech synthesis, Speaker recognition, Hidden Markov model, Natural language
Papers
More filters
••
03 Oct 1996TL;DR: A novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition that jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models.
Abstract: We formulate a novel approach to estimating the parameters of continuous density HMMs for speaker-independent (SI) continuous speech recognition. It is motivated by the fact that variability in SI acoustic models is attributed to both phonetic variation and variation among the speakers of the training population, that is independent of the information content of the speech signal. These two variation sources are decoupled and the proposed method jointly annihilates the inter-speaker variation and estimates the HMM parameters of the SI acoustic models. We compare the proposed training algorithm to the common SI training paradigm within the context of supervised adaptation. We show that the proposed acoustic models are more efficiently adapted to the test speakers, thus achieving significant overall word error rate reductions of 19% and 25% for 20K and 5K vocabulary tasks respectively.
586 citations
••
03 Oct 1996TL;DR: A new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour, is presented, which obtains classification performance that is close to human performance on the task.
Abstract: The paper explores several statistical pattern recognition techniques to classify utterances according to their emotional content. The authors have recorded a corpus containing emotional speech with over a 1000 utterances from different speakers. They present a new method of extracting prosodic features from speech, based on a smoothing spline approximation of the pitch contour. To make maximal use of the limited amount of training data available, they introduce a novel pattern recognition technique: majority voting of subspace specialists. Using this technique, they obtain classification performance that is close to human performance on the task.
521 citations
••
03 Oct 1996TL;DR: The MBROLA project, initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications.
Abstract: The aim of the MBROLA project, initiated by the Faculte Polytechnique de Mons (Belgium), is to obtain a set of speech synthesizers for as many voices, languages and dialects as possible, free of use for non-commercial and non-military applications. The ultimate goal is to boost academic research on speech synthesis, and particularly on prosody generation, known as one of the biggest challenges taken up by text-to-speech synthesizers for the years to come. Central to the MBROLA project is MBROLA 2.00, a speech synthesizer based on the concatenation of diphones. Executable files of this synthesizer have been made freely available for many computers/operating systems, as well as a first diphone database for a French male voice. We describe the terms of participation to the project, as a user, as an associated developer, or as a database provider.
505 citations
••
03 Oct 1996TL;DR: An alternative dialogue planning algorithm based on the notion of filling in an electronic form, or "E-form", which is used in an application to search a database of used car advertisements and found over 70% compliance in answering specific system prompts.
Abstract: A popular approach to dialogue management is based on a finite state model, where user utterances trigger transitions between the dialogue states, and these states, in turn, determine the system's response. The paper describes an alternative dialogue planning algorithm based on the notion of filling in an electronic form, or "E-form". Each slot has associated prompts that guide the user through the dialogue, and a priority that determines the order in which the system tries to acquire information. These slots can be optional or mandatory. However, the user is not restricted to follow the system's lead, and is free to ignore the prompts and take the initiative in the dialogue. The E-form based dialogue planner has been used in an application to search a database of used car advertisements. The goal is to assist the user in selecting, from this database, a small list of cars which match their constraints. For a large number of dialogues collected from over 600 naive users, we found over 70% compliance in answering specific system prompts.
330 citations
••
01 Jan 1997TL;DR: Two experiments using bisyllabic CVCCVC nonsense words that varied in phonotactic probability and stress placement were conducted to examine the influences of phonotact and metrical information on spoken word recognition.
Abstract: phonotactics Two experiments using bisyllabic CVCCVC nonsense words that varied in phonotactic probability and stress placement were conducted to examine the influences of phonotactic and metrical information on spoken word recognition. Experiment 1 examined participants' intuitions about the phonological "goodness" of nonsense words. Experiment 2 examined processing times for the same stimuli in a speeded auditory repetition task. The results of both studies provide further evidence that the phonotactic configuration and stress placement of spoken stimuli have important implications for the representation and processing of spoken words. syllable stress
313 citations