scispace - formally typeset
Search or ask a question

Showing papers in "Speech Communication in 1997"


Journal ArticleDOI
TL;DR: This paper focuses on the task of automatically routing telephone calls based on a user's fluently spoken response to the open-ended prompt of “ How may I help you? ”.

664 citations


Journal ArticleDOI
TL;DR: Comparisons suggest that the human-machine performance gap can be reduced by basic research on improving low-level acoustic-phonetic modeling, on improving robustness with noise and channel variability, and on more accurately modeling spontaneous speech.

606 citations


Journal ArticleDOI
TL;DR: Experimental results show that MMIE optimisation of system structure and parameters can yield useful increases in recognition accuracy and the use of lattices makes MMIe training practicable for very complex recognition systems and large training sets.

203 citations


Journal ArticleDOI
TL;DR: Findings of “motor-equivalent” trading relations between the contributions of two constrictions to the same acoustic transfer function provide preliminary support for the idea that segmental control is based on acoustic or auditory-perceptual goals.

191 citations


Journal ArticleDOI
TL;DR: In this paper, a target-based control model of speech production using Feldman's Equilibrium Point Hypothesis is presented, which consists of simulations of articulatory movements during Vowel-to-Vowel sequences with a 2D biomechanical tongue model.

141 citations


Journal ArticleDOI
TL;DR: None of these acoustic features by itself can clearly discriminate between the two speaking styles, and it became clear that the performance of the speakers and the listeners varied enormously.

118 citations


Journal ArticleDOI
TL;DR: An outline of the properties of the human voice source in connected speech is formed based on the transformed LF-model and frequency domain correspondences which allows for a maximal specificational power with a limited number of parameters.

115 citations


Journal ArticleDOI
TL;DR: The voiced/unvoiced, unvoiced/voiced performance and pitch estimation errors for the proposed PDA and the reference system while utilising three speech databases are reported in details.

106 citations


Journal ArticleDOI
TL;DR: The perception of voicing in stops was found to rely strongly on phase information while the perception of the place of articulation was mainly determined by amplitude information, which demonstrated that phonetically different signals can be constructed by combining the same short-time amplitude spectra with different phase spectra.

101 citations


Journal ArticleDOI
TL;DR: It is suggested that many of the advantages to be gained from interaction between speech production and speech recognition communities will develop from integrating production models with the probabilistic analysis-by-synthesis strategy currently used by the technology community.

98 citations


Journal ArticleDOI
TL;DR: A new frequency domain parameter, Parabolic Spectral Parameter (PSP), for the quantification of the glottal volume velocity waveform is presented and the performance of the new parameter is compared to three commonly used time-based parameters and to one previously developed frequency domain method.

Journal ArticleDOI
TL;DR: This paper shows how an articulatory model, able to produce acoustic signals from articulatory motion, can learn to speak i.e. coordinate its movements in such a way that it utters meaningful sequences of sounds belonging to a given language.

Journal ArticleDOI
TL;DR: A general formulation of the multigram model, applicable to single or multiple parallel strings of data having either discrete or continuous values, and used to infer a set of variable-length acoustic units, directly from speech data.

Journal ArticleDOI
TL;DR: Bifurcations in 2-mass models of the vocal folds are analyzed and how the incorporation ofThe vocal tract affects bifurcation diagrams is studied to study how these features are affected by the underlying nonlinear dynamical system.

Journal ArticleDOI
TL;DR: Whether the first words are similar to babbling in all respects was evaluated in 4 subjects, using a database consisting of 152 hours of audio recording, and a tendency towards increasing use of labial consonants relative to alveolar consonants was observed.

Journal ArticleDOI
TL;DR: The need for standardisation in speech synthesizers and how this will help builders of systems make better use of synthesis is discussed, and the features of SSML (based on SGML, standard generalised markup language) are discussed.

Journal ArticleDOI
TL;DR: It is argued that spoken language interfaces (SLIs) are essential to making this vision of ubiquitous access to multimedia communication services between people and machines a reality.

Journal ArticleDOI
TL;DR: The study reveals that in human-machine interactions, both discourse segmentation and utterance purpose can have particular prosodic correlates, although speakers also mark this information through choice of wording.

Journal ArticleDOI
Anne Cutler1
TL;DR: The study of spoken-language processing by human listeners requires cross-linguistic comparison, and aspects of the universal processing model are revealed by analysis of language-specific effects.

Journal ArticleDOI
TL;DR: La regle de decision est implantee dans une approche multiniveaux correspondant a une combinaison d'une reconnaissance de parole au niveau of l'etat de l'art, d'un algorithme de recherche des N-meilleures solutions, qui est egalement decrit dans cet article.

Journal ArticleDOI
TL;DR: The results obtained prove that HMM adaptation and preprocessing techniques can be advantageously combined to improve Automatic Speech Recognition (ASR) robustness and show that spectral subtraction improves speech detection under noisy GSM conditions.

Journal ArticleDOI
TL;DR: The Dial-Your-Disc system is presented, an interactive system that supports browsing through a large database of musical information and generates a spoken monologue once a musical composition has been selected.

Journal ArticleDOI
TL;DR: The analysis of the average long-term spectrum of the successfully filtered sequences reveals a combined effect of equalization and band selection that provides insights into TSSP filtering, and it is shown that, when supplementary differential parameters are not used, the recognition rate can be improved even for clean speech, just by properly filtering the TSSPs.

Journal ArticleDOI
TL;DR: The RailTel system developed at LIMSI to provide vocal access to static train timetable information in French is described, and a field trial carried out to assess the technical adequacy of available speech technology for interactive services is described.

Journal ArticleDOI
TL;DR: It is shown empirically that an automated spoken questionnaire could successfully collect and recognize census data, and that subjects preferred the spoken system to written questionnaires.

Journal ArticleDOI
TL;DR: It is shown that a small set of modality properties are surprisingly powerful in justifying, supporting and correcting the claims set, and it is argued that their power could be made available to systems and interface designers who have to make modality choices during early design of speech-related systems and interfaces.

Journal ArticleDOI
TL;DR: The results indicate the validity of the prominence based approach as an interface between linguistics and acoustics, and two algorithms to transform prominence values to prosodic parameters are evaluated.

Journal ArticleDOI
TL;DR: A discriminant analysis of jitter time series extracted from 279 sustained vocoids shows that the jitter features which separately describe the predictable and random components better characterise healthy and dysphonic speakers than a traditional jitter feature.

Journal ArticleDOI
TL;DR: Several modifications to the Klatt synthesizer may improve synthesis of pathological voices, including providing jitter and shimmer parameters; updating synthesis parameters as a function of period, rather than absolute time; modeling diplophonia with independent parameters for fundamental frequency and amplitude variations; providing a parameter to increase low-frequency energy; and adding more pole-zero pairs.

Journal ArticleDOI
TL;DR: The performance of telephone speech recognition using Bayesian adaptation is shown to be superior to that using maximum-likelihood adaptation and the affine transformation is also demonstrated to be significantly better than the bias transformation.