scispace - formally typeset
Search or ask a question

Showing papers in "Speech Communication in 2008"


Journal ArticleDOI
TL;DR: A novel estimation algorithm is presented that demonstrates high accuracy on a variety of databases and studies the impact of the maximum approximation in training and transcription, the interaction of model size parameters, n-best list generation, confidence measures, and phoneme-to-grapheme conversion.

705 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the MLE- based mapping with dynamic features can significantly improve the mapping performance compared with the MMSE-based mapping in both the articulatory-to-acoustic mapping and the inversion mapping.

251 citations


Journal ArticleDOI
TL;DR: It was concluded that sarcasm in speech can be characterized by a specific pattern of prosodic cues in addition to textual cues, and that these acoustic characteristics can be influenced by language used by the speaker.

199 citations


Journal ArticleDOI
TL;DR: Analysis of the gain function of the proposed spectral subtraction algorithm indicated that it possesses similar properties as the traditional MMSE algorithm, and Objective evaluation showed that it performed significantly better than the traditional spectral subtractive algorithm.

194 citations


Journal ArticleDOI
TL;DR: A new approach for extracting and representing prosodic features directly from the speech signal, and syllable-like unit is chosen as the basic unit for representing the prosodic characteristics.

190 citations


Journal ArticleDOI
TL;DR: The SAFE corpus (situation analysis in a fictional and emotional corpus) based on fiction movies is developed and a task-dependent annotation strategy which has the particularity to describe simultaneously the emotion and the situation evolution in context is defined.

184 citations


Journal ArticleDOI
TL;DR: The two-way mimicry target is presented, a model for measuring how well a human-computer dialogue mimics or replicates some aspect of human-human dialogue, including human flaws and inconsistencies.

145 citations


Journal ArticleDOI
TL;DR: How dialogue management is handled in industry is discussed and to what extent current state-of-the-art machine learning methods can be of practical benefit to application developers who are deploying commercial production systems is critically evaluated.

143 citations


Journal ArticleDOI
TL;DR: A statistical approach for the development of a dialog manager and for learning optimal dialog strategies based on a classification procedure that considers all of the previous history of the dialog to select the next system answer is presented.

119 citations


Journal ArticleDOI
TL;DR: This paper proposed a new physiological feature which emphasizes individual information for text-independent speaker identification by using a non-uniform subband processing strategy to emphasize the physiological information involved in speech production.

111 citations


Journal ArticleDOI
TL;DR: Investigation of whether filled pauses affect listeners' predictions about the complexity of upcoming phrases in Japanese found that FPs cause listeners to expect that the speaker is going to refer to something that is likely to be expressed by a relatively long or complex constituent.

Journal ArticleDOI
TL;DR: The development of and the first experiments in a Spanish to sign language translation system in a real domain focusing on the sentences spoken by an official when assisting people applying for, or renewing their Identity Card are described.

Journal ArticleDOI
TL;DR: A novel feature extraction algorithm, perceptual-MVDR (PMVDR), which computes cepstral coefficients from the speech signal is proposed, shown to better model the speech spectrum compared to traditional feature extraction approaches.

Journal ArticleDOI
TL;DR: The inclusion of the history of user-system interaction and the neutral speaking style of users is proposed to automatically include in the annotation of emotions making use of novel techniques for acoustic normalization and dialogue context annotation.

Journal ArticleDOI
TL;DR: In four experiments in which listeners were presented with novel categories based on vowels of Dutch, feedback was either available or not; this comparison showed supervised learning to be significantly superior to unsupervised learning.

Journal ArticleDOI
TL;DR: A new approach is presented to adapt the energy and spectral parameters of HMMs as well as their time derivatives to the modifications by the speech input in a reverberant environment to combine the adaptation to background noise and unknown frequency characteristics.

Journal ArticleDOI
TL;DR: The ripple-enhanced power spectrum based method (REPS) and the use of instantaneous frequency (IF) enables us to refine the accuracy of the F0 estimates, and the degree of dominance defined based on the IF is introduced as a robust voicing decision measure.

Journal ArticleDOI
TL;DR: This paper reports the results of an experiment in which speakers spoke to a simulated speech recognizer and received text feedback about what had been ''recognized'' and coded for adaptations associated with hyperarticulate speech: speaking rate and phonetically clear speech.

Journal ArticleDOI
TL;DR: It is shown that a model-based joint uncertainty decoding approach does not suffer from this limitation, like these front-end forms do, and is more computationally attractive.

Journal ArticleDOI
TL;DR: A recently acquired magnetic resonance imaging database including almost all classes of European Portuguese sounds, excluding taps and trills, is presented and analyzed, and European Portuguese stops revealed less resistant to coarticulatory effects than fricatives.

Journal ArticleDOI
TL;DR: Results indicated that English listeners automatically detect the emotional significance of prosody when expressed in a foreign language, although activation of emotional meanings in aforeign language may require increased exposure to prosodic information than when listening to the native language.

Journal ArticleDOI
TL;DR: Experimental results indicated that the classical prosodic features, i.e., F0 and duration, were effective for discriminating groups of paralinguistic information expressing intentions, and accounted for 57% of the global detection rate, in a task of discriminating seven groups ofParalinguism information.

Journal ArticleDOI
TL;DR: This work presents a text-independent automatic phone segmentation algorithm based on the Bayesian Information Criterion, and uses a computationally inexpensive maximum likelihood approach for parameter estimation to evaluate the efficiency and demonstrate that the proposed adjustments yield significant performance improvement in noisy environments.

Journal ArticleDOI
TL;DR: It is shown that - although an accurate prediction of individual ratings is not yet possible with such models - they may still be used for taking decisions on component optimization, and are thus helpful tools for the system developer.

Journal ArticleDOI
TL;DR: The study established that listeners can discriminate different smile types and indicated that listeners utilize prototypical ideals to discern whether a person is smiling, regardless of whether the speaker is actually smiling.

Journal ArticleDOI
TL;DR: A study about recovering punctuation marks, and capitalization information from European Portuguese broadcast news speech transcriptions, using finite state transducers automatically built from language models; and maximum entropy models.

Journal ArticleDOI
TL;DR: The objective is to discover differences between speaker groups in F"0 low-frequency modulations and show that Parkinson's disease has different effects on the voice of male and female speakers.

Journal ArticleDOI
TL;DR: A three-layer model: five categories of expressive speech constitute the top layer, semantic primitives constitute the middle layer, and acoustic features, the bottom layer is introduced, showing significant relationships between expressive speech, semanticPrimitives, andoustic features.

Journal ArticleDOI
TL;DR: This work investigates how to create and evaluate the best state space representations for a Reinforcement Learning model to learn an optimal dialogue control strategy and presents three metrics for evaluating the impact of different state models.

Journal ArticleDOI
TL;DR: The paper presents a novel solution that combines both audio and visual information to estimate acoustic SNR and relates the use of visual information in the current system to its role in recent simultaneous speaker intelligibility studies, where, as well as providing phonetic content, it triggers 'informational masking release', helping the listener to attend selectively to the target speech stream.