scispace - formally typeset
Search or ask a question

Showing papers in "Speech Communication in 2011"


Journal ArticleDOI
TL;DR: The basic phenomenon reflecting the last fifteen years is addressed, commenting on databases, modelling and annotation, the unit of analysis and prototypicality and automatic processing including discussions on features, classification, robustness, evaluation, and implementation and system integration.

671 citations


Journal ArticleDOI
TL;DR: Modulation spectral features are proposed for the automatic recognition of human affective information from speech and render a substantial improvement in recognition performance when used to augment prosodic features, which have been extensively used for emotion recognition.

359 citations


Journal ArticleDOI
TL;DR: The results of the oracle experiments show that accurate phase spectrum estimates can considerably contribute towards speech quality, as well as that the use of mismatched analysis windows in the computation of the magnitude and phase spectra provides significant improvements in both objective and subjective speech quality.

357 citations


Journal ArticleDOI
TL;DR: In this article, a hierarchical computational structure is proposed to recognize emotions, which maps an input speech utterance into one of the multiple emotion classes through subsequent layers of binary classifications.

291 citations


Journal ArticleDOI
TL;DR: This study investigates auditory model based DOA estimation emphasizing known features and limitations of the auditory binaural processing such as high temporal resolution, restricted frequency range to exploit temporal fine-structure, and a limited range to compensate for interaural time delay.

127 citations


Journal ArticleDOI
TL;DR: The performance of a spoken dialogue system that provides substantive dynamic responses to automatically detected user affective states is evaluated and a detailed system error analysis is presented that reveals challenges for real-time affect detection and adaptation.

110 citations


Journal ArticleDOI
TL;DR: The present study elaborates on the exploitation of both linguistic and acoustic feature modeling for anger classification by evaluating classification success using the f1 measurement in addition to overall accuracy figures.

100 citations


Journal ArticleDOI
TL;DR: Three new objective measures that can be used for prediction of intelligibility of processed speech in noisy conditions using a critical-band spectral representation of the clean and noise-suppressed signals and are based on the measurement of the SNR loss incurred after the corrupted signal goes through a speech enhancement algorithm.

88 citations


Journal ArticleDOI
TL;DR: Gracie is the first spoken dialog system that recognizes a user's emotional state from his or her speech and gives a response with appropriate emotional coloring, and shows that dialog systems can tap into this important level of interpersonal interaction using today's technology.

84 citations


Journal ArticleDOI
TL;DR: Two procedures for the calculation of forensic likelihood ratios were tested on the same set of acoustic-phonetic data, and the performance of the fused GMM-UBM system was much better than that of the fusion MVKD system.

82 citations


Journal ArticleDOI
TL;DR: Results of classification experiments show that the spectral centroid features consistently and significantly outperform a baseline system employing MFCC, pitch, and intensity features, and the fusion of an SCF based system with an SCA based system results in a relative reduction in error rate.

Journal ArticleDOI
TL;DR: Using the database, some basic configuration choices of speech synthesis, such as waveform sampling frequency and auditory frequency warping scale are revisited, with the aim of improving speaker similarity, which is an acknowledged weakness of current HMM-based speech synthesisers.

Journal ArticleDOI
TL;DR: To stimulate expressively-rich and vivid conversation, the ''4-frame cartoon sorting task'' was devised and the perceived emotional states of speakers can be accurately estimated from the speech parameters in most cases.

Journal ArticleDOI
TL;DR: Data indicate that practice patterns have a significant effect on the fluency characteristics of public speaking performance, as speakers who started practicing earlier were less disfluent than those who started later.

Journal ArticleDOI
TL;DR: The results from objective experiments and blind subjective listening tests using the NOIZEUS corpus show that the MDKF (with clean speech parameters) outperforms all the acoustic and time- domain enhancement methods that were evaluated, including the time-domain Kalman filter withclean speech parameters.

Journal ArticleDOI
TL;DR: The main advantages of the proposed TS-BASE/WF are effectiveness in dealing with non-stationary multiple-source interference signals, and success in preserving binaural cues after processing, confirmed according to the comprehensive objective and subjective evaluations in different acoustical spatial configurations.

Journal ArticleDOI
TL;DR: The experimental results show that the proposed envelope and phase based features can improve recognition rates in clean and noisy conditions compared to the reference MFCC-based recognizer.

Journal ArticleDOI
TL;DR: This paper proposes novel spectrally weighted mel-frequency cepstral coefficient (WMFCC) features for emotion recognition from speech, and evaluates the proposed WMFCC features together with the standard spectral and prosody features using HMM based classifiers on the spontaneous FAU Aibo emotional speech corpus.

Journal ArticleDOI
TL;DR: This paper describes the efforts of transferring feature extraction and statistical modeling techniques from the fields of speaker and language identification to the related field of emotion recognition and shows how to apply Gaussian Mixture Modeling techniques on top of it.

Journal ArticleDOI
TL;DR: It is shown that the complex cepstrum causal-anticausal decomposition can be effectively used for glottal flow estimation when specific windowing criteria are met and has the potential to be used for voice quality analysis.

Journal ArticleDOI
TL;DR: An analysis based on phoneme confusions for both feature types suggests that spectro-temporal and purely spectral features carry complementary information.

Journal ArticleDOI
TL;DR: Major identified accent-specific cues include the devoicing of voiced stop consonants, the ''rolled r'' and schwa fronting or raising, which can contribute to improve pronunciation modeling in automatic speech recognition of accented speech.

Journal ArticleDOI
TL;DR: A noisy speech enhancement method by combining linear prediction (LP) residual weighting in the time domain and spectral processing in the frequency domain to provide better noise suppression as well as better enhancement in the speech regions is presented.

Journal ArticleDOI
TL;DR: The results show that the turn-taking cues realized with a synthetic voice affect the judgements similar to the corresponding human version and there is no difference in reaction times between these two conditions.

Journal ArticleDOI
TL;DR: A class of hierarchical directed graphical models is developed and applied on the task of recognizing affective categories from prosody in both acted and natural speech, achieving rates within nearly 10% of human recognition accuracy despite only focusing on prosody.

Journal ArticleDOI
TL;DR: Evaluations of listener motion strategies demonstrated that two strategies were particularly effective for localisation, simply to move towards the most likely source location, which is beneficial in increasing signal-to-noise ratio, particularly in reverberant conditions.

Journal ArticleDOI
TL;DR: Experiments on a word-level emphasis synthesis task show that all context adaptive training approaches can outperform the standard full-context-dependent HMM approach, however, the MLLR based system achieved the best performance.

Journal ArticleDOI
TL;DR: This paper proposes a resampling technique - namely utterance partitioning with acoustic vector resamplings (UP-AVR) - to mitigate the data imbalance problem in GMM-SVM systems.

Journal ArticleDOI
TL;DR: Comparisons in French-speaking children and adults indicate that whereas general cognitive maturation has some influence on the development of perceptual categorization, this is not without domain-specific effects, the structural complexity of the categories being one of them.

Journal ArticleDOI
TL;DR: Results indicate that listeners place a great deal of perceptual importance on the presence of artifacts and discontinuities in the speech, somewhat less importance on aspects of segmental quality, and very little importance on stress/intonation appropriateness.