scispace - formally typeset
Search or ask a question

Showing papers in "Speech Communication in 2007"


Journal ArticleDOI
TL;DR: A noisy speech corpus is developed suitable for evaluation of speech enhancement algorithms encompassing four classes of algorithms: spectral subtractive, subspace, statistical-model based and Wiener-type algorithms.

634 citations


Journal ArticleDOI
TL;DR: Current advances related to automatic speech recognition (ASR) and spoken language systems and deficiencies in dealing with variation naturally present in speech are outlined.

507 citations


Journal ArticleDOI
TL;DR: An image-based, text-free evaluation system is presented that provides intuitive assessment of emotion primitives, and yields high inter-evaluator agreement, and speaker-dependent modeling of emotion expression is proposed since the emotionPrimitives are particularly suited for capturing dynamics and intrinsic variations in emotion expression.

309 citations


Journal ArticleDOI
TL;DR: This research aims to improve the automatic perception of vocal emotion in two ways: compare two emotional speech data sources: natural, spontaneous emotional speech and acted or portrayed emotional speech, and look at two classification methods which have not been applied: stacked generalisation and unweighted vote.

305 citations


Journal ArticleDOI
TL;DR: It was found that, compared with British adult-directed speech, vowels were equivalently hyperarticulated in infant- and foreigner-directedspeech, and that linguistic modifications are independent of vocal pitch and affective valence.

215 citations


Journal ArticleDOI
TL;DR: The results show that the proposed approach always outperforms the use of transformations in the feature space and yields even better results when combined with linear input transformations.

173 citations


Journal ArticleDOI
TL;DR: The development of a gender-independent laugh detector is described with the aim to enable automatic emotion recognition and acoustic measurements showed differences between laughter and speech in mean pitch and in the ratio of the durations of unvoiced to voiced portions, which indicate that these prosodic features are indeed useful for discrimination between laughed and speech.

169 citations


Journal ArticleDOI
TL;DR: It is shown that the SP-SDW-MWF is more robust against signal model errors than the GSC, and that the block-structured step size matrix gives rise to a faster convergence and a better tracking performance than the diagonal step size Matrix, only at a slightly higher computational cost.

167 citations


Journal ArticleDOI
TL;DR: The implementation and evaluation of an open-domain unit selection speech synthesis engine designed to be flexible enough to encourage further unit selection research and allow rapid voice development by users with minimal speech synthesis knowledge and experience are presented.

161 citations


Journal ArticleDOI
TL;DR: This study significantly improved the intelligibility of dysarthric vowels of one speaker from 48% to 54%, as evaluated by a vowel identification task using 64 CVC stimuli judged by 24 listeners.

161 citations


Journal ArticleDOI
TL;DR: The robustness of approaches to the automatic classification of emotions in speech is addressed and it is suggested that existing approaches are efficient enough to handle larger amounts of training data without any reduction in classification accuracy.

Journal ArticleDOI
TL;DR: The use of several methods for speaker adaptive acoustic modeling to cope with inter-speaker spectral variability and to improve recognition performance for children proved to be effective in recognition of read speech with a vocabulary of about 11k words.

Journal ArticleDOI
TL;DR: An overview of past and present efforts to link human and automatic speech recognition research is provided and an overview of the literature describing the performance difference between machines and human listeners is presented.

Journal ArticleDOI
TL;DR: The research findings indicated that Persian apologies are as formulaic in pragmatic structures as in English apologies and the values assigned to the two context-external variables were found to have significant effect on the frequency of the intensifiers in different situations.

Journal ArticleDOI
TL;DR: Using a single set of speaker-independent, noise-level-independent parameters, the model was able to predict not only the intelligibility of individual speakers to a remarkable degree, but could also account for most of the token-wise intelligibilities of the letter keywords.

Journal ArticleDOI
TL;DR: Overall results indicate that SNR and SSNR improvements for the proposed approach are comparable to those of the Ephraim Malah filter, with BWT enhancement giving the best results of all methods for the noisiest (-10db and -5db input SNR) conditions.

Journal ArticleDOI
TL;DR: It is shown that chirp group delay representations are potentially useful for improving ASR performance and presented one application in feature extraction for automatic speech recognition (ASR), which can be guaranteed to be spike-free.

Journal ArticleDOI
TL;DR: Results confirm that lexical masking occurs only when some words in the babble are detectable, and suggest that different levels of linguistic information can be extracted from background babble and cause different types of linguistic competition for target-word identification.

Journal ArticleDOI
TL;DR: The present study demonstrates that the assumed distribution of the DFT coefficients can have a significant effect on the quality of the enhanced speech and derive the MMSE estimator under speech presence uncertainty and a Laplacian statistical model.

Journal ArticleDOI
TL;DR: It is demonstrated that speech recognition error rates for interactive read aloud can be reduced by more than 50% through a combination of advances in both statistical language and acoustic modeling.

Journal ArticleDOI
TL;DR: It is argued that progress is hampered by the fragmentation of the field across many different disciplines, coupled with a failure to create an integrated view of the fundamental mechanisms that underpin one organism's ability to communicate with another.

Journal ArticleDOI
TL;DR: It is shown that the ''decision-directed'' approach for speech spectral variance estimation can have an important bias at low SNRs, which generally leads to too much speech suppression.

Journal ArticleDOI
TL;DR: The results provide support for an autosegmental-metrical account of the intonational phonology of French in which the early rise is a bitonal (LH) phrase accent that serves as a cue to content word beginnings.

Journal ArticleDOI
TL;DR: The results suggest that in addition to content cues, voice cues can be used by Chinese listeners to release speech from masking by other talkers.

Journal ArticleDOI
TL;DR: This study proposes a new feature vector that will allow better classification of emotional/stressed states and achieves good discrimination between neutral, angry, loud and Lombard states for the simulated domain of the Speech Under Simulated and Actual Stress (SUSAS) database.

Journal ArticleDOI
TL;DR: This paper reviews the progress of Thai speech technology in five areas of research: fundamental analyses and tools, text-to-speech synthesis (TTS), automatic speech recognition (ASR), speech applications, and language resources.

Journal ArticleDOI
TL;DR: Results show that the proposed system produces more coherent fragments over different conditions, which results in significantly better recognition accuracy, which is compared to a conventional fragment generation approach.

Journal ArticleDOI
TL;DR: A probabilistic algorithm for phrase stress assignment accounts for both prominence and constituency prosodic relations by considering the coupling between a dependency-grammar system of markers and constituent-size constraints, which copes with intra- and inter-speaker prosodic variability.

Journal ArticleDOI
TL;DR: It is shown from the continuous Korean-English speech recognition experiments that the proposed method can achieve the average word error rate reduction by 12.75% when compared with the speech recognition system with the baseline acoustic models trained by native speech.

Journal ArticleDOI
TL;DR: Results from the analysis of Japanese vowel data suggested that contraction and relaxation of the three subdivisions of the genioglossus play a dominant role in forming tongue shapes for vowels.