scispace - formally typeset
Search or ask a question

Showing papers in "Speech Communication in 2012"


Journal ArticleDOI
TL;DR: A class of linear transformation techniques based on block wise transformation of MFLE which effectively decorrelate the filter bank log energies and also capture speech information in an efficient manner are studied.

389 citations


Journal ArticleDOI
TL;DR: Three speaking-aid systems are proposed that enhance three different types of EL speech signals: EL speech, EL speech using an air-pressure sensor (EL-air speech), and silent EL speech which is produced with a new sound source unit that generates signals with extremely low energy.

190 citations


Journal ArticleDOI
TL;DR: Analysis of the data indicate that the rhythmic class distinctions under consideration finely correlate with differences in the way these languages instantiate two prosodic timing processes, namely, the durational marking of prosodic heads, and pre-final lengthening at prosodic boundaries.

103 citations


Journal ArticleDOI
TL;DR: A complete framework for speaker indexing is proposed, which is aimed to be domain independent and parameter free and applicable for both online and offline applications.

97 citations


Journal ArticleDOI
TL;DR: A composite measure is developed based on linearly combining a salient subset of the proposed measures with conventional prosodic parameters and can achieve correlation with subjective intelligibility ratings as high as 0.97; thus the measure can serve as an accurate indicator of dysarthric speech intelligibility.

87 citations


Journal ArticleDOI
TL;DR: Formant frequency data for /l/ in 23 languages/dialects where the consonant may be typically clear or dark show that the two varieties of / l/ are set in contrast mostly in the context of /i/ but also next to /a/, and that a few languages/ Dialects may exhibit intermediate degrees of darkness in the consonants.

82 citations


Journal ArticleDOI
TL;DR: A Neurogram Similarity Index Measure (NSIM) is presented that automates this inspection process, and translates the response pattern differences into a bounded discrimination metric and represents an important step in validating the use of auditory nerve models to predict speech intelligibility.

81 citations


Journal ArticleDOI
TL;DR: The proposed MMSE modulation magnitude estimator is shown to have better noise suppression than MMSE acoustic magnitude estimation, and improved speech quality compared to other modulation domain based enhancement methods considered.

73 citations


Journal ArticleDOI
Okko Räsänen1
TL;DR: This work reviews a number of existing computational studies concentrated on the question of how spoken language can be learned from continuous speech in the absence of linguistically or phonetically motivated background knowledge, a situation faced by human infants when they first attempt to learn their native language.

61 citations


Journal ArticleDOI
TL;DR: A corpus of affective speech is developed based on one lexically neutral utterance and prosody transplantation method is applied and logistic regression is applied to analyze categorical data and differences in the identification of these two affective categories are observed.

59 citations


Journal ArticleDOI
TL;DR: The dynamic glottal-edge detection applied here is based on the local region-based framework proposed by Lankton and Tannenbaum (2008), which allows the foreground and background to be modeled in terms of smaller local regions, instead of representing them with global statistics.

Journal ArticleDOI
TL;DR: The impact of vocal effort variability on the performance of an isolated-word recognizer is shown and effective means of improving the system's robustness are tested.

Journal ArticleDOI
TL;DR: The TNO-Gaming Corpus is described that contains spontaneous vocal and facial expressions elicited via a multiplayer videogame and that includes emotion annotations obtained via self-report and observation by outside observers.

Journal ArticleDOI
TL;DR: This paper designs a data-driven 3D talking head system for articulatory animations with synthesized articulator dynamics at the phoneme level, and investigates visual synthesis methods, including a phoneme-based articulatory model with a modified blending method.

Journal ArticleDOI
TL;DR: Investigation of the extent to which listeners are able to discriminate between bilingual talkers in three language pairs shows listeners also extend this to Finnish and Mandarin, languages that are quite distinct from English from a genetic and phonetic similarity perspective.

Journal ArticleDOI
TL;DR: A new hierarchical classification technique whose structure is based on NMDS, which is called Data-Driven Dimensional Emotion Classification (3DEC), which significantly outperforms the competitors for the much more interesting and important case of speaker-independent emotion classification.

Journal ArticleDOI
TL;DR: This work proposes using Wavelet-Packet Cepstral coefficients (WPPCs) as an alternative way to do filter-bank energy-based feature extraction (FE) for automatic speech recognition (ASR).

Journal ArticleDOI
TL;DR: The proposed strategy exhibits increased efficiency in radio-programme content segmentation and classification, which is one of the most demanding audio semantics tasks, and can be easily adapted in broader audio detection and classification problems, including additional real-world speech-communication demanding scenarios.

Journal ArticleDOI
TL;DR: A database model designed for the storage and accessibility of various speech disorder data including signals, clinical evaluations and patients' information is proposed and recommended based on MySQL, a relational database management system.

Journal ArticleDOI
TL;DR: The data show that emotional prosody has a rapid impact on gaze behavior during social information processing, but that prosodic meanings can be overridden by semantic cues when linguistic information is task relevant.

Journal ArticleDOI
TL;DR: It is suggested that vocal emotional expressions with similar valence are processed with category specificity, and that discrete emotion knowledge implicitly affects the processing of emotional faces between sensory modalities.

Journal ArticleDOI
TL;DR: This paper exploits semi-supervised learning with the co-training algorithm for automatic detection of coarse-level representation of prosodic events such as pitch accent, intonational phrase boundaries, and break indices.

Journal ArticleDOI
TL;DR: Comparing the recognition and discrimination performance achieved with voiced words to that achieved with whispered words is compared by comparing the stability of the internal representation of speech which improves with GPR across the range of values used in these experiments.

Journal ArticleDOI
TL;DR: A model based on the definition of disfluency and the concept of underlying fluent sentences is proposed, showing that it is possible to synthesise filled pauses without decreasing the overall naturalness of the system and users stated that the speech produced is even more natural than the one produced without filled pauses.

Journal ArticleDOI
TL;DR: This paper presents the result of an experimental study where PreLingua was applied in a population with voice disorders and pathologies in special education centers in Spain and Colombia and showed improvements in the voice capabilities of a remarkable number of users and the ability of the tool to educate impaired users with voice alterations.

Journal ArticleDOI
TL;DR: Analysis of an evaluation investigating the perception of quality and speaking style of HMM-based voices confirms that speech with conversational characteristics are instrumental for listeners to perceive successful integration of conversational speech phenomena in synthetic speech.

Journal ArticleDOI
TL;DR: A new method for acoustic-to-articulatory inversion which estimates positions of the vocal tract given acoustics using a nonlinear Hammerstein system and a new method in which acoustic-based hypotheses are re-evaluated according to the likelihoods of their articulatory realizations in task-dynamics.

Journal ArticleDOI
TL;DR: The listening tests showed maximum improvement in speech perception for a compression factor of 0.6, with an improvement of 9%-21% in the recognition scores for consonants and a significant reduction in response times.

Journal ArticleDOI
George Saon1, Hagen Soltau1
TL;DR: It is suggested that significant gains can be obtained for small amounts of training data even after feature and model-space discriminative training.

Journal ArticleDOI
TL;DR: A set of tools to analyze inconsistencies observed in a Cat_ToBI labeling experiment are presented and the results reveal agreement rates for this study that are comparable to previous ToBI inter-reliability tests.