scispace - formally typeset
Search or ask a question

Showing papers in "Speech Communication in 2010"


Journal ArticleDOI
TL;DR: This paper starts with the fundamentals of automatic speaker recognition, concerning feature extraction and speaker modeling and elaborate advanced computational techniques to address robustness and session variability.

1,433 citations


Journal ArticleDOI
TL;DR: The article first outlines the emergence of the silent speech interface from the fields of speech production, automatic speech processing, speech pathology research, and telecommunications privacy issues, and then follows with a presentation of demonstrator systems based on seven different types of technologies.

436 citations


Journal ArticleDOI
TL;DR: In this article, the authors reviewed experimental studies on non-native listening in adverse conditions, organized around three principal contributory factors: the task facing listeners, the effect of adverse conditions on speech, and the differences among listener populations.

241 citations


Journal ArticleDOI
TL;DR: Initial results from a study of continuous speech production with instantaneous acoustic feedback show the BCI user was able to improve his control over an artificial speech synthesizer both within and across recording sessions, validates the potential of the intracortical microelectrode-based approach for providing a speech prosthesis that can allow much more rapid communication rates.

220 citations


Journal ArticleDOI
TL;DR: This work introduces a more fine-grained yet robust set of spectral features: statistics of Mel-Frequency Cepstral Coefficients computed over three phoneme type classes of interest-stressed vowels, unstressed vowel and consonants in the utterance.

210 citations


Journal ArticleDOI
TL;DR: A segmental vocoder driven by ultrasound and optical images of the tongue and lips for a ''silent speech interface'' application, usable either by a laryngectomized patient or for silent communication.

177 citations


Journal ArticleDOI
TL;DR: The results indicate that modulation frame durations, provide a good compromise between different types of spectral distortions, namely musical noise and temporal slurring, and given a proper selection of modulation frame duration, the proposed modulation spectral subtraction does not suffer from musical noise artifacts typically associated with acoustic spectral subtracted.

174 citations


Journal ArticleDOI
TL;DR: It is proposed that doubly confusable pairs, rather than high neighborhood densit y, may better explain phonetic neighborhood errors in human speech processing.

164 citations


Journal ArticleDOI
TL;DR: The new approach of phonetic feature bundling for modeling coarticulation in EMG-based speech recognition is described and results on theEMG-PIT corpus, a multiple speaker large vocabulary database of silent and audible EMG speech recordings, which was recently collected are reported.

161 citations


Journal ArticleDOI
TL;DR: An optimal formulation for the widely used greedy maximum marginal relevance (MMR) algorithm is introduced and a system which finds an optimal selection of utterances covering as many unique important concepts as possible is described.

105 citations


Journal ArticleDOI
TL;DR: The general conclusion is that age-related changes in F"1 may be compensatory to offset a physiologically induced decline in F"-0 and thereby maintain a relatively constant auditory distance between F"0 and F" 1.

Journal ArticleDOI
Stefan Kopp1
TL;DR: In this paper, the authors argue that human natural face-to-face communication is characterized by inter-personal coordination, and that one major step in this direction is embodied coordination, mutual adaptations that are mediated by flexible modules for the top-down production and bottom-up perception of expressive conversational behavior that ground in and coalesce in the same sensorimotor structures.

Journal ArticleDOI
TL;DR: Assessment of listeners' sensitivity to nonnative speaker status when potential segmental, grammatical, and lexical cues were removed indicates that temporal properties, pitch, and voice quality probably played a role in the listeners' judgments.

Journal ArticleDOI
TL;DR: It is confirmed that co-speech gestures develop with age in the context of narrative activity and plays a crucial role in discourse cohesion and the framing of verbal utterances.

Journal ArticleDOI
TL;DR: The present study predicts speech intelligibility by combining a psychoacoustically validated model of auditory preprocessing with a simple central stage that describes the similarity of the test signal with the corresponding reference signal at a level of the internal representation of the signals.

Journal ArticleDOI
TL;DR: Two series of experiments are described that examine audiovisual face-to-face interaction between naive human viewers and either a human interlocutor or a virtual conversational agent to analyze the interplay between speech activity and mutual gaze patterns during mediated face- to-face interactions.

Journal ArticleDOI
TL;DR: Experiments show that using the data selection approach for discriminative training yields disappointing performance improvement on the data which is mismatched to the training data type of the seed model, however, using the directed manual transcription approach can yield significant improvements in recognition accuracy on all types of data.

Journal ArticleDOI
TL;DR: The analysis shows that, although the HMM method produces significantly better neutral speech, the two methods produce emotional speech of similar quality, except for emotions having context-dependent prosodic patterns.

Journal ArticleDOI
TL;DR: A set of audiovisual VCV stimuli with an audiovISual talking head that can display all speech articulators, including tongue, in an augmented speech mode and conclude that these tongue reading capabilities could be used for applications in the domains of speech therapy for speech retarded children, of perception and production rehabilitation of hearing impaired children, and of pronunciation training for second language learners.

Journal ArticleDOI
TL;DR: A set of experiments for both speech recognition and speech synthesis based on surface electromyography are described and the lessons learned about the characteristics of the EMG signal for these domains are discussed.

Journal ArticleDOI
TL;DR: Results indicate that both the similarity between the target and noise and the language experience of the listeners contribute to the amount of interference listeners experience when listening to speech in the presence of speech noise.

Journal ArticleDOI
TL;DR: The overall results of the visual-visual matching experiments showed that people could discriminate same- from different-prosody sentences with a high degree of accuracy, which supports the proposal that rigid head motion provides an important visual cue to prosody.

Journal ArticleDOI
TL;DR: Small but significant results partially supported the predictions, suggesting a link between eyebrow raising and spoken language and possible linguistic functions are proposed, namely the structuring and emphasising of information in the verbal message.

Journal ArticleDOI
TL;DR: The results indicate that non-native speakers pay less attention to lexical information-and relatively more attention to acoustic detail-than previously thought, and suggest that the penetrability of the speech system by cognitive factors depends on listener's proficiency with the language, and especially their level of lexical-semantic knowledge.

Journal ArticleDOI
TL;DR: An automatic intonation assessment system for second language learning is proposed based on a top-down scheme that gives an averaged subjective–objective score correlation as high as 0.88 and a stress assessment system presented by combining intonations and energy contour estimation.

Journal ArticleDOI
TL;DR: PARADE is applied to a front-end processing technique for automatic speech recognition (ASR) that employs a robust feature extraction method called SPADE (Subband based Periodicity and Aperiodicity DEcomposition) as an application of PARADE, and confirmed that PARADE can improve the performance of front- end processing for ASR.

Journal ArticleDOI
TL;DR: Three types of silent-speech enhancement systems were developed that enable voice impaired people to communicate verbally using body-conducted vocal-tract resonance signals using statistical conversion and listening tests demonstrated that weak body-conductor resonance sounds can be transformed into intelligible whispered speech sounds.

Journal ArticleDOI
TL;DR: Exposure to syllables incorporating visual and/or acoustic tongue-related phonemes induced a greater excitability of the left tongue primary motor cortex as early as 100-200ms after the consonantal onset of the acoustically presented syllable, providing evidence that both visual and auditory modalities specifically modulate activity in the tonguePrimary motor cortex at an early stage during audiovisual speech perception.

Journal ArticleDOI
TL;DR: Results demonstrate how text and acoustic inputs both contribute to the prediction of articulatory movements in the method used, including the effectiveness of context-dependent modeling, the role of supplementary acoustic input, and the appropriateness of certain model structures for the unified acoustic-articulatory models.

Journal ArticleDOI
TL;DR: The HMM framework is used to model speech prosody, and to perform initial syntactic and/or semantic level processing of the input speech in parallel to standard speech recognition, and the N-best rescoring based on syntactic level word-stress unit alignment was shown to augment the number of correctly recognized words.