Topic
Viseme
About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.
Papers published on a yearly basis
Papers
More filters
••
22 May 2011TL;DR: A visual speech synthesizer providing midsagittal and front views of the vocal tract to help language learners to correct their mispronunciations is presented.
Abstract: This paper presents a visual speech synthesizer providing midsagittal and front views of the vocal tract to help language learners to correct their mispronunciations. We adopt a set of allophonic rules to determine the visualization of allophonic variations. We also implement coarticulation by decomposing a viseme (visualization of all articulators) into viseme components (visualization of tongue, lips, jaw, and velum separately). Viseme components are morphed independently while the temporally adjacent articulations are considered. Subjective evaluation involving 6 subjects with linguistic background shows that 54% of their responses prefer having allophonic variations incorporated.
18 citations
•
01 Jan 1995
18 citations
••
22 Apr 2013TL;DR: The emotion recognition experiments on the IEMOCAP corpus validate the effectiveness of the proposed feature and model level compensation approaches both at the viseme and utterance levels.
Abstract: Along with emotions, modulation of the lexical content is an integral aspect of spontaneously produced facial expressions. Hence, the verbal content introduces an undesired variability for solving the facial emotion recognition problem, especially in continuous frame-by-frame analysis during spontaneous human interactions. This study proposes feature and model level compensation approaches to address this problem. The feature level compensation scheme builds upon a trajectory-based modeling of facial features and the whitening transformation of the trajectories. The approach aims to normalize the lexicon-dependent patterns observed in the trajectories. The model level compensation approach builds viseme-dependent emotional classifiers to incorporate the lexical variability. The emotion recognition experiments on the IEMOCAP corpus validate the effectiveness of the proposed techniques both at the viseme and utterance levels. The accuracies of viseme level and utterance level emotion recognitions increase by 2.73% (5.9% relative) and 5.82% (11 % relative), respectively, over a lexicon-independent baseline. These performances represent statistically significant improvements.
18 citations
01 Jan 1997
TL;DR: The spatio-temporal characteristics of the closure/opening movements for the realisation of these consonantal targets were studied relative to the lip height (LH) parameter together with the temporal relationships between the characteristics of this articulatory movement and the co-produced acoustic signal.
Abstract: In order to identify the Italian consonantal visemes, to verify the results of perceptive tests and elaborate rules for bimodal synthesis and recognition, the 3D (lip height, lip width, lower lip protrusion) lip target shapes for all the 21 Italian consonants were determined. Moreover, the spatio-temporal characteristics of the closure/opening movements for the realisation of these consonantal targets were studied relative to the lip height (LH) parameter together with the temporal relationships between the characteristics of this articulatory movement and the co-produced acoustic signal.
18 citations