scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Proceedings ArticleDOI
22 May 2011
TL;DR: A visual speech synthesizer providing midsagittal and front views of the vocal tract to help language learners to correct their mispronunciations is presented.
Abstract: This paper presents a visual speech synthesizer providing midsagittal and front views of the vocal tract to help language learners to correct their mispronunciations. We adopt a set of allophonic rules to determine the visualization of allophonic variations. We also implement coarticulation by decomposing a viseme (visualization of all articulators) into viseme components (visualization of tongue, lips, jaw, and velum separately). Viseme components are morphed independently while the temporally adjacent articulations are considered. Subjective evaluation involving 6 subjects with linguistic background shows that 54% of their responses prefer having allophonic variations incorporated.

18 citations

OtherDOI
24 Apr 2015

18 citations

Proceedings ArticleDOI
22 Apr 2013
TL;DR: The emotion recognition experiments on the IEMOCAP corpus validate the effectiveness of the proposed feature and model level compensation approaches both at the viseme and utterance levels.
Abstract: Along with emotions, modulation of the lexical content is an integral aspect of spontaneously produced facial expressions. Hence, the verbal content introduces an undesired variability for solving the facial emotion recognition problem, especially in continuous frame-by-frame analysis during spontaneous human interactions. This study proposes feature and model level compensation approaches to address this problem. The feature level compensation scheme builds upon a trajectory-based modeling of facial features and the whitening transformation of the trajectories. The approach aims to normalize the lexicon-dependent patterns observed in the trajectories. The model level compensation approach builds viseme-dependent emotional classifiers to incorporate the lexical variability. The emotion recognition experiments on the IEMOCAP corpus validate the effectiveness of the proposed techniques both at the viseme and utterance levels. The accuracies of viseme level and utterance level emotion recognitions increase by 2.73% (5.9% relative) and 5.82% (11 % relative), respectively, over a lexicon-independent baseline. These performances represent statistically significant improvements.

18 citations

01 Jan 1997
TL;DR: The spatio-temporal characteristics of the closure/opening movements for the realisation of these consonantal targets were studied relative to the lip height (LH) parameter together with the temporal relationships between the characteristics of this articulatory movement and the co-produced acoustic signal.
Abstract: In order to identify the Italian consonantal visemes, to verify the results of perceptive tests and elaborate rules for bimodal synthesis and recognition, the 3D (lip height, lip width, lower lip protrusion) lip target shapes for all the 21 Italian consonants were determined. Moreover, the spatio-temporal characteristics of the closure/opening movements for the realisation of these consonantal targets were studied relative to the lip height (LH) parameter together with the temporal relationships between the characteristics of this articulatory movement and the co-produced acoustic signal.

18 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822