scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Proceedings Article
01 Jan 2003

2 citations

Proceedings ArticleDOI
22 Aug 1999
TL;DR: This paper presents a system, which is capable of appropriately combining emotional cues automatically with phonemes to generate emotional visual speech on a synthetic human face.
Abstract: The animation of a three dimensional synthetic human face has been the object of much research in the past few years. Many systems now exist for this purpose, which rely on the artistic and animation skills of animators. Methods for the generation of lip movements to accompany a speech soundtrack have also been developed. These systems rely on the extraction of phonemes from the speech signal and converting them to "visemes" or visual lip shapes for a synthetic human face. The generation of human emotional expressions has also been developed in the recent past. This paper combines some of these developments to present a system, which is capable of appropriately combining emotional cues automatically with phonemes to generate emotional visual speech on a synthetic human face.

2 citations

Book
01 Jan 1913

2 citations

Posted ContentDOI
31 Jul 2020-bioRxiv
TL;DR: In this article, face rotation was used to detect pitch modulation in target speech with upright and inverted faces that either matched the target or masker speech such that performance differences could be explained by binding, an early multisensory integration mechanism distinct from traditional late integration.
Abstract: When listening is difficult, seeing the face of the talker aids speech comprehension. Faces carry both temporal (low-level physical correspondence of mouth movement and auditory speech) and linguistic (learned physical correspondences of mouth shape (viseme) and speech sound (phoneme)) cues. Listeners participated in two experiments investigating how these cues may be used to process sentences when maskers are present. In Experiment I, faces were rotated to disrupt linguistic but not temporal cue correspondence. Listeners suffered a deficit in speech comprehension when the faces were rotated, indicating that visemes are processed in a rotation-dependent manner, and that linguistic cues aid comprehension. In Experiment II, listeners were asked to detect pitch modulation in the target speech with upright and inverted faces that either matched the target or masker speech such that performance differences could be explained by binding, an early multisensory integration mechanism distinct from traditional late integration. Performance in this task replicated previous findings that temporal integration induces binding, but there was no behavioral evidence for a role of linguistic cues in binding. Together these experiments point to temporal cues providing a speech processing benefit through binding and linguistic cues providing a benefit through late integration.

2 citations

Journal ArticleDOI
TL;DR: A new lip synchronization algorithm for realistic applications is proposed, which can be employed to generate synchronized facial movements among the audio generated from natural speech or through a text-to-speech engine.
Abstract: Speech is one of the most important interaction methods between the humans. Therefore, most of avatar researches focus on this area with significant attention. Creating animated speech requires a facial model capable of representing the myriad shapes the human face expressions during speech. Moreover, a method to produce the correct shape at the correct time is also in order. One of the main challenges is to create precise lip movements of the avatar and synchronize it with a recorded audio. This paper proposes a new lip synchronization algorithm for realistic applications, which can be employed to generate synchronized facial movements among the audio generated from natural speech or through a text-to-speech engine. This method requires an animator to construct animations using a canonical set of visemes for all pair wise combination of a reduced phoneme set. These animations are then stitched together smoothly to construct the final animation.

2 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822