scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This work presents a procedural audio‐driven speech animation method for interactive virtual characters that automatically generates lip‐synchronized speech animation that could drive any three‐dimensional virtual character.
Abstract: We present a procedural audio‐driven speech animation method for interactive virtual characters. Given any audio with its respective speech transcript, we automatically generate lip‐synchronized speech animation that could drive any three‐dimensional virtual character. The realism of the animation is enhanced by studying the emotional features of the audio signal and its effect on mouth movements. We also propose a coarticulation model that takes into account various linguistic rules. The generated animation is configurable by the user by modifying the control parameters, such as viseme types, intensities, and coarticulation curves. We compare our approach against two lip‐synchronized speech animation generators. Our results show that our method surpasses them in terms of user preference.

13 citations

Proceedings ArticleDOI
10 Dec 2007
TL;DR: Preliminary experiments show that audio-visual speech recognition based on two approaches based on missing feature theory and a biologically-inspired approach improves the noise robustness of AVSR drastically.
Abstract: Audio-visual speech recognition (AVSR) is a promising approach to improve noise robustness of speech recognition in the real world. A phoneme and a viseme are used as an auditory and visual unit for AVSR, respectively. However, in the real world, they are often misclassified due to additional input noises. To solve this problem, we propose two approaches. One is audio-visual integration based on missing feature theory to cope with missing or unreliable audio and visual features for recognition. The other is a biologically-inspired approach, that is, phoneme and viseme grouping based on coarse-to-fine recognition. Preliminary experiments show that audio-visual speech recognition based on these approaches improves the noise robustness of AVSR drastically.

12 citations

Dissertation
01 Nov 2008
TL;DR: A large section of this thesis has been dedicated to analysis the performance of the new visual speech unit model when compared with that attained for standard (MPEG-4) viseme models.
Abstract: This dissertation presents a new learning-based representation that is referred to as a Visual Speech Unit for visual speech recognition (VSR). The automated recognition of human speech using only features from the visual domain has become a significant research topic that plays an essential role in the development of many multimedia systems such as audio visual speech recognition(AVSR), mobile phone applications, human-computer interaction (HCI) and sign language recognition. The inclusion of the lip visual information is opportune since it can improve the overall accuracy of audio or hand recognition algorithms especially when such systems are operated in environments characterized by a high level of acoustic noise. The main contribution of the work presented in this thesis is located in the development of a new learning-based representation that is referred to as Visual Speech Unit for Visual Speech Recognition (VSR). The main components of the developed Visual Speech Recognition system are applied to: (a) segment the mouth region of interest, (b) extract the visual features from the real time input video image and (c) to identify the visual speech units. The major difficulty associated with the VSR systems resides in the identification of the smallest elements contained in the image sequences that represent the lip movements in the visual domain. The Visual Speech Unit concept as proposed represents an extension of the standard viseme model that is currently applied for VSR. The VSU model augments the standard viseme approach by including in this new representation not only the data associated with the articulation of the visemes but also the transitory information between consecutive visemes. A large section of this thesis has been dedicated to analysis the performance of the new visual speech unit model when compared with that attained for standard (MPEG- 4) viseme models. Two experimental results indicate that: 1. The developed VSR system achieved 80-90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 62-72%. 2. 15 words are identified when VSU and viseme are employed as the visual speech element. The accuracy rate for word recognition based on VSUs is 7%-12% higher than the accuracy rate based on visemes.

12 citations

Proceedings ArticleDOI
01 Nov 2011
TL;DR: This paper designs a lip synchronization system for the authors' humanoid robot using Microsoft Speech API (SAPI), and builds sixteen lip shapes to perform all the visemes, and implements the whole system at the humanoid robot head to demonstrate the success of the system.
Abstract: The most common way for human interaction to understand each other is through communication with each other, so as in the human robot interaction. The ability to talk is one of the most important technologies in the field of intelligent robotics. When it comes to talk there are two basic type of signal transmitted by people: Auditory and Visual. Speech synthesis and speech recognition are crucial abilities for auditory signal. Likewise, lip synchronization is the key technique for visual signal. The mimic lip synchronization could contribute to the improvement of human robot interaction. In this paper, we design a lip synchronization system for our humanoid robot using Microsoft Speech API (SAPI). With totally thirty degrees of freedom (twelve for mouth) we build sixteen lip shapes to perform all the visemes. With proposed system the precise and mimic lip synchronization can let the users to have favorable impression, and gains the closeness to the people. Finally the whole system is implemented at our humanoid robot head to demonstrate the success of the system proposed herein.

12 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822