scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Journal ArticleDOI
01 Aug 2005
TL;DR: An efficient system for realistic speech animation is proposed, which supports all steps of the animation pipeline, from the capture or design of 3-D head models up to the synthesis and editing of the performance.
Abstract: An efficient system for realistic speech animation is proposed. The system supports all steps of the animation pipeline, from the capture or design of 3-D head models up to the synthesis and editing of the performance. This pipeline is fully 3-D, which yields high flexibility in the use of the animated character. Real detailed 3-D face dynamics, observed at video frame rate for thousands of points on the face of speaking actors, underpin the realism of the facial deformations. These are given a compact and intuitive representation via independent component analysis (ICA). Performances amount to trajectories through this ‘viseme space’. When asked to animate a face the system replicates the ‘visemes’ that it has learned, and adds the necessary co-articulation effects. Realism has been improved through comparisons with motion captured groundtruth. Faces for which no 3-D dynamics could be observed can be animated nonetheless. Their visemes are adapted automatically to their physiognomy by localising the face in a ‘face space’.

22 citations

Dissertation
01 Jan 2002

22 citations

Proceedings ArticleDOI
23 Jun 1997
TL;DR: The perceptual boundaries of speech reading and multimedia technology, which are the constraints that effect speech reading performance, are investigated and conclusions on the relationship between viseme groupings, accuracy of viseme recognition, and presentation rate are drawn.
Abstract: In the future, multimedia technology will be able to provide video frame rates equal to or better than 30 frames per second (FPS). Until that time the hearing impaired community will be using band limited communication systems over unshielded twisted pair copper wiring. As a result, multimedia communication systems will use a coder/decoder (CODEC) to compress the video and audio signals for transmission. For these systems to be usable by the hearing impaired community, the algorithms within the CODEC have to be designed to account for the perceptual boundaries of the hearing impaired. We investigate the perceptual boundaries of speech reading and multimedia technology, which are the constraints that effect speech reading performance. We analyze and draw conclusions on the relationship between viseme groupings, accuracy of viseme recognition, and presentation rate. These results are critical in the design of multimedia systems for the hearing impaired.

22 citations

01 Jan 1999
TL;DR: A three dimensional facial model with a commercial audio text-to-speech synthesizer and the confusion patterns of consonants and the identification of the Finnish visemes were examined to examine the intelligibility of both natural and synthetic auditory speech.
Abstract: We describe our Finnish audio-visual speech synthesizer, its evaluation and discuss possible improvements. We have combined a three dimensional facial model with a commercial audio text-to-speech synthesizer. The visual speech is based on a letter-to-viseme mapping and the animation is created by linear interpolation between the visemes. An intelligibility test was run to quantify the benefit of seeing the synthetic and natural face on hearing the synthetic and natural voice presented at different signal to noise ratios. Both natural and synthetic faces improved the intelligibility of both natural and synthetic auditory speech. We examined the confusion patterns of consonants and the identification of the Finnish visemes. We also propose how the viseme repertoire of the talking head can be improved.

22 citations

Journal Article
TL;DR: The phoneme lipreading system word accuracy outperforms the viseme based system word word accuracy, however, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.
Abstract: There is debate if phoneme or viseme units are the most effective for a lipreading system. Some studies use phoneme units even though phonemes describe unique short sounds; other studies tried to improve lipreading accuracy by focusing on visemes with varying results. We compare the performance of a lipreading system by modeling visual speech using either 13 viseme or 38 phoneme units. We report the accuracy of our system at both word and unit levels. The evaluation task is large vocabulary continuous speech using the TCD-TIMIT corpus. We complete our visual speech modeling via hybrid DNN-HMMs and our visual speech decoder is aWeighted Finite-State Transducer (WFST). We use DCT and Eigenlips as a representation of mouth ROI image. The phoneme lipreading system word accuracy outperforms the viseme based system word accuracy. However, the phoneme system achieved lower accuracy at the unit level which shows the importance of the dictionary for decoding classification outputs into words.

22 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822