scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This study investigated the relationship between clearly produced and plain citation form speech styles and motion of visible articulators, and found significant effects of speech style as well as speaker gender and saliency of visual speech cues.

23 citations

Dissertation
01 Jan 2004
TL;DR: In this paper, a method for automatic multimodal person authentication using speech, face and visual speech modalities is presented, which uses the motion information to localize the face region, and the face regions are processed in YC"rC"b color space to determine the locations of the eyes.
Abstract: This paper presents a method for automatic multimodal person authentication using speech, face and visual speech modalities. The proposed method uses the motion information to localize the face region, and the face region is processed in YC"rC"b color space to determine the locations of the eyes. The system models the nonlip region of the face using a Gaussian distribution, and it is used to estimate the center of the mouth. Facial and visual speech features are extracted using multiscale morphological erosion and dilation operations, respectively. The facial features are extracted relative to the locations of the eyes, and visual speech features are extracted relative to the locations of the eyes and mouth. Acoustic features are derived from the speech signal, and are represented by weighted linear prediction cepstral coefficients (WLPCC). Autoassociative neural network (AANN) models are used to capture the distribution of the extracted acoustic, facial and visual speech features. The evidence from speech, face and visual speech models are combined using a weighting rule, and the result is used to accept or reject the identity claim of the subject. The performance of the system is evaluated for newsreaders in TV broadcast news data, and the system achieves an equal error rate (EER) of about 0.45% for 50 subjects.

22 citations

Journal ArticleDOI
TL;DR: A novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in this case, English, is presented.
Abstract: This paper describes a morphing-based audio driven facial animation system. Based on an incoming audio stream, a face image is animated with full lip synchronization and synthesized expressions. A novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in our case, English, is presented. The method presented here can also be used for text to audio-visual speech synthesis. Visemes in new expressions are synthesized to be able to generate animations with different facial expressions. An animation sequence using optical flow between visemes is constructed, given an incoming audio stream and still pictures of a face representing different visemes. The presented techniques give improved lip synchronization and naturalness to the animated video.

22 citations

Proceedings ArticleDOI
14 Jul 2014
TL;DR: It is found that both audio and visual speech production are affected by varying the rate of speech, however, the effect is significantly more prominent in visual speech.
Abstract: The speed that an utterance is spoken affects both the duration of the speech and the position of the articulators. Consequently, the sounds that are produced are modified, as are the position and appearance of the lips, teeth, tongue and other visible articulators. We describe an experiment designed to measure the effect of variable speaking rate on audio and visual speech by comparing sequences of phonemes and dynamic visemes appearing in the same sentences spoken at different speeds. We find that both audio and visual speech production are affected by varying the rate of speech, however, the effect is significantly more prominent in visual speech.

22 citations

Journal ArticleDOI
TL;DR: A neural network-based lip reading system designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training has achieved a significantly improved performance with 15% lower word error rate.
Abstract: In this paper, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The system has been testified on the challenging BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art works in lip reading sentences, the system has achieved a significantly improved performance with 15% lower word error rate. In addition, experiments with videos of varying illumination have shown that the proposed model has a good robustness to varying levels of lighting. The main contributions of this paper are: 1) The classification of visemes in continuous speech using a specially designed transformer with a unique topology; 2) The use of visemes as a classification schema for lip reading sentences; and 3) The conversion of visemes to words using perplexity analysis. All the contributions serve to enhance the accuracy of lip reading sentences. The paper also provides an essential survey of the research area.

22 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822