Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Examining visible articulatory features in clear and plain speech

[...]

Lisa Tang¹, Beverly Hannah¹, Allard Jongman², Joan A. Sereno², Yue Wang¹, Ghassan Hamarneh¹ - Show less +2 more•Institutions (2)

Simon Fraser University¹, University of Kansas²

01 Dec 2015-Speech Communication

TL;DR: This study investigated the relationship between clearly produced and plain citation form speech styles and motion of visible articulators, and found significant effects of speech style as well as speaker gender and saliency of visual speech cues.

...read moreread less

23 citations

Dissertation•

Person Authentication using Speech Face and Visual Speech

[...]

S. Palanivel

01 Jan 2004

TL;DR: In this paper, a method for automatic multimodal person authentication using speech, face and visual speech modalities is presented, which uses the motion information to localize the face region, and the face regions are processed in YC"rC"b color space to determine the locations of the eyes.

...read moreread less

Abstract: This paper presents a method for automatic multimodal person authentication using speech, face and visual speech modalities. The proposed method uses the motion information to localize the face region, and the face region is processed in YC"rC"b color space to determine the locations of the eyes. The system models the nonlip region of the face using a Gaussian distribution, and it is used to estimate the center of the mouth. Facial and visual speech features are extracted using multiscale morphological erosion and dilation operations, respectively. The facial features are extracted relative to the locations of the eyes, and visual speech features are extracted relative to the locations of the eyes and mouth. Acoustic features are derived from the speech signal, and are represented by weighted linear prediction cepstral coefficients (WLPCC). Autoassociative neural network (AANN) models are used to capture the distribution of the extracted acoustic, facial and visual speech features. The evidence from speech, face and visual speech models are combined using a weighting rule, and the result is used to accept or reject the identity claim of the subject. The performance of the system is evaluated for newsreaders in TV broadcast news data, and the system achieves an equal error rate (EER) of about 0.45% for 50 subjects.

...read moreread less

22 citations

Journal Article•DOI•

Animating expressive faces across languages

[...]

Ashish Verma¹, L.V. Subramaniam¹, Nitendra Rajput¹, Chalapathy Neti, Tanveer A. Faruquie - Show less +1 more•Institutions (1)

Indian Institutes of Technology¹

01 Dec 2004-IEEE Transactions on Multimedia

TL;DR: A novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in this case, English, is presented.

...read moreread less

Abstract: This paper describes a morphing-based audio driven facial animation system. Based on an incoming audio stream, a face image is animated with full lip synchronization and synthesized expressions. A novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in our case, English, is presented. The method presented here can also be used for text to audio-visual speech synthesis. Visemes in new expressions are synthesized to be able to generate animations with different facial expressions. An animation sequence using optical flow between visemes is constructed, given an incoming audio stream and still pictures of a face representing different visemes. The presented techniques give improved lip synchronization and naturalness to the animated video.

...read moreread less

22 citations

Proceedings Article•DOI•

The effect of speaking rate on audio and visual speech

[...]

Sarah Taylor¹, Barry-John Theobald², Iain Matthews¹•Institutions (2)

Disney Research¹, University of East Anglia²

14 Jul 2014

TL;DR: It is found that both audio and visual speech production are affected by varying the rate of speech, however, the effect is significantly more prominent in visual speech.

...read moreread less

Abstract: The speed that an utterance is spoken affects both the duration of the speech and the position of the articulators. Consequently, the sounds that are produced are modified, as are the position and appearance of the lips, teeth, tongue and other visible articulators. We describe an experiment designed to measure the effect of variable speaking rate on audio and visual speech by comparing sequences of phonemes and dynamic visemes appearing in the same sentences spoken at different speeds. We find that both audio and visual speech production are affected by varying the rate of speech, however, the effect is significantly more prominent in visual speech.

...read moreread less

22 citations

Journal Article•DOI•

Lip Reading Sentences Using Deep Learning With Only Visual Cues

[...]

Souheil Fenghour¹, Daqing Chen¹, Kun Guo, Perry Xiao¹•Institutions (1)

London South Bank University¹

26 Nov 2020-IEEE Access

TL;DR: A neural network-based lip reading system designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training has achieved a significantly improved performance with 15% lower word error rate.

...read moreread less

Abstract: In this paper, a neural network-based lip reading system is proposed. The system is lexicon-free and uses purely visual cues. With only a limited number of visemes as classes to recognise, the system is designed to lip read sentences covering a wide range of vocabulary and to recognise words that may not be included in system training. The system has been testified on the challenging BBC Lip Reading Sentences 2 (LRS2) benchmark dataset. Compared with the state-of-the-art works in lip reading sentences, the system has achieved a significantly improved performance with 15% lower word error rate. In addition, experiments with videos of varying illumination have shown that the proposed model has a good robustness to varying levels of lighting. The main contributions of this paper are: 1) The classification of visemes in continuous speech using a specially designed transformer with a unique topology; 2) The use of visemes as a classification schema for lip reading sentences; and 3) The conversion of visemes to words using perplexity analysis. All the contributions serve to enhance the accuracy of lip reading sentences. The paper also provides an essential survey of the research area.

...read moreread less

22 citations

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics