Topic
Viseme
About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.
Papers published on a yearly basis
Papers
More filters
••
01 Jan 2016TL;DR: The aim of this chapter is to give a comprehensive overview of current state-of-the-art parametric methods for realistic facial modelling and animation.
Abstract: Facial modelling is a fundamental technique in a variety of applications in computer graphics, computer vision and pattern recognition areas. As 3D technologies evolved over the years, the quality of facial modelling greatly improved. To enhance the modelling quality and controllability of the model further, parametric methods, which represent or manipulate facial attributes (e.g. identity, expression, viseme) with a set of control parameters, have been proposed in recent years. The aim of this chapter is to give a comprehensive overview of current state-of-the-art parametric methods for realistic facial modelling and animation.
••
30 Dec 2017
TL;DR: A model of the audiovisual system based on the hidden Markov models is proposed, which allows recognizing the language in real time and provides a language recognition tool that can be used in conditions where other means may not be possible.
Abstract: A model of the audiovisual system based on the hidden Markov models is proposed, which allows recognizing the language in real time. The model provides a language recognition tool that can be used in conditions where other means may not be possible, for example, in the absence of an audio component. The model is researched and tested on the example of digital recognition, expected results are obtained
01 Jan 2009
TL;DR: This article investigated older and younger control participants' susceptibility to an audio-visual speech illusion known as the McGurk effect that occurs when an incongruentphoneme [|ba|] and viseme [ga] are perceived as a new fused percept [da] and found that older persons integrate information from different senses more than younger, their susceptibility to the illusion should be higher.
Abstract: In the present work we investigated older and younger control participants’ susceptibility toan audio-visual speech illusion known as the McGurk effect that occurs when an incongruentphoneme [|ba|] and viseme [ga] are perceived as a new fused percept [‘da’]. We hypothesizedthat if older persons integrate information from different senses more than younger, theirsusceptibility to the illusion should be higher. The results confirmed this hypothesis. Wesuggest that difficulty in focusing on one channel (audition) while simultaneously perceivinginputs from other channels (vision) is the reason of this enhanced integration.
••
TL;DR: In this article , the authors used the facial motion capture technology to obtain the dynamic lip viseme feature data, during the stop's forming block, continuing block, removing-block, and co-articulation with vowels in the CV structure.
Abstract: In the study of articulatory phonetics, lip shape and tongue position is the focus of linguists. In order to reveal the physiological characteristics of the lip shape during pronunciation, the author takes the Tibetan Xiahe dialect as the research object and defines the facial parameter feature points of the speaker according to the MPEG-4 international standard. Most importantly, the author uses the facial motion capture technology to obtain the dynamic lip viseme feature data, during the stop's forming-block, continuing-block, removing-block, and co-articulation with vowels in the CV structure. Through research and analysis, it is found that the distribution of lip shape change the characteristics of different parts' pronunciation is different during the stop's forming block. In the co-articulation with [a], the reverse effect is greater than the forward effect, which is consistent with the relevant conclusions in many languages obtained by many scholars through other experimental methods. The study also found that in the process of pronunciation, the movement of the lip physiological characteristics of each speaker is random to a certain extent, but when different speakers pronounce the same sound, they can always maintain the consistency of the changing trend of the lip shape characteristics.
••
15 Jan 2023TL;DR: In this paper , a parametric viseme fitting algorithm was proposed to extract viseme parameters from speech videos, which can better correlate with phonemes, thus more controllable and friendly to animators.
Abstract: We present a novel audio-driven facial animation approach that can generate realistic lip-synchronized 3D facial animations from the input audio. Our approach learns viseme dynamics from speech videos, produces animator-friendly viseme curves, and supports multilingual speech inputs. The core of our approach is a novel parametric viseme fitting algorithm that utilizes phoneme priors to extract viseme parameters from speech videos. With the guidance of phonemes, the extracted viseme curves can better correlate with phonemes, thus more controllable and friendly to animators. To support multilingual speech inputs and generalizability to unseen voices, we take advantage of deep audio feature models pretrained on multiple languages to learn the mapping from audio to viseme curves. Our audio-to-curves mapping achieves state-of-the-art performance even when the input audio suffers from distortions of volume, pitch, speed, or noise. Lastly, a viseme scanning approach for acquiring high-fidelity viseme assets is presented for efficient speech animation production. We show that the predicted viseme curves can be applied to different viseme-rigged characters to yield various personalized animations with realistic and natural facial motions. Our approach is artist-friendly and can be easily integrated into typical animation production workflows including blendshape or bone based animation.