Topic
Viseme
About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.
Papers published on a yearly basis
Papers
More filters
••
10 Sep 2007
TL;DR: A new automatic approach for lip and point of interest localization on a speaker's face based both on the color information of mouth and a geometric model of lips is presented, which makes the method more tolerant to noise and artifacts in the image.
Abstract: Motivated by humans' ability to lipread, the visual component is considered to yield information in the speech recognition system. The lip-reading is the perception of the speech purely based on observing the talker lip movements. The major difficulty of the lip- reading system is the extraction of the visual speech descriptors. In fact, to ensure this task it is necessary to carry out an automatic localization and tracking of the labial gestures. We present in this paper a new automatic approach for lip and point of interest localization on a speaker's face based both on the color information of mouth and a geometric model of lips. This hybrid solution makes our method more tolerant to noise and artifacts in the image. Experiments revealed that our lip POI localization approach for lip-reading purpose is promising. The presented results show that our system recognizes 94.64 % of French visemes.
17 citations
•
29 Jan 1998TL;DR: In this paper, a probabilistic mapping between static speech sounds and pseudo-articulator positions is proposed, which can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry
17 citations
••
01 Jan 2008TL;DR: With the development of new trends in human-machine interfaces, animated feature films and video games, better avatars and virtual agents are required to synthesize realistic animation that capture and resemble the complex relationship between these communicative channels.
Abstract: With the development of new trends in human-machine interfaces, animated feature films and video games, better avatars and virtual agents are required that more accurately mimic how humans communicate and interact. Gestures and speech are jointly used to express intended messages. The tone and energy of the speech, facial expression, rigid head motion and hand motion combine in a non-trivial manner as they unfold in natural human interaction. Given that the use of large motion capture datasets is expensive and can only be applied in planned scenarios, new automatic approaches are required to synthesize realistic animation that capture and resemble the complex relationship between these communicative channels. One useful and practical approach is the use of acoustic features to generate gestures, exploiting the link between gestures and speech. Since the shape of the lips is determined by the underlying articulation, acoustic features have been used to generate visual visemes that match the spoken sentences [4, 5, 12, 17]. Likewise, acoustic features have been used to synthesize facial expressions [11, 30], exploiting the fact that the same muscles used for articulation also affect the shape of the face [44, 46]. One important gesture that has received less attention than other aspects in facial animations is rigid head motion. Head motion is important not only to acknowledge active listening or replace verbal information (e.g. “nod”), but also for many aspect of human
17 citations
••
TL;DR: This work presents an implementation of real time, language independent lip synchronization based on the classification of the speech signal into visemes using neural networks (NNs), and improves real time lip synchronization by using a genetic algorithm for obtaining a near optimal NN topology.
17 citations
•
03 Mar 2003
TL;DR: In this paper, the authors classified speech signals into two broad classes of speech production: whispered speech and normally phonated speech, and showed that the erroneous results that occur when typical automated speech processing systems encounter non-typical speech such as whispered speech, will be avoided.
Abstract: Method and apparatus for the classification of speech signals. Speech is classified into two broad classes of speech production—whispered speech and normally phonated speech. Speech classified in this manner will yield increased performance of automated speech processing systems because the erroneous results that occur when typical automated speech processing systems encounter non-typical speech such as whispered speech, will be avoided.
17 citations