scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Proceedings ArticleDOI
10 Sep 2007
TL;DR: A new automatic approach for lip and point of interest localization on a speaker's face based both on the color information of mouth and a geometric model of lips is presented, which makes the method more tolerant to noise and artifacts in the image.
Abstract: Motivated by humans' ability to lipread, the visual component is considered to yield information in the speech recognition system. The lip-reading is the perception of the speech purely based on observing the talker lip movements. The major difficulty of the lip- reading system is the extraction of the visual speech descriptors. In fact, to ensure this task it is necessary to carry out an automatic localization and tracking of the labial gestures. We present in this paper a new automatic approach for lip and point of interest localization on a speaker's face based both on the color information of mouth and a geometric model of lips. This hybrid solution makes our method more tolerant to noise and artifacts in the image. Experiments revealed that our lip POI localization approach for lip-reading purpose is promising. The presented results show that our system recognizes 94.64 % of French visemes.

17 citations

Patent
29 Jan 1998
TL;DR: In this paper, a probabilistic mapping between static speech sounds and pseudo-articulator positions is proposed, which can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.
Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry

17 citations

Book ChapterDOI
01 Jan 2008
TL;DR: With the development of new trends in human-machine interfaces, animated feature films and video games, better avatars and virtual agents are required to synthesize realistic animation that capture and resemble the complex relationship between these communicative channels.
Abstract: With the development of new trends in human-machine interfaces, animated feature films and video games, better avatars and virtual agents are required that more accurately mimic how humans communicate and interact. Gestures and speech are jointly used to express intended messages. The tone and energy of the speech, facial expression, rigid head motion and hand motion combine in a non-trivial manner as they unfold in natural human interaction. Given that the use of large motion capture datasets is expensive and can only be applied in planned scenarios, new automatic approaches are required to synthesize realistic animation that capture and resemble the complex relationship between these communicative channels. One useful and practical approach is the use of acoustic features to generate gestures, exploiting the link between gestures and speech. Since the shape of the lips is determined by the underlying articulation, acoustic features have been used to generate visual visemes that match the spoken sentences [4, 5, 12, 17]. Likewise, acoustic features have been used to synthesize facial expressions [11, 30], exploiting the fact that the same muscles used for articulation also affect the shape of the face [44, 46]. One important gesture that has received less attention than other aspects in facial animations is rigid head motion. Head motion is important not only to acknowledge active listening or replace verbal information (e.g. “nod”), but also for many aspect of human

17 citations

Journal ArticleDOI
TL;DR: This work presents an implementation of real time, language independent lip synchronization based on the classification of the speech signal into visemes using neural networks (NNs), and improves real time lip synchronization by using a genetic algorithm for obtaining a near optimal NN topology.

17 citations

Patent
03 Mar 2003
TL;DR: In this paper, the authors classified speech signals into two broad classes of speech production: whispered speech and normally phonated speech, and showed that the erroneous results that occur when typical automated speech processing systems encounter non-typical speech such as whispered speech, will be avoided.
Abstract: Method and apparatus for the classification of speech signals. Speech is classified into two broad classes of speech production—whispered speech and normally phonated speech. Speech classified in this manner will yield increased performance of automated speech processing systems because the erroneous results that occur when typical automated speech processing systems encounter non-typical speech such as whispered speech, will be avoided.

17 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822