Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Colour and Geometric based Model for Lip Localisation: Application for Lip-reading System

[...]

Salah Werda, Walid Mahdi, A. Ben Hamadou

10 Sep 2007

TL;DR: A new automatic approach for lip and point of interest localization on a speaker's face based both on the color information of mouth and a geometric model of lips is presented, which makes the method more tolerant to noise and artifacts in the image.

...read moreread less

Abstract: Motivated by humans' ability to lipread, the visual component is considered to yield information in the speech recognition system. The lip-reading is the perception of the speech purely based on observing the talker lip movements. The major difficulty of the lip- reading system is the extraction of the visual speech descriptors. In fact, to ensure this task it is necessary to carry out an automatic localization and tracking of the labial gestures. We present in this paper a new automatic approach for lip and point of interest localization on a speaker's face based both on the color information of mouth and a geometric model of lips. This hybrid solution makes our method more tolerant to noise and artifacts in the image. Experiments revealed that our lip POI localization approach for lip-reading purpose is promising. The presented results show that our system recognizes 94.64 % of French visemes.

...read moreread less

17 citations

Patent•

Speech processing using maximum likelihood continuity mapping

[...]

John E. Hogden¹•Institutions (1)

University of California¹

29 Jan 1998

TL;DR: In this paper, a probabilistic mapping between static speech sounds and pseudo-articulator positions is proposed, which can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry.

...read moreread less

Abstract: Speech processing is obtained that, given a probabilistic mapping between static speech sounds and pseudo-articulator positions, allows sequences of speech sounds to be mapped to smooth sequences of pseudo-articulator positions In addition, a method for learning a probabilistic mapping between static speech sounds and pseudo-articulator position is described The method for learning the mapping between static speech sounds and pseudo-articulator position uses a set of training data composed only of speech sounds The said speech processing can be applied to various speech analysis tasks, including speech recognition, speaker recognition, speech coding, speech synthesis, and voice mimicry

...read moreread less

17 citations

Book Chapter•DOI•

Learning Expressive Human-Like Head Motion Sequences from Speech

[...]

Carlos Busso¹, Zhigang Deng², Ulrich Neumann¹, Shrikanth S. Narayanan¹•Institutions (2)

University of Southern California¹, University of Houston²

01 Jan 2008

TL;DR: With the development of new trends in human-machine interfaces, animated feature films and video games, better avatars and virtual agents are required to synthesize realistic animation that capture and resemble the complex relationship between these communicative channels.

...read moreread less

Abstract: With the development of new trends in human-machine interfaces, animated feature films and video games, better avatars and virtual agents are required that more accurately mimic how humans communicate and interact. Gestures and speech are jointly used to express intended messages. The tone and energy of the speech, facial expression, rigid head motion and hand motion combine in a non-trivial manner as they unfold in natural human interaction. Given that the use of large motion capture datasets is expensive and can only be applied in planned scenarios, new automatic approaches are required to synthesize realistic animation that capture and resemble the complex relationship between these communicative channels. One useful and practical approach is the use of acoustic features to generate gestures, exploiting the link between gestures and speech. Since the shape of the lips is determined by the underlying articulation, acoustic features have been used to generate visual visemes that match the spoken sentences [4, 5, 12, 17]. Likewise, acoustic features have been used to synthesize facial expressions [11, 30], exploiting the fact that the same muscles used for articulation also affect the shape of the face [44, 46]. One important gesture that has received less attention than other aspects in facial animations is rigid head motion. Head motion is important not only to acknowledge active listening or replace verbal information (e.g. “nod”), but also for many aspect of human

...read moreread less

17 citations

Journal Article•DOI•

Real-time language independent lip synchronization method using a genetic algorithm

[...]

Goranka Zoric¹, Igor S. Pandžić¹•Institutions (1)

University of Zagreb¹

01 Dec 2006-Signal Processing

TL;DR: This work presents an implementation of real time, language independent lip synchronization based on the classification of the speech signal into visemes using neural networks (NNs), and improves real time lip synchronization by using a genetic algorithm for obtaining a near optimal NN topology.

...read moreread less

17 citations

Patent•

Method and apparatus for classifying whispered and normally phonated speech

[...]

Stanley J. Wenndt, Edward J. Cupples

03 Mar 2003

TL;DR: In this paper, the authors classified speech signals into two broad classes of speech production: whispered speech and normally phonated speech, and showed that the erroneous results that occur when typical automated speech processing systems encounter non-typical speech such as whispered speech, will be avoided.

...read moreread less

Abstract: Method and apparatus for the classification of speech signals. Speech is classified into two broad classes of speech production—whispered speech and normally phonated speech. Speech classified in this manner will yield increased performance of automated speech processing systems because the erroneous results that occur when typical automated speech processing systems encounter non-typical speech such as whispered speech, will be avoided.

...read moreread less

17 citations

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics