Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•

Automatic lip synchronization by speech signal analysis

[...]

Goranka Zoric

21 Oct 2005

TL;DR: This master thesis investigates automatic lip synchronization, a method for generating an animation of 3D human face model where the animation of the face model is synchronized with the lip synchronization.

...read moreread less

Abstract: This master thesis investigates automatic lip synchronization. It is a method for generating an animation of 3D human face model where the animation is driven only by a speech signal. The whole process is completely automatic and starts from the speech signal. The automatic lip synchronization consists of two main parts: audio to visual mapping and a face synthesis. The thesis proposes and implements a system for the automatic lip synchronization of synthetic 3D avatars based only on the speech input. The speech signal is classified into viseme classes using neural networks. The topology of neural networks is automatically configured using genetic algorithms. Visual representation of phonemes, viseme, defined in MPEG-4 FA, is used for face synthesis. The system is adopted for specificity of the Croatian language. Detailed system validation based on three different evaluation methods is done and potential applications of these technologies are discussed in details. This method is suitable for real-time and offline applications. It is speaker independent and multilingual.

...read moreread less

18 citations

Journal Article•DOI•

Speech Recognition by Image Processing of Lip Movements

[...]

Kiyotoshi Matsuoka¹, Tadayoshi Furuya¹, Kenji Kurosu¹•Institutions (1)

Kyushu Institute of Technology¹

28 Feb 1986-Journal of the Society of Instrument and Control Engineers

TL;DR: An attempt to realize lip reading, using techniques of image processing and pattern recognition, in which the result showed a remarkable capability of lip readig for a small number of words.

...read moreread less

Abstract: Lip reading is a regular method that enables the deaf to understand other people's speech by visual information. The acquisition of the technique, however, requires great effort and long time, and the educational system for its teaching is not established yet. This paper describes an attempt to realize lip reading, using techniques of image processing and pattern recognition, in which we aim at clarifying possibilities and limitations inherently existing in lip reading. Although our final goal is the realization of speech understanding, this paper only deals with recognition of vowels and words of the Japanese language as the first step. The front or side view of the mouth is taken with a TV camera, and some feature values of the lip shape were extracted. Discrimination of five vowels were performed by the maximum-likelihood method and the vowels were correctly discriminated by more than about 80% Moreover, word recognition based upon the vowel discrimination was performed. The result showed a remarkable capability of lip readig for a small number of words. Finally, several problems are discussed in relation to the actual lip reading of the deaf.

...read moreread less

18 citations

Proceedings Article•DOI•

Intelligibility of Children with Cleft Lip and Palate: Evaluation by Speech Recognition Techniques

[...]

Andreas Maier¹, Christian Hacker¹, Elmar Nöth¹, Emeka Nkenke¹, Tino Haderlein¹, Frank Rosanowski¹, Maria Schuster¹ - Show less +3 more•Institutions (1)

University of Erlangen-Nuremberg¹

20 Aug 2006

TL;DR: It is shown that automatic speech recognition serves as a good means to objectify and quantify global speech outcome of children with CLP.

...read moreread less

Abstract: Cleft lip and palate (CLP) may cause functional limitations even after adequate surgical and non-surgical treatment, speech disorder being one of them. Until now, an objective means to determine and quantify the intelligibility does not exist. An automatic speech recognition system was applied to 31 recordings of CLP children who spoke a German standard test for articulation disorders. The speech recognition system was trained with normal adult speakers' and children's speech. A subjective evaluation of the intelligibility was performed by a panel of 3 experts and confronted to the automatic speech evaluation. The automatic speech recognition yielded word accuracies between 1.2% and 75.8% (48.0% plusmn 19.6%) with sufficient discrimination. It complied with experts' rating of intelligibility. Thus we show that automatic speech recognition serves as a good means to objectify and quantify global speech outcome of children with CLP

...read moreread less

17 citations

Patent•

High-fidelity facial and speech animation for virtual reality head mounted displays

[...]

Hao Li, Joseph J. Lim, Kyle Olszewski

21 Feb 2017

TL;DR: In this paper, a system and method for training a set of expression and neutral convolutional neural networks using a single performance mapped to the set of known phonemes and visemes in the form of predetermined sentences and facial expressions is described.

...read moreread less

Abstract: There is disclosed a system and method for training a set of expression and neutral convolutional neural networks using a single performance mapped to a set of known phonemes and visemes in the form predetermined sentences and facial expressions. Then, subsequent training of the convolutional neural networks can occur using temporal data derived from audio data within the original performance mapped to a set of professionally-created three dimensional animations. Thereafter, with sufficient training, the expression and neutral convolutional neural networks can generate facial animations from facial image data in real-time without individual specific training.

...read moreread less

17 citations

Proceedings Article•DOI•

Using viseme based acoustic models for speech driven lip synthesis

[...]

Ashish Verma¹, Nitendra Rajput¹, L.V. Subramaniam¹•Institutions (1)

Indian Institutes of Technology¹

06 Jul 2003

TL;DR: A novel method for generation of the viseme sequence is presented, which uses viseme basedoustic models, instead of usual phone based acoustic models, to align the input speech signal, which results in higher accuracy and speed of the alignment procedure.

...read moreread less

Abstract: Speech driven lip synthesis is an interesting and important step toward human-computer interaction. An incoming speech signal is time aligned using a speech recognizer to generate a phonetic sequence which is then converted to the corresponding viseme sequence to be animated. We present a novel method for generation of the viseme sequence, which uses viseme based acoustic models, instead of the usual phone based acoustic models, to align the input speech signal. This results in higher accuracy and speed of the alignment procedure and allows a much simpler implementation of the speech driven lip synthesis system as it completely obviates the requirement of an acoustic unit to visual unit conversion. We show, through various experiments, that the proposed method results in about 53% relative improvement in classification accuracy and about 52% reduction in the time required to compute alignments.

...read moreread less

17 citations

Collapse

Network Information

Performance

Metrics

884

Papers

19,235

Citations

No. of papers in the topic in previous years
Year	Papers
2023	7
2022	12
2021	13
2020	39
2019	19
2018	22

Viseme

Papers published on a yearly basis

Papers

Trending Questions (8)

Network Information

Related Topics (5)

Performance

Metrics