scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Proceedings Article
21 Oct 2005
TL;DR: This master thesis investigates automatic lip synchronization, a method for generating an animation of 3D human face model where the animation of the face model is synchronized with the lip synchronization.
Abstract: This master thesis investigates automatic lip synchronization. It is a method for generating an animation of 3D human face model where the animation is driven only by a speech signal. The whole process is completely automatic and starts from the speech signal. The automatic lip synchronization consists of two main parts: audio to visual mapping and a face synthesis. The thesis proposes and implements a system for the automatic lip synchronization of synthetic 3D avatars based only on the speech input. The speech signal is classified into viseme classes using neural networks. The topology of neural networks is automatically configured using genetic algorithms. Visual representation of phonemes, viseme, defined in MPEG-4 FA, is used for face synthesis. The system is adopted for specificity of the Croatian language. Detailed system validation based on three different evaluation methods is done and potential applications of these technologies are discussed in details. This method is suitable for real-time and offline applications. It is speaker independent and multilingual.

18 citations

Journal ArticleDOI
TL;DR: An attempt to realize lip reading, using techniques of image processing and pattern recognition, in which the result showed a remarkable capability of lip readig for a small number of words.
Abstract: Lip reading is a regular method that enables the deaf to understand other people's speech by visual information. The acquisition of the technique, however, requires great effort and long time, and the educational system for its teaching is not established yet. This paper describes an attempt to realize lip reading, using techniques of image processing and pattern recognition, in which we aim at clarifying possibilities and limitations inherently existing in lip reading. Although our final goal is the realization of speech understanding, this paper only deals with recognition of vowels and words of the Japanese language as the first step. The front or side view of the mouth is taken with a TV camera, and some feature values of the lip shape were extracted. Discrimination of five vowels were performed by the maximum-likelihood method and the vowels were correctly discriminated by more than about 80% Moreover, word recognition based upon the vowel discrimination was performed. The result showed a remarkable capability of lip readig for a small number of words. Finally, several problems are discussed in relation to the actual lip reading of the deaf.

18 citations

Proceedings ArticleDOI
20 Aug 2006
TL;DR: It is shown that automatic speech recognition serves as a good means to objectify and quantify global speech outcome of children with CLP.
Abstract: Cleft lip and palate (CLP) may cause functional limitations even after adequate surgical and non-surgical treatment, speech disorder being one of them. Until now, an objective means to determine and quantify the intelligibility does not exist. An automatic speech recognition system was applied to 31 recordings of CLP children who spoke a German standard test for articulation disorders. The speech recognition system was trained with normal adult speakers' and children's speech. A subjective evaluation of the intelligibility was performed by a panel of 3 experts and confronted to the automatic speech evaluation. The automatic speech recognition yielded word accuracies between 1.2% and 75.8% (48.0% plusmn 19.6%) with sufficient discrimination. It complied with experts' rating of intelligibility. Thus we show that automatic speech recognition serves as a good means to objectify and quantify global speech outcome of children with CLP

17 citations

Patent
21 Feb 2017
TL;DR: In this paper, a system and method for training a set of expression and neutral convolutional neural networks using a single performance mapped to the set of known phonemes and visemes in the form of predetermined sentences and facial expressions is described.
Abstract: There is disclosed a system and method for training a set of expression and neutral convolutional neural networks using a single performance mapped to a set of known phonemes and visemes in the form predetermined sentences and facial expressions. Then, subsequent training of the convolutional neural networks can occur using temporal data derived from audio data within the original performance mapped to a set of professionally-created three dimensional animations. Thereafter, with sufficient training, the expression and neutral convolutional neural networks can generate facial animations from facial image data in real-time without individual specific training.

17 citations

Proceedings ArticleDOI
06 Jul 2003
TL;DR: A novel method for generation of the viseme sequence is presented, which uses viseme basedoustic models, instead of usual phone based acoustic models, to align the input speech signal, which results in higher accuracy and speed of the alignment procedure.
Abstract: Speech driven lip synthesis is an interesting and important step toward human-computer interaction. An incoming speech signal is time aligned using a speech recognizer to generate a phonetic sequence which is then converted to the corresponding viseme sequence to be animated. We present a novel method for generation of the viseme sequence, which uses viseme based acoustic models, instead of the usual phone based acoustic models, to align the input speech signal. This results in higher accuracy and speed of the alignment procedure and allows a much simpler implementation of the speech driven lip synthesis system as it completely obviates the requirement of an acoustic unit to visual unit conversion. We show, through various experiments, that the proposed method results in about 53% relative improvement in classification accuracy and about 52% reduction in the time required to compute alignments.

17 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822