scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Journal ArticleDOI
20 Dec 2013
TL;DR: In this article, a vision system is used to track movements of the lip to classify 11 types of phonemes and then classify them to the respective viseme groups. But, the proposed system succesfully can differentiate 11 type of phoneme and then classified it to the corresponding viseme group.
Abstract: Deaf and hard of hearing people often have problems being able to understand and lip read other people. Usually deaf and hard of hearing people feel left out of conversation and sometimes they are actually ignored by other people. There are a variety of ways hearing-impaired person can communicate and gain accsss to the information. Communication support includes both technical and human aids. Human aids include interpreters, lip-readers and note-takers. Interpreters translate the Sign Language and must therefore be qualified. In this paper, vision system is used to track movements of the lip. In the experiment, the proposed system succesfully can differentiate 11 type of phonemes and then classified it to the respective viseme group. By using the proposed system the hearing-impaired persons could practise pronaunciations by themselve without support from the instructor.

8 citations

Journal ArticleDOI
TL;DR: The blending algorithm enables animators to script their animations at higher, more user-friendly levels or to use the results of artificial intelligence and computational psychological methods to generate and manage expressive, autonomous or near-autonomous virtual characters, without having to rely on performance-based methods.
Abstract: This paper describes the generation and realistic blending of emotional facial expressions, visual speech, facial poses and other non-emotional secondary facial expressions on 3D computer graphical head models. The generation and blending of these expressions is done by means of the mathematical formulation of a psychological theory of facial expression generation. In total, 23 emotional expressions, 21 emotion blends, 19 visemes, 342 viseme blends and 37 secondary expressions and postures have been modelled, which can result in an infinite number of realistic facial expressions, due to the blending of these entities at different intensities that may vary continuously with time. The blending algorithm enables animators to script their animations at higher, more user-friendly levels or to use the results of artificial intelligence and computational psychological methods to generate and manage expressive, autonomous or near-autonomous virtual characters, without having to rely on performance-based methods.

8 citations

Book ChapterDOI
TL;DR: A text-to-audiovisual speech synthesizer system incorporating the head and eye movements, and methods for introducing nonverbal mechanisms in visual speech communication such as eye blinks and head nods are described.
Abstract: This paper describes a text-to-audiovisual speech synthesizer system incorporating the head and eye movements. The face is modeled using a set of images of a human subject. Visemes, that are a set of lip images of the phonemes, are extracted from a recorded video. A smooth transition between visemes is achieved by morphing along the correspondence between the visemes obtained by optical flows. This paper also describes methods for introducing nonverbal mechanisms in visual speech communication such as eye blinks and head nods. For eye movements, a simple mask based approach is used. View morphing is used to generate the head movement. A complete audiovisual sequence is constructed by concatenating the viseme transitions and synchronizing the visual stream with the audio stream. An effort has been made to integrate all these features into a single system, which takes text, head and eye movement parameters and produces the audiovisual stream.

8 citations

Proceedings Article
01 Jan 1989
TL;DR: An experimental speech recognition system was developed for German using an already existing technology reported elsewhere (1) and recognizes complete sentences when the words are spoken with a small pause in between.
Abstract: An experimental speech recognition system was developed for German using an already existing technology reported elsewhere (1). The system recognizes complete sentences when the words are spoken with a small pause in between. The user has to train the system in advance by uttering 110 short sentences. The size of the system's vocabulary is presently limited to about 1300 words, with a coverage being 58% of the running words in a newspaper text from the commercial discourse domain. The system uses 60 allophones and a statistical trigram model made out of a text corpus of 14 million words. The recognition accuracy is over 95%.

8 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822