scispace - formally typeset
Search or ask a question
Topic

Viseme

About: Viseme is a research topic. Over the lifetime, 865 publications have been published within this topic receiving 17889 citations.


Papers
More filters
Proceedings ArticleDOI
08 Sep 2016
TL;DR: This paper examines methods to improve visual speech synthesis from a text input using a deep neural network (DNN) and reveals the importance of the frame level information which is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic output.
Abstract: This paper examines methods to improve visual speech synthesis from a text input using a deep neural network (DNN). Two representations of the input text are considered, namely into phoneme sequences or dynamic viseme sequences. From these sequences, contextual features are extracted that include information at varying linguistic levels, from frame level down to the utterance level. These are extracted from a broad sliding window that captures context and produces features that are input into the DNN to estimate visual features. Experiments first compare the accuracy of these visual features against an HMM baseline method which establishes that both the phoneme and dynamic viseme systems perform better with best performance obtained by a combined phoneme-dynamic viseme system. An investigation into the features then reveals the importance of the frame level information which is able to avoid discontinuities in the visual feature sequence and produces a smooth and realistic output.

3 citations

01 Jan 2007
TL;DR: This poster presents a novel technique to automatically synthesise lip motion trajectories given some text and speech using a time series stochastic model called ”Trajectory Hidden Markov Model”.
Abstract: Lip synchronisation is essential to make character animation believeable. In this poster we present a novel technique to automatically synthesise lip motion trajectories given some text and speech. Our work distinguishes itself from other work by not using visemes (visual counterparts of phonemes). The lip motion trajectories are directly modelled using a time series stochastic model called ”Trajectory Hidden Markov Model”. Its parameter generation algorithm can produce motion trajectories that are used to drive control points on the lips directly.

3 citations

Proceedings ArticleDOI
20 Oct 2011
TL;DR: The main idea is to extract and capture a vise me from the video of a human talking and the phonemic scripts inside this video, and generate a talking head animation video by synchronizing a time-stamped of each phoneme to concatenated visemes.
Abstract: We consider the problem of making lip movement for an animated talking character, which consumes workload and cost during the animation development process. The main idea is to extract and capture a vise me from the video of a human talking and the phonemic scripts inside this video. After that, we generate a talking head animation video by synchronizing a time-stamped of each phoneme to concatenated visemes. The results of experimental tests are reported, indicating good accuracy.

3 citations

DOI
13 Dec 2016
TL;DR: This work defined system of an Indonesian viseme set and the associated mouth shapes, namely system of text input segmentation and proposed a choice of one of affection type as input in the system to generate a viseme sequence for natural speech of Indonesian sentences with affection.
Abstract: In a communication using texts input, viseme (visual phonemes) is derived from a group of phonemes having similar visual appearances. Hidden Markov model (HMM) has been a popular mathematical approach for sequence classification such as speech recognition. For speech emotion recognition, a HMM is trained for each emotion and an unknown sample is classified according to the model which illustrate the derived feature sequence best. Viterbi algorithm, HMM is used for guessing the most possible state sequence of observable states. In this work, first stage, we defined system of an Indonesian viseme set and the associated mouth shapes, namely system of text input segmentation. The second stage, we defined a choice of one of affection type as input in the system. The last stage, we experimentally using Trigram HMMs for generating the viseme sequence to be used for synchronized mouth shape and lip movements. The whole system is interconnected in a sequence. The final system produced a viseme sequence for natural speech of Indonesian sentences with affection. We show through various experiments that the proposed, the results in about 82,19% relative improvement in classification accuracy.

3 citations

Proceedings Article
01 Jan 2002
TL;DR: This study investigates the extent to which a localist-distributive hybrid formal model of human memory replicates observed behavioral patterns in perception and recognition of appropriately coded language data.
Abstract: This study investigates the extent to which a localist-distributive hybrid formal model of human memory replicates observed behavioral patterns in perception and recognition of appropriately coded language data. Extending previous research that considered for modeled memorization only items with uniform, undefined randomly generated featural specifications, a MINERVA2 simulation was trained to recognize linguistic events and categories at both acoustic-phonetic and phonological-featural processing levels. Results of both test conditions parallel two important effects observed in behavioral data and are discussed with respect to speech perception as well as human memory research.

3 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
78% related
Feature vector
48.8K papers, 954.4K citations
76% related
Feature extraction
111.8K papers, 2.1M citations
75% related
Feature (computer vision)
128.2K papers, 1.7M citations
74% related
Unsupervised learning
22.7K papers, 1M citations
73% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20237
202212
202113
202039
201919
201822