scispace - formally typeset
Journal ArticleDOI

Visual Speech Synthesis by Morphing Visemes

Reads0
Chats0
TLDR
MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes which are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.
Abstract
We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Face transfer with multilinear models

TL;DR: Face Transfer is a method for mapping videorecorded performances of one individual to facial animations of another, based on a multilinear model of 3D face meshes that separably parameterizes the space of geometric variations due to different attributes.
Journal ArticleDOI

Reanimating Faces in Images and Video

TL;DR: A method for photo‐realistic animation that can be applied to any face shown in a single image or a video, which allows for head rotations and speech in the original sequence, but neither of these motions is required.
Proceedings ArticleDOI

Trainable videorealistic speech animation

TL;DR: This work describes how to create with machine learning techniques a generative, videorealistic, and speech animation module that looks like a video camera recording of the subject.
Proceedings ArticleDOI

Trainable videorealistic speech animation

TL;DR: This work describes how to create with machine learning techniques a generative, videorealistic, and speech animation module that looks like a video camera recording of the subject.
Journal ArticleDOI

You Said That?: Synthesising Talking Faces from Audio

TL;DR: An encoder–decoder convolutional neural network model is developed that uses a joint embedding of the face and audio to generate synthesised talking face video frames and proposed methods to re-dub videos by visually blending the generated face into the source video frame using a multi-stream CNN model.
References
More filters
Journal ArticleDOI

Determining optical flow

TL;DR: In this paper, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.
Proceedings ArticleDOI

Determining Optical Flow

TL;DR: In this article, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.
Journal ArticleDOI

The Laplacian Pyramid as a Compact Image Code

TL;DR: A technique for image encoding in which local operators of many scales but identical shape serve as the basis functions, which tends to enhance salient image features and is well suited for many image analysis tasks as well as for image compression.
Journal ArticleDOI

Performance of optical flow techniques

TL;DR: These comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques the authors implemented.
Book ChapterDOI

Active Appearance Models

TL;DR: A novel method of interpreting images using an Active Appearance Model (AAM), a statistical model of the shape and grey-level appearance of the object of interest which can generalise to almost any valid example.