Visual Speech Synthesis by Morphing Visemes

doi:10.1023/A:1008166717597

Journal ArticleDOI

Visual Speech Synthesis by Morphing Visemes

Tony Ezzat, +1 more

- 30 Jun 2000 -

International Journal of Computer Vision

- Vol. 38, Iss: 1, pp 45-57

Chats0

TLDR

MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes which are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.

Abstract:

We present MikeTalk, a text-to-audiovisual speech synthesizer which converts input text into an audiovisual speech stream. MikeTalk is built using visemes, which are a small set of images spanning a large range of mouth shapes. The visemes are acquired from a recorded visual corpus of a human subject which is specifically designed to elicit one instantiation of each viseme. Using optical flow methods, correspondence from every viseme to every other viseme is computed automatically. By morphing along this correspondence, a smooth transition between viseme images may be generated. A complete visual utterance is constructed by concatenating viseme transitions. Finally, phoneme and timing information extracted from a text-to-speech synthesizer is exploited to determine which viseme transitions to use, and the rate at which the morphing process should occur. In this manner, we are able to synchronize the visual speech stream with the audio speech stream, and hence give the impression of a photorealistic talking face.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Face transfer with multilinear models

Daniel Vlasic, +3 more

TL;DR: Face Transfer is a method for mapping videorecorded performances of one individual to facial animations of another, based on a multilinear model of 3D face meshes that separably parameterizes the space of geometric variations due to different attributes.

...read moreread less

Journal ArticleDOI

Reanimating Faces in Images and Video

Volker Blanz, +3 more

TL;DR: A method for photo‐realistic animation that can be applied to any face shown in a single image or a video, which allows for head rotations and speech in the original sequence, but neither of these motions is required.

...read moreread less

Proceedings ArticleDOI

Trainable videorealistic speech animation

Tony Ezzat, +2 more

TL;DR: This work describes how to create with machine learning techniques a generative, videorealistic, and speech animation module that looks like a video camera recording of the subject.

...read moreread less

Proceedings ArticleDOI

Trainable videorealistic speech animation

Tony Ezzat, +2 more

TL;DR: This work describes how to create with machine learning techniques a generative, videorealistic, and speech animation module that looks like a video camera recording of the subject.

...read moreread less

Journal ArticleDOI

You Said That?: Synthesising Talking Faces from Audio

Amir Jamaludin, +2 more

- 01 Dec 2019 -

International Journal of Computer Vision

TL;DR: An encoder–decoder convolutional neural network model is developed that uses a joint embedding of the face and audio to generate synthesised talking face video frames and proposed methods to re-dub videos by visually blending the generated face into the source video frame using a multi-stream CNN model.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Determining optical flow

Berthold K. P. Horn, +1 more

- 01 Aug 1981 -

Artificial Intelligence

TL;DR: In this paper, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.

...read moreread less

Proceedings ArticleDOI

Determining Optical Flow

Berthold K. P. Horn, +1 more

TL;DR: In this article, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.

...read moreread less

Journal ArticleDOI

The Laplacian Pyramid as a Compact Image Code

Peter J. Burt, +1 more

- 01 Apr 1983 -

IEEE Transactions on Communications

TL;DR: A technique for image encoding in which local operators of many scales but identical shape serve as the basis functions, which tends to enhance salient image features and is well suited for many image analysis tasks as well as for image compression.

...read moreread less

Journal ArticleDOI

Performance of optical flow techniques

John L. Barron, +2 more

- 01 Feb 1994 -

International Journal of Computer Vision

TL;DR: These comparisons are primarily empirical, and concentrate on the accuracy, reliability, and density of the velocity measurements; they show that performance can differ significantly among the techniques the authors implemented.

...read moreread less

Book ChapterDOI

Active Appearance Models

Timothy F. Cootes, +2 more

TL;DR: A novel method of interpreting images using an Active Appearance Model (AAM), a statistical model of the shape and grey-level appearance of the object of interest which can generalise to almost any valid example.

...read moreread less

Collapse

Visual Speech Synthesis by Morphing Visemes

Citations

Face transfer with multilinear models

Reanimating Faces in Images and Video

Trainable videorealistic speech animation

Trainable videorealistic speech animation

You Said That?: Synthesising Talking Faces from Audio

References

Determining optical flow

Determining Optical Flow

The Laplacian Pyramid as a Compact Image Code

Performance of optical flow techniques

Active Appearance Models

Related Papers (5)

Video Rewrite: driving visual speech with audio

Synthesizing realistic facial expressions from photographs

A morphable model for the synthesis of 3D faces

Hearing lips and seeing voices

A muscle model for animation three-dimensional facial expression