scispace - formally typeset
Journal ArticleDOI

Audiovisual speech synthesis

TLDR
The paper discusses the evaluation of audiovisual speech synthesizers, it elaborates on the hardware requirements for performing visual speech synthesis and it describes some important future directions that should stimulate the use of audiolabeled speech synthesis technology in real-life applications.
About
This article is published in Speech Communication.The article was published on 2015-02-01. It has received 60 citations till now. The article focuses on the topics: Speech processing & Speech corpus.

read more

Citations
More filters
Journal ArticleDOI

Digital processing of speech signals

Journal ArticleDOI

Synthesizing Obama: learning lip sync from audio

TL;DR: Given audio of President Barack Obama, a high quality video of him speaking with accurate lip sync is synthesized, composited into a target video clip, and a recurrent neural network learns the mapping from raw audio features to mouth shapes to produce photorealistic results.
Journal ArticleDOI

Audio-driven facial animation by joint end-to-end learning of pose and emotion

TL;DR: This work presents a machine learning technique for driving 3D facial animation by audio input in real time and with low latency, and simultaneously discovers a compact, latent code that disambiguates the variations in facial expression that cannot be explained by the audio alone.
Journal ArticleDOI

JALI: an animator-centric viseme model for expressive lip synchronization

TL;DR: A system that, given an input audio soundtrack and speech transcript, automatically generates expressive lip-synchronized facial animation that is amenable to further artistic refinement, and that is comparable with both performance capture and professional animator output is presented.
Book ChapterDOI

MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation

TL;DR: The Multi-view Emotional Audio-visual Dataset (MEAD) is built, a talking-face video corpus featuring 60 actors and actresses talking with eight different emotions at three different intensity levels that could benefit a number of different research fields including conditional generation, cross-modal understanding and expression recognition.
References
More filters
Journal ArticleDOI

Least squares quantization in PCM

TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.
Journal ArticleDOI

Determining optical flow

TL;DR: In this paper, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.
Journal ArticleDOI

LIII. On lines and planes of closest fit to systems of points in space

TL;DR: This paper is concerned with the construction of planes of closest fit to systems of points in space and the relationships between these planes and the planes themselves.

Least Squares Quantization in PCM

TL;DR: The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.
Book

Fundamentals of speech recognition

TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.
Related Papers (5)