Journal ArticleDOI
Audiovisual speech synthesis
TLDR
The paper discusses the evaluation of audiovisual speech synthesizers, it elaborates on the hardware requirements for performing visual speech synthesis and it describes some important future directions that should stimulate the use of audiolabeled speech synthesis technology in real-life applications.About:
This article is published in Speech Communication.The article was published on 2015-02-01. It has received 60 citations till now. The article focuses on the topics: Speech processing & Speech corpus.read more
Citations
More filters
Journal ArticleDOI
Synthesizing Obama: learning lip sync from audio
TL;DR: Given audio of President Barack Obama, a high quality video of him speaking with accurate lip sync is synthesized, composited into a target video clip, and a recurrent neural network learns the mapping from raw audio features to mouth shapes to produce photorealistic results.
Journal ArticleDOI
Audio-driven facial animation by joint end-to-end learning of pose and emotion
TL;DR: This work presents a machine learning technique for driving 3D facial animation by audio input in real time and with low latency, and simultaneously discovers a compact, latent code that disambiguates the variations in facial expression that cannot be explained by the audio alone.
Journal ArticleDOI
JALI: an animator-centric viseme model for expressive lip synchronization
TL;DR: A system that, given an input audio soundtrack and speech transcript, automatically generates expressive lip-synchronized facial animation that is amenable to further artistic refinement, and that is comparable with both performance capture and professional animator output is presented.
Book ChapterDOI
MEAD: A Large-Scale Audio-Visual Dataset for Emotional Talking-Face Generation
Kaisiyuan Wang,Qianyi Wu,Linsen Song,Linsen Song,Zhuoqian Yang,Wayne Wu,Chen Qian,Ran He,Yu Qiao,Chen Change Loy +9 more
TL;DR: The Multi-view Emotional Audio-visual Dataset (MEAD) is built, a talking-face video corpus featuring 60 actors and actresses talking with eight different emotions at three different intensity levels that could benefit a number of different research fields including conditional generation, cross-modal understanding and expression recognition.
References
More filters
Journal ArticleDOI
Least squares quantization in PCM
TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.
Journal ArticleDOI
Determining optical flow
TL;DR: In this paper, a method for finding the optical flow pattern is presented which assumes that the apparent velocity of the brightness pattern varies smoothly almost everywhere in the image, and an iterative implementation is shown which successfully computes the Optical Flow for a number of synthetic image sequences.
Journal ArticleDOI
LIII. On lines and planes of closest fit to systems of points in space
TL;DR: This paper is concerned with the construction of planes of closest fit to systems of points in space and the relationships between these planes and the planes themselves.
Least Squares Quantization in PCM
TL;DR: The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.
Book
Fundamentals of speech recognition
TL;DR: This book presents a meta-modelling framework for speech recognition that automates the very labor-intensive and therefore time-heavy and therefore expensive and expensive process of manually modeling speech.