Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

Open AccessPosted Content

Arbitrary Talking Face Generation via Attentional Audio-Visual Coherence Learning

- 17 Dec 2018 -

arXiv: Computer Vision and Pattern Recog...

TLDR

A novel arbitrary talking face generation framework is proposed by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE) and a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization.

Abstract:

Talking face generation aims to synthesize a face video with precise lip synchronization as well as a smooth transition of facial motion over the entire video via the given speech clip and facial image. Most existing methods mainly focus on either disentangling the information in a single image or learning temporal information between frames. However, cross-modality coherence between audio and video information has not been well addressed during synthesis. In this paper, we propose a novel arbitrary talking face generation framework by discovering the audio-visual coherence via the proposed Asymmetric Mutual Information Estimator (AMIE). In addition, we propose a Dynamic Attention (DA) block by selectively focusing the lip area of the input image during the training stage, to further enhance lip synchronization. Experimental results on benchmark LRW dataset and GRID dataset transcend the state-of-the-art methods on prevalent metrics with robust high-resolution synthesizing on gender and pose variations.

Citations

PDF

Open Access

More filters

Posted Content

A Review on Generative Adversarial Networks: Algorithms, Theory, and Applications

Jie Gui, +4 more

- 20 Jan 2020 -

arXiv: Learning

TL;DR: This paper attempts to provide a review on various GANs methods from the perspectives of algorithms, theory, and applications, and compares the commonalities and differences of these GAns methods.

...read moreread less

Journal ArticleDOI

The Journal of the Acoustical Society of America

F. R. Watson

- 01 Jul 1939 -

Journal of the Acoustical Society of Ame...

Posted Content

Deep Audio-Visual Learning: A Survey

Hao Zhu, +6 more

- 14 Jan 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A comprehensive survey of recent audio-visual learning development can be found in this article, where the authors divide the current audio visual learning tasks into four different subfields: audio visual separation and localization, audio visual correspondence learning, audiovisual generation, and audio visual representation learning.

...read moreread less

Book ChapterDOI

Semantic-Aware Implicit Neural Audio-Driven Video Portrait Generation

Xian Liu, +5 more

TL;DR: Li et al. as mentioned in this paper proposed Semantic-aware Speaking Portrait NeRF (SSP-NeRF), which can handle the detailed local facial semantics and the global head-torso relationship through two semantic-aware modules.

...read moreread less

Journal ArticleDOI

Foundations and Recent Trends in Multimodal Machine Learning: Principles, Challenges, and Open Questions

Paul Pu Liang, +2 more

arXiv.org

TL;DR: This paper proposes a taxonomy of 6 core technical challenges: representation, alignment, reasoning, generation, transference, and quantification covering historical and recent trends, and defines two key principles of modality heterogeneity and interconnections that have driven subsequent innovations.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Journal ArticleDOI

Image quality assessment: from error visibility to structural similarity

Zhou Wang, +3 more

- 01 Apr 2004 -

IEEE Transactions on Image Processing

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.

...read moreread less

Journal ArticleDOI

Generative Adversarial Nets

Ian Goodfellow, +7 more

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Posted Content

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

- 18 May 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less