scispace - formally typeset
Proceedings ArticleDOI

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN

Reads0
Chats0
TLDR
In this paper , a pre-trained StyleGAN is used to generate high-resolution video and audio for one-shot talking face generation, where the latent feature space of a StyleGAN was investigated and some excellent spatial transformation properties were discovered.
Abstract
One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image, driven by a video or an audio segment. One challenging quality factor is the resolution of the output video: higher resolution conveys more details. In this work, we investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties. Upon the observation, we explore the possibility of using a pre-trained StyleGAN to break through the resolution limit of training datasets. We propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities, i.e., high-resolution video generation, disentangled control by driving video or audio, and flexible face editing. Our framework elevates the resolution of the synthesized talking face to 1024*1024 for the first time, even though the training dataset has a lower resolution. We design a video-based motion generation module and an audio-based one, which can be plugged into the framework either individually or jointly to drive the video generation. The predicted motion is used to transform the latent features of StyleGAN for visual animation. To compensate for the transformation distortion, we propose a calibration network as well as a domain loss to refine the features. Moreover, our framework allows two types of facial editing, i.e., global editing via GAN inversion and intuitive editing based on 3D morphable models. Comprehensive experiments show superior video quality, flexible controllability, and editability over state-of-the-art methods.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Depth-Aware Generative Adversarial Network for Talking Head Video Generation

TL;DR: A self-supervised face-depth learning method to automatically recover dense 3D facial geometry (i.e. depth) from the face videos without the requirement of any expensive 3D annotation data is introduced.
Proceedings ArticleDOI

Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection

TL;DR: This work addresses the generalizable deepfake detection from a simple principle: a generalizable representation should be sensitive to diverse types of forgeries and synthesize augmented forgeries with a pool of forgery configurations and strengthen the “sensitivity” to the forgeries by enforcing the model to predict the forgery configuration.
Journal ArticleDOI

Next3D: Generative Neural Texture Rasterization for 3D-Aware Head Avatars

TL;DR: Li et al. as mentioned in this paper proposed a 3D representation called Generative Texture-Rasterized Tri-planes, which learns Generative Neural Textures on top of parametric mesh templates and then projects them into three orthogonal-viewed feature planes through rasterization, forming a tri-plane feature representation for volume rendering.
Journal ArticleDOI

Diffused Heads: Diffusion Models Beat GANs on Talking-Face Generation

TL;DR: In this paper , an autoregressive diffusion model was proposed to generate a video of a realistic talking human head, hallucinating head movements, facial expressions, such as blinks, and preserving a given background.
Journal ArticleDOI

3D GAN Inversion with Facial Symmetry Prior

TL;DR: Zhang et al. as discussed by the authors proposed a novel method to promote 3D GAN inversion by introducing facial symmetry prior, which helps obtain a robust and reasonable geometry shape during the inversion process.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Posted Content

Image-to-Image Translation with Conditional Adversarial Networks

TL;DR: Conditional Adversarial Network (CA) as discussed by the authors is a general-purpose solution to image-to-image translation problems, which can be used to synthesize photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Proceedings ArticleDOI

A Style-Based Generator Architecture for Generative Adversarial Networks

TL;DR: This paper proposed an alternative generator architecture for GANs, borrowing from style transfer literature, which leads to an automatically learned, unsupervised separation of high-level attributes (e.g., pose and identity when trained on human faces) and stochastic variation in the generated images.
Posted Content

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

TL;DR: This work considers image transformation problems, and proposes the use of perceptual loss functions for training feed-forward networks for image transformation tasks, and shows results on image style transfer, where aFeed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.