scispace - formally typeset
Open AccessPosted Content

X2Face: A network for controlling face generation by using images, audio, and pose codes

Reads0
Chats0
TLDR
A neural network model that controls the pose and expression of a given face, using another face or modality (e.g. audio) is proposed, which can be used for lightweight, sophisticated video and image editing.
Abstract
The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e.g. audio). This model can then be used for lightweight, sophisticated video and image editing. We make the following three contributions. First, we introduce a network, X2Face, that can control a source face (specified by one or more frames) using another face in a driving frame to produce a generated frame with the identity of the source frame but the pose and expression of the face in the driving frame. Second, we propose a method for training the network fully self-supervised using a large collection of video data. Third, we show that the generation process can be driven by other modalities, such as audio or pose codes, without any further training of the network. The generation results for driving a face with another face are compared to state-of-the-art self-supervised/supervised methods. We show that our approach is more robust than other methods, as it makes fewer assumptions about the input data. We also show examples of using our framework for video face editing.

read more

Citations
More filters
Proceedings Article

A morphable model for the synthesis of 3D faces

Matthew Turk
Journal ArticleDOI

Text-based editing of talking-head video

TL;DR: This work proposes a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts).
Journal ArticleDOI

Neural style-preserving visual dubbing

TL;DR: In this article, a recurrent generative adversarial network (GAN) is used to capture the spatio-temporal co-activation of facial expressions and enables generating and modifying the facial expressions of the target actor while preserving their style.
Proceedings ArticleDOI

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN

TL;DR: In this paper , a pre-trained StyleGAN is used to generate high-resolution video and audio for one-shot talking face generation, where the latent feature space of a StyleGAN was investigated and some excellent spatial transformation properties were discovered.
References
More filters
Proceedings ArticleDOI

Image-to-Image Translation with Conditional Adversarial Networks

TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Proceedings ArticleDOI

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

TL;DR: CycleGAN as discussed by the authors learns a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Proceedings ArticleDOI

Poisson image editing

TL;DR: Using generic interpolation machinery based on solving Poisson equations, a variety of novel tools are introduced for seamless editing of image regions, which permits the seamless importation of both opaque and transparent source image regions into a destination region.
Journal ArticleDOI

Dlib-ml: A Machine Learning Toolkit

TL;DR: dlib-ml contains an extensible linear algebra toolkit with built in BLAS support, and implementations of algorithms for performing inference in Bayesian networks and kernel-based methods for classification, regression, clustering, anomaly detection, and feature ranking.
Proceedings Article

InfoGAN: interpretable representation learning by information maximizing generative adversarial nets

TL;DR: InfoGAN as mentioned in this paper is an information-theoretic extension to the GAN that is able to learn disentangled representations in a completely unsupervised manner, and it also discovers visual concepts that include hair styles, presence of eyeglasses, and emotions on the CelebA face dataset.
Related Papers (5)