Open AccessPosted Content
X2Face: A network for controlling face generation by using images, audio, and pose codes
Reads0
Chats0
TLDR
A neural network model that controls the pose and expression of a given face, using another face or modality (e.g. audio) is proposed, which can be used for lightweight, sophisticated video and image editing.Abstract:
The objective of this paper is a neural network model that controls the pose and expression of a given face, using another face or modality (e.g. audio). This model can then be used for lightweight, sophisticated video and image editing.
We make the following three contributions. First, we introduce a network, X2Face, that can control a source face (specified by one or more frames) using another face in a driving frame to produce a generated frame with the identity of the source frame but the pose and expression of the face in the driving frame. Second, we propose a method for training the network fully self-supervised using a large collection of video data. Third, we show that the generation process can be driven by other modalities, such as audio or pose codes, without any further training of the network.
The generation results for driving a face with another face are compared to state-of-the-art self-supervised/supervised methods. We show that our approach is more robust than other methods, as it makes fewer assumptions about the input data. We also show examples of using our framework for video face editing.read more
Citations
More filters
Journal Article
“Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告
Journal ArticleDOI
Text-based editing of talking-head video
Ohad Fried,Ayush Tewari,Michael Zollhöfer,Adam Finkelstein,Eli Shechtman,Dan B. Goldman,Kyle Genova,Zeyu Jin,Christian Theobalt,Maneesh Agrawala +9 more
TL;DR: This work proposes a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts).
Journal ArticleDOI
Neural style-preserving visual dubbing
Hyeongwoo Kim,Mohamed Elgharib,Michael Zollhöfer,Hans-Peter Seidel,Thabo Beeler,Christian Richardt,Christian Theobalt +6 more
TL;DR: In this article, a recurrent generative adversarial network (GAN) is used to capture the spatio-temporal co-activation of facial expressions and enables generating and modifying the facial expressions of the target actor while preserving their style.
Proceedings ArticleDOI
StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN
Fei Yin,Yong Zhang,Xiaodong Cun,Ming Cao,Yanbo Fan,Xuan-Yi Wang,Qingyan Bai,Baoyuan Wu,Jue Wang,Yujiu Yang +9 more
TL;DR: In this paper , a pre-trained StyleGAN is used to generate high-resolution video and audio for one-shot talking face generation, where the latent feature space of a StyleGAN was investigated and some excellent spatial transformation properties were discovered.
References
More filters
Proceedings ArticleDOI
Image-to-Image Translation with Conditional Adversarial Networks
TL;DR: Conditional adversarial networks are investigated as a general-purpose solution to image-to-image translation problems and it is demonstrated that this approach is effective at synthesizing photos from label maps, reconstructing objects from edge maps, and colorizing images, among other tasks.
Proceedings ArticleDOI
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
TL;DR: CycleGAN as discussed by the authors learns a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Proceedings ArticleDOI
Poisson image editing
TL;DR: Using generic interpolation machinery based on solving Poisson equations, a variety of novel tools are introduced for seamless editing of image regions, which permits the seamless importation of both opaque and transparent source image regions into a destination region.
Journal ArticleDOI
Dlib-ml: A Machine Learning Toolkit
TL;DR: dlib-ml contains an extensible linear algebra toolkit with built in BLAS support, and implementations of algorithms for performing inference in Bayesian networks and kernel-based methods for classification, regression, clustering, anomaly detection, and feature ranking.
Proceedings Article
InfoGAN: interpretable representation learning by information maximizing generative adversarial nets
TL;DR: InfoGAN as mentioned in this paper is an information-theoretic extension to the GAN that is able to learn disentangled representations in a completely unsupervised manner, and it also discovers visual concepts that include hair styles, presence of eyeglasses, and emotions on the CelebA face dataset.