scispace - formally typeset
Open AccessProceedings ArticleDOI

Learning Detailed Face Reconstruction from a Single Image

Reads0
Chats0
TLDR
This work proposes to leverage the power of convolutional neural networks to produce a highly detailed face reconstruction from a single image, and introduces an end- to-end CNN framework which derives the shape in a coarse-to-fine fashion.
Abstract
Reconstructing the detailed geometric structure of a face from a given image is a key to many computer vision and graphics applications, such as motion capture and reenactment. The reconstruction task is challenging as human faces vary extensively when considering expressions, poses, textures, and intrinsic geometries. While many approaches tackle this complexity by using additional data to reconstruct the face of a single subject, extracting facial surface from a single image remains a difficult problem. As a result, single-image based methods can usually provide only a rough estimate of the facial geometry. In contrast, we propose to leverage the power of convolutional neural networks to produce a highly detailed face reconstruction from a single image. For this purpose, we introduce an end-to-end CNN framework which derives the shape in a coarse-to-fine fashion. The proposed architecture is composed of two main blocks, a network that recovers the coarse facial geometry (CoarseNet), followed by a CNN that refines the facial features of that geometry (FineNet). The proposed networks are connected by a novel layer which renders a depth image given a mesh in 3D. Unlike object recognition and detection problems, there are no suitable datasets for training CNNs to perform face geometry reconstruction. Therefore, our training regime begins with a supervised phase, based on synthetic images, followed by an unsupervised phase that uses only unconstrained facial images. The accuracy and robustness of the proposed model is demonstrated by both qualitative and quantitative evaluation tests.

read more

Citations
More filters
Proceedings Article

A morphable model for the synthesis of 3D faces

Matthew Turk
Proceedings ArticleDOI

Differentiable Volumetric Rendering: Learning Implicit 3D Representations Without 3D Supervision

TL;DR: This work proposes a differentiable rendering formulation for implicit shape and texture representations, showing that depth gradients can be derived analytically using the concept of implicit differentiation, and finds that this method can be used for multi-view 3D reconstruction, directly resulting in watertight meshes.
Journal ArticleDOI

Deep video portraits

TL;DR: In this paper, a generative neural network with a novel space-time architecture is proposed to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor.
Book ChapterDOI

Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network

TL;DR: Yadira et al. as mentioned in this paper proposed a simple convolutional neural network to regress the 3D shape of a complete face from a single 2D image, which can reconstruct full facial geometry along with semantic meaning.
Proceedings ArticleDOI

Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning

TL;DR: This work proposes a truly differentiable rendering framework that is able to directly render colorized mesh using differentiable functions and back-propagate efficient supervision signals to mesh vertices and their attributes from various forms of image representations, including silhouette, shading and color images.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI

Rapid object detection using a boosted cascade of simple features

TL;DR: A machine learning approach for visual object detection which is capable of processing images extremely rapidly and achieving high detection rates and the introduction of a new image representation called the "integral image" which allows the features used by the detector to be computed very quickly.
Proceedings ArticleDOI

Deep face recognition

TL;DR: It is shown how a very large scale dataset can be assembled by a combination of automation and human in the loop, and the trade off between data purity and time is discussed.
Related Papers (5)