scispace - formally typeset
Open AccessProceedings ArticleDOI

MoFA: Model-Based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction

Reads0
Chats0
TLDR
A novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image and can be trained end-to-end in an unsupervised manner, which renders training on very large real world data feasible.
Abstract
In this work we propose a novel model-based deep convolutional autoencoder that addresses the highly challenging problem of reconstructing a 3D human face from a single in-the-wild color image. To this end, we combine a convolutional encoder network with an expert-designed generative model that serves as decoder. The core innovation is the differentiable parametric decoder that encapsulates image formation analytically based on a generative model. Our decoder takes as input a code vector with exactly defined semantic meaning that encodes detailed face pose, shape, expression, skin reflectance and scene illumination. Due to this new way of combining CNN-based with model-based face reconstruction, the CNN-based encoder learns to extract semantically meaningful parameters from a single monocular input image. For the first time, a CNN encoder and an expert-designed generative model can be trained end-to-end in an unsupervised manner, which renders training on very large (unlabeled) real world data feasible. The obtained reconstructions compare favorably to current state-of-the-art approaches in terms of quality and richness of representation.

read more

Content maybe subject to copyright    Report

Citations
More filters
Posted Content

Personalized Face Modeling for Improved Face Reconstruction and Motion Retargeting

TL;DR: Experimental results show that the end-to-end framework proposed accurately captures fine-grained facial dynamics in a wide range of conditions and efficiently decouples the learned face model from facial motion, resulting in more accurate face reconstruction and facial motion retargeting compared to state-of-the-art methods.
Posted Content

A Review of 3D Face Reconstruction From a Single Image.

TL;DR: A review of the recent literature on 3D face reconstruction from a single image can be found in this paper, where a large number of articles have been published on the problem and many researchers put attention to the problem.
Proceedings ArticleDOI

Spatially Multi-conditional Image Generation

TL;DR: TLAM as mentioned in this paper uses a transformer-like architecture operating pixel-wise, which receives the available labels as input tokens to merge them in a learned homogeneous space of labels, and then the merged labels are used for image generation via conditional generative adversarial training.
Book ChapterDOI

Learning 3D Face Reconstruction with a Pose Guidance Network

TL;DR: Li et al. as mentioned in this paper proposed a self-supervised learning approach to learn monocular 3D face reconstruction with a pose guidance network (PGN), which can learn from both faces with fully labeled 3D landmarks and unlimited unlabeled in-the-wild face images.
Proceedings ArticleDOI

Screen-space Regularization on Differentiable Rasterization

TL;DR: Wang et al. as discussed by the authors proposed a screen-space regularization method, which targets the unbalanced deformation due to the limited viewpoints, and applied the regularization to both multi-view deformation and single-view reconstruction tasks.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Reducing the Dimensionality of Data with Neural Networks

TL;DR: In this article, an effective way of initializing the weights that allows deep autoencoder networks to learn low-dimensional codes that work much better than principal components analysis as a tool to reduce the dimensionality of data is described.
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI

Deep Learning Face Attributes in the Wild

TL;DR: A novel deep learning framework for attribute prediction in the wild that cascades two CNNs, LNet and ANet, which are fine-tuned jointly with attribute tags, but pre-trained differently.
Related Papers (5)