Adversarial Inverse Graphics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired Supervision
Hsiao-Yu Fish Tung,Adam W. Harley,William Seto,Katerina Fragkiadaki +3 more
- pp 4364-4372
TLDR
Adversarial Inverse Graphics networks (AIGNs) are proposed, weakly supervised neural network models that combine feedback from rendering their predictions, with distribution matching between their predictions and a collection of ground-truth factors, and outperform models supervised by only paired annotations.Abstract:
Researchers have developed excellent feed-forward models that learn to map images to desired outputs, such as to the images’ latent factors, or to other images, using supervised learning. Learning such mappings from unlabelled data, or improving upon supervised models by exploiting unlabelled data, remains elusive. We argue that there are two important parts to learning without annotations: (i) matching the predictions to the input observations, and (ii) matching the predictions to known priors. We propose Adversarial Inverse Graphics networks (AIGNs): weakly supervised neural network models that combine feedback from rendering their predictions, with distribution matching between their predictions and a collection of ground-truth factors. We apply AIGNs to 3D human pose estimation and 3D structure and egomotion estimation, and outperform models supervised by only paired annotations. We further apply AIGNs to facial image transformation using super-resolution and inpainting renderers, while deliberately adding biases in the ground-truth datasets. Our model seamlessly incorporates such biases, rendering input faces towards young, old, feminine, masculine or Tom Cruiselike equivalents (depending on the chosen bias), or adding lip and nose augmentations while inpainting concealed lips and noses.read more
Citations
More filters
Proceedings ArticleDOI
High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs
TL;DR: In this paper, a new method for synthesizing high-resolution photo-realistic images from semantic label maps using conditional generative adversarial networks (conditional GANs) is presented.
Proceedings ArticleDOI
End-to-End Recovery of Human Shape and Pose
TL;DR: This work introduces an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes, and produces a richer and more useful mesh representation that is parameterized by shape and 3D joint angles.
Proceedings Article
Scene Representation Networks: Continuous 3D-Structure-Aware Neural Scene Representations
TL;DR: The proposed Scene Representation Networks (SRNs), a continuous, 3D-structure-aware scene representation that encodes both geometry and appearance, are demonstrated by evaluating them for novel view synthesis, few-shot reconstruction, joint shape and appearance interpolation, and unsupervised discovery of a non-rigid face model.
Proceedings ArticleDOI
3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training
TL;DR: It is demonstrated that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints and back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data is introduced.
Proceedings ArticleDOI
RepNet: Weakly Supervised Training of an Adversarial Reprojection Network for 3D Human Pose Estimation
Bastian Wandt,Bodo Rosenhahn +1 more
TL;DR: RepNet as discussed by the authors proposes a reprojection network to reproject the estimated 3D pose back to 2D, which results in a loss function that can be used for weakly supervised training.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
TL;DR: CycleGAN as discussed by the authors learns a mapping G : X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Proceedings ArticleDOI
Are we ready for autonomous driving? The KITTI vision benchmark suite
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.