scispace - formally typeset
Open AccessProceedings ArticleDOI

Neural Body Fitting: Unifying Deep Learning and Model Based Human Pose and Shape Estimation

Reads0
Chats0
TLDR
Neural Body Fitting (NBF) as discussed by the authors integrates a statistical body model as a layer within a CNN leveraging both reliable bottom-up body part segmentation and robust top-down body model constraints.
Abstract
Direct prediction of 3D body pose and shape parameters remains a challenge even for highly parameterized, deep learning models. The representation of the prediction space is difficult to map to from the plain 2D image space, perspective ambiguities make the loss function noisy and training data is scarce. In this paper, we propose a novel approach (Neural Body Fitting (NBF)) that integrates a statistical body model as a layer within a CNN leveraging both reliable bottom-up body part segmentation and robust top-down body model constraints. NBF is fully differentiable and can be trained end-to-end from both 2D and 3D annotations. In detailed experiments we analyze how the components of our model improve model performance and present a robust, easy to use, end-to-end trainable framework for 3D human pose estimation from single 2D images.

read more

Citations
More filters
Proceedings ArticleDOI

PIFu: Pixel-Aligned Implicit Function for High-Resolution Clothed Human Digitization

TL;DR: Pixel-aligned Implicit Function (PIFu) as mentioned in this paper aligns pixels of 2D images with the global context of their corresponding 3D object to produce highresolution surfaces including largely unseen regions such as the back of a person.
Proceedings ArticleDOI

Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop

TL;DR: SPIN as discussed by the authors uses a deep network to initialize an iterative optimization routine that fits the body model to 2D joints within the training loop, and the fitted estimate is subsequently used to supervise the network.
Proceedings ArticleDOI

VIBE: Video Inference for Human Body Pose and Shape Estimation

TL;DR: This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels.
Proceedings ArticleDOI

Expressive Body Capture: 3D Hands, Face, and Body From a Single Image

TL;DR: In this article, a 3D model of human body pose, hand pose, and facial expression from a single monocular image is computed using SMPL-X, which is trained using thousands of 3D scans.
Proceedings ArticleDOI

Implicit Functions in Feature Space for 3D Shape Reconstruction and Completion

TL;DR: In this paper, the authors proposed Implicit Feature Networks (IF-Nets), which deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data retaining the nice properties of recent learned implicit functions.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Posted Content

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

TL;DR: DeepLab as discussed by the authors proposes atrous spatial pyramid pooling (ASPP) to segment objects at multiple scales by probing an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views.
Related Papers (5)