Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation

doi:10.1109/ICCV.2017.425

Open AccessProceedings ArticleDOI

Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation

Bugra Tekin, +3 more

- pp 3961-3970

Chats0

TLDR

In this paper, a trainable fusion scheme is proposed to fuse the information optimally instead of being hand-designed, which yields significant improvements on standard 3D human pose estimation benchmarks.

Abstract:

Most recent approaches to monocular 3D human pose estimation rely on Deep Learning. They typically involve regressing from an image to either 3D joint coordinates directly or 2D joint locations from which 3D coordinates are inferred. Both approaches have their strengths and weaknesses and we therefore propose a novel architecture designed to deliver the best of both worlds by performing both simultaneously and fusing the information along the way. At the heart of our framework is a trainable fusion scheme that learns how to fuse the information optimally instead of being hand-designed. This yields significant improvements upon the state-of-the-art on standard 3D human pose estimation benchmarks.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

End-to-End Recovery of Human Shape and Pose

Angjoo Kanazawa, +3 more

TL;DR: This work introduces an adversary trained to tell whether human body shape and pose parameters are real or not using a large database of 3D human meshes, and produces a richer and more useful mesh representation that is parameterized by shape and 3D joint angles.

...read moreread less

Journal ArticleDOI

VNect: real-time 3D human pose estimation with a single RGB camera

Dushyant Mehta, +8 more

TL;DR: In this paper, a fully-convolutional pose formulation was proposed to regress 2D and 3D joint positions jointly in real-time and does not require tightly cropped input frames.

...read moreread less

Proceedings ArticleDOI

Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop

Nikos Kolotouros, +3 more

TL;DR: SPIN as discussed by the authors uses a deep network to initialize an iterative optimization routine that fits the body model to 2D joints within the training loop, and the fitted estimate is subsequently used to supervise the network.

...read moreread less

Journal ArticleDOI

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

Dushyant Mehta, +8 more

- 03 May 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents the first real-time method to capture the full global 3D skeletal pose of a human in a stable, temporally consistent manner using a single RGB camera and shows that the approach is more broadly applicable than RGB-D solutions, i.e., it works for outdoor scenes, community videos, and low quality commodity RGB cameras.

...read moreread less

Proceedings ArticleDOI

Learning to Estimate 3D Human Pose and Shape from a Single Color Image

Georgios Pavlakos, +3 more

TL;DR: This work addresses the problem of estimating the full body 3D human pose and shape from a single color image and proposes an efficient and effective direct prediction method based on ConvNets, incorporating a parametric statistical body shape model (SMPL) within an end-to-end framework.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Book ChapterDOI

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Posted Content

U-Net: Convolutional Networks for Biomedical Image Segmentation

Olaf Ronneberger, +2 more

- 18 May 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is shown that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.

...read moreread less

Book ChapterDOI

Stacked Hourglass Networks for Human Pose Estimation

Alejandro Newell, +2 more

TL;DR: This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.

...read moreread less

Collapse

Related Papers (5)

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

Catalin Ionescu, +3 more

- 01 Jul 2014 -

IEEE Transactions on Pattern Analysis an...

Learning to Fuse 2D and 3D Image Cues for Monocular Body Pose Estimation

Citations

End-to-End Recovery of Human Shape and Pose

VNect: real-time 3D human pose estimation with a single RGB camera

Learning to Reconstruct 3D Human Pose and Shape via Model-Fitting in the Loop

VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera

Learning to Estimate 3D Human Pose and Shape from a Single Color Image

References

Adam: A Method for Stochastic Optimization

Very Deep Convolutional Networks for Large-Scale Image Recognition

U-Net: Convolutional Networks for Biomedical Image Segmentation

U-Net: Convolutional Networks for Biomedical Image Segmentation

Stacked Hourglass Networks for Human Pose Estimation

Related Papers (5)

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

A Simple Yet Effective Baseline for 3d Human Pose Estimation

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

VNect: real-time 3D human pose estimation with a single RGB camera