Monocular 3D Human Pose Estimation by Predicting Depth on Joints

doi:10.1109/ICCV.2017.373

Proceedings ArticleDOI

Monocular 3D Human Pose Estimation by Predicting Depth on Joints

Bruce Xiaohan Nie, +2 more

- pp 3467-3475

Chats0

TLDR

The empirical e-valuation on Human3.6M and HHOI dataset demonstrates the advantage of combining global 2D skeleton and local image patches for depth prediction, and the superior quantitative and qualitative performance relative to state-of-the-art methods.

Abstract:

This paper aims at estimating full-body 3D human poses from monocular images of which the biggest challenge is the inherent ambiguity introduced by lifting the 2D pose into 3D space. We propose a novel framework focusing on reducing this ambiguity by predicting the depth of human joints based on 2D human joint locations and body part images. Our approach is built on a two-level hierarchy of Long Short-Term Memory (LSTM) Networks which can be trained end-to-end. The first level consists of two components: 1) a skeleton-LSTM which learns the depth information from global human skeleton features; 2) a patch-LSTM which utilizes the local image evidence around joint locations. The both networks have tree structure defined on the kinematic relation of human skeleton, thus the information at different joints is broadcast through the whole skeleton in a top-down fashion. The two networks are first pre-trained separately on different data sources and then aggregated in the second layer for final depth prediction. The empirical e-valuation on Human3.6M and HHOI dataset demonstrates the advantage of combining global 2D skeleton and local image patches for depth prediction, and our superior quantitative and qualitative performance relative to state-of-the-art methods.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Matteo Fabbri, +4 more

TL;DR: A novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images by devising a simple and effective compression method to drastically reduce the size of this representation.

...read moreread less

Posted Content

Ordinal Depth Supervision for 3D Human Pose Estimation

Georgios Pavlakos, +2 more

- 10 May 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, the authors propose to use a weaker supervision signal provided by the ordinal depths of human joints, which can be acquired by human annotators for a wide range of images and poses.

...read moreread less

Journal ArticleDOI

3D Human Pose Machines with Self-Supervised Learning

Keze Wang, +4 more

- 01 May 2020 -

IEEE Transactions on Pattern Analysis an...

TL;DR: Zhang et al. as discussed by the authors proposed a self-supervised correction mechanism to learn all intrinsic structures of human poses from abundant images, which involves two dual learning tasks, i.e., the 2D-to-3D pose transformation and 3Dto-2D pose projection, to serve as a bridge between 3D and 2D human poses in a type of free selfsupervision for accurate 3D human pose estimation.

...read moreread less

Posted Content

Self-Supervised Learning of 3D Human Pose using Multi-view Geometry

Muhammed Kocabas, +2 more

- 06 Mar 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: EpipolarPose is presented, a self-supervised learning method for 3D human pose estimation, which does not need any 3D ground-truth data or camera extrinsics, and a new performance measure Pose Structure Score (PSS) which is a scale invariant, structure aware measure to evaluate the structural plausibility of a pose with respect to its ground truth.

...read moreread less

Posted Content

DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare

Yuanlu Xu, +2 more

- 30 Sep 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel end-to-end framework for jointly estimating 3D human pose and body shape from a monocular RGB image and a large-scale synthetic dataset utilizing web-crawled Mocap sequences, 3D scans and animations is constructed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

A unified architecture for natural language processing: deep neural networks with multitask learning

Ronan Collobert, +1 more

TL;DR: This work describes a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense using a language model.

...read moreread less

Proceedings ArticleDOI

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

Zhe Cao, +3 more

TL;DR: Part Affinity Fields (PAFs) as discussed by the authors uses a nonparametric representation to learn to associate body parts with individuals in the image and achieves state-of-the-art performance on the MPII Multi-Person benchmark.

...read moreread less

Book ChapterDOI

Stacked Hourglass Networks for Human Pose Estimation

Alejandro Newell, +2 more

TL;DR: This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.

...read moreread less

Collapse

Related Papers (5)

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

Catalin Ionescu, +3 more

- 01 Jul 2014 -

IEEE Transactions on Pattern Analysis an...

Monocular 3D Human Pose Estimation by Predicting Depth on Joints

Citations

Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation

Ordinal Depth Supervision for 3D Human Pose Estimation

3D Human Pose Machines with Self-Supervised Learning

Self-Supervised Learning of 3D Human Pose using Multi-view Geometry

DenseRaC: Joint 3D Pose and Shape Estimation by Dense Render-and-Compare

References

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

A unified architecture for natural language processing: deep neural networks with multitask learning

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

Stacked Hourglass Networks for Human Pose Estimation

Related Papers (5)

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

A Simple Yet Effective Baseline for 3d Human Pose Estimation

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose