Monocular 3D Human Pose Estimation by Predicting Depth on Joints

doi:10.1109/ICCV.2017.373

Proceedings ArticleDOI

Monocular 3D Human Pose Estimation by Predicting Depth on Joints

Bruce Xiaohan Nie, +2 more

- pp 3467-3475

Chats0

TLDR

The empirical e-valuation on Human3.6M and HHOI dataset demonstrates the advantage of combining global 2D skeleton and local image patches for depth prediction, and the superior quantitative and qualitative performance relative to state-of-the-art methods.

Abstract:

This paper aims at estimating full-body 3D human poses from monocular images of which the biggest challenge is the inherent ambiguity introduced by lifting the 2D pose into 3D space. We propose a novel framework focusing on reducing this ambiguity by predicting the depth of human joints based on 2D human joint locations and body part images. Our approach is built on a two-level hierarchy of Long Short-Term Memory (LSTM) Networks which can be trained end-to-end. The first level consists of two components: 1) a skeleton-LSTM which learns the depth information from global human skeleton features; 2) a patch-LSTM which utilizes the local image evidence around joint locations. The both networks have tree structure defined on the kinematic relation of human skeleton, thus the information at different joints is broadcast through the whole skeleton in a top-down fashion. The two networks are first pre-trained separately on different data sources and then aggregated in the second layer for final depth prediction. The empirical e-valuation on Human3.6M and HHOI dataset demonstrates the advantage of combining global 2D skeleton and local image patches for depth prediction, and our superior quantitative and qualitative performance relative to state-of-the-art methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Video Salient Object Detection via Fully Convolutional Networks

Wenguan Wang, +2 more

- 01 Jan 2018 -

IEEE Transactions on Image Processing

TL;DR: Wang et al. as discussed by the authors proposed a deep video saliency network consisting of two modules, for capturing the spatial and temporal saliency information, respectively, which can directly produce spatio-temporal saliency inference without time-consuming optical flow computation.

...read moreread less

Book ChapterDOI

Integral Human Pose Regression

Xiao Sun, +4 more

TL;DR: In this paper, a simple integral operation relates and unifies the heat map representation and joint regression, thus avoiding the non-differentiable post-processing and quantization error of human pose estimation.

...read moreread less

Proceedings ArticleDOI

3D Human Pose Estimation in the Wild by Adversarial Learning

Wei Yang, +5 more

TL;DR: An adversarial learning framework is proposed, which distills the 3D human pose structures learned from the fully annotated dataset to in-the-wild images with only 2D pose annotations and designs a geometric descriptor, which computes the pairwise relative locations and distances between body joints, as a new information source for the discriminator.

...read moreread less

Proceedings Article

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

Hao-Shu Fang, +4 more

TL;DR: This paper proposes a pose grammar to tackle the problem of 3D human pose estimation, which takes 2D pose as input and learns a generalized 2D-3D mapping function and enforces high-level constraints over human poses.

...read moreread less

Proceedings ArticleDOI

Single-Shot Multi-person 3D Pose Estimation from Monocular RGB

Dushyant Mehta, +6 more

TL;DR: This work proposes a new single-shot method for multi-person 3D pose estimation in general scenes from a monocular RGB camera which uses novel occlusion-robust pose-maps (ORPM) which enable full body pose inference even under strong partial occlusions by other people and objects in the scene.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation

Jonathan Tompson, +3 more

TL;DR: In this article, a hybrid architecture that consists of a deep Convolu-tional Network and a Markov Random Field (MRF) was proposed for articulated human pose estimation in monocular images.

...read moreread less

Proceedings ArticleDOI

Deep convolutional neural fields for depth estimation from a single image

Fayao Liu, +2 more

TL;DR: Zhang et al. as mentioned in this paper proposed a deep convolutional neural field model for depth estimation from a single image, aiming to jointly explore the capacity of deep CNN and continuous CRF.

...read moreread less

Proceedings ArticleDOI

Saliency-aware geodesic video object segmentation

Wenguan Wang, +2 more

TL;DR: This work introduces an unsupervised, geodesic distance based, salient video object segmentation method that incorporates saliency as prior for object via the computation of robust geodesIC measurement and builds global appearance models for foreground and background.

...read moreread less

Proceedings ArticleDOI

Towards unified depth and semantic prediction from a single image

Peng Wang, +5 more

TL;DR: This work proposes a unified framework for joint depth and semantic prediction that effectively leverages the advantages of both tasks and provides the state-of-the-art results.

...read moreread less

Proceedings ArticleDOI

Learning effective human pose estimation from inaccurate annotation

Sam Johnson, +1 more

TL;DR: A significant increase in pose estimation accuracy is demonstrated, while simultaneously reducing computational expense by a factor of 10, and a dataset of10,000 highly articulated poses is contributed.

...read moreread less

Collapse

Related Papers (5)

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

Catalin Ionescu, +3 more

- 01 Jul 2014 -

IEEE Transactions on Pattern Analysis an...

Monocular 3D Human Pose Estimation by Predicting Depth on Joints

Citations

Video Salient Object Detection via Fully Convolutional Networks

Integral Human Pose Regression

3D Human Pose Estimation in the Wild by Adversarial Learning

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

Single-Shot Multi-person 3D Pose Estimation from Monocular RGB

References

Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation

Deep convolutional neural fields for depth estimation from a single image

Saliency-aware geodesic video object segmentation

Towards unified depth and semantic prediction from a single image

Learning effective human pose estimation from inaccurate annotation

Related Papers (5)

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

A Simple Yet Effective Baseline for 3d Human Pose Estimation

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose