Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

Open AccessProceedings Article

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

Hao-Shu Fang, +4 more

- Vol. 32, Iss: 1, pp 6821-6828

Chats0

TLDR

This paper proposes a pose grammar to tackle the problem of 3D human pose estimation, which takes 2D pose as input and learns a generalized 2D-3D mapping function and enforces high-level constraints over human poses.

Abstract:

In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalizability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

Dario Pavllo, +3 more

TL;DR: It is demonstrated that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints and back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data is introduced.

...read moreread less

Book ChapterDOI

Learning Human-Object Interactions by Graph Parsing Neural Networks

Siyuan Qi, +4 more

TL;DR: This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos with the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end.

...read moreread less

Proceedings ArticleDOI

Semantic Graph Convolutional Networks for 3D Human Pose Regression

Long Zhao, +4 more

TL;DR: The proposed Semantic Graph Convolutional Networks (SemGCN), a novel neural network architecture that operates on regression tasks with graph-structured data that learns to capture semantic information such as local and global node relationships, which is not explicitly represented in the graph.

...read moreread less

Proceedings ArticleDOI

Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks

Yujun Cai, +6 more

TL;DR: A novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections, where domain knowledge about the human hand (body) configurations is explicitly incorporated into the graph convolutional operations to meet the specific demand of the 3D pose estimation.

...read moreread less

Proceedings ArticleDOI

Monocular Total Capture: Posing Face, Body, and Hands in the Wild

Donglai Xiang, +2 more

TL;DR: Li et al. as mentioned in this paper used 3D Part Orientation Fields (POFs) to encode the 3D orientations of all body parts in the common 2D image space, and predicted POFs by a Fully Convolutional Network, along with the joint confidence maps.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, +1 more

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Journal ArticleDOI

Bidirectional recurrent neural networks

Mike Schuster, +1 more

- 01 Nov 1997 -

IEEE Transactions on Signal Processing

TL;DR: It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.

...read moreread less

Proceedings ArticleDOI

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

Mykhaylo Andriluka, +3 more

TL;DR: A novel benchmark "MPII Human Pose" is introduced that makes a significant advance in terms of diversity and difficulty, a contribution that is required for future developments in human body models.

...read moreread less

Posted Content

Stacked Hourglass Networks for Human Pose Estimation

Alejandro Newell, +2 more

- 22 Mar 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Stacked hourglass networks as mentioned in this paper were proposed for human pose estimation, where features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body, and repeated bottom-up, top-down processing with intermediate supervision is critical to improving the performance of the network.

...read moreread less

Collapse

Related Papers (5)

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

Catalin Ionescu, +3 more

- 01 Jul 2014 -

IEEE Transactions on Pattern Analysis an...

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

Citations

3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

Learning Human-Object Interactions by Graph Parsing Neural Networks

Semantic Graph Convolutional Networks for 3D Human Pose Regression

Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks

Monocular Total Capture: Posing Face, Body, and Hands in the Wild

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Bidirectional recurrent neural networks

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

Stacked Hourglass Networks for Human Pose Estimation

Related Papers (5)

Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments

A Simple Yet Effective Baseline for 3d Human Pose Estimation

Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose

Monocular 3D Human Pose Estimation in the Wild Using Improved CNN Supervision

Stacked Hourglass Networks for Human Pose Estimation