scispace - formally typeset
Open AccessProceedings Article

Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation

Reads0
Chats0
TLDR
This paper proposes a pose grammar to tackle the problem of 3D human pose estimation, which takes 2D pose as input and learns a generalized 2D-3D mapping function and enforces high-level constraints over human poses.
Abstract
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalizability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.

read more

Citations
More filters
Proceedings ArticleDOI

3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training

TL;DR: It is demonstrated that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints and back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data is introduced.
Book ChapterDOI

Learning Human-Object Interactions by Graph Parsing Neural Networks

TL;DR: This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos with the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end.
Proceedings ArticleDOI

Semantic Graph Convolutional Networks for 3D Human Pose Regression

TL;DR: The proposed Semantic Graph Convolutional Networks (SemGCN), a novel neural network architecture that operates on regression tasks with graph-structured data that learns to capture semantic information such as local and global node relationships, which is not explicitly represented in the graph.
Proceedings ArticleDOI

Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks

TL;DR: A novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections, where domain knowledge about the human hand (body) configurations is explicitly incorporated into the graph convolutional operations to meet the specific demand of the 3D pose estimation.
Proceedings ArticleDOI

Monocular Total Capture: Posing Face, Body, and Hands in the Wild

TL;DR: Li et al. as mentioned in this paper used 3D Part Orientation Fields (POFs) to encode the 3D orientations of all body parts in the common 2D image space, and predicted POFs by a Fully Convolutional Network, along with the joint confidence maps.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI

Bidirectional recurrent neural networks

TL;DR: It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.
Proceedings ArticleDOI

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

TL;DR: A novel benchmark "MPII Human Pose" is introduced that makes a significant advance in terms of diversity and difficulty, a contribution that is required for future developments in human body models.
Posted Content

Stacked Hourglass Networks for Human Pose Estimation

TL;DR: Stacked hourglass networks as mentioned in this paper were proposed for human pose estimation, where features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body, and repeated bottom-up, top-down processing with intermediate supervision is critical to improving the performance of the network.
Related Papers (5)