Open AccessProceedings Article
Learning Pose Grammar to Encode Human Body Configuration for 3D Pose Estimation
Hao-Shu Fang,Yuanlu Xu,Wenguan Wang,Xiaobai Liu,Song-Chun Zhu +4 more
- Vol. 32, Iss: 1, pp 6821-6828
Reads0
Chats0
TLDR
This paper proposes a pose grammar to tackle the problem of 3D human pose estimation, which takes 2D pose as input and learns a generalized 2D-3D mapping function and enforces high-level constraints over human poses.Abstract:
In this paper, we propose a pose grammar to tackle the problem of 3D human pose estimation. Our model directly takes 2D pose as input and learns a generalized 2D-3D mapping function. The proposed model consists of a base network which efficiently captures pose-aligned features and a hierarchy of Bi-directional RNNs (BRNN) on the top to explicitly incorporate a set of knowledge regarding human body configuration (i.e., kinematics, symmetry, motor coordination). The proposed model thus enforces high-level constraints over human poses. In learning, we develop a pose sample simulator to augment training samples in virtual camera views, which further improves our model generalizability. We validate our method on public 3D human pose benchmarks and propose a new evaluation protocol working on cross-view setting to verify the generalization capability of different methods. We empirically observe that most state-of-the-art methods encounter difficulty under such setting while our method can well handle such challenges.read more
Citations
More filters
Proceedings ArticleDOI
3D Human Pose Estimation in Video With Temporal Convolutions and Semi-Supervised Training
TL;DR: It is demonstrated that 3D poses in video can be effectively estimated with a fully convolutional model based on dilated temporal convolutions over 2D keypoints and back-projection, a simple and effective semi-supervised training method that leverages unlabeled video data is introduced.
Book ChapterDOI
Learning Human-Object Interactions by Graph Parsing Neural Networks
TL;DR: This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images and videos with the Graph Parsing Neural Network (GPNN), a framework that incorporates structural knowledge while being differentiable end-to-end.
Proceedings ArticleDOI
Semantic Graph Convolutional Networks for 3D Human Pose Regression
TL;DR: The proposed Semantic Graph Convolutional Networks (SemGCN), a novel neural network architecture that operates on regression tasks with graph-structured data that learns to capture semantic information such as local and global node relationships, which is not explicitly represented in the graph.
Proceedings ArticleDOI
Exploiting Spatial-Temporal Relationships for 3D Pose Estimation via Graph Convolutional Networks
TL;DR: A novel graph-based method to tackle the problem of 3D human body and 3D hand pose estimation from a short sequence of 2D joint detections, where domain knowledge about the human hand (body) configurations is explicitly incorporated into the graph convolutional operations to meet the specific demand of the 3D pose estimation.
Proceedings ArticleDOI
Monocular Total Capture: Posing Face, Body, and Hands in the Wild
TL;DR: Li et al. as mentioned in this paper used 3D Part Orientation Fields (POFs) to encode the 3D orientations of all body parts in the common 2D image space, and predicted POFs by a Fully Convolutional Network, along with the joint confidence maps.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
Bidirectional recurrent neural networks
Mike Schuster,Kuldip K. Paliwal +1 more
TL;DR: It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.
Proceedings ArticleDOI
2D Human Pose Estimation: New Benchmark and State of the Art Analysis
TL;DR: A novel benchmark "MPII Human Pose" is introduced that makes a significant advance in terms of diversity and difficulty, a contribution that is required for future developments in human body models.
Posted Content
Stacked Hourglass Networks for Human Pose Estimation
TL;DR: Stacked hourglass networks as mentioned in this paper were proposed for human pose estimation, where features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body, and repeated bottom-up, top-down processing with intermediate supervision is critical to improving the performance of the network.