Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos
Tomas Pfister,Karen Simonyan,James Charles,Andrew Zisserman +3 more
- pp 538-552
TLDR
This work is the first to their knowledge to use ConvNets for estimating human pose in videos and introduces a new network that exploits temporal information from multiple frames, leading to better performance.Abstract:
Our objective is to efficiently and accurately estimate the upper body pose of humans in gesture videos. To this end, we build on the recent successful applications of deep convolutional neural networks (ConvNets). Our novelties are: (i) our method is the first to our knowledge to use ConvNets for estimating human pose in videos; (ii) a new network that exploits temporal information from multiple frames, leading to better performance; (iii) showing that pre-segmenting the foreground of the video improves performance; and (iv) demonstrating that even without foreground segmentations, the network learns to abstract away from the background and can estimate the pose even in the presence of a complex, varying background.read more
Citations
More filters
Journal ArticleDOI
Deep learning for visual understanding
TL;DR: The state-of-the-art in deep learning algorithms in computer vision is reviewed by highlighting the contributions and challenges from over 210 recent research papers, and the future trends and challenges in designing and training deep neural networks are summarized.
Book ChapterDOI
Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
Federica Bogo,Angjoo Kanazawa,Christoph Lassner,Christoph Lassner,Peter V. Gehler,Peter V. Gehler,Javier Romero,Michael J. Black +7 more
TL;DR: In this article, the authors estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image by fitting a statistical body shape model to the 2D joints.
Posted Content
Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image
Federica Bogo,Angjoo Kanazawa,Christoph Lassner,Christoph Lassner,Peter V. Gehler,Peter V. Gehler,Javier Romero,Michael J. Black +7 more
TL;DR: The first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image is described, showing superior pose accuracy with respect to the state of the art.
Proceedings ArticleDOI
Flowing ConvNets for Human Pose Estimation in Videos
TL;DR: This work proposes a ConvNet architecture that is able to benefit from temporal context by combining information across the multiple frames using optical flow and outperforms a number of others, including one that uses optical flow solely at the input layers, one that regresses joint coordinates directly, and one that predicts heatmaps without spatial fusion.
Book ChapterDOI
Human Pose Estimation via Convolutional Part Heatmap Regression
TL;DR: In this article, a CNN cascaded architecture was proposed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions. But the performance of the proposed architecture is limited.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Posted Content
Rich feature hierarchies for accurate object detection and semantic segmentation
TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.