Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos

doi:10.1007/978-3-319-16865-4_35

Open AccessBook ChapterDOI

Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos

- pp 538-552

TLDR

This work is the first to their knowledge to use ConvNets for estimating human pose in videos and introduces a new network that exploits temporal information from multiple frames, leading to better performance.

Abstract:

Our objective is to efficiently and accurately estimate the upper body pose of humans in gesture videos. To this end, we build on the recent successful applications of deep convolutional neural networks (ConvNets). Our novelties are: (i) our method is the first to our knowledge to use ConvNets for estimating human pose in videos; (ii) a new network that exploits temporal information from multiple frames, leading to better performance; (iii) showing that pre-segmenting the foreground of the video improves performance; and (iv) demonstrating that even without foreground segmentations, the network learns to abstract away from the background and can estimate the pose even in the presence of a complex, varying background.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning for visual understanding

Yanming Guo, +5 more

- 26 Apr 2016 -

Neurocomputing

TL;DR: The state-of-the-art in deep learning algorithms in computer vision is reviewed by highlighting the contributions and challenges from over 210 recent research papers, and the future trends and challenges in designing and training deep neural networks are summarized.

...read moreread less

Book ChapterDOI

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Federica Bogo, +7 more

TL;DR: In this article, the authors estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image by fitting a statistical body shape model to the 2D joints.

...read moreread less

Posted Content

Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Federica Bogo, +7 more

- 27 Jul 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The first method to automatically estimate the 3D pose of the human body as well as its 3D shape from a single unconstrained image is described, showing superior pose accuracy with respect to the state of the art.

...read moreread less

Proceedings ArticleDOI

Flowing ConvNets for Human Pose Estimation in Videos

Tomas Pfister, +2 more

TL;DR: This work proposes a ConvNet architecture that is able to benefit from temporal context by combining information across the multiple frames using optical flow and outperforms a number of others, including one that uses optical flow solely at the input layers, one that regresses joint coordinates directly, and one that predicts heatmaps without spatial fusion.

...read moreread less

Book ChapterDOI

Human Pose Estimation via Convolutional Part Heatmap Regression

Adrian Bulat, +1 more

TL;DR: In this article, a CNN cascaded architecture was proposed for learning part relationships and spatial context, and robustly inferring pose even for the case of severe part occlusions. But the performance of the proposed architecture is limited.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, +3 more

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.

...read moreread less

Posted Content

Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, +3 more

- 11 Nov 2013 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

...read moreread less

Collapse

Deep Convolutional Neural Networks for Efficient Pose Estimation in Gesture Videos

Citations

Deep learning for visual understanding

Keep It SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Keep it SMPL: Automatic Estimation of 3D Human Pose and Shape from a Single Image

Flowing ConvNets for Human Pose Estimation in Videos

Human Pose Estimation via Convolutional Part Heatmap Regression

References

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Rich feature hierarchies for accurate object detection and semantic segmentation

Related Papers (5)

DeepPose: Human Pose Estimation via Deep Neural Networks

ImageNet Classification with Deep Convolutional Neural Networks

2D Human Pose Estimation: New Benchmark and State of the Art Analysis

Stacked Hourglass Networks for Human Pose Estimation

Convolutional Pose Machines