Multi-scale Attention Aided Multi-Resolution Network for Human Pose Estimation

doi:10.1007/978-3-030-34869-4_50

Book ChapterDOI

Multi-scale Attention Aided Multi-Resolution Network for Human Pose Estimation

- pp 461-472

TLDR

This paper proposes a network named Refinement Net for regressing the predicted heatmaps to 2D joint locations to remove ambiguities in predicted position and experiments with three levels of attention schemes - global, heatmap and multi-resolution.

Abstract:

In this paper, we propose attention maps at various scales on multi-resolution feature extractor baseline network for human pose estimation. The baseline network captures information across various scales with the help of repeated bottom-up and top-down approach using successive pooling and up-sampling. We propose a network named Refinement Net for regressing the predicted heatmaps to 2D joint locations to remove ambiguities in predicted position. We experiment with three levels of attention schemes - global, heatmap and multi-resolution. Attention masks helps in generating basin of attraction that helps the network on deciding where to “look”. The proposed network performance is at par with the state-of-the-art two dimensional pose estimation methods on MPII dataset.

References

PDF

Open Access

More filters

Posted Content

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Kelvin Xu, +7 more

- 10 Feb 2015 -

arXiv: Learning

TL;DR: This paper proposed an attention-based model that automatically learns to describe the content of images by focusing on salient objects while generating corresponding words in the output sequence, which achieved state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

...read moreread less

Book ChapterDOI

Stacked Hourglass Networks for Human Pose Estimation

Alejandro Newell, +2 more

TL;DR: This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.

...read moreread less

Proceedings ArticleDOI

Convolutional Pose Machines

Shih-En Wei, +3 more

TL;DR: In this paper, a convolutional network is incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation, which can implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation.

...read moreread less

Posted Content

Stacked Hourglass Networks for Human Pose Estimation

Alejandro Newell, +2 more

- 22 Mar 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Stacked hourglass networks as mentioned in this paper were proposed for human pose estimation, where features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body, and repeated bottom-up, top-down processing with intermediate supervision is critical to improving the performance of the network.

...read moreread less

Proceedings ArticleDOI

Image Captioning with Semantic Attention

Quanzeng You, +4 more

TL;DR: Zhang et al. as discussed by the authors proposed a model of semantic attention to selectively attend to semantic concept proposals and fuse them into hidden states and outputs of recurrent neural networks. But their model is not suitable for image caption generation.

...read moreread less