scispace - formally typeset
Book ChapterDOI

Multi-scale Attention Aided Multi-Resolution Network for Human Pose Estimation

TLDR
This paper proposes a network named Refinement Net for regressing the predicted heatmaps to 2D joint locations to remove ambiguities in predicted position and experiments with three levels of attention schemes - global, heatmap and multi-resolution.
Abstract
In this paper, we propose attention maps at various scales on multi-resolution feature extractor baseline network for human pose estimation. The baseline network captures information across various scales with the help of repeated bottom-up and top-down approach using successive pooling and up-sampling. We propose a network named Refinement Net for regressing the predicted heatmaps to 2D joint locations to remove ambiguities in predicted position. We experiment with three levels of attention schemes - global, heatmap and multi-resolution. Attention masks helps in generating basin of attraction that helps the network on deciding where to “look”. The proposed network performance is at par with the state-of-the-art two dimensional pose estimation methods on MPII dataset.

read more

References
More filters
Posted Content

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

TL;DR: This paper proposed an attention-based model that automatically learns to describe the content of images by focusing on salient objects while generating corresponding words in the output sequence, which achieved state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.
Book ChapterDOI

Stacked Hourglass Networks for Human Pose Estimation

TL;DR: This work introduces a novel convolutional network architecture for the task of human pose estimation that is described as a “stacked hourglass” network based on the successive steps of pooling and upsampling that are done to produce a final set of predictions.
Proceedings ArticleDOI

Convolutional Pose Machines

TL;DR: In this paper, a convolutional network is incorporated into the pose machine framework for learning image features and image-dependent spatial models for the task of pose estimation, which can implicitly model long-range dependencies between variables in structured prediction tasks such as articulated pose estimation.
Posted Content

Stacked Hourglass Networks for Human Pose Estimation

TL;DR: Stacked hourglass networks as mentioned in this paper were proposed for human pose estimation, where features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body, and repeated bottom-up, top-down processing with intermediate supervision is critical to improving the performance of the network.
Proceedings ArticleDOI

Image Captioning with Semantic Attention

TL;DR: Zhang et al. as discussed by the authors proposed a model of semantic attention to selectively attend to semantic concept proposals and fuse them into hidden states and outputs of recurrent neural networks. But their model is not suitable for image caption generation.
Related Papers (5)