Cascaded Pyramid Network for Multi-person Pose Estimation

doi:10.1109/CVPR.2018.00742

Open AccessProceedings ArticleDOI

Cascaded Pyramid Network for Multi-person Pose Estimation

Yilun Chen, +5 more

- pp 7103-7112

Chats0

TLDR

A novel network structure called Cascaded Pyramid Network (CPN) is presented which targets to relieve the problem from these "hard" keypoints, with state-of-art results on the COCO keypoint benchmark, with average precision at 73.0.

Abstract:

The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge. Code1 and the detection results for person used will be publicly available for further research.

Citations

PDF

Open Access

More filters

Posted Content

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Zhe Cao, +3 more

- 24 Nov 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents an approach to efficiently detect the 2D pose of multiple people in an image using a nonparametric representation, which it refers to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image.

...read moreread less

Proceedings ArticleDOI

Deep High-Resolution Representation Learning for Human Pose Estimation

Ke Sun, +3 more

TL;DR: This paper proposes a network that maintains high-resolution representations through the whole process of human pose estimation and empirically demonstrates the effectiveness of the network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.

...read moreread less

Journal ArticleDOI

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

Zhe Cao, +4 more

- 01 Jan 2021 -

IEEE Transactions on Pattern Analysis an...

TL;DR: OpenPose as mentioned in this paper uses Part Affinity Fields (PAFs) to learn to associate body parts with individuals in the image, which achieves high accuracy and real-time performance.

...read moreread less

Journal ArticleDOI

SECOND: Sparsely Embedded Convolutional Detection

Yan Yan, +2 more

- 06 Oct 2018 -

Sensors

TL;DR: An improved sparse convolution method for Voxel-based 3D convolutional networks is investigated, which significantly increases the speed of both training and inference and introduces a new form of angle loss regression to improve the orientation estimation performance.

...read moreread less

Book ChapterDOI

Simple Baselines for Human Pose Estimation and Tracking

Bin Xiao, +2 more

TL;DR: In this article, the authors provide simple and effective baseline methods for pose estimation, which are helpful for inspiring and evaluating new ideas for the field and achieve state-of-the-art results on challenging benchmarks.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Journal ArticleDOI

Gradient-based learning applied to document recognition

Yann LeCun, +6 more

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Collapse

Cascaded Pyramid Network for Multi-person Pose Estimation

Citations

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Deep High-Resolution Representation Learning for Human Pose Estimation

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

SECOND: Sparsely Embedded Convolutional Detection

Simple Baselines for Human Pose Estimation and Tracking

References

Deep Residual Learning for Image Recognition

Gradient-based learning applied to document recognition

ImageNet Large Scale Visual Recognition Challenge

Microsoft COCO: Common Objects in Context

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Related Papers (5)

Stacked Hourglass Networks for Human Pose Estimation

Deep Residual Learning for Image Recognition

Microsoft COCO: Common Objects in Context

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

Deep High-Resolution Representation Learning for Human Pose Estimation