Cascaded Pyramid Network for Multi-person Pose Estimation
Yilun Chen,Zhicheng Wang,Yuxiang Peng,Zhiqiang Zhang,Gang Yu,Jian Sun +5 more
- pp 7103-7112
Reads0
Chats0
TLDR
A novel network structure called Cascaded Pyramid Network (CPN) is presented which targets to relieve the problem from these "hard" keypoints, with state-of-art results on the COCO keypoint benchmark, with average precision at 73.0.Abstract:
The topic of multi-person pose estimation has been largely improved recently, especially with the development of convolutional neural network. However, there still exist a lot of challenging cases, such as occluded keypoints, invisible keypoints and complex background, which cannot be well addressed. In this paper, we present a novel network structure called Cascaded Pyramid Network (CPN) which targets to relieve the problem from these "hard" keypoints. More specifically, our algorithm includes two stages: GlobalNet and RefineNet. GlobalNet is a feature pyramid network which can successfully localize the "simple" keypoints like eyes and hands but may fail to precisely recognize the occluded or invisible keypoints. Our RefineNet tries explicitly handling the "hard" keypoints by integrating all levels of feature representations from the GlobalNet together with an online hard keypoint mining loss. In general, to address the multi-person pose estimation problem, a top-down pipeline is adopted to first generate a set of human bounding boxes based on a detector, followed by our CPN for keypoint localization in each human bounding box. Based on the proposed algorithm, we achieve state-of-art results on the COCO keypoint benchmark, with average precision at 73.0 on the COCO test-dev dataset and 72.1 on the COCO test-challenge dataset, which is a 19% relative improvement compared with 60.5 from the COCO 2016 keypoint challenge. Code1 and the detection results for person used will be publicly available for further research.read more
Citations
More filters
Posted Content
Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
TL;DR: This work presents an approach to efficiently detect the 2D pose of multiple people in an image using a nonparametric representation, which it refers to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image.
Proceedings ArticleDOI
Deep High-Resolution Representation Learning for Human Pose Estimation
TL;DR: This paper proposes a network that maintains high-resolution representations through the whole process of human pose estimation and empirically demonstrates the effectiveness of the network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.
Journal ArticleDOI
OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
TL;DR: OpenPose as mentioned in this paper uses Part Affinity Fields (PAFs) to learn to associate body parts with individuals in the image, which achieves high accuracy and real-time performance.
Journal ArticleDOI
SECOND: Sparsely Embedded Convolutional Detection
Yan Yan,Yuxing Mao,Bo Li +2 more
TL;DR: An improved sparse convolution method for Voxel-based 3D convolutional networks is investigated, which significantly increases the speed of both training and inference and introduces a new form of angle loss regression to improve the orientation estimation performance.
Book ChapterDOI
Simple Baselines for Human Pose Estimation and Tracking
Bin Xiao,Haiping Wu,Yichen Wei +2 more
TL;DR: In this article, the authors provide simple and effective baseline methods for pose estimation, which are helpful for inspiring and evaluating new ideas for the field and achieve state-of-the-art results on challenging benchmarks.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI
Gradient-based learning applied to document recognition
Yann LeCun,Léon Bottou,Léon Bottou,Yoshua Bengio,Yoshua Bengio,Yoshua Bengio,Patrick Haffner +6 more
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal ArticleDOI
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky,Jia Deng,Hao Su,Jonathan Krause,Sanjeev Satheesh,Sean Ma,Zhiheng Huang,Andrej Karpathy,Aditya Khosla,Michael S. Bernstein,Alexander C. Berg,Li Fei-Fei +11 more
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Book ChapterDOI
Microsoft COCO: Common Objects in Context
Tsung-Yi Lin,Michael Maire,Serge Belongie,James Hays,Pietro Perona,Deva Ramanan,Piotr Dollár,C. Lawrence Zitnick +7 more
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Posted Content
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.