A Coarse-Fine Network for Keypoint Localization

doi:10.1109/ICCV.2017.329

Proceedings ArticleDOI

A Coarse-Fine Network for Keypoint Localization

Shaoli Huang, +2 more

- pp 3047-3056

Chats0

TLDR

A coarse-fine network that exploits multi-level supervisions for keypoint localization and achieves 72.2% AP on the 2016 COCO Keypoints Challenge dataset, which is an 18% improvement over the winning entry.

Abstract:

We propose a coarse-fine network (CFN) that exploits multi-level supervisions for keypoint localization. Recently, convolutional neural networks (CNNs)-based methods have achieved great success due to the powerful hierarchical features in CNNs. These methods typically use confidence maps generated from ground-truth keypoint locations as supervisory signals. However, while some keypoints can be easily located with high accuracy, many of them are hard to localize due to appearance ambiguity. Thus, using strict supervision often fails to detect keypoints that are difficult to locate accurately To target this problem, we develop a keypoint localization network composed of several coarse detector branches, each of which is built on top of a feature layer in a CNN, and a fine detector branch built on top of multiple feature layers. We supervise each branch by a specified label map to explicate a certain supervision strictness level. All the branches are unified principally to produce the final accurate keypoint locations. We demonstrate the efficacy, efficiency, and generality of our method on several benchmarks for multiple tasks including bird part localization and human body pose estimation. Especially, our method achieves 72.2% AP on the 2016 COCO Keypoints Challenge dataset, which is an 18% improvement over the winning entry.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep High-Resolution Representation Learning for Human Pose Estimation

Ke Sun, +3 more

TL;DR: This paper proposes a network that maintains high-resolution representations through the whole process of human pose estimation and empirically demonstrates the effectiveness of the network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.

...read moreread less

Posted Content

Deep High-Resolution Representation Learning for Visual Recognition

Jingdong Wang, +11 more

- 20 Aug 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, is shown, suggesting that the HRNet is a stronger backbone for computer vision problems.

...read moreread less

Proceedings ArticleDOI

Cascaded Pyramid Network for Multi-person Pose Estimation

Yilun Chen, +5 more

TL;DR: A novel network structure called Cascaded Pyramid Network (CPN) is presented which targets to relieve the problem from these "hard" keypoints, with state-of-art results on the COCO keypoint benchmark, with average precision at 73.0.

...read moreread less

Journal ArticleDOI

Deep High-Resolution Representation Learning for Visual Recognition

Jingdong Wang, +11 more

- 01 Oct 2021 -

IEEE Transactions on Pattern Analysis an...

TL;DR: The High-Resolution Network (HRNet) as mentioned in this paper maintains high-resolution representations through the whole process by connecting the high-to-low resolution convolution streams in parallel and repeatedly exchanging the information across resolutions.

...read moreread less

Book ChapterDOI

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

George Papandreou, +5 more

TL;DR: In this article, a CNN is used to detect individual keypoints and predict their relative displacements, allowing them to group keypoints into person pose instances and then associate semantic person pixels with their corresponding person instance, delivering instance-level person segmentations.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Proceedings ArticleDOI

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Collapse

A Coarse-Fine Network for Keypoint Localization

Citations

Deep High-Resolution Representation Learning for Human Pose Estimation

Deep High-Resolution Representation Learning for Visual Recognition

Cascaded Pyramid Network for Multi-person Pose Estimation

Deep High-Resolution Representation Learning for Visual Recognition

PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model

References

Deep Residual Learning for Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

Histograms of oriented gradients for human detection

Microsoft COCO: Common Objects in Context

Related Papers (5)

Microsoft COCO: Common Objects in Context

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

Stacked Hourglass Networks for Human Pose Estimation

Deep Residual Learning for Image Recognition

Mask R-CNN