Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Open AccessProceedings Article

Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Alejandro Newell, +2 more

- Vol. 30, pp 2278-2288

Chats0

TLDR

In this article, associative embedding is used to supervise convolutional neural networks for the task of detection and grouping, which can be easily integrated into any state-of-the-art network architecture that produces pixel-wise predictions.

Abstract:

We introduce associative embedding, a novel method for supervising convolutional neural networks for the task of detection and grouping. A number of computer vision problems can be framed in this manner including multi-person pose estimation, instance segmentation, and multi-object tracking. Usually the grouping of detections is achieved with multi-stage pipelines, instead we propose an approach that teaches a network to simultaneously output detections and group assignments. This technique can be easily integrated into any state-of-the-art network architecture that produces pixel-wise predictions. We show how to apply this method to multi-person pose estimation and report state-of-the-art performance on the MPII and MS-COCO datasets.

Citations

PDF

Open Access

More filters

Posted Content

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Zhe Cao, +3 more

- 24 Nov 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents an approach to efficiently detect the 2D pose of multiple people in an image using a nonparametric representation, which it refers to as Part Affinity Fields (PAFs), to learn to associate body parts with individuals in the image.

...read moreread less

Proceedings ArticleDOI

Deep High-Resolution Representation Learning for Human Pose Estimation

Ke Sun, +3 more

TL;DR: This paper proposes a network that maintains high-resolution representations through the whole process of human pose estimation and empirically demonstrates the effectiveness of the network through the superior pose estimation results over two benchmark datasets: the COCO keypoint detection dataset and the MPII Human Pose dataset.

...read moreread less

Journal ArticleDOI

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

Zhe Cao, +4 more

- 01 Jan 2021 -

IEEE Transactions on Pattern Analysis an...

TL;DR: OpenPose as mentioned in this paper uses Part Affinity Fields (PAFs) to learn to associate body parts with individuals in the image, which achieves high accuracy and real-time performance.

...read moreread less

Posted Content

Objects as Points

Xingyi Zhou, +2 more

- 16 Apr 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors and performs competitively with sophisticated multi-stage methods and runs in real-time.

...read moreread less

Journal ArticleDOI

Deep Learning for Generic Object Detection: A Survey

Li Liu, +7 more

- 01 Feb 2020 -

International Journal of Computer Vision

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book ChapterDOI

Microsoft COCO: Common Objects in Context

Tsung-Yi Lin, +7 more

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Proceedings ArticleDOI

Normalized cuts and image segmentation

Jianbo Shi, +1 more

TL;DR: This work treats image segmentation as a graph partitioning problem and proposes a novel global criterion, the normalized cut, for segmenting the graph, which measures both the total dissimilarity between the different groups as well as the total similarity within the groups.

...read moreread less

Journal ArticleDOI

A tutorial on spectral clustering

Ulrike von Luxburg

- 01 Dec 2007 -

Statistics and Computing

TL;DR: In this article, the authors present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches, and discuss the advantages and disadvantages of these algorithms.

...read moreread less

Proceedings Article

Distributed Representations of Sentences and Documents

Quoc V. Le, +1 more

TL;DR: Paragraph Vector is an unsupervised algorithm that learns fixed-length feature representations from variable-length pieces of texts, such as sentences, paragraphs, and documents, and its construction gives the algorithm the potential to overcome the weaknesses of bag-of-words models.

...read moreread less

Collapse

Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Citations

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Deep High-Resolution Representation Learning for Human Pose Estimation

OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields

Objects as Points

Deep Learning for Generic Object Detection: A Survey

References

Microsoft COCO: Common Objects in Context

Distributed Representations of Words and Phrases and their Compositionality

Normalized cuts and image segmentation

A tutorial on spectral clustering

Distributed Representations of Sentences and Documents

Related Papers (5)

Microsoft COCO: Common Objects in Context

Deep Residual Learning for Image Recognition

Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields

Stacked Hourglass Networks for Human Pose Estimation

Mask R-CNN