OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Citations

PDF

Open Access

More filters

Posted Content•

Objects as Points

[...]

Xingyi Zhou¹, Dequan Wang², Philipp Krähenbühl¹•Institutions (2)

University of Texas at Austin¹, University of California, Berkeley²

16 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: The center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors and performs competitively with sophisticated multi-stage methods and runs in real-time.

...read moreread less

Abstract: Detection identifies objects as axis-aligned boxes in an image. Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each. This is wasteful, inefficient, and requires additional post-processing. In this paper, we take a different approach. We model an object as a single point --- the center point of its bounding box. Our detector uses keypoint estimation to find center points and regresses to all other object properties, such as size, 3D location, orientation, and even pose. Our center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors. CenterNet achieves the best speed-accuracy trade-off on the MS COCO dataset, with 28.1% AP at 142 FPS, 37.4% AP at 52 FPS, and 45.1% AP with multi-scale testing at 1.4 FPS. We use the same approach to estimate 3D bounding box in the KITTI benchmark and human pose on the COCO keypoint dataset. Our method performs competitively with sophisticated multi-stage methods and runs in real-time.

...read moreread less

1,899 citations

Cites background from "OpenPose: Realtime Multi-Person 2D ..."

...Object detection is then a standard keypoint estimation problem [3,39,60]....
[...]
...Object detection powers many vision tasks like instance segmentation [7, 21, 32], pose estimation [3, 15, 39], tracking [24, 27], and action recognition [5]....
[...]

Journal Article•DOI•

Cellpose: a generalist algorithm for cellular segmentation

[...]

Carsen Stringer, Timothy C. Wang, Michalis Michaelos, Marius Pachitariu

01 Jan 2021-Nature Methods

TL;DR: This work introduces a generalist, deep learning-based segmentation method called Cellpose, which can precisely segment cells from a wide range of image types and does not require model retraining or parameter adjustments.

...read moreread less

Abstract: Many biological applications require the segmentation of cell bodies, membranes and nuclei from microscopy images. Deep learning has enabled great progress on this problem, but current methods are specialized for images that have large training datasets. Here we introduce a generalist, deep learning-based segmentation method called Cellpose, which can precisely segment cells from a wide range of image types and does not require model retraining or parameter adjustments. Cellpose was trained on a new dataset of highly varied images of cells, containing over 70,000 segmented objects. We also demonstrate a three-dimensional (3D) extension of Cellpose that reuses the two-dimensional (2D) model and does not require 3D-labeled data. To support community contributions to the training data, we developed software for manual labeling and for curation of the automated results. Periodically retraining the model on the community-contributed data will ensure that Cellpose improves constantly.

...read moreread less

947 citations

Proceedings Article•DOI•

VIBE: Video Inference for Human Body Pose and Shape Estimation

[...]

Muhammed Kocabas¹, Nikos Athanasiou¹, Michael J. Black¹•Institutions (1)

Max Planck Society¹

14 Jun 2020

TL;DR: This work defines a novel temporal network architecture with a self-attention mechanism and shows that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels.

...read moreread less

Abstract: Human motion is fundamental to understanding behavior. Despite progress on single-image 3D pose and shape estimation, existing video-based state-of-the-art methods fail to produce accurate and natural motion sequences due to a lack of ground-truth 3D motion data for training. To address this problem, we propose "Video Inference for Body Pose and Shape Estimation'' (VIBE), which makes use of an existing large-scale motion capture dataset (AMASS) together with unpaired, in-the-wild, 2D keypoint annotations. Our key novelty is an adversarial learning framework that leverages AMASS to discriminate between real human motions and those produced by our temporal pose and shape regression networks. We define a novel temporal network architecture with a self-attention mechanism and show that adversarial training, at the sequence level, produces kinematically plausible motion sequences without in-the-wild ground-truth 3D labels. We perform extensive experimentation to analyze the importance of motion and demonstrate the effectiveness of VIBE on challenging 3D pose estimation datasets, achieving state-of-the-art performance. Code and pretrained models are available at https://github.com/mkocabas/VIBE

...read moreread less

687 citations

Cites methods from "OpenPose: Realtime Multi-Person 2D ..."

...PennAction [69] and PoseTrack [3] are the only ground-truth 2D video datasets we use, while InstaVariety [30] and Kinetics-400 [31] are pseudo ground-truth datasets annotated using a 2D keypoint detector [12, 35]....
[...]

Proceedings Article•DOI•

Soft Rasterizer: A Differentiable Renderer for Image-Based 3D Reasoning

[...]

Shichen Liu¹, Weikai Chen², Tianye Li¹, Hao Li¹•Institutions (2)

University of Southern California¹, Institute for Creative Technologies²

03 Apr 2019

TL;DR: This work proposes a truly differentiable rendering framework that is able to directly render colorized mesh using differentiable functions and back-propagate efficient supervision signals to mesh vertices and their attributes from various forms of image representations, including silhouette, shading and color images.

...read moreread less

Abstract: Rendering bridges the gap between 2D vision and 3D scenes by simulating the physical process of image formation. By inverting such renderer, one can think of a learning approach to infer 3D information from 2D images. However, standard graphics renderers involve a fundamental discretization step called rasterization, which prevents the rendering process to be differentiable, hence able to be learned. Unlike the state-of-the-art differentiable renderers, which only approximate the rendering gradient in the back propagation, we propose a truly differentiable rendering framework that is able to (1) directly render colorized mesh using differentiable functions and (2) back-propagate efficient supervision signals to mesh vertices and their attributes from various forms of image representations, including silhouette, shading and color images. The key to our framework is a novel formulation that views rendering as an aggregation function that fuses the probabilistic contributions of all mesh triangles with respect to the rendered pixels. Such formulation enables our framework to flow gradients to the occluded and far-range vertices, which cannot be achieved by the previous state-of-the-arts. We show that by using the proposed renderer, one can achieve significant improvement in 3D unsupervised single-view reconstruction both qualitatively and quantitatively. Experiments also demonstrate that our approach is able to handle the challenging tasks in image-based shape fitting, which remain nontrivial to existing differentiable renderers. Code is available at https://github.com/ShichenLiu/SoftRas.

...read moreread less

566 citations

Cites background or methods from "OpenPose: Realtime Multi-Person 2D ..."

...To obtain 3D pose, shape priors [1, 29] have been incorporated to minimize the shape fitting errors in recent approaches [3, 4, 18, 2]....
[...]
...By detecting the 2D key points, great progress has been made to estimate the 2D poses [32, 4, 47]....
[...]

Proceedings Article•DOI•

Pose-Guided Feature Alignment for Occluded Person Re-Identification

[...]

Jiaxu Miao¹, Yu Wu¹, Ping Liu¹, Yuhang Ding¹, Yi Yang¹ - Show less +1 more•Institutions (1)

University of Technology, Sydney¹

01 Oct 2019

TL;DR: This paper introduces a novel method named Pose-Guided Feature Alignment (PGFA), exploiting pose landmarks to disentangle the useful information from the occluded noise, and largely outperforms existing person re-id methods on three occlusion datasets, while remains top performance on two holistic datasets.

...read moreread less

Abstract: Persons are often occluded by various obstacles in person retrieval scenarios. Previous person re-identification (re-id) methods, either overlook this issue or resolve it based on an extreme assumption. To alleviate the occlusion problem, we propose to detect the occluded regions, and explicitly exclude those regions during feature generation and matching. In this paper, we introduce a novel method named Pose-Guided Feature Alignment (PGFA), exploiting pose landmarks to disentangle the useful information from the occlusion noise. During the feature constructing stage, our method utilizes human landmarks to generate attention maps. The generated attention maps indicate if a specific body part is occluded and guide our model to attend to the non-occluded regions. During matching, we explicitly partition the global feature into parts and use the pose landmarks to indicate which partial features belonging to the target person. Only the visible regions are utilized for the retrieval. Besides, we construct a large-scale dataset for the Occluded Person Re-ID problem, namely Occluded-DukeMTMC, which is by far the largest dataset for the Occlusion Person Re-ID. Extensive experiments are conducted on our constructed occluded re-id dataset, two partial re-id datasets, and two commonly used holistic re-id datasets. Our method largely outperforms existing person re-id methods on three occlusion datasets, while remains top performance on two holistic datasets.

...read moreread less

352 citations

Collapse

OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

Citations

Cites background from "OpenPose: Realtime Multi-Person 2D ..."

Cites methods from "OpenPose: Realtime Multi-Person 2D ..."

Cites background or methods from "OpenPose: Realtime Multi-Person 2D ..."

References

"OpenPose: Realtime Multi-Person 2D ..." refers background in this paper

"OpenPose: Realtime Multi-Person 2D ..." refers methods in this paper

Related Papers (5)

Trending Questions (1)