scispace - formally typeset
Open AccessProceedings ArticleDOI

Bottom-Up Human Pose Estimation Via Disentangled Keypoint Regression

Reads0
Chats0
TLDR
In this paper, the authors proposed a disentangled keypoint regression (DEKR) method, which adopts adaptive convolutions through pixel-wise spatial transformer to activate the pixels in the keypoint regions and accordingly learn representations from them.
Abstract
In this paper, we are interested in the bottom-up paradigm of estimating human poses from an image. We study the dense keypoint regression framework that is previously inferior to the keypoint detection and grouping framework. Our motivation is that regressing keypoint positions accurately needs to learn representations that focus on the keypoint regions.We present a simple yet effective approach, named disentangled keypoint regression (DEKR). We adopt adaptive convolutions through pixel-wise spatial transformer to activate the pixels in the keypoint regions and accordingly learn representations from them. We use a multi-branch structure for separate regression: each branch learns a representation with dedicated adaptive convolutions and regresses one keypoint. The resulting disentangled representations are able to attend to the keypoint regions, respectively, and thus the keypoint regression is spatially more accurate. We empirically show that the proposed direct regression method outperforms keypoint detection and grouping methods and achieves superior bottom-up pose estimation results on two benchmark datasets, COCO and CrowdPose. The code and models are available at https://github.com/HRNet/DEKR.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Revealing the Dark Secrets of Masked Image Modeling

TL;DR: This paper compares MIM with the long-dominant supervised pre-trained models from two perspectives, the visualizations and the experiments, to uncover their key representational differences and finds that MIM models can perform significantly better on geometric and motion tasks with weak semantics or fine-grained classi-cation tasks, than their supervised counterparts.
Proceedings ArticleDOI

End-to-End Multi-Person Pose Estimation with Transformers

TL;DR: The proposed PETR method views pose estimation as a hierarchical set prediction problem and effectively removes the need for many hand-crafted modules like RoI cropping, NMS and grouping post-processing, and largely overcomes the feature misalignment difficulty in pose estimation and improves the performance considerably.
Proceedings ArticleDOI

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

TL;DR: LitePose is designed, an efficient single-branch architecture for pose estimation, and two simple approaches to enhance the capacity of LitePose are introduced, including fusion deconv head and large kernel conv.
Proceedings ArticleDOI

Fast and Flexible Human Pose Estimation with HyperPose

TL;DR: Hyperpose as mentioned in this paper provides expressive Python APIs that enable developers to easily customise pose estimation algorithms for their applications and further provides a model inference engine highly optimized for real-time pose estimation.
Proceedings ArticleDOI

NIMBLE: A Non-rigid Hand Model with Bones and Muscles

TL;DR: A novel parametric hand model that includes the missing key components, bringing 3D hand model to a new level of realism by enforcing the inner bones and muscles to match anatomic and kinematic rules, NIMBLE can animate 3D hands to new poses at unprecedented realism.
References
More filters
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI

Mask R-CNN

TL;DR: This work presents a conceptually simple, flexible, and general framework for object instance segmentation, which extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.
Journal ArticleDOI

Representation Learning: A Review and New Perspectives

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.
Proceedings Article

Spatial transformer networks

TL;DR: This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.
Related Papers (5)