Single-Stage Keypoint- Based Category-Level Object Pose Estimation from an RGB Image

doi:10.1109/icra46639.2022.9812299

Open AccessProceedings ArticleDOI

Single-Stage Keypoint- Based Category-Level Object Pose Estimation from an RGB Image

Chats0

TLDR

In this paper , a single-stage, keypoint-based approach for category-level object pose estimation is proposed, which performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions.

Abstract:

Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6- DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation that operates on unknown object instances within a known category using a single RGB image as input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6- DoF pose, and regresses relative bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric (27.6% higher than the MobilePose single-stage approach and 7.1 % higher than the related two-stage approach).

Single-Stage Keypoint- Based Category-Level Object Pose Estimation from an RGB Image

Citations

Zero-Shot Category-Level Object Pose Estimation

i2c-net: Using Instance-Level Neural Networks for Monocular Category-Level 6D Pose Estimation

Non-Prehensile Manipulation Actions and Visual 6D Pose Estimation for Fruit Grasping Based on Tactile Sensing

References

Focal Loss for Dense Object Detection

Direct Linear Transformation from Comparator Coordinates into Object Space Coordinates in Close-Range Photogrammetry

Deformable ConvNets V2: More Deformable, Better Results

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

Deep Layer Aggregation

Related Papers (5)

Single image 3D object detection and pose estimation for grasping

Learning Deep Network for Detecting 3D Object Keypoints and 6D Poses

CubeSLAM: Monocular 3-D Object SLAM

Simultaneous object detection and localization using convolutional neural networks

SilhoNet: An RGB Method for 3D Object Pose Estimation and Grasp Planning.