WS-OPE: Weakly Supervised 6-D Object Pose Regression Using Relative Multi-Camera Pose Constraints

doi:10.1109/lra.2022.3146924

Journal ArticleDOI

WS-OPE: Weakly Supervised 6-D Object Pose Regression Using Relative Multi-Camera Pose Constraints

- Vol. 7, Iss: 2, pp 3703-3710

TLDR

In this article , the authors use 2-D bounding boxes and object sizes as the only labels and constrain the problem with multiple images of known relative poses during training, which leads to better learning of 6-D pose embeddings in comparison to fully supervised methods.

Abstract:

Precise annotation of 6-D poses in real data is intricate and time-consuming, however, an essential requirement to train pose estimation pipelines. We propose a way for scalable, end-to-end 6-D pose regression with weak supervision to avoid this problem. Our method requires neither 3-D models nor 6-D object poses as ground truth. Instead, we use 2-D bounding boxes and object sizes as the only labels and constrain the problem with multiple images of known relative poses during training. A novel Rotated-IoU loss brings together a pose prediction from an image with labeled 2-D bounding boxes of the corresponding object in other views. Our rotation estimation combines an initial coarse pose classification with an offset regression using a continuous rotation parametrization that allows for direct pose estimation. At test time, the model still uses only a single image to predict a 6-D pose. We observe that multi-view constraints and our rotation representation used during training lead to better learning of 6-D pose embeddings in comparison to fully supervised methods. Experiments on several datasets show that the proposed method is capable of predicting poses of good quality, in spite being trained with only weak labels. Direct pose regression without the need for a consecutive refinement stage thereby ensures real-time performance.

WS-OPE: Weakly Supervised 6-D Object Pose Regression Using Relative Multi-Camera Pose Constraints

Citations

OSOP: A Multi-Stage One Shot Object Pose Estimation Framework

Ambiguity-Aware Multi-Object Pose Optimization for Visually-Assisted Robot Manipulation

3D hand pose estimation from a single RGB image by weighting the occlusion and classification

Weak6D: Weakly Supervised 6D Pose Estimation With Iterative Annotation Resolver

WeLSA: Learning to Predict 6D Pose from Weakly Labeled Data Using Shape Alignment

References

You Only Look Once: Unified, Real-Time Object Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Fast R-CNN

Mask R-CNN

ORB-SLAM: a Versatile and Accurate Monocular SLAM System

Related Papers (5)

Single image 3D object detection and pose estimation for grasping

Single Image 3D Vehicle Pose Estimation for Augmented Reality

Textured/textureless object recognition and pose estimation using RGB-D image

2D Pose Estimation of Subject Body via Deep Neural Networks

Unaided stereo vision based pose estimation