scispace - formally typeset
Open AccessProceedings ArticleDOI

GDR-Net: Geometry-Guided Direct Regression Network for Monocular 6D Object Pose Estimation

Reads0
Chats0
TLDR
GDR-Net as mentioned in this paper proposes a geometry-guided direct regression network to learn the 6D pose in an end-to-end manner from dense correspondence-based intermediate geometric representations.
Abstract
6D pose estimation from a single RGB image is a fundamental task in computer vision. The current top-performing deep learning-based methods rely on an indirect strategy, i.e., first establishing 2D-3D correspondences between the coordinates in the image plane and object coordinate system, and then applying a variant of the PnP/RANSAC algorithm. However, this two-stage pipeline is not end-to-end trainable, thus is hard to be employed for many tasks requiring differentiable poses. On the other hand, methods based on direct regression are currently inferior to geometry-based methods. In this work, we perform an in-depth investigation on both direct and indirect methods, and propose a simple yet effective Geometry-guided Direct Regression Network (GDR-Net) to learn the 6D pose in an end-to-end manner from dense correspondence-based intermediate geometric representations. Extensive experiments show that our approach remarkably outperforms state-of-the-art methods on LM, LM-O and YCB-V datasets. Code is available at https://git.io/GDR-Net.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

OnePose: One-Shot Object Pose Estimation without CAD Models

TL;DR: A new graph attention network is proposed that directly matches 2D interest points in the query image with the 3D Points in the SfM model, resulting in efficient and robust pose estimation and is able to stably detect and track 6D poses of everyday household objects in real-time.
Proceedings ArticleDOI

ZebraPose: Coarse to Fine Surface Encoding for 6DoF Object Pose Estimation

TL;DR: This work presents a discrete descriptor, which can represent the object surface densely by incorporating a hierarchical binary grouping, and proposes a coarse to fine training strategy, which enables fine-grained correspondence prediction of the 6DoF pose.
Proceedings ArticleDOI

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

TL;DR: The EPro-PnP is proposed, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose on the SE(3) manifold, essentially bringing categorical Softmax to the continuous domain.
Posted Content

SO-Pose: Exploiting Self-Occlusion for Direct 6D Pose Estimation

TL;DR: In this paper, a two-layer representation for 3D objects is proposed to enhance the accuracy of end-to-end 6D pose estimation by using self-occlusion.
Proceedings ArticleDOI

GPV-Pose: Category-level Object Pose Estimation via Geometry-guided Point-wise Voting

TL;DR: GPV-Pose is proposed, a novel framework for robust category-level pose estimation, harnessing geometric insights to enhance the learning of category- level pose-sensitive features, which produces superior results to state-of-the-art competitors on common public benchmarks, whilst almost achieving real-time inference speed at 20 FPS.
References
More filters
Proceedings Article

Faster R-CNN: towards real-time object detection with region proposal networks

TL;DR: Ren et al. as discussed by the authors proposed a region proposal network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.
Posted Content

YOLOv3: An Incremental Improvement.

TL;DR: The authors present some updates to YOLO!
Proceedings ArticleDOI

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

TL;DR: This paper designs a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input and provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing.
Posted Content

YOLOv4: Optimal Speed and Accuracy of Object Detection

TL;DR: This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.
Related Papers (5)