scispace - formally typeset
Open AccessProceedings ArticleDOI

Single-Stage Keypoint- Based Category-Level Object Pose Estimation from an RGB Image

Reads0
Chats0
TLDR
In this paper , a single-stage, keypoint-based approach for category-level object pose estimation is proposed, which performs 2D object detection, detects 2D keypoints, estimates 6-DoF pose, and regresses relative bounding cuboid dimensions.
Abstract
Prior work on 6-DoF object pose estimation has largely focused on instance-level processing, in which a textured CAD model is available for each object being detected. Category-level 6- DoF pose estimation represents an important step toward developing robotic vision systems that operate in unstructured, real-world scenarios. In this work, we propose a single-stage, keypoint-based approach for category-level object pose estimation that operates on unknown object instances within a known category using a single RGB image as input. The proposed network performs 2D object detection, detects 2D keypoints, estimates 6- DoF pose, and regresses relative bounding cuboid dimensions. These quantities are estimated in a sequential fashion, leveraging the recent idea of convGRU for propagating information from easier tasks to those that are more difficult. We favor simplicity in our design choices: generic cuboid vertex coordinates, single-stage network, and monocular RGB input. We conduct extensive experiments on the challenging Objectron benchmark, outperforming state-of-the-art methods on the 3D IoU metric (27.6% higher than the MobilePose single-stage approach and 7.1 % higher than the related two-stage approach).

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

Zero-Shot Category-Level Object Pose Estimation

TL;DR: Zhang et al. as mentioned in this paper propose a zero-shot, category-level pose estimation method based on semantic correspondences from a self-supervised vision transformer to solve the pose estimation problem.
Journal ArticleDOI

i2c-net: Using Instance-Level Neural Networks for Monocular Category-Level 6D Pose Estimation

TL;DR: In this article , an instance-level pose estimation network was proposed to extract the 6D pose of multiple objects belonging to different categories, starting from an instancelevel pose estimator network and relying only on RGB images.
Journal ArticleDOI

Non-Prehensile Manipulation Actions and Visual 6D Pose Estimation for Fruit Grasping Based on Tactile Sensing

Marco Costanzo, +1 more
- 25 Jun 2023 - 
TL;DR: In this article , a grasp controller based on tactile sensing is proposed to push items hindering the grasp of a detected fruit, and the pushing from an initial location to a target one is performed by a model predictive controller taking into account the unavoidable delay in the perception and computing pipeline of the robotic system.
References
More filters
Proceedings ArticleDOI

Focal Loss for Dense Object Detection

TL;DR: This paper proposes to address the extreme foreground-background class imbalance encountered during training of dense detectors by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples, and develops a novel Focal Loss, which focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training.
Journal ArticleDOI

Direct Linear Transformation from Comparator Coordinates into Object Space Coordinates in Close-Range Photogrammetry

TL;DR: In this paper, a method for photogrammetric data reduction without the necessity for neither fiducial marks nor initial approximations for inner and outer orientation parameters of the camera has been developed.
Proceedings ArticleDOI

Deformable ConvNets V2: More Deformable, Better Results

TL;DR: This work presents a reformulation of Deformable Convolutional Networks that improves its ability to focus on pertinent image regions, through increased modeling power and stronger training, and guides network training via a proposed feature mimicking scheme that helps the network to learn features that reflect the object focus and classification power of R-CNN features.
Proceedings ArticleDOI

PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes

TL;DR: PoseCNN as discussed by the authors estimates the 3D translation of an object by localizing its center in the image and predicting its distance from the camera, and regresses to a quaternion representation.
Proceedings ArticleDOI

Deep Layer Aggregation

TL;DR: Deep layer aggregation as mentioned in this paper iteratively and hierarchically merge the feature hierarchy to make networks with better accuracy and fewer parameters, and experiments across architectures and tasks show that deep layer aggregation improves recognition and resolution compared to existing branching and merging schemes.
Related Papers (5)