Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview
Reads0
Chats0
TLDR
A comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route is presented in this article , where the authors take monocular RGB/RGBD data as input and cover three kinds of major tasks: instance-level, category-level and monocular object pose tracking.Abstract:
Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, survey study about the latest development of deep learning-based methods is lacking. Therefore, this study presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this study is limited to methods taking monocular RGB/RGBD data as input and covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods of both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions. read more
Citations
More filters
Journal ArticleDOI
V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer
TL;DR: In this article , a holistic attention model, namely V2X-ViT, is proposed to fuse information across on-road agents (i.e., vehicles and infrastructure) to improve the perception performance of autonomous vehicles.
Book ChapterDOI
Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image
TL;DR: Zhang et al. as mentioned in this paper proposed to directly predict object-level depth from a monocular RGB image by deforming the category-level shape prior into objectlevel depth and the canonical NOCS representation.
Journal ArticleDOI
i2c-net: Using Instance-Level Neural Networks for Monocular Category-Level 6D Pose Estimation
TL;DR: In this article , an instance-level pose estimation network was proposed to extract the 6D pose of multiple objects belonging to different categories, starting from an instancelevel pose estimator network and relying only on RGB images.
Book ChapterDOI
Projecting Product-Aware Cues as Assembly Intentions for Human-Robot Collaboration
TL;DR: In this paper , the authors propose a generalizable information construct for projecting assembly intentions that is capable of coping with different part geometries and makes use of a digital thread framework for on-demand, run-time computation and retrieval of these bounding boxes from product CAD models.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal ArticleDOI
Distinctive Image Features from Scale-Invariant Keypoints
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI
You Only Look Once: Unified, Real-Time Object Detection
TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Journal ArticleDOI
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.