scispace - formally typeset
Open AccessJournal ArticleDOI

Deep Learning on Monocular Object Pose Detection and Tracking: A Comprehensive Overview

- 21 Nov 2022 - 
- Vol. 55, Iss: 4, pp 1-40
Reads0
Chats0
TLDR
A comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route is presented in this article , where the authors take monocular RGB/RGBD data as input and cover three kinds of major tasks: instance-level, category-level and monocular object pose tracking.
Abstract
Object pose detection and tracking has recently attracted increasing attention due to its wide applications in many areas, such as autonomous driving, robotics, and augmented reality. Among methods for object pose detection and tracking, deep learning is the most promising one that has shown better performance than others. However, survey study about the latest development of deep learning-based methods is lacking. Therefore, this study presents a comprehensive review of recent progress in object pose detection and tracking that belongs to the deep learning technical route. To achieve a more thorough introduction, the scope of this study is limited to methods taking monocular RGB/RGBD data as input and covering three kinds of major tasks: instance-level monocular object pose detection, category-level monocular object pose detection, and monocular object pose tracking. In our work, metrics, datasets, and methods of both detection and tracking are presented in detail. Comparative results of current state-of-the-art methods on several publicly available datasets are also presented, together with insightful observations and inspiring future research directions.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer

TL;DR: In this article , a holistic attention model, namely V2X-ViT, is proposed to fuse information across on-road agents (i.e., vehicles and infrastructure) to improve the perception performance of autonomous vehicles.
Book ChapterDOI

Object Level Depth Reconstruction for Category Level 6D Object Pose Estimation from Monocular RGB Image

None Yaou Liu
TL;DR: Zhang et al. as mentioned in this paper proposed to directly predict object-level depth from a monocular RGB image by deforming the category-level shape prior into objectlevel depth and the canonical NOCS representation.
Journal ArticleDOI

i2c-net: Using Instance-Level Neural Networks for Monocular Category-Level 6D Pose Estimation

TL;DR: In this article , an instance-level pose estimation network was proposed to extract the 6D pose of multiple objects belonging to different categories, starting from an instancelevel pose estimator network and relying only on RGB images.
Book ChapterDOI

Projecting Product-Aware Cues as Assembly Intentions for Human-Robot Collaboration

TL;DR: In this paper , the authors propose a generalizable information construct for projecting assembly intentions that is capable of coping with different part geometries and makes use of a digital thread framework for on-demand, run-time computation and retrieval of these bounding boxes from product CAD models.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.