3D Bounding Box Estimation Using Deep Learning and Geometry

Open AccessPosted Content

3D Bounding Box Estimation Using Deep Learning and Geometry

Arsalan Mousavian, +3 more

- 01 Dec 2016 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

Although conceptually simple, this method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset.

Abstract:

We present a method for 3D object detection and pose estimation from a single image. In contrast to current techniques that only regress the 3D orientation of an object, our method first regresses relatively stable 3D object properties using a deep convolutional neural network and then combines these estimates with geometric constraints provided by a 2D object bounding box to produce a complete 3D bounding box. The first network output estimates the 3D object orientation using a novel hybrid discrete-continuous loss, which significantly outperforms the L2 loss. The second output regresses the 3D object dimensions, which have relatively little variance compared to alternatives and can often be predicted for many object types. These estimates, combined with the geometric constraints on translation imposed by the 2D bounding box, enable us to recover a stable and accurate 3D object pose. We evaluate our method on the challenging KITTI object detection benchmark both on the official metric of 3D orientation estimation and also on the accuracy of the obtained 3D bounding boxes. Although conceptually simple, our method outperforms more complex and computationally expensive approaches that leverage semantic segmentation, instance level segmentation and flat ground priors and sub-category detection. Our discrete-continuous loss also produces state of the art results for 3D viewpoint estimation on the Pascal 3D+ dataset.

Citations

PDF

Open Access

More filters

Posted Content

nuScenes: A multimodal dataset for autonomous driving

Holger Caesar, +9 more

- 26 Mar 2019 -

arXiv: Learning

TL;DR: nuScenes as mentioned in this paper is the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field of view.

...read moreread less

Posted Content

Objects as Points

Xingyi Zhou, +2 more

- 16 Apr 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: The center point based approach, CenterNet, is end-to-end differentiable, simpler, faster, and more accurate than corresponding bounding box based detectors and performs competitively with sophisticated multi-stage methods and runs in real-time.

...read moreread less

Journal ArticleDOI

SECOND: Sparsely Embedded Convolutional Detection

Yan Yan, +2 more

- 06 Oct 2018 -

Sensors

TL;DR: An improved sparse convolution method for Voxel-based 3D convolutional networks is investigated, which significantly increases the speed of both training and inference and introduces a new form of angle loss regression to improve the orientation estimation performance.

...read moreread less

Posted Content

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

Shaoshuai Shi, +2 more

- 11 Dec 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Extensive experiments on the 3D detection benchmark of KITTI dataset show that the proposed architecture outperforms state-of-the-art methods with remarkable margins by using only point cloud as input.

...read moreread less

Proceedings ArticleDOI

SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again

Wadim Kehl, +4 more

TL;DR: In this paper, a novel method for detecting 3D model instances and estimating their 6D pose from RGB data in a single shot is presented, which outperforms state-of-the-art methods that leverage RGBD data on multiple challenging datasets.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, +3 more

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

Posted Content

Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick, +3 more

- 11 Nov 2013 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

...read moreread less

Book ChapterDOI

SSD: Single Shot MultiBox Detector

Wei Liu, +6 more

- 08 Dec 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: SSD as mentioned in this paper discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, and combines predictions from multiple feature maps with different resolutions to naturally handle objects of various sizes.

...read moreread less

Collapse

Related Papers (5)

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

3D Bounding Box Estimation Using Deep Learning and Geometry

Citations

nuScenes: A multimodal dataset for autonomous driving

Objects as Points

SECOND: Sparsely Embedded Convolutional Detection

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud

SSD-6D: Making RGB-Based 3D Detection and 6D Pose Estimation Great Again

References

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

You Only Look Once: Unified, Real-Time Object Detection

Rich feature hierarchies for accurate object detection and semantic segmentation

SSD: Single Shot MultiBox Detector

Related Papers (5)

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Are we ready for autonomous driving? The KITTI vision benchmark suite

Frustum PointNets for 3D Object Detection from RGB-D Data

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation