Deep Cuboid Detection: Beyond 2D Bounding Boxes

Open AccessPosted Content

Deep Cuboid Detection: Beyond 2D Bounding Boxes

Debidatta Dwibedi, +3 more

- 30 Nov 2016 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

This work proposes an end-to-end deep learning system to detect cuboids across many semantic categories, and localizes all 3D cuboids (box-like objects) with a 2D bounding box.

Abstract:

We present a Deep Cuboid Detector which takes a consumer-quality RGB image of a cluttered scene and localizes all 3D cuboids (box-like objects). Contrary to classical approaches which fit a 3D model from low-level cues like corners, edges, and vanishing points, we propose an end-to-end deep learning system to detect cuboids across many semantic categories (e.g., ovens, shipping boxes, and furniture). We localize cuboids with a 2D bounding box, and simultaneously localize the cuboid's corners, effectively producing a 3D interpretation of box-like objects. We refine keypoints by pooling convolutional features iteratively, improving the baseline method significantly. Our deep learning cuboid detector is trained in an end-to-end fashion and is suitable for real-time applications in augmented reality (AR) and robotics.

Citations

PDF

Open Access

More filters

Posted Content

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

Jia Zheng, +5 more

- 01 Aug 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper presents a new synthetic dataset, Structured3D, with the aim of providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks, and takes advantage of the availability of professional interior designs to automatically extract 3D structures from them.

...read moreread less

Book ChapterDOI

Structured3D: A Large Photo-Realistic Dataset for Structured 3D Modeling

Jia Zheng, +5 more

TL;DR: Structured3D as mentioned in this paper is a large-scale photo-realistic image dataset with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks, including room layout estimation.

...read moreread less

Patent

Deep learning system for cuboid detection

Tomasz Malisiewicz, +3 more

TL;DR: In this article, a deep cuboid detector can be used for simultaneous cuboid detection and keypoint localization in monocular images, which can include a plurality of convolutional and non-convolutional layers of a trained convolution neural network.

...read moreread less

Patent

Augmented reality display device with deep learning sensors

Andrew Rabinovich, +2 more

TL;DR: In this article, a hydra neural network is used to determine an event of a plurality of events using the different types of sensor data from a head-mounted augmented reality (AR) device.

...read moreread less

Patent

Methods and systems of performing object pose estimation

Mahdi Rad, +2 more

TL;DR: In this article, a trained regressor is trained to predict two-dimensional projections of the 3D bounding box of an object in a plurality of poses, based on a training set of images.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

Proceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, +3 more

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

...read moreread less

Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Collapse

Deep Cuboid Detection: Beyond 2D Bounding Boxes

Citations

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

Structured3D: A Large Photo-Realistic Dataset for Structured 3D Modeling

Deep learning system for cuboid detection

Augmented reality display device with deep learning sensors

Methods and systems of performing object pose estimation

References

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

You Only Look Once: Unified, Real-Time Object Detection

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Related Papers (5)

Blocks world revisited: image understanding using qualitative geometry and mechanics

Instance-Aware Semantic Segmentation via Multi-task Network Cascades

Geometric reasoning for single image structure recovery

Fully convolutional networks for semantic segmentation

Articulated Human Detection with Flexible Mixtures of Parts