scispace - formally typeset
Open AccessPosted Content

Deep Cuboid Detection: Beyond 2D Bounding Boxes

Reads0
Chats0
TLDR
This work proposes an end-to-end deep learning system to detect cuboids across many semantic categories, and localizes all 3D cuboids (box-like objects) with a 2D bounding box.
Abstract
We present a Deep Cuboid Detector which takes a consumer-quality RGB image of a cluttered scene and localizes all 3D cuboids (box-like objects). Contrary to classical approaches which fit a 3D model from low-level cues like corners, edges, and vanishing points, we propose an end-to-end deep learning system to detect cuboids across many semantic categories (e.g., ovens, shipping boxes, and furniture). We localize cuboids with a 2D bounding box, and simultaneously localize the cuboid's corners, effectively producing a 3D interpretation of box-like objects. We refine keypoints by pooling convolutional features iteratively, improving the baseline method significantly. Our deep learning cuboid detector is trained in an end-to-end fashion and is suitable for real-time applications in augmented reality (AR) and robotics.

read more

Citations
More filters
Posted Content

Structured3D: A Large Photo-realistic Dataset for Structured 3D Modeling

TL;DR: This paper presents a new synthetic dataset, Structured3D, with the aim of providing large-scale photo-realistic images with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks, and takes advantage of the availability of professional interior designs to automatically extract 3D structures from them.
Book ChapterDOI

Structured3D: A Large Photo-Realistic Dataset for Structured 3D Modeling

TL;DR: Structured3D as mentioned in this paper is a large-scale photo-realistic image dataset with rich 3D structure annotations for a wide spectrum of structured 3D modeling tasks, including room layout estimation.
Patent

Deep learning system for cuboid detection

TL;DR: In this article, a deep cuboid detector can be used for simultaneous cuboid detection and keypoint localization in monocular images, which can include a plurality of convolutional and non-convolutional layers of a trained convolution neural network.
Patent

Augmented reality display device with deep learning sensors

TL;DR: In this article, a hydra neural network is used to determine an event of a plurality of events using the different types of sensor data from a head-mounted augmented reality (AR) device.
Patent

Methods and systems of performing object pose estimation

TL;DR: In this article, a trained regressor is trained to predict two-dimensional projections of the 3D bounding box of an object in a plurality of poses, based on a training set of images.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.