Proceedings ArticleDOI
Sparsity Invariant CNNs
Reads0
Chats0
TLDR
This paper proposes a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation, and demonstrates the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches.Abstract:
In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings and will be made available upon publication.read more
Citations
More filters
Book ChapterDOI
Image Inpainting for Irregular Holes Using Partial Convolutions
TL;DR: This work proposes the use of partial convolutions, where the convolution is masked and renormalized to be conditioned on only valid pixels, and outperforms other methods for irregular masks.
Proceedings ArticleDOI
Digging Into Self-Supervised Monocular Depth Estimation
TL;DR: In this paper, the authors propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods, and demonstrate the effectiveness of each component in isolation, and show high quality, state-of-theart results on the KITTI benchmark.
Posted Content
Digging Into Self-Supervised Monocular Depth Estimation
TL;DR: It is shown that a surprisingly simple model, and associated design choices, lead to superior predictions, and together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.
Proceedings ArticleDOI
Multi-Task Multi-Sensor Fusion for 3D Object Detection
TL;DR: An end-to-end learnable architecture that reasons about 2D and 3D object detection as well as ground estimation and depth completion is presented that leads the KITTI benchmark on 2D, 3D and bird's eye view object detection, while being real-time.
Proceedings ArticleDOI
PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation
TL;DR: PointFusion as mentioned in this paper is a generic 3D object detection method that leverages both image and 3D point cloud information, which predicts multiple 3D box hypotheses and their confidences using the input 3D points as spatial anchors.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings ArticleDOI
Going deeper with convolutions
Christian Szegedy,Wei Liu,Yangqing Jia,Pierre Sermanet,Scott Reed,Dragomir Anguelov,Dumitru Erhan,Vincent Vanhoucke,Andrew Rabinovich +8 more
TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Proceedings ArticleDOI
Fully convolutional networks for semantic segmentation
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI
Are we ready for autonomous driving? The KITTI vision benchmark suite
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.