scispace - formally typeset
Proceedings ArticleDOI

Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?

TLDR
The importance of recognition granularity is investigated, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions, and it is observed that the instance segmentation cue is by far strongest, in the authors' setting.
Abstract
Existing methods for 3D scene flow estimation often fail in the presence of large displacement or local ambiguities, e.g., at texture-less or reflective surfaces. However, these challenges are omnipresent in dynamic road scenes, which is the focus of this work. Our main contribution is to overcome these 3D motion estimation problems by exploiting recognition. In particular, we investigate the importance of recognition granularity, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions. We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting. We demonstrate the effectiveness of our method on the challenging KITTI 2015 scene flow benchmark where we achieve state-of-the-art performance at the time of submission.

read more

Citations
More filters
Proceedings ArticleDOI

Deep Rigid Instance Scene Flow

TL;DR: In this paper, the authors tackle the problem of scene flow estimation in the context of self-driving by leveraging deep learning techniques as well as strong priors as in their application domain the motion of the robot and the 3D motion of actors in the scene.
Book ChapterDOI

RTM3D: Real-Time Monocular 3D Detection from Object Keypoints for Autonomous Driving

TL;DR: In this paper, the nine perspective keypoints of a 3D bounding box in image space are predicted and the geometric relationship of 3D and 2D perspectives is utilized to recover the dimension, location, and orientation in 3D space.
Proceedings ArticleDOI

ClusterVO: Clustering Moving Instances and Estimating Visual Odometry for Self and Surroundings

TL;DR: ClusterVO as discussed by the authors uses a multi-level probabilistic association mechanism and a heterogeneous CRF clustering approach combining semantic, spatial and motion information to jointly infer cluster segmentations online for every frame.
Proceedings ArticleDOI

FPConv: Learning Local Flattening for Point Convolution

TL;DR: FPConv is introduced, a novel surface-style convolution operator designed for 3D point cloud analysis and can be a complementary of volumetric convolutions and jointly training them can further boost overall performance into state-of-the-art results.
Proceedings ArticleDOI

Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning

TL;DR: How deep learning can be used to replace parts of the classical Visual SLAM pipeline and the opportunities of using Deep Learning to improve upon state-of-the-art classical methods are discussed.
References
More filters
Proceedings ArticleDOI

Fast R-CNN

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.
Posted Content

Fast R-CNN

TL;DR: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection that builds on previous work to efficiently classify object proposals using deep convolutional networks.
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI

Are we ready for autonomous driving? The KITTI vision benchmark suite

TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Related Papers (5)