scispace - formally typeset
Proceedings ArticleDOI

Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?

TLDR
The importance of recognition granularity is investigated, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions, and it is observed that the instance segmentation cue is by far strongest, in the authors' setting.
Abstract
Existing methods for 3D scene flow estimation often fail in the presence of large displacement or local ambiguities, e.g., at texture-less or reflective surfaces. However, these challenges are omnipresent in dynamic road scenes, which is the focus of this work. Our main contribution is to overcome these 3D motion estimation problems by exploiting recognition. In particular, we investigate the importance of recognition granularity, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions. We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting. We demonstrate the effectiveness of our method on the challenging KITTI 2015 scene flow benchmark where we achieve state-of-the-art performance at the time of submission.

read more

Citations
More filters
Proceedings ArticleDOI

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

TL;DR: PWC-Net as discussed by the authors uses the current optical flow estimate to warp the CNN features of the second image, which is processed by a CNN to estimate the optical flow, and achieves state-of-the-art performance on the MPI Sintel final pass and KITTI 2015 benchmarks.
Proceedings ArticleDOI

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

Zhichao Yin, +1 more
TL;DR: GeoNet as mentioned in this paper proposes an adaptive geometric consistency loss to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively.
Posted Content

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

TL;DR: An adaptive geometric consistency loss is proposed to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively and achieves state-of-the-art results in all of the three tasks, performing better than previously unsupervised methods and comparably with supervised ones.
Proceedings ArticleDOI

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

TL;DR: M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.
Book ChapterDOI

SegStereo: Exploiting Semantic Information for Disparity Estimation

TL;DR: This paper suggests that appropriate incorporation of semantic cues can greatly rectify prediction in commonly-used disparity estimation frameworks and proposes a unified model SegStereo, which employs semantic features from segmentation and introduces semantic softmax loss, which helps improve the prediction accuracy of disparity maps.
References
More filters
Book ChapterDOI

Joint estimation of motion, structure and geometry from stereo sequences

TL;DR: A joint energy functional is proposed that integrates spatial and temporal information from two subsequent image pairs subject to an unknown stereo setup and a normalisation of image and stereo constraints such that deviations from model assumptions can be interpreted in a geometrical way.
Proceedings ArticleDOI

Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision

TL;DR: 3D information is exploited to automatically generate very accurate object segmentations given annotated 3D bounding boxes in a binary Markov random field which exploits appearance models, stereo and/or noisy point clouds, a repository of 3D CAD models as well as topological constraints.
Proceedings ArticleDOI

A Fully-Connected Layered Model of Foreground and Background Flow

TL;DR: This work forms a fully-connected layered model that enables global reasoning about the complicated segmentations of real objects, and combines these ideas with a layered flow model, and finds that the long-range connections greatly improve segmentation into figure-ground layers when compared with locally connected MRF models.
Proceedings ArticleDOI

Multi-view scene flow estimation: A view centered variational approach

TL;DR: A unified global energy functional is proposed to incorporate the information from the available sequences and simultaneously recover both depth and scene flow and the minimization of the functional is successfully obtained despite the non-convex optimization problem.
Proceedings ArticleDOI

SphereFlow: 6 DoF Scene Flow from RGB-D Pairs

TL;DR: By reasoning in terms of patches under 6 DoF rigid body motions in 3D, this work succeeds in obtaining compelling results at displacements large and small without relying on either of two simplifying assumptions that pervade much of the earlier literature: brightness constancy or local surface planarity.
Related Papers (5)