Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?

doi:10.1109/ICCV.2017.281

Proceedings ArticleDOI

Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?

- pp 2593-2602

TLDR

The importance of recognition granularity is investigated, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions, and it is observed that the instance segmentation cue is by far strongest, in the authors' setting.

Abstract:

Existing methods for 3D scene flow estimation often fail in the presence of large displacement or local ambiguities, e.g., at texture-less or reflective surfaces. However, these challenges are omnipresent in dynamic road scenes, which is the focus of this work. Our main contribution is to overcome these 3D motion estimation problems by exploiting recognition. In particular, we investigate the importance of recognition granularity, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions. We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting. We demonstrate the effectiveness of our method on the challenging KITTI 2015 scene flow benchmark where we achieve state-of-the-art performance at the time of submission.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

Deqing Sun, +3 more

TL;DR: PWC-Net as discussed by the authors uses the current optical flow estimate to warp the CNN features of the second image, which is processed by a CNN to estimate the optical flow, and achieves state-of-the-art performance on the MPI Sintel final pass and KITTI 2015 benchmarks.

...read moreread less

Proceedings ArticleDOI

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

Zhichao Yin, +1 more

TL;DR: GeoNet as mentioned in this paper proposes an adaptive geometric consistency loss to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively.

...read moreread less

Posted Content

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

Zhichao Yin, +1 more

- 06 Mar 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: An adaptive geometric consistency loss is proposed to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively and achieves state-of-the-art results in all of the three tasks, performing better than previously unsupervised methods and comparably with supervised ones.

...read moreread less

Proceedings ArticleDOI

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

Garrick Brazil, +1 more

TL;DR: M3D-RPN is able to significantly improve the performance of both monocular 3D Object Detection and Bird's Eye View tasks within the KITTI urban autonomous driving dataset, while efficiently using a shared multi-class model.

...read moreread less

Book ChapterDOI

SegStereo: Exploiting Semantic Information for Disparity Estimation

Guorun Yang, +4 more

TL;DR: This paper suggests that appropriate incorporation of semantic cues can greatly rectify prediction in commonly-used disparity estimation frameworks and proposes a unified model SegStereo, which employs semantic features from segmentation and introduces semantic softmax loss, which helps improve the prediction accuracy of disparity maps.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book ChapterDOI

Joint estimation of motion, structure and geometry from stereo sequences

Levi Valgaerts, +5 more

TL;DR: A joint energy functional is proposed that integrates spatial and temporal information from two subsequent image pairs subject to an unknown stereo setup and a normalisation of image and stereo constraints such that deviations from model assumptions can be interpreted in a geometrical way.

...read moreread less

Proceedings ArticleDOI

Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision

Liang-Chieh Chen, +2 more

TL;DR: 3D information is exploited to automatically generate very accurate object segmentations given annotated 3D bounding boxes in a binary Markov random field which exploits appearance models, stereo and/or noisy point clouds, a repository of 3D CAD models as well as topological constraints.

...read moreread less

Proceedings ArticleDOI

A Fully-Connected Layered Model of Foreground and Background Flow

Deqing Sun, +4 more

TL;DR: This work forms a fully-connected layered model that enables global reasoning about the complicated segmentations of real objects, and combines these ideas with a layered flow model, and finds that the long-range connections greatly improve segmentation into figure-ground layers when compared with locally connected MRF models.

...read moreread less

Proceedings ArticleDOI

Multi-view scene flow estimation: A view centered variational approach

Tali Basha, +2 more

TL;DR: A unified global energy functional is proposed to incorporate the information from the available sequences and simultaneously recover both depth and scene flow and the minimization of the functional is successfully obtained despite the non-convex optimization problem.

...read moreread less

Proceedings ArticleDOI

SphereFlow: 6 DoF Scene Flow from RGB-D Pairs

Michael Hornacek, +2 more

TL;DR: By reasoning in terms of patches under 6 DoF rigid body motions in 3D, this work succeeds in obtaining compelling results at displacements large and small without relying on either of two simplifying assumptions that pervade much of the earlier literature: brightness constancy or local surface planarity.

...read moreread less

Collapse

Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?

Citations

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

SegStereo: Exploiting Semantic Information for Disparity Estimation

References

Joint estimation of motion, structure and geometry from stereo sequences

Beat the MTurkers: Automatic Image Labeling from Weak 3D Supervision

A Fully-Connected Layered Model of Foreground and Background Flow

Multi-view scene flow estimation: A view centered variational approach

SphereFlow: 6 DoF Scene Flow from RGB-D Pairs

Related Papers (5)

Object scene flow for autonomous vehicles

Are we ready for autonomous driving? The KITTI vision benchmark suite

FlowNet: Learning Optical Flow with Convolutional Networks

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume