Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?

doi:10.1109/ICCV.2017.281

Proceedings ArticleDOI

Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?

- pp 2593-2602

TLDR

The importance of recognition granularity is investigated, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions, and it is observed that the instance segmentation cue is by far strongest, in the authors' setting.

Abstract:

Existing methods for 3D scene flow estimation often fail in the presence of large displacement or local ambiguities, e.g., at texture-less or reflective surfaces. However, these challenges are omnipresent in dynamic road scenes, which is the focus of this work. Our main contribution is to overcome these 3D motion estimation problems by exploiting recognition. In particular, we investigate the importance of recognition granularity, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions. We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting. We demonstrate the effectiveness of our method on the challenging KITTI 2015 scene flow benchmark where we achieve state-of-the-art performance at the time of submission.

Citations

PDF

Open Access

More filters

Posted Content

iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects

Omid Jafari, +4 more

- 05 Dec 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work presents the first deep learning-based system that estimates accurate poses for partly occluded objects from RGB-D and RGB input with a new instance-aware pipeline that decomposes 6D object pose estimation into a sequence of simpler steps, where each step removes specific aspects of the problem.

...read moreread less

Posted Content

Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes

Fabian Brickwedde, +2 more

- 17 Aug 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: In this article, a novel monocular 3D scene flow estimation method, called Mono-SF, is proposed to jointly estimate the 3D geometry and motion of the scene by combining multi-view geometry and single-view depth information.

...read moreread less

Proceedings ArticleDOI

A Deep Temporal Fusion Framework for Scene Flow Using a Learnable Motion Model and Occlusions

René Schuster, +2 more

TL;DR: This work pro-poses a novel data-driven approach for temporal fusion of scene flow estimates in a multi-frame setup to overcome the issue of occlusion, and provides a fast multi- frame extension for a variety of sceneflow estimators, which outperforms the underlying dual-frame approaches.

...read moreread less

Posted Content

L3DOC: Lifelong 3D Object Classification

Yuyang Liu, +2 more

- 12 Dec 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This is the first work about using lifelong learning to handle 3D object classification task without model fine-tuning or retraining, and the core idea of the proposed L3DOC model is to factorize PointNet in a perspective of lifelong learning, while capturing and storing the shared point-knowledge in a Perspective of layer-wise tensor factorization architecture.

...read moreread less

Proceedings ArticleDOI

Two-Stage Adaptive Object Scene Flow Using Hybrid CNN-CRF Model

Congcong Li, +2 more

TL;DR: Zhang et al. as mentioned in this paper proposed a two-stage adaptive object scene flow estimation method using a hybrid CNN-CRF model, which benefits from high-quality features and the structured modelling capability.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Fast R-CNN

Ross Girshick

TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.

...read moreread less

Posted Content

Fast R-CNN

Ross Girshick

- 30 Apr 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection that builds on previous work to efficiently classify object proposals using deep convolutional networks.

...read moreread less

Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

- 20 Jun 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Proceedings ArticleDOI

Are we ready for autonomous driving? The KITTI vision benchmark suite

Andreas Geiger, +2 more

TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.

...read moreread less

Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

Yangqing Jia, +7 more

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

...read moreread less

Collapse

Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?

Citations

iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects

Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes

A Deep Temporal Fusion Framework for Scene Flow Using a Learnable Motion Model and Occlusions

L3DOC: Lifelong 3D Object Classification

Two-Stage Adaptive Object Scene Flow Using Hybrid CNN-CRF Model

References

Fast R-CNN

Fast R-CNN

Caffe: Convolutional Architecture for Fast Feature Embedding

Are we ready for autonomous driving? The KITTI vision benchmark suite

Caffe: Convolutional Architecture for Fast Feature Embedding

Related Papers (5)

Object scene flow for autonomous vehicles

Are we ready for autonomous driving? The KITTI vision benchmark suite

FlowNet: Learning Optical Flow with Convolutional Networks

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume