Proceedings ArticleDOI
Bounding Boxes, Segmentations and Object Coordinates: How Important is Recognition for 3D Scene Flow Estimation in Autonomous Driving Scenarios?
Aseem Behl,Omid Hosseini Jafari,Siva Karthik Mustikovela,Hassan Abu Alhaija,Carsten Rother,Andreas Geiger +5 more
- pp 2593-2602
TLDR
The importance of recognition granularity is investigated, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions, and it is observed that the instance segmentation cue is by far strongest, in the authors' setting.Abstract:
Existing methods for 3D scene flow estimation often fail in the presence of large displacement or local ambiguities, e.g., at texture-less or reflective surfaces. However, these challenges are omnipresent in dynamic road scenes, which is the focus of this work. Our main contribution is to overcome these 3D motion estimation problems by exploiting recognition. In particular, we investigate the importance of recognition granularity, from coarse 2D bounding box estimates over 2D instance segmentations to fine-grained 3D object part predictions. We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting. We demonstrate the effectiveness of our method on the challenging KITTI 2015 scene flow benchmark where we achieve state-of-the-art performance at the time of submission.read more
Citations
More filters
Posted Content
UPFlow: Upsampling Pyramid for Unsupervised Optical Flow Learning
TL;DR: This work designs a self-guided upsample module to tackle the interpolation blur problem caused by bilinear upsampling between pyramid levels, and proposes a pyramid distillation loss to add supervision for intermediate levels via distilling the finest flow as pseudo labels.
Proceedings ArticleDOI
Learning to Segment Rigid Motions from Two Frames
Gengshan Yang,Deva Ramanan +1 more
TL;DR: This work proposes a modular network, whose architecture is motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field, and achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.
Book ChapterDOI
Image-to-Voxel Model Translation with Conditional Adversarial Networks
TL;DR: The framework is evaluated on 3D shape datasets to show that it delivers robust 3D scene reconstruction results that compete with and surpass state-of-the-art in a scene reconstruction with multiple non-rigid objects.
Proceedings ArticleDOI
RAFT-3D: Scene Flow using Rigid-Motion Embeddings
Zachary Teed,Jia Deng +1 more
TL;DR: The authors proposed a new deep architecture for scene flow, based on the RAFT model developed for optical flow but iteratively updating a dense field of pixelwise SE3 motion instead of 2D motion.
Feature Learning for Scene Flow Estimation from LIDAR
Arash K. Ushani,Ryan M. Eustice +1 more
TL;DR: An encoding network is built to learn features from an occupancy grid so that these features are discriminative in finding matching or non-matching locations between successive timesteps, then leveraged to estimate scene flow.
References
More filters
Proceedings ArticleDOI
Fast R-CNN
TL;DR: Fast R-CNN as discussed by the authors proposes a Fast Region-based Convolutional Network method for object detection, which employs several innovations to improve training and testing speed while also increasing detection accuracy and achieves a higher mAP on PASCAL VOC 2012.
Posted Content
Fast R-CNN
TL;DR: This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection that builds on previous work to efficiently classify object proposals using deep convolutional networks.
Posted Content
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI
Are we ready for autonomous driving? The KITTI vision benchmark suite
TL;DR: The autonomous driving platform is used to develop novel challenging benchmarks for the tasks of stereo, optical flow, visual odometry/SLAM and 3D object detection, revealing that methods ranking high on established datasets such as Middlebury perform below average when being moved outside the laboratory to the real world.
Proceedings ArticleDOI
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia,Evan Shelhamer,Jeff Donahue,Sergey Karayev,Jonathan Long,Ross Girshick,Sergio Guadarrama,Trevor Darrell +7 more
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.