End-to-End Learning of Geometry and Context for Deep Stereo Regression

doi:10.1109/ICCV.2017.17

Open AccessProceedings ArticleDOI

End-to-End Learning of Geometry and Context for Deep Stereo Regression

- pp 66-75

TLDR

A novel deep learning architecture for regressing disparity from a rectified pair of stereo images is proposed, leveraging knowledge of the problem’s geometry to form a cost volume using deep feature representations and incorporating contextual information using 3-D convolutions over this volume.

Abstract:

We propose a novel deep learning architecture for regressing disparity from a rectified pair of stereo images. We leverage knowledge of the problem’s geometry to form a cost volume using deep feature representations. We learn to incorporate contextual information using 3-D convolutions over this volume. Disparity values are regressed from the cost volume using a proposed differentiable soft argmin operation, which allows us to train our method end-to-end to sub-pixel accuracy without any additional post-processing or regularization. We evaluate our method on the Scene Flow and KITTI datasets and on KITTI we set a new stateof-the-art benchmark, while being significantly faster than competing approaches.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Unsupervised Learning of Depth and Ego-Motion from Video

Tinghui Zhou, +3 more

TL;DR: In this paper, an unsupervised learning framework for the task of monocular depth and camera motion estimation from unstructured video sequences is presented, which uses single-view depth and multiview pose networks with a loss based on warping nearby views to the target using the computed depth and pose.

...read moreread less

Proceedings ArticleDOI

Pyramid Stereo Matching Network

Jia-Ren Chang, +1 more

TL;DR: PSMNet as discussed by the authors proposes a pyramid stereo matching network consisting of two main modules: spatial pyramid pooling and 3D CNN to regularize cost volume using stacked multiple hourglass networks in conjunction with intermediate supervision.

...read moreread less

Posted Content

Digging Into Self-Supervised Monocular Depth Estimation

Clément Godard, +3 more

- 04 Jun 2018 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: It is shown that a surprisingly simple model, and associated design choices, lead to superior predictions, and together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.

...read moreread less

Book ChapterDOI

MVSNet: Depth inference for unstructured multi-view stereo

Yao Yao, +4 more

TL;DR: This work presents an end-to-end deep learning architecture for depth map inference from multi-view images that flexibly adapts arbitrary N-view inputs using a variance-based cost metric that maps multiple features into one cost feature.

...read moreread less

Proceedings ArticleDOI

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

Zhichao Yin, +1 more

TL;DR: GeoNet as mentioned in this paper proposes an adaptive geometric consistency loss to increase robustness towards outliers and non-Lambertian regions, which resolves occlusions and texture ambiguities effectively.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

Kaiming He, +3 more

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

Alex Krizhevsky, +2 more

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

Jonathan Long, +2 more

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.

...read moreread less

Proceedings ArticleDOI

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Ross Girshick, +3 more

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.

...read moreread less

Proceedings Article

Neural Machine Translation by Jointly Learning to Align and Translate

Dzmitry Bahdanau, +2 more

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Collapse

International Journal of Computer Vision

Stereo Processing by Semiglobal Matching and Mutual Information

Heiko Hirschmüller

- 01 Feb 2008 -

IEEE Transactions on Pattern Analysis an...

End-to-End Learning of Geometry and Context for Deep Stereo Regression

Citations

Unsupervised Learning of Depth and Ego-Motion from Video

Pyramid Stereo Matching Network

Digging Into Self-Supervised Monocular Depth Estimation

MVSNet: Depth inference for unstructured multi-view stereo

GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Fully convolutional networks for semantic segmentation

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Neural Machine Translation by Jointly Learning to Align and Translate

Related Papers (5)

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

Are we ready for autonomous driving? The KITTI vision benchmark suite

Object scene flow for autonomous vehicles

A taxonomy and evaluation of dense two-frame stereo correspondence algorithms

Stereo Processing by Semiglobal Matching and Mutual Information