Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry
Nan Yang,Rui Wang,Jörg Stückler,Daniel Cremers +3 more
- pp 835-852
TLDR
The Deep Virtual Stereo Odometry incorporates deep depth predictions into Direct Sparse Odometry (DSO) as direct virtual stereo measurements and designs a novel deep network that refines predicted depth from a single image in a two-stage process.Abstract:
Monocular visual odometry approaches that purely rely on geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction. In this paper, we propose to leverage deep monocular depth prediction to overcome limitations of geometry-based monocular visual odometry. To this end, we incorporate deep depth predictions into Direct Sparse Odometry (DSO) as direct virtual stereo measurements. For depth prediction, we design a novel deep network that refines predicted depth from a single image in a two-stage process. We train our network in a semi-supervised way on photoconsistency in stereo images and on consistency with accurate sparse depth reconstructions from Stereo DSO. Our deep predictions excel state-of-the-art approaches for monocular depth on the KITTI benchmark. Moreover, our Deep Virtual Stereo Odometry clearly exceeds previous monocular and deep-learning based methods in accuracy. It even achieves comparable performance to the state-of-the-art stereo methods, while only relying on a single camera.read more
Citations
More filters
Proceedings ArticleDOI
Digging Into Self-Supervised Monocular Depth Estimation
TL;DR: In this paper, the authors propose a set of improvements, which together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods, and demonstrate the effectiveness of each component in isolation, and show high quality, state-of-theart results on the KITTI benchmark.
Posted Content
Digging Into Self-Supervised Monocular Depth Estimation
TL;DR: It is shown that a surprisingly simple model, and associated design choices, lead to superior predictions, and together result in both quantitatively and qualitatively improved depth maps compared to competing self-supervised methods.
Proceedings ArticleDOI
D3VO: Deep Depth, Deep Pose and Deep Uncertainty for Monocular Visual Odometry
TL;DR: Li et al. as mentioned in this paper proposed a self-supervised monocular depth estimation network trained on stereo videos without any external supervision, which aligns the training image pairs into similar lighting condition with predictive brightness transformation parameters.
Proceedings ArticleDOI
3D Packing for Self-Supervised Monocular Depth Estimation
TL;DR: Li et al. as mentioned in this paper proposed a self-supervised monocular depth estimation method combining geometry with a new deep network, PackNet, learned only from unlabeled monocular videos, which leverages symmetrical packing and unpacking blocks to jointly learn to compress and decompress detail-preserving representations using 3D convolutions.
Journal ArticleDOI
CubeSLAM: Monocular 3-D Object SLAM
Shichao Yang,Sebastian Scherer +1 more
TL;DR: The SLAM method achieves the state-of-the-art monocular camera pose estimation and at the same time, improves the 3-D object detection accuracy.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
Adam: A Method for Stochastic Optimization
Diederik P. Kingma,Jimmy Ba +1 more
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI
Image quality assessment: from error visibility to structural similarity
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Proceedings ArticleDOI
Fully convolutional networks for semantic segmentation
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Book
Multiple view geometry in computer vision
Richard Hartley,Andrew Zisserman +1 more
TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.