Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision.

Open AccessPosted Content

Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision.

- 22 Mar 2021 -

arXiv: Computer Vision and Pattern Recog...

TLDR

In this article, a virtual-world supervision (MonoDEVS) and real-world SfM self-supervision is proposed to compensate the SfMs limitations by leveraging virtual world images with accurate semantic and depth supervision and addressing the virtual to real domain gap.

Abstract:

Depth information is essential for on-board perception in autonomous driving and driver assistance. Monocular depth estimation (MDE) is very appealing since it allows for appearance and depth being on direct pixelwise correspondence without further calibration. Best MDE models are based on Convolutional Neural Networks (CNNs) trained in a supervised manner, i.e., assuming pixelwise ground truth (GT). Usually, this GT is acquired at training time through a calibrated multi-modal suite of sensors. However, also using only a monocular system at training time is cheaper and more scalable. This is possible by relying on structure-from-motion (SfM) principles to generate self-supervision. Nevertheless, problems of camouflaged objects, visibility changes, static-camera intervals, textureless areas, and scale ambiguity, diminish the usefulness of such self-supervision. In this paper, we perform monocular depth estimation by virtual-world supervision (MonoDEVS) and real-world SfM self-supervision. We compensate the SfM self-supervision limitations by leveraging virtual-world images with accurate semantic and depth supervision and addressing the virtual-to-real domain gap. Our MonoDEVSNet outperforms previous MDE CNNs trained on monocular and even stereo sequences.

Monocular Depth Estimation through Virtual-world Supervision and Real-world SfM Self-Supervision.

Citations

Co-Training for Deep Object Detection: Comparing Single-Modal and Multi-Modal Approaches.

Co-training for Deep Object Detection: Comparing Single-modal and Multi-modal Approaches

References

Adam: A Method for Stochastic Optimization

ImageNet: A large-scale hierarchical image database

Image quality assessment: from error visibility to structural similarity

Generative Adversarial Nets

Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks

Related Papers (5)

PackNet-SfM: 3D Packing for Self-Supervised Monocular Depth Estimation.

Learning monocular depth estimation infusing traditional stereo knowledge

The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth

SynDeMo: Synergistic Deep Feature Alignment for Joint Learning of Depth and Ego-Motion

A large RGB-D dataset for semi-supervised monocular depth estimation