VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

doi:10.1109/CVPR.2017.284

Open AccessProceedings ArticleDOI

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

- pp 2652-2660

TLDR

In this article, a recurrent model is proposed for 6-DoF localization of video-clips, and the pose estimates are smoothed and the localization error can be drastically reduced.

Abstract:

Machine learning techniques, namely convolutional neural networks (CNN) and regression forests, have recently shown great promise in performing 6-DoF localization of monocular images. However, in most cases image-sequences, rather only single images, are readily available. To this extent, none of the proposed learning-based approaches exploit the valuable constraint of temporal smoothness, often leading to situations where the per-frame error is larger than the camera motion. In this paper we propose a recurrent model for performing 6-DoF localization of video-clips. We find that, even by considering only short sequences (20 frames), the pose estimates are smoothed and the localization error can be drastically reduced. Finally, we consider means of obtaining probabilistic pose estimates from our model. We evaluate our method on openly-available real-world autonomous driving and indoor localization datasets.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions

Torsten Sattler, +14 more

TL;DR: This paper introduces the first benchmark datasets specifically designed for analyzing the impact of day-night changes, weather and seasonal variations, as well as sequence-based localization approaches and the need for better local features on visual localization.

...read moreread less

Journal ArticleDOI

The ApolloScape Open Dataset for Autonomous Driving and Its Application

Xinyu Huang, +5 more

- 01 Oct 2020 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This paper provides a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving.

...read moreread less

Proceedings ArticleDOI

Geometry-Aware Learning of Maps for Camera Localization

Samarth Brahmbhatt, +4 more

TL;DR: In this article, the authors propose to represent maps as a deep neural network called MapNet, which enables learning a data-driven map representation and fuses them together for camera localization.

...read moreread less

Proceedings ArticleDOI

CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM

Michael Bloesch, +4 more

TL;DR: In this paper, a dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters is presented. But it is not suitable for use in a keyframe-based monocular dense SLAM system.

...read moreread less

Proceedings ArticleDOI

L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving

Lu Weixin, +4 more

TL;DR: This work innovatively implements the use of various deep neural network structures to establish a learning-based LiDAR localization system that achieves centimeter-level localization accuracy, comparable to prior state-of-the-art systems with hand-crafted pipelines.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Long short-term memory

Sepp Hochreiter, +1 more

- 01 Nov 1997 -

Neural Computation

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.

...read moreread less

Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan, +1 more

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Journal ArticleDOI

Bidirectional recurrent neural networks

Mike Schuster, +1 more

- 01 Nov 1997 -

IEEE Transactions on Signal Processing

TL;DR: It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.

...read moreread less

Proceedings ArticleDOI

KinectFusion: Real-time dense surface mapping and tracking

Richard Newcombe, +9 more

TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.

...read moreread less

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

Citations

Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions

The ApolloScape Open Dataset for Autonomous Driving and Its Application

Geometry-Aware Learning of Maps for Camera Localization

CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM

L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving

References

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Going deeper with convolutions

Bidirectional recurrent neural networks

KinectFusion: Real-time dense surface mapping and tracking

Related Papers (5)

PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization

Geometric Loss Functions for Camera Pose Regression with Deep Learning

Scene Coordinate Regression Forests for Camera Relocalization in RGB-D Images

Deep Residual Learning for Image Recognition

ORB-SLAM: A Versatile and Accurate Monocular SLAM System