scispace - formally typeset
Open AccessProceedings ArticleDOI

VidLoc: A Deep Spatio-Temporal Model for 6-DoF Video-Clip Relocalization

TLDR
In this article, a recurrent model is proposed for 6-DoF localization of video-clips, and the pose estimates are smoothed and the localization error can be drastically reduced.
Abstract
Machine learning techniques, namely convolutional neural networks (CNN) and regression forests, have recently shown great promise in performing 6-DoF localization of monocular images. However, in most cases image-sequences, rather only single images, are readily available. To this extent, none of the proposed learning-based approaches exploit the valuable constraint of temporal smoothness, often leading to situations where the per-frame error is larger than the camera motion. In this paper we propose a recurrent model for performing 6-DoF localization of video-clips. We find that, even by considering only short sequences (20 frames), the pose estimates are smoothed and the localization error can be drastically reduced. Finally, we consider means of obtaining probabilistic pose estimates from our model. We evaluate our method on openly-available real-world autonomous driving and indoor localization datasets.

read more

Citations
More filters
Proceedings ArticleDOI

Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions

TL;DR: This paper introduces the first benchmark datasets specifically designed for analyzing the impact of day-night changes, weather and seasonal variations, as well as sequence-based localization approaches and the need for better local features on visual localization.
Journal ArticleDOI

The ApolloScape Open Dataset for Autonomous Driving and Its Application

TL;DR: This paper provides a sensor fusion scheme integrating camera videos, consumer-grade motion sensors (GPS/IMU), and a 3D semantic map in order to achieve robust self-localization and semantic segmentation for autonomous driving.
Proceedings ArticleDOI

Geometry-Aware Learning of Maps for Camera Localization

TL;DR: In this article, the authors propose to represent maps as a deep neural network called MapNet, which enables learning a data-driven map representation and fuses them together for camera localization.
Proceedings ArticleDOI

CodeSLAM - Learning a Compact, Optimisable Representation for Dense Visual SLAM

TL;DR: In this paper, a dense representation of scene geometry which is conditioned on the intensity data from a single image and generated from a code consisting of a small number of parameters is presented. But it is not suitable for use in a keyframe-based monocular dense SLAM system.
Proceedings ArticleDOI

L3-Net: Towards Learning Based LiDAR Localization for Autonomous Driving

TL;DR: This work innovatively implements the use of various deep neural network structures to establish a learning-based LiDAR localization system that achieves centimeter-level localization accuracy, comparable to prior state-of-the-art systems with hand-crafted pipelines.
References
More filters
Journal ArticleDOI

Long short-term memory

TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Journal ArticleDOI

Bidirectional recurrent neural networks

TL;DR: It is shown how the proposed bidirectional structure can be easily modified to allow efficient estimation of the conditional posterior probability of complete symbol sequences without making any explicit assumption about the shape of the distribution.
Proceedings ArticleDOI

KinectFusion: Real-time dense surface mapping and tracking

TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.
Related Papers (5)