scispace - formally typeset
Open AccessProceedings ArticleDOI

Deep learning features at scale for visual place recognition

Reads0
Chats0
TLDR
This paper trains, at large scale, two CNN architectures for the specific place recognition task and employs a multi-scale feature encoding method to generate condition- and viewpoint-invariant features.
Abstract
The success of deep learning techniques in the computer vision domain has triggered a range of initial investigations into their utility for visual place recognition, all using generic features from networks that were trained for other types of recognition tasks. In this paper, we train, at large scale, two CNN architectures for the specific place recognition task and employ a multi-scale feature encoding method to generate condition- and viewpoint-invariant features. To enable this training to occur, we have developed a massive Specific PlacEs Dataset (SPED) with hundreds of examples of place appearance change at thousands of different places, as opposed to the semantic place type datasets currently available. This new dataset enables us to set up a training regime that interprets place recognition as a classification problem. We comprehensively evaluate our trained networks on several challenging benchmark place recognition datasets and demonstrate that they achieve an average 10% increase in performance over other place recognition algorithms and pre-trained CNNs. By analyzing the network responses and their differences from pre-trained networks, we provide insights into what a network learns when training for place recognition, and what these results signify for future research in this area.

read more

Citations
More filters
Proceedings ArticleDOI

When Deep Meets Shallow: Subspace-Based Multi-View Fusion for Instance-Level Image Retrieval

TL;DR: A subspace-based multi-view fusion strategy where a shared subspace is uncovered from the original high-dimensional features yielding a compact latent representation that lends itself to various real-time robotic vision tasks, e.g. place recognition and scene description.
Proceedings ArticleDOI

RGB2LIDAR: Towards Solving Large-Scale Cross-Modal Visual Localization

TL;DR: A novel joint embedding based method that effectively combines the appearance and semantic cues from both modalities to handle drastic cross-modal variations is proposed that represents a significant advancement over prior works in performance and scale.
Proceedings ArticleDOI

LatentSLAM: unsupervised multi-sensor representation learning for localization and mapping

TL;DR: In this paper, an unsupervised representation learning method was proposed that yields low-dimensional latent state descriptors that can be used for RatSLAM, which can be applied to any sensor modality, such as camera images, radar range-doppler maps and lidar scans.
Posted Content

CAMAL: Context-Aware Multi-scale Attention framework for Lightweight Visual Place Recognition

TL;DR: A lightweight CNN-based VPR technique that captures multi-layer context-aware attentions robust under changing environment and viewpoints that reveals better performance at low memory and resources utilization over state-of-the-art contemporary VPR methodologies is presented.
Posted Content

Same Features, Different Day: Weakly Supervised Feature Learning for Seasonal Invariance

TL;DR: Deja-Vu is proposed, a weakly supervised approach to learning season invariant features that does not require pixel-wise ground truth data, and is trained to produce "similar" dense feature maps for corresponding locations despite environmental changes.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings ArticleDOI

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Posted Content

Rich feature hierarchies for accurate object detection and semantic segmentation

TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI

Dimensionality Reduction by Learning an Invariant Mapping

TL;DR: This work presents a method - called Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) - for learning a globally coherent nonlinear function that maps the data evenly to the output manifold.
Related Papers (5)