Global localization and relative pose estimation based on scale-invariant features

doi:10.1109/ICPR.2004.418

Home
/
Papers
/
Global localization and relative pose estimation based on scale-invariant features

Proceedings Article•DOI•

Global localization and relative pose estimation based on scale-invariant features

Jana Kosecka¹, Xiaolong Yang¹•Institutions (1)

George Mason University¹

23 Aug 2004-Vol. 4, pp 319-322

TL;DR: This work describes a vision-based hybrid localization scheme based on scale-invariant keypoints and demonstrates the efficiency of the location recognition approach and presents a closed form solution to the relative pose recovery for the case of planar motion and unknown focal length of the camera.

read less

Abstract: The capability of maintaining the pose of the mobile robot is central for basic navigation and map building tasks. In This work we describe a vision-based hybrid localization scheme based on scale-invariant keypoints. In the first stage the topological localization is accomplished by matching the keypoints detected in the current view with the database of model views. Once the best match has been found, the relative pose between the model view and the current image is recovered. We demonstrate the efficiency of the location recognition approach and present a closed form solution to the relative pose recovery for the case of planar motion and unknown focal length of the camera. The approach is demonstrated on several examples of indoor environments.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

Outdoor SLAM using visual appearance and laser ranging

[...]

Paul Newman¹, Dave Cole¹, Kin Leong Ho¹•Institutions (1)

University of Oxford¹

15 May 2006

TL;DR: A 3D SLAM system using information from an actuated laser scanner and camera installed on a mobile robot to detect loop closure events using a novel appearance-based retrieval system that is robust to repetitive visual structure and provides a probabilistic measure of confidence.

...read moreread less

Abstract: This paper describes a 3D SLAM system using information from an actuated laser scanner and camera installed on a mobile robot. The laser samples the local geometry of the environment and is used to incrementally build a 3D point-cloud map of the workspace. Sequences of images from the camera are used to detect loop closure events (without reference to the internal estimates of vehicle location) using a novel appearance-based retrieval system. The loop closure detection is robust to repetitive visual structure and provides a probabilistic measure of confidence. The images suggesting loop closure are then further processed with their corresponding local laser scans to yield putative Euclidean image-image transformations. We show how naive application of this transformation to effect the loop closure can lead to catastrophic linearization errors and go on to describe a way in which gross, pre-loop closing errors can be successfully annulled. We demonstrate our system working in a challenging, outdoor setting containing substantial loops and beguiling, gently curving traversals. The results are overlaid on an aerial image to provide a ground truth comparison with the estimated map. The paper concludes with an extension into the multi-robot domain in which 3D maps resulting from distinct SLAM sessions (no common reference frame) are combined without recourse to mutual observation

...read moreread less

378 citations

Proceedings Article•DOI•

A visual bag of words method for interactive qualitative localization and mapping

[...]

David Filliat

10 Apr 2007

TL;DR: This work presents a visual localization and map-learning system that relies on vision only and that is able to incrementally learn to recognize the different rooms of an apartment from any robot position.

...read moreread less

Abstract: Localization for low cost humanoid or animal-like personal robots has to rely on cheap sensors and has to be robust to user manipulations of the robot. We present a visual localization and map-learning system that relies on vision only and that is able to incrementally learn to recognize the different rooms of an apartment from any robot position. This system is inspired by visual categorization algorithms called bag of words methods that we modified to make fully incremental and to allow a user-interactive training. Our system is able to reliably recognize the room in which the robot is after a short training time and is stable for long term use. Empirical validation on a real robot and on an image database acquired in real environments are presented.

...read moreread less

263 citations

Journal Article•DOI•

Detecting Loop Closure with Scene Sequences

[...]

Kin Leong Ho¹, Paul Newman¹•Institutions (1)

University of Oxford¹

01 Sep 2007-International Journal of Computer Vision

TL;DR: An extension of the loop closing technique to a multi-robot mapping problem in which the outputs of several, uncoordinated and SLAM-enabled robots are fused without requiring inter-vehicle observations or a-priori frame alignment.

...read moreread less

Abstract: This paper is concerned with "loop closing" for mobile robots. Loop closing is the problem of correctly asserting that a robot has returned to a previously visited area. It is a particularly hard but important component of the Simultaneous Localization and Mapping (SLAM) problem. Here a mobile robot explores an a-priori unknown environment performing on-the-fly mapping while the map is used to localize the vehicle. Many SLAM implementations look to internal map and vehicle estimates (p.d.fs) to make decisions about whether a vehicle is revisiting a previously mapped area or is exploring a new region of workspace. We suggest that one of the reasons loop closing is hard in SLAM is precisely because these internal estimates can, despite best efforts, be in gross error. The "loop closer" we propose, analyze and demonstrate makes no recourse to the metric estimates of the SLAM system it supports and aids---it is entirely independent. At regular intervals the vehicle captures the appearance of the local scene (with camera and laser). We encode the similarity between all possible pairings of scenes in a "similarity matrix". We then pose the loop closing problem as the task of extracting statistically significant sequences of similar scenes from this matrix. We show how suitable analysis (introspection) and decomposition (remediation) of the similarity matrix allows for the reliable detection of loops despite the presence of repetitive and visually ambiguous scenes. We demonstrate the technique supporting a SLAM system driven by scan-matching laser data in a variety of settings. Some of the outdoor settings are beyond the capability of the SLAM system itself in which case GPS was used to provide a ground truth. We further show how the techniques can equally be applied to detect loop closure using spatial images taken with a scanning laser. We conclude with an extension of the loop closing technique to a multi-robot mapping problem in which the outputs of several, uncoordinated and SLAM-enabled robots are fused without requiring inter-vehicle observations or a-priori frame alignment.

...read moreread less

244 citations

Journal Article•DOI•

Omnidirectional Vision Based Topological Navigation

[...]

Toon Goedemé¹, Marnix Nuttin¹, Tinne Tuytelaars¹, Luc Van Gool¹•Institutions (1)

Katholieke Universiteit Leuven¹

01 Sep 2007-International Journal of Computer Vision

TL;DR: In this paper, the authors present a system for autonomous mobile robot navigation with only an omnidirectional camera as sensor, which is able to build automatically and robustly accurate topologically organized environment maps of a complex, natural environment.

...read moreread less

Abstract: In this work we present a novel system for autonomous mobile robot navigation. With only an omnidirectional camera as sensor, this system is able to build automatically and robustly accurate topologically organised environment maps of a complex, natural environment. It can localise itself using such a map at each moment, including both at startup (kidnapped robot) or using knowledge of former localisations. The topological nature of the map is similar to the intuitive maps humans use, is memory-efficient and enables fast and simple path planning towards a specified goal. We developed a real-time visual servoing technique to steer the system along the computed path. A key technology making this all possible is the novel fast wide baseline feature matching, which yields an efficient description of the scene, with a focus on man-made environments.

...read moreread less

189 citations

Journal Article•DOI•

Dictionary Learning for Stereo Image Representation

[...]

Ivana Tosic¹, Pascal Frossard¹•Institutions (1)

École Normale Supérieure¹

01 Apr 2011-IEEE Transactions on Image Processing

TL;DR: This work proposes a new method for learning overcomplete dictionaries that are adapted to the joint representation of stereo images and applies the learning algorithm to the case of omnidirectional images, where they learn scales of atoms in a parametric dictionary.

...read moreread less

Abstract: One of the major challenges in multi-view imaging is the definition of a representation that reveals the intrinsic geometry of the visual information. Sparse image representations with overcomplete geometric dictionaries offer a way to efficiently approximate these images, such that the multi-view geometric structure becomes explicit in the representation. However, the choice of a good dictionary in this case is far from obvious. We propose a new method for learning overcomplete dictionaries that are adapted to the joint representation of stereo images. We first formulate a sparse stereo image model where the multi-view correlation is described by local geometric transforms of dictionary elements (atoms) in two stereo views. A maximum-likelihood (ML) method for learning stereo dictionaries is then proposed, where a multi-view geometry constraint is included in the probabilistic model. The ML objective function is optimized using the expectation-maximization algorithm. We apply the learning algorithm to the case of omnidirectional images, where we learn scales of atoms in a parametric dictionary. The resulting dictionaries provide better performance in the joint representation of stereo omnidirectional images as well as improved multi-view feature matching. We finally discuss and demonstrate the benefits of dictionary learning for distributed scene representation and camera pose estimation.

...read moreread less

29 citations

1
2
3
4
…
5

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

"Global localization and relative po..." refers methods in this paper

...More detailed discussion about enforcing the separation between the features, sampling of the scale space and improvement in feature localization can be found in [ 8 , 4]. Once the location and scale have been assigned to candidate keypoints, the dominant orientation is computed by determining peaks in the orientation histogram of its local neigh-Figure 1. Examples of scale invariant keypoints....
[...]
...In this paper we examine the effectiveness of scale-invariant (SIFT) features proposed by [ 8 ]....
[...]
...Commonly used representations are responses to a banks of filters [17], multi-dimensional histograms [12, 7], local Fourier-transforms [15] and affine invariant feature descriptors [ 8 ]....
[...]
...Our approach is motivated by the recent advances in object recognition using local scale invariant features proposed by [ 8 ] and adopts the strategy for localization by means of location recognition....
[...]

Journal Article•DOI•

Local grayvalue invariants for image retrieval

[...]

Cordelia Schmid, Roger Mohr

01 May 1997-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.

...read moreread less

Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

...read moreread less

1,756 citations

"Global localization and relative po..." refers background in this paper

...The use of the local feature detectors in the context of object recognition has been demonstrated successfully by several researchers in the past [13, 11]....
[...]

Proceedings Article•DOI•

Recognizing indoor scenes

[...]

Ariadna Quattoni¹, Antonio Torralba¹•Institutions (1)

Massachusetts Institute of Technology¹

20 Jun 2009

TL;DR: A prototype based model that can successfully combine local and global discriminative information is proposed that can significantly outperform a state of the art classifier for the indoor scene recognition task.

...read moreread less

Abstract: Indoor scene recognition is a challenging open problem in high level vision. Most scene recognition models that work well for outdoor scenes perform poorly in the indoor domain. The main difficulty is that while some indoor scenes (e.g. corridors) can be well characterized by global spatial properties, others (e.g, bookstores) are better characterized by the objects they contain. More generally, to address the indoor scenes recognition problem we need a model that can exploit local and global discriminative information. In this paper we propose a prototype based model that can successfully combine both sources of information. To test our approach we created a dataset of 67 indoor scenes categories (the largest available) covering a wide range of domains. The results show that our approach can significantly outperform a state of the art classifier for the task.

...read moreread less

1,517 citations

Book•

An Invitation to 3-D Vision: From Images to Geometric Models

[...]

Yi Ma, Stefano Soatto, Jana Koseck, S. Shankar Sastry

14 Nov 2003

TL;DR: In this paper, the authors introduce the geometry of 3D vision, that is, the reconstruction of 3-D models of objects from a collection of 2-D images, and develop practical reconstruction algorithms and discuss possible extensions of the theory.

...read moreread less

Abstract: This book introduces the geometry of 3-D vision, that is, the reconstruction of 3-D models of objects from a collection of 2-D images. It details the classic theory of two view geometry and shows that a more proper tool for studying the geometry of multiple views is the so-called rank consideration of the multiple view matrix. It also develops practical reconstruction algorithms and discusses possible extensions of the theory.

...read moreread less

1,136 citations

Proceedings Article•DOI•

Invariant Features from Interest Point Groups

[...]

Matthew Brown¹, David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Sep 2002

TL;DR: This work introduces a family of features which use groups of interest points to form geometrically invariant descriptors of image regions to ensure robust matching between images in which there are large changes in viewpoint, scale and illumi- nation.

...read moreread less

Abstract: This paper approaches the problem of ¯nding correspondences between images in which there are large changes in viewpoint, scale and illumi- nation. Recent work has shown that scale-space `interest points' may be found with good repeatability in spite of such changes. Further- more, the high entropy of the surrounding image regions means that local descriptors are highly discriminative for matching. For descrip- tors at interest points to be robustly matched between images, they must be as far as possible invariant to the imaging process. In this work we introduce a family of features which use groups of interest points to form geometrically invariant descriptors of image regions. Feature descriptors are formed by resampling the image rel- ative to canonical frames de¯ned by the points. In addition to robust matching, a key advantage of this approach is that each match implies a hypothesis of the local 2D (projective) transformation. This allows us to immediately reject most of the false matches using a Hough trans- form. We reject remaining outliers using RANSAC and the epipolar constraint. Results show that dense feature matching can be achieved in a few seconds of computation on 1GHz Pentium III machines.

...read moreread less

723 citations

"Global localization and relative po..." refers background in this paper

...More detailed discussion about enforcing the separation between the features, sampling of the scale space and improvement in feature localization can be found in [8, 4]....
[...]