scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Global localization and relative pose estimation based on scale-invariant features

23 Aug 2004-Vol. 4, pp 319-322
TL;DR: This work describes a vision-based hybrid localization scheme based on scale-invariant keypoints and demonstrates the efficiency of the location recognition approach and presents a closed form solution to the relative pose recovery for the case of planar motion and unknown focal length of the camera.
Abstract: The capability of maintaining the pose of the mobile robot is central for basic navigation and map building tasks. In This work we describe a vision-based hybrid localization scheme based on scale-invariant keypoints. In the first stage the topological localization is accomplished by matching the keypoints detected in the current view with the database of model views. Once the best match has been found, the relative pose between the model view and the current image is recovered. We demonstrate the efficiency of the location recognition approach and present a closed form solution to the relative pose recovery for the case of planar motion and unknown focal length of the camera. The approach is demonstrated on several examples of indoor environments.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
15 May 2006
TL;DR: A 3D SLAM system using information from an actuated laser scanner and camera installed on a mobile robot to detect loop closure events using a novel appearance-based retrieval system that is robust to repetitive visual structure and provides a probabilistic measure of confidence.
Abstract: This paper describes a 3D SLAM system using information from an actuated laser scanner and camera installed on a mobile robot. The laser samples the local geometry of the environment and is used to incrementally build a 3D point-cloud map of the workspace. Sequences of images from the camera are used to detect loop closure events (without reference to the internal estimates of vehicle location) using a novel appearance-based retrieval system. The loop closure detection is robust to repetitive visual structure and provides a probabilistic measure of confidence. The images suggesting loop closure are then further processed with their corresponding local laser scans to yield putative Euclidean image-image transformations. We show how naive application of this transformation to effect the loop closure can lead to catastrophic linearization errors and go on to describe a way in which gross, pre-loop closing errors can be successfully annulled. We demonstrate our system working in a challenging, outdoor setting containing substantial loops and beguiling, gently curving traversals. The results are overlaid on an aerial image to provide a ground truth comparison with the estimated map. The paper concludes with an extension into the multi-robot domain in which 3D maps resulting from distinct SLAM sessions (no common reference frame) are combined without recourse to mutual observation

378 citations

Proceedings ArticleDOI
10 Apr 2007
TL;DR: This work presents a visual localization and map-learning system that relies on vision only and that is able to incrementally learn to recognize the different rooms of an apartment from any robot position.
Abstract: Localization for low cost humanoid or animal-like personal robots has to rely on cheap sensors and has to be robust to user manipulations of the robot. We present a visual localization and map-learning system that relies on vision only and that is able to incrementally learn to recognize the different rooms of an apartment from any robot position. This system is inspired by visual categorization algorithms called bag of words methods that we modified to make fully incremental and to allow a user-interactive training. Our system is able to reliably recognize the room in which the robot is after a short training time and is stable for long term use. Empirical validation on a real robot and on an image database acquired in real environments are presented.

263 citations

Journal ArticleDOI
TL;DR: An extension of the loop closing technique to a multi-robot mapping problem in which the outputs of several, uncoordinated and SLAM-enabled robots are fused without requiring inter-vehicle observations or a-priori frame alignment.
Abstract: This paper is concerned with "loop closing" for mobile robots. Loop closing is the problem of correctly asserting that a robot has returned to a previously visited area. It is a particularly hard but important component of the Simultaneous Localization and Mapping (SLAM) problem. Here a mobile robot explores an a-priori unknown environment performing on-the-fly mapping while the map is used to localize the vehicle. Many SLAM implementations look to internal map and vehicle estimates (p.d.fs) to make decisions about whether a vehicle is revisiting a previously mapped area or is exploring a new region of workspace. We suggest that one of the reasons loop closing is hard in SLAM is precisely because these internal estimates can, despite best efforts, be in gross error. The "loop closer" we propose, analyze and demonstrate makes no recourse to the metric estimates of the SLAM system it supports and aids---it is entirely independent. At regular intervals the vehicle captures the appearance of the local scene (with camera and laser). We encode the similarity between all possible pairings of scenes in a "similarity matrix". We then pose the loop closing problem as the task of extracting statistically significant sequences of similar scenes from this matrix. We show how suitable analysis (introspection) and decomposition (remediation) of the similarity matrix allows for the reliable detection of loops despite the presence of repetitive and visually ambiguous scenes. We demonstrate the technique supporting a SLAM system driven by scan-matching laser data in a variety of settings. Some of the outdoor settings are beyond the capability of the SLAM system itself in which case GPS was used to provide a ground truth. We further show how the techniques can equally be applied to detect loop closure using spatial images taken with a scanning laser. We conclude with an extension of the loop closing technique to a multi-robot mapping problem in which the outputs of several, uncoordinated and SLAM-enabled robots are fused without requiring inter-vehicle observations or a-priori frame alignment.

244 citations

Journal ArticleDOI
TL;DR: In this paper, the authors present a system for autonomous mobile robot navigation with only an omnidirectional camera as sensor, which is able to build automatically and robustly accurate topologically organized environment maps of a complex, natural environment.
Abstract: In this work we present a novel system for autonomous mobile robot navigation. With only an omnidirectional camera as sensor, this system is able to build automatically and robustly accurate topologically organised environment maps of a complex, natural environment. It can localise itself using such a map at each moment, including both at startup (kidnapped robot) or using knowledge of former localisations. The topological nature of the map is similar to the intuitive maps humans use, is memory-efficient and enables fast and simple path planning towards a specified goal. We developed a real-time visual servoing technique to steer the system along the computed path. A key technology making this all possible is the novel fast wide baseline feature matching, which yields an efficient description of the scene, with a focus on man-made environments.

189 citations

Journal ArticleDOI
TL;DR: This work proposes a new method for learning overcomplete dictionaries that are adapted to the joint representation of stereo images and applies the learning algorithm to the case of omnidirectional images, where they learn scales of atoms in a parametric dictionary.
Abstract: One of the major challenges in multi-view imaging is the definition of a representation that reveals the intrinsic geometry of the visual information. Sparse image representations with overcomplete geometric dictionaries offer a way to efficiently approximate these images, such that the multi-view geometric structure becomes explicit in the representation. However, the choice of a good dictionary in this case is far from obvious. We propose a new method for learning overcomplete dictionaries that are adapted to the joint representation of stereo images. We first formulate a sparse stereo image model where the multi-view correlation is described by local geometric transforms of dictionary elements (atoms) in two stereo views. A maximum-likelihood (ML) method for learning stereo dictionaries is then proposed, where a multi-view geometry constraint is included in the probabilistic model. The ML objective function is optimized using the expectation-maximization algorithm. We apply the learning algorithm to the case of omnidirectional images, where we learn scales of atoms in a parametric dictionary. The resulting dictionaries provide better performance in the joint representation of stereo omnidirectional images as well as improved multi-view feature matching. We finally discuss and demonstrate the benefits of dictionary learning for distributed scene representation and camera pose estimation.

29 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations


"Global localization and relative po..." refers methods in this paper

  • ...More detailed discussion about enforcing the separation between the features, sampling of the scale space and improvement in feature localization can be found in [ 8 , 4]. Once the location and scale have been assigned to candidate keypoints, the dominant orientation is computed by determining peaks in the orientation histogram of its local neigh-Figure 1. Examples of scale invariant keypoints....

    [...]

  • ...In this paper we examine the effectiveness of scale-invariant (SIFT) features proposed by [ 8 ]....

    [...]

  • ...Commonly used representations are responses to a banks of filters [17], multi-dimensional histograms [12, 7], local Fourier-transforms [15] and affine invariant feature descriptors [ 8 ]....

    [...]

  • ...Our approach is motivated by the recent advances in object recognition using local scale invariant features proposed by [ 8 ] and adopts the strategy for localization by means of location recognition....

    [...]

Journal ArticleDOI
TL;DR: This paper addresses the problem of retrieving images from large image databases with a method based on local grayvalue invariants which are computed at automatically detected interest points and allows for efficient retrieval from a database of more than 1,000 images.
Abstract: This paper addresses the problem of retrieving images from large image databases. The method is based on local grayvalue invariants which are computed at automatically detected interest points. A voting algorithm and semilocal constraints make retrieval possible. Indexing allows for efficient retrieval from a database of more than 1,000 images. Experimental results show correct retrieval in the case of partial visibility, similarity transformations, extraneous features, and small perspective deformations.

1,756 citations


"Global localization and relative po..." refers background in this paper

  • ...The use of the local feature detectors in the context of object recognition has been demonstrated successfully by several researchers in the past [13, 11]....

    [...]

Proceedings ArticleDOI
20 Jun 2009
TL;DR: A prototype based model that can successfully combine local and global discriminative information is proposed that can significantly outperform a state of the art classifier for the indoor scene recognition task.
Abstract: Indoor scene recognition is a challenging open problem in high level vision. Most scene recognition models that work well for outdoor scenes perform poorly in the indoor domain. The main difficulty is that while some indoor scenes (e.g. corridors) can be well characterized by global spatial properties, others (e.g, bookstores) are better characterized by the objects they contain. More generally, to address the indoor scenes recognition problem we need a model that can exploit local and global discriminative information. In this paper we propose a prototype based model that can successfully combine both sources of information. To test our approach we created a dataset of 67 indoor scenes categories (the largest available) covering a wide range of domains. The results show that our approach can significantly outperform a state of the art classifier for the task.

1,517 citations

Book
14 Nov 2003
TL;DR: In this paper, the authors introduce the geometry of 3D vision, that is, the reconstruction of 3-D models of objects from a collection of 2-D images, and develop practical reconstruction algorithms and discuss possible extensions of the theory.
Abstract: This book introduces the geometry of 3-D vision, that is, the reconstruction of 3-D models of objects from a collection of 2-D images. It details the classic theory of two view geometry and shows that a more proper tool for studying the geometry of multiple views is the so-called rank consideration of the multiple view matrix. It also develops practical reconstruction algorithms and discusses possible extensions of the theory.

1,136 citations

Proceedings ArticleDOI
01 Sep 2002
TL;DR: This work introduces a family of features which use groups of interest points to form geometrically invariant descriptors of image regions to ensure robust matching between images in which there are large changes in viewpoint, scale and illumi- nation.
Abstract: This paper approaches the problem of ¯nding correspondences between images in which there are large changes in viewpoint, scale and illumi- nation. Recent work has shown that scale-space `interest points' may be found with good repeatability in spite of such changes. Further- more, the high entropy of the surrounding image regions means that local descriptors are highly discriminative for matching. For descrip- tors at interest points to be robustly matched between images, they must be as far as possible invariant to the imaging process. In this work we introduce a family of features which use groups of interest points to form geometrically invariant descriptors of image regions. Feature descriptors are formed by resampling the image rel- ative to canonical frames de¯ned by the points. In addition to robust matching, a key advantage of this approach is that each match implies a hypothesis of the local 2D (projective) transformation. This allows us to immediately reject most of the false matches using a Hough trans- form. We reject remaining outliers using RANSAC and the epipolar constraint. Results show that dense feature matching can be achieved in a few seconds of computation on 1GHz Pentium III machines.

723 citations


"Global localization and relative po..." refers background in this paper

  • ...More detailed discussion about enforcing the separation between the features, sampling of the scale space and improvement in feature localization can be found in [8, 4]....

    [...]