scispace - formally typeset
Open AccessJournal ArticleDOI

Omnidirectional Vision Based Topological Navigation

Reads0
Chats0
TLDR
In this paper, the authors present a system for autonomous mobile robot navigation with only an omnidirectional camera as sensor, which is able to build automatically and robustly accurate topologically organized environment maps of a complex, natural environment.
Abstract
In this work we present a novel system for autonomous mobile robot navigation. With only an omnidirectional camera as sensor, this system is able to build automatically and robustly accurate topologically organised environment maps of a complex, natural environment. It can localise itself using such a map at each moment, including both at startup (kidnapped robot) or using knowledge of former localisations. The topological nature of the map is similar to the intuitive maps humans use, is memory-efficient and enables fast and simple path planning towards a specified goal. We developed a real-time visual servoing technique to steer the system along the computed path. A key technology making this all possible is the novel fast wide baseline feature matching, which yields an efficient description of the scene, with a focus on man-made environments.

read more

Content maybe subject to copyright    Report

Omnidirectional Vision based Topological Navigation
Toon Goedem´e
1
, Marnix Nuttin
2
, Tinne Tuytelaars
1
, and Luc Van Gool
1,3
1
ESAT - PSI - VISICS
2
PMA
3
BIWI
University of Leuven, Belgium University of Leuven, Belgium ETH Z¨urich, Switzerland
Abstract
In this work we present a novel system for au-
tonomous mobile robot navigation. With only
an omnidirectional camera as sensor, this system
is able to build automatically and robustly accu-
rate topologically organise d environment maps of a
complex, natural environment. It can localise itself
using such a map at each moment, including both
at star tup (kidnapped ro bot) or using knowledge of
former localisations. The topological nature of the
map is similar to the intuitive maps humans use, is
memory-efficient and enables fast and simple path
planning towards a specified goal. We developed
a real-time visual servoing technique to steer the
system along the computed path.
A key technology making this all possible is the
novel fast wide baseline feature matching, which
yields an efficient description of the scene, with a
fo c us on man-ma de environments.
1 Introduction
1.1 Application
This paper describes a total navigation solution for
mobile robots. It enables a mobile robot to effi-
ciently localise itself and navigate in a large man-
made environment, which can be indoor, outdoor
or a combination of both. For instance, the inside
of a house, an entire university campus or even a
small city lie in the possibilities.
Traditionally, other sensors than c ameras are
used for robot navigation, like GPS and laser scan-
ners. Because GPS (and Galileo also) needs a di-
rect line of sight to the satellites [38], it cannot be
contact address: toon.goedeme@esat.kuleuven.be
Figure 1: Left: the robotic wheelchair platform.
Right: the omnidirectional camera, composed by a
colour camera and an hyperbolic mirror.
used indoors or in narr ow city centre streets, i.e.
the very conditions we forese e in our application.
Time-of-flight laser scanners are widely applicable,
but are expensive and voluminous, even when the
scanning field is restricted to a horizontal plane.
The latter only yields a poor world repr esentation,
with the risk of not detecting ess e ntial obstacles
such as table tops.
That is why we aim at a vision-only solution
to navigation. Vision is, in comparison with these
other sensors, much more informative. Moreover,
cameras are quite compact and increasingly cheap.
We observe also that many biological species, in
particular migratory birds, use mainly their visual
sensors for navigation. We chose to use an omni-
directional camera as visual sensor, because of its
wide field of view and thus rich information content
of the images acquired with. For the time being, we
added a range sensing device for o bs tacle detection,
but this is to be replaced by an omnidirectional vi-
1

sion range estimator under development [31].
Our method works with natural environments.
That means that the environment does not have
to be modified for navigation in any way. Indeed,
adding artificial markers to every r oom in a house
or to an entire city doesn’t seem feasible nor desir-
able.
In contrast to classical navigation methods, we
chose a topological representation of the environ-
ment, rather than a metrical one, because of its
resemblance to the intuitive system humans use for
navigation, its flexibility, wide usability, memory-
efficiency and ease for map building and path plan-
ning.
The ta rgeted application of this research is the
visual guidance of electric wheelchairs for severely
disabled people. In particular, the target group are
people not able to give detailed steering commands
to navigate around in their homes and local city
neighbourhoods. If it is possible for them to per-
form complicated navigational tasks by only giving
simple commands, their autonomy can be greatly
enhanced. For most of them such an increase of mo-
bility and independence from other people is very
welcome.
Our test platform and camer a are shown in fig. 1.
The price of such a robotic wheelchair is a serious
issue. With our method, the only additional hard-
ware req uired is a laptop (or an equivalent embed-
ded processor), a webcam, a mirror and (for the
time being) some ultrasound sensors. Because of
the increased independence of the users the cost
of personal helper s is reduced, ma king the robotic
wheelchair even more economica lly feasible.
1.2 Method overview
An overview of the navigation method presented is
given in fig. 2. The sy stem can be subdivided in
three parts: ma p building, localisation and locomo-
tion.
The map building stage has to be gone through
only once, to train the system in a new environ-
ment. The mobile system is lead through all parts
of the environment, while it takes images at a con-
stant rate (in our set-up one per s e c ond). Later,
this large set of omnidirectional images is automati-
cally analysed and converted into a topological map
of the environment, which is stored in the system’s
Figure 2: Overview of the navigation method
memory and will be used when the system is actu-
ally in use.
The next stage is localisation. When the system
is powered up somewhere in the environment, it
takes a new ima ge with its camera . This image is
rapidly compared with all the images in the envi-
ronment map, and an hypothesis is for med about
the present location of the mobile robot. This hy-
pothesis is refined using Bayes’ rule as soon as the
robot starts to move and new images come in.
The final stage is locomotion. When the present
location of the ro bot is known and a goal position
is communicated by the user to the robot, a path
can be planned towards that goal using the map.
The planned route is spec ified as a sequence of map
images, serving as a reference for what the robot
should subsequently see if on course. This path is
executed by means of a visual servoing algorithm:
each time a visual homing procedure is executed
towards the location where the next path image is
2

taken.
The main co ntributions of this paper are:
1. a fa st wide baseline matching technique, which
allows efficient, online co mparison of images,
2. a method to construct a top ological map,
which is robust to self-similarities in the envi-
ronment thanks to the use of Demster-Shafer
evidence collection,
3. a visual ser voing algorithm which is rubust to
occlusions and tracking losses,
4. the integration of all these components in a n
operational system.
The remainder of this paper is organised as fol-
lows. The next section gives an overview of the
related work. In section 3 , our core image analy-
sis a nd matching technique is explained: fast wide
baseline matching. The sections there after describe
the differe nt stages of our approach. Section 4
discusses the map building proce ss, s e c tion 5 ex-
plains the localisation method, sec tio n 6 describes
the path planning, and section 7 details the visual
servoing algorithm. We end with an overview of
exp erimental results (section 8) and a conclusion
(section 9).
2 Related Work
2.1 Image comparison
A good image comparison method is of utmost
impo rtance in a vision-based navigation approach.
Global methods compute a measure using all the
pixels of the entire image. Although these meth-
ods are fast, they cannot cope with e.g. occlusions
and severe viewpoint changes. On the other hand,
techniques that work at a local scale, extracting
and recognising local features, can be made robust
to these effects. The traditional disadvantage of
these local techniques is time complexity. In our
approach, we combine novel global and local ap-
proaches resulting in fast and accurate image com-
parison.
2.1.1 Global techniques
Many researchers use global image comparison
techniques. Straightforward global methods like
histogram-based matching, used by Ulrich and
Nourbakhsh [53] don’t seem distinctive enough for
our application. Stricke r [47] proposed a method
based on the Fourier-Mellin transform to compare
images. Unfortunately, the baseline can not be
large which restricts that method to tracking. An-
other popular technique is the use of an eigenspa c e
decomposition of the training images [20], which
yields a compact databas e. However, these meth-
ods proved not useful in general situations because
they are not robust enough against occlusions and
illumination changes. That is why Jogan et al. [21]
and Bischof et al. [4] developed a PCA-based im-
age comparison that is robust against partial oc-
clusions, respectively varying illumination.
2.1.2 Local techniques
A solution to be able to cope with partial occlusions
is comparing local regions in the images. The big
question is how to detect these local features, also
known as visual landmarks.
A simple solution to do this is by adding artifi-
cial markers to strategically chosen places in the
world. To make these features easily detectable
with a no rmal camera, they are given special (in-
dividual) photometric appearances (for instance
coloured patterns [37], LEDs [1] or even 2D bar-
codes [41]). Using such a rtificial markers is pe r-
fectly possible for some applications, but often dif-
ficult. Navigation through an entire city or inside
someone’s house are examples of cases where past-
ing these markers all over the place is hardly feasi-
ble and in no case desirable.
That is why, in this project we use natural land-
marks, extracted from the scene itself, without
modifications. Moreover, the extraction of these
landmarks must be automatic and robust against
changes in viewpoint and illumination to ensure the
detection of these landmarks under as many cir-
cumstances as possible.
Many researchers proposed algorithms for natu-
ral landmark detection. Mostly, local regions are
defined around interest points in the images. The
characterisation of these local regions with descrip-
tor vectors enables the regions to be compared
across images. Differences between approaches lie
in the way in which interest points, local image
regions, and descriptor vectors are ex tracted. An
early example is the work of Schmid and Mohr [42],
3

where geometric invariance was still under image
rotations only. Scaling was handled by using cir-
cular regions of several sizes. Lowe et al. [27] ex-
tended these ideas to real scale-invariance. More
general affine invariance has been achieved in the
work of Tuytelaars & Van Gool [51, 52], Matas et
al. [28], and Mikolajczyk & Schmid [30].
Although these methods are capable to find high
quality correspondences, most of them a re too slow
to use in a re al-time mobile robot algorithm. That
is w hy we propose a much faster alternative, as
explained in section 3.
2.2 Map structure
Many researchers proposed different ways to repre-
sent the environment perceived by vision sensors.
We can order all po ssible map organisations by
metrical detail: from dense 3D over sparse 3D to
topological maps. We believe that the outer topo-
logical end o f this spe c trum offers the to p opportu-
nities.
2.2.1 Dense 3D maps
One approach is building dense 3D models out o f
the incoming visual data [39, 34]. Such appr oach
has some disadvantages. It is computationally and
memory demanding, and can not cope with pla-
nar and ill-textured parts of the environment such
as walls. Nevertheless, these structures are om-
nipresent in our application, and collisions need to
be avoided.
2.2.2 Sparse 3D maps
One way to reduce the computational burden is
to make abs traction of the visual data. Instead of
modelling a dense 3D model co ntaining billions o f
voxels, a sparse 3 D model is built containing only
sp e c ial features, i.e. visual landmarks.
Examples of researchers solving the navigation
problem with sparse 3D maps of natural landmarks
are Se et al. [43] and Davison [8]. They position
natural features in a metrical frame, which is as big
as the entire mapped environment. Although less
than the dense 3D variant, these methods are still
computationally demanding for lar ge environments
since their complexity is quadratic in the number of
features in the model. Also, for larger models the
metric error accumulates, so that feature positions
are drifting away.
2.2.3 Topological maps
As a matter of fact, the need for ex plicit 3D maps in
navigation is questionable. One step further in the
abstraction of environment information is the in-
troduction of topological maps. The psychological
exp eriments of B¨ulthoff et al. [5] show that peo-
ple rely more on a topological map than a metrical
one for their navigation. In these topological ma ps,
locally places are described as a configuration of
natural landmarks. These places form the nodes
of the graph-like map, and are interconnected by
traversable paths. Other researchers [54, 53, 23]
also chose for topological maps, mainly because
they sc ale better to real-world applications than
metrical, deterministic representations, g iven the
complexity of unstructured environments. Other
advantages are the ease of path planning in such a
map and the absence of drift.
2.3 Toplogical map building
Vale [54] developed a clustering-based method for
automatic building of a topological environment
map out of a set of images. Unfortunately, his
method is only suited for image comparison tech-
niques which are a metric function (which doesn’t
hold for the similar ity measure we use ), and
does not give correct results if self-similarities are
present in the environment, i.e. places that are dif-
ferent but look similar.
Very popular are various probabilistic approaches
of the top ological map building problem. [40] fo r
instance use Bayesian inference to find the topolog-
ical structure that explains best a set of pa noramic
observations, while [45] fit hidden Markov models
to the data. If the state transition model of this
HMM is extended with robot action data, the latter
can b e modeled using a partially observable Markov
decision process or POMDP, as in [22] and [50]. [55]
solve the map building problem using graph cuts.
In contrast to these global topology fitting ap-
proaches, an alternative way is detecting loop clos-
ings. During a ride through the environment, sen-
sor data is recorded. Because it is known that
the driven path is traversable, an initial topolog i-
cal representation consists of one long edge between
4

start and end node. Now, extra links are created
where a certain place is revisited, i.e. an equivalent
sensor reading occurs twice in the sequence. This
is called a loop closing. A correct topological map
results if all loop closing links are added.
Also in loop closing, probabilistic methods are
introduced to cope with the uncertainty of link hy-
potheses and avoid links a t self-similarities. [7], for
instance, use Bayesian inference. [3] recently in-
troduced Dempster-Shafer probability theory into
loop closing, which has the advantage that igno-
rance can be modelled and no prior knowledge is
needed. Their approach is promising, but limited
to simple sensors and environments. In this pa-
per, we present a new framework for loop closing
using rich visual s ensors in natural complex envi-
ronments, which is also based on Dempster-Shafer.
2.4 Visual Servoing
As expla ined in section 6, the e xecution of a path
using such a topological environment map boils
down to a se ries of visual s e rvoing operations be-
tween places defined by images.
Cartwright and Collett [6] proposed the so-called
bearing-only ’snapshot’ model, inspired by the vi-
sual homing be haviour of insects such as bees and
ants. Their proposed algorithm consists of the con-
struction of a home vector, computed as the av-
erage of landmark displacement vectors. Franz et
al. [13] analysed the computational foundations of
this method and derived its error and convergence
properties. They conclude that every visual homing
method based solely on bearing angles of landmar ks
like this one, inevitably depends on bas ic assump-
tions such as equal landmark distances, isotropic
landmark distribution or the availability of an ex-
ternal compass reference. Unfortunately, because
none of these assumptions generally hold in our
targeted application we propose an alternative ap-
proach.
If both image dimensions are taken into ac c ount,
not limiting the available information to the bear-
ing angle, the most obvious choice is working via
epipolar geometry estimation (e.g. [51, 2]). Un-
fortunately, for perspective cameras this problem
is in many cases ill conditioned, although Svo-
boda [48] proved tha t motion estimation with om-
nidirectional images is much better conditioned.
That is why we chose a method based on omni-
directional epipolar geometry. Other work in this
field is the research of Mariottini et al. [29], who
split the homing procedure in a rotation phase and
a translation phase, but this approach can not be
used in our application because of the non-smooth
robot motion it produces.
3 Fast wide baseline matching
The novel technique we use for image comparison
is fast wide baseline matching. This key technique
enables ex traction of natural landmar ks and image
comparison for our map building, localisation and
visual servoing algorithms.
We use a combination of two different kinds of
wide baseline features, namely a rotation reduced
and colour enhanced fo rm of Lowe’s SIFT fea-
tures [27], and the invariant column segments we
developed [15]. These techniques extrac t local re-
gions in each imag e, and desc ribe these regions with
a vector of measures which are invariant to image
deformations and illumination changes. Across dif-
ferent images, similar regions can be found by com-
paring these descriptors. This makes it po ssible to
find correspondences between image s taken from
very different positions, or under different lighting
conditions. The crux of the matter is that the ex-
traction of these regio ns can be done beforehand
on each image separately, rather than during the
matching. Data base images ca n be processed off-
line, so that the images themselves do not have to
be available at the time of matching with another
image.
3.1 Camera motion constraint
The camera we use is a catadio ptr ic system, con-
sisting of an upward looking camera with a hyp e r-
boloidal mirror mounted above it. The result is
a field of view of 360
in horizontal direc tion a nd
more than 180
in vertical direction. The disadvan-
tage is that these ima ges conta in se vere distortions,
as seen for instance in fig. 5.
We presume the robot to move on one horizo ntal
plane. The optical axis of the camera is oriented
vertically. In other words, allowed movements con-
sist of translatio ns in the plane and rotatio n around
a vertical axis, see also figure 3.
5

Citations
More filters
Journal ArticleDOI

Visual-inertial navigation, mapping and localization: A scalable real-time causal approach

TL;DR: An integrated approach to ‘loop-closure’, that is the recognition of previously seen locations and the topological re-adjustment of the traveled path, is described, where loop-closure can be performed without the need to re-compute past trajectories or perform bundle adjustment.
Journal ArticleDOI

Large-Scale 6-DOF SLAM With Stereo-in-Hand

TL;DR: A system that can carry out simultaneous localization and mapping (SLAM) in large indoor and outdoor environments using a stereo pair moving with 6 DOF as the only sensor, which accommodates both monocular and stereo.
Journal IssueDOI

Visual teach and repeat for long-range rover autonomy

TL;DR: Because it enables long-range autonomous behavior in a single command cycle, visual teach and repeat is well suited to planetary applications, such as Mars sample return, in which no GPS is available.
Journal ArticleDOI

Vast-scale Outdoor Navigation Using Adaptive Relative Bundle Adjustment

TL;DR: A new relative bundle adjustment is derived which, instead of optimizing in a single Euclidean space, works in a metric space defined by a manifold, and it is shown experimentally that it is possible to solve for the full maximum-likelihood solution incrementally in constant time, even at loop closure.
Journal ArticleDOI

Vision-based topological mapping and localization methods

TL;DR: This paper reviews the main solutions presented in the last fifteen years of topological mapping and localization methods, and classify them in accordance to the kind of image descriptor employed, including global, local, BoW and combinations.
References
More filters
Journal ArticleDOI

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

TL;DR: New results are derived on the minimum number of landmarks needed to obtain a solution, and algorithms are presented for computing these minimum-landmark solutions in closed form that provide the basis for an automatic system that can solve the Location Determination Problem under difficult viewing.
Journal ArticleDOI

A note on two problems in connexion with graphs

TL;DR: A tree is a graph with one and only one path between every two nodes, where at least one path exists between any two nodes and the length of each branch is given.
Proceedings ArticleDOI

Object recognition from local scale-invariant features

TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Book

A mathematical theory of evidence

Glenn Shafer
TL;DR: This book develops an alternative to the additive set functions and the rule of conditioning of the Bayesian theory: set functions that need only be what Choquet called "monotone of order of infinity." and Dempster's rule for combining such set functions.
Book

Fundamentals of digital image processing

TL;DR: This chapter discusses two Dimensional Systems and Mathematical Preliminaries and their applications in Image Analysis and Computer Vision, as well as image reconstruction from Projections and image enhancement.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the contributions mentioned in the paper "Omnidirectional vision based topological navigation" ?

In this work the authors present a novel system for autonomous mobile robot navigation. 

To include colour information in the descriptor vector, the authors compute the colour invariants, based on generalised colour moments (equation 1), over the column segment. 

In their method, because it is computationally less intensive and gives real output values, the authors choose to use the seven first coefficients of the discrete cosine transform (DCT), instead of Fourier. 

Using their rotation reduced and colour enhanced algorithm, the authors see that up to 25 correct matches are found without including erroneous ones. 

Because of the increased independence of the users the cost of personal helpers is reduced, making the robotic wheelchair even more economically feasible. 

To characterise the intensity profile along the column segment, the best features to use are those obtained through the Karhunen-Lòeve transform (PCA). 

They conclude that every visual homing method based solely on bearing angles of landmarks like this one, inevitably depends on basic assumptions such as equal landmark distances, isotropic landmark distribution or the availability of an external compass reference. 

Very popular are various probabilistic approaches of the topological map building problem. [40] for instance use Bayesian inference to find the topological structure that explains best a set of panoramic observations, while [45] fit hidden Markov models to the data. 

The authors characterise the extracted column segments with a descriptor that holds information about colour and intensity properties of the segment. 

These techniques extract local regions in each image, and describe these regions with a vector of measures which are invariant to image deformations and illumination changes. 

In the image sequence, visual features move only a little from one image to the next, which enables to find the new feature position in a small search space. 

Repeated similar experiments showed an average homing accuracy of 11 cm, with a standard deviation of 5 cm, after a homing distance of around 3 m. 

How the user of the system, for instance the wheelchair patient, gives the instruction to go towards a certain goal is highly dependent on the situation. 

Time-of-flight laser scanners are widely applicable, but are expensive and voluminous, even when the scanning field is restricted to a horizontal plane.