What are the contributions mentioned in the paper "Omnidirectional vision based topological navigation" ?

In this work the authors present a novel system for autonomous mobile robot navigation.

How do the authors include colour information in the descriptor vector?

To include colour information in the descriptor vector, the authors compute the colour invariants, based on generalised colour moments (equation 1), over the column segment.

Why do the authors use the DCT instead of Fourier?

In their method, because it is computationally less intensive and gives real output values, the authors choose to use the seven first coefficients of the discrete cosine transform (DCT), instead of Fourier.

How many correct matches are found without including erroneous ones?

Using their rotation reduced and colour enhanced algorithm, the authors see that up to 25 correct matches are found without including erroneous ones.

What is the feature to use to characterise the intensity profile along the column segment?

To characterise the intensity profile along the column segment, the best features to use are those obtained through the Karhunen-Lòeve transform (PCA).

What is the popular approach to the topological map building problem?

Very popular are various probabilistic approaches of the topological map building problem. [40] for instance use Bayesian inference to find the topological structure that explains best a set of panoramic observations, while [45] fit hidden Markov models to the data.

What is the way to characterise the column segments?

The authors characterise the extracted column segments with a descriptor that holds information about colour and intensity properties of the segment.

What is the way to find the new feature position in a small search space?

In the image sequence, visual features move only a little from one image to the next, which enables to find the new feature position in a small search space.

How many times did the same experiment show an average homing accuracy?

Repeated similar experiments showed an average homing accuracy of 11 cm, with a standard deviation of 5 cm, after a homing distance of around 3 m.

How does the user of the system give the instruction to go towards a certain goal?

How the user of the system, for instance the wheelchair patient, gives the instruction to go towards a certain goal is highly dependent on the situation.

(Open Access) Omnidirectional Vision Based Topological Navigation (2007) | Toon Goedemé

Q: What is the key technique used to extract local regions in images?

These techniques extract local regions in each image, and describe these regions with a vector of measures which are invariant to image deformations and illumination changes.

Omnidirectional Vision based Topological Navigation

Toon Goedem´e

1∗

, Marnix Nuttin

, Tinne Tuytelaars

, and Luc Van Gool

1,3

ESAT - PSI - VISICS

PMA

BIWI

University of Leuven, Belgium University of Leuven, Belgium ETH Z¨urich, Switzerland

Abstract

In this work we present a novel system for au-

tonomous mobile robot navigation. With only

an omnidirectional camera as sensor, this system

is able to build automatically and robustly accu-

rate topologically organise d environment maps of a

complex, natural environment. It can localise itself

using such a map at each moment, including both

at star tup (kidnapped ro bot) or using knowledge of

former localisations. The topological nature of the

map is similar to the intuitive maps humans use, is

memory-eﬃcient and enables fast and simple path

planning towards a speciﬁed goal. We developed

a real-time visual servoing technique to steer the

system along the computed path.

A key technology making this all possible is the

novel fast wide baseline feature matching, which

yields an eﬃcient description of the scene, with a

fo c us on man-ma de environments.

1 Introduction

1.1 Application

This paper describes a total navigation solution for

mobile robots. It enables a mobile robot to eﬃ-

ciently localise itself and navigate in a large man-

made environment, which can be indoor, outdoor

or a combination of both. For instance, the inside

of a house, an entire university campus or even a

small city lie in the possibilities.

Traditionally, other sensors than c ameras are

used for robot navigation, like GPS and laser scan-

ners. Because GPS (and Galileo also) needs a di-

rect line of sight to the satellites [38], it cannot be

∗

contact address: toon.goedeme@esat.kuleuven.be

Figure 1: Left: the robotic wheelchair platform.

Right: the omnidirectional camera, composed by a

colour camera and an hyperbolic mirror.

used indoors or in narr ow city centre streets, i.e.

the very conditions we forese e in our application.

Time-of-ﬂight laser scanners are widely applicable,

but are expensive and voluminous, even when the

scanning ﬁeld is restricted to a horizontal plane.

The latter only yields a poor world repr esentation,

with the risk of not detecting ess e ntial obstacles

such as table tops.

That is why we aim at a vision-only solution

to navigation. Vision is, in comparison with these

other sensors, much more informative. Moreover,

cameras are quite compact and increasingly cheap.

We observe also that many biological species, in

particular migratory birds, use mainly their visual

sensors for navigation. We chose to use an omni-

directional camera as visual sensor, because of its

wide ﬁeld of view and thus rich information content

of the images acquired with. For the time being, we

added a range sensing device for o bs tacle detection,

but this is to be replaced by an omnidirectional vi-

sion range estimator under development [31].

Our method works with natural environments.

That means that the environment does not have

to be modiﬁed for navigation in any way. Indeed,

adding artiﬁcial markers to every r oom in a house

or to an entire city doesn’t seem feasible nor desir-

able.

In contrast to classical navigation methods, we

chose a topological representation of the environ-

ment, rather than a metrical one, because of its

resemblance to the intuitive system humans use for

navigation, its ﬂexibility, wide usability, memory-

eﬃciency and ease for map building and path plan-

ning.

The ta rgeted application of this research is the

visual guidance of electric wheelchairs for severely

disabled people. In particular, the target group are

people not able to give detailed steering commands

to navigate around in their homes and local city

neighbourhoods. If it is possible for them to per-

form complicated navigational tasks by only giving

simple commands, their autonomy can be greatly

enhanced. For most of them such an increase of mo-

bility and independence from other people is very

welcome.

Our test platform and camer a are shown in ﬁg. 1.

The price of such a robotic wheelchair is a serious

issue. With our method, the only additional hard-

ware req uired is a laptop (or an equivalent embed-

ded processor), a webcam, a mirror and (for the

time being) some ultrasound sensors. Because of

the increased independence of the users the cost

of personal helper s is reduced, ma king the robotic

wheelchair even more economica lly feasible.

1.2 Method overview

An overview of the navigation method presented is

given in ﬁg. 2. The sy stem can be subdivided in

three parts: ma p building, localisation and locomo-

tion.

The map building stage has to be gone through

only once, to train the system in a new environ-

ment. The mobile system is lead through all parts

of the environment, while it takes images at a con-

stant rate (in our set-up one per s e c ond). Later,

this large set of omnidirectional images is automati-

cally analysed and converted into a topological map

of the environment, which is stored in the system’s

Figure 2: Overview of the navigation method

memory and will be used when the system is actu-

ally in use.

The next stage is localisation. When the system

is powered up somewhere in the environment, it

takes a new ima ge with its camera . This image is

rapidly compared with all the images in the envi-

ronment map, and an hypothesis is for med about

the present location of the mobile robot. This hy-

pothesis is reﬁned using Bayes’ rule as soon as the

robot starts to move and new images come in.

The ﬁnal stage is locomotion. When the present

location of the ro bot is known and a goal position

is communicated by the user to the robot, a path

can be planned towards that goal using the map.

The planned route is spec iﬁed as a sequence of map

images, serving as a reference for what the robot

should subsequently see if on course. This path is

executed by means of a visual servoing algorithm:

each time a visual homing procedure is executed

towards the location where the next path image is

taken.

The main co ntributions of this paper are:

1. a fa st wide baseline matching technique, which

allows eﬃcient, online co mparison of images,

2. a method to construct a top ological map,

which is robust to self-similarities in the envi-

ronment thanks to the use of Demster-Shafer

evidence collection,

3. a visual ser voing algorithm which is rubust to

occlusions and tracking losses,

4. the integration of all these components in a n

operational system.

The remainder of this paper is organised as fol-

lows. The next section gives an overview of the

related work. In section 3 , our core image analy-

sis a nd matching technique is explained: fast wide

baseline matching. The sections there after describe

the diﬀere nt stages of our approach. Section 4

discusses the map building proce ss, s e c tion 5 ex-

plains the localisation method, sec tio n 6 describes

the path planning, and section 7 details the visual

servoing algorithm. We end with an overview of

exp erimental results (section 8) and a conclusion

(section 9).

2 Related Work

2.1 Image comparison

A good image comparison method is of utmost

impo rtance in a vision-based navigation approach.

Global methods compute a measure using all the

pixels of the entire image. Although these meth-

ods are fast, they cannot cope with e.g. occlusions

and severe viewpoint changes. On the other hand,

techniques that work at a local scale, extracting

and recognising local features, can be made robust

to these eﬀects. The traditional disadvantage of

these local techniques is time complexity. In our

approach, we combine novel global and local ap-

proaches resulting in fast and accurate image com-

parison.

2.1.1 Global techniques

Many researchers use global image comparison

techniques. Straightforward global methods like

histogram-based matching, used by Ulrich and

Nourbakhsh [53] don’t seem distinctive enough for

our application. Stricke r [47] proposed a method

based on the Fourier-Mellin transform to compare

images. Unfortunately, the baseline can not be

large which restricts that method to tracking. An-

other popular technique is the use of an eigenspa c e

decomposition of the training images [20], which

yields a compact databas e. However, these meth-

ods proved not useful in general situations because

they are not robust enough against occlusions and

illumination changes. That is why Jogan et al. [21]

and Bischof et al. [4] developed a PCA-based im-

age comparison that is robust against partial oc-

clusions, respectively varying illumination.

2.1.2 Local techniques

A solution to be able to cope with partial occlusions

is comparing local regions in the images. The big

question is how to detect these local features, also

known as visual landmarks.

A simple solution to do this is by adding artiﬁ-

cial markers to strategically chosen places in the

world. To make these features easily detectable

with a no rmal camera, they are given special (in-

dividual) photometric appearances (for instance

coloured patterns [37], LEDs [1] or even 2D bar-

codes [41]). Using such a rtiﬁcial markers is pe r-

fectly possible for some applications, but often dif-

ﬁcult. Navigation through an entire city or inside

someone’s house are examples of cases where past-

ing these markers all over the place is hardly feasi-

ble and in no case desirable.

That is why, in this project we use natural land-

marks, extracted from the scene itself, without

modiﬁcations. Moreover, the extraction of these

landmarks must be automatic and robust against

changes in viewpoint and illumination to ensure the

detection of these landmarks under as many cir-

cumstances as possible.

Many researchers proposed algorithms for natu-

ral landmark detection. Mostly, local regions are

deﬁned around interest points in the images. The

characterisation of these local regions with descrip-

tor vectors enables the regions to be compared

across images. Diﬀerences between approaches lie

in the way in which interest points, local image

regions, and descriptor vectors are ex tracted. An

early example is the work of Schmid and Mohr [42],

where geometric invariance was still under image

rotations only. Scaling was handled by using cir-

cular regions of several sizes. Lowe et al. [27] ex-

tended these ideas to real scale-invariance. More

general aﬃne invariance has been achieved in the

work of Tuytelaars & Van Gool [51, 52], Matas et

al. [28], and Mikolajczyk & Schmid [30].

Although these methods are capable to ﬁnd high

quality correspondences, most of them a re too slow

to use in a re al-time mobile robot algorithm. That

is w hy we propose a much faster alternative, as

explained in section 3.

2.2 Map structure

Many researchers proposed diﬀerent ways to repre-

sent the environment perceived by vision sensors.

We can order all po ssible map organisations by

metrical detail: from dense 3D over sparse 3D to

topological maps. We believe that the outer topo-

logical end o f this spe c trum oﬀers the to p opportu-

nities.

2.2.1 Dense 3D maps

One approach is building dense 3D models out o f

the incoming visual data [39, 34]. Such appr oach

has some disadvantages. It is computationally and

memory demanding, and can not cope with pla-

nar and ill-textured parts of the environment such

as walls. Nevertheless, these structures are om-

nipresent in our application, and collisions need to

be avoided.

2.2.2 Sparse 3D maps

One way to reduce the computational burden is

to make abs traction of the visual data. Instead of

modelling a dense 3D model co ntaining billions o f

voxels, a sparse 3 D model is built containing only

sp e c ial features, i.e. visual landmarks.

Examples of researchers solving the navigation

problem with sparse 3D maps of natural landmarks

are Se et al. [43] and Davison [8]. They position

natural features in a metrical frame, which is as big

as the entire mapped environment. Although less

than the dense 3D variant, these methods are still

computationally demanding for lar ge environments

since their complexity is quadratic in the number of

features in the model. Also, for larger models the

metric error accumulates, so that feature positions

are drifting away.

2.2.3 Topological maps

As a matter of fact, the need for ex plicit 3D maps in

navigation is questionable. One step further in the

abstraction of environment information is the in-

troduction of topological maps. The psychological

exp eriments of B¨ulthoﬀ et al. [5] show that peo-

ple rely more on a topological map than a metrical

one for their navigation. In these topological ma ps,

locally places are described as a conﬁguration of

natural landmarks. These places form the nodes

of the graph-like map, and are interconnected by

traversable paths. Other researchers [54, 53, 23]

also chose for topological maps, mainly because

they sc ale better to real-world applications than

metrical, deterministic representations, g iven the

complexity of unstructured environments. Other

advantages are the ease of path planning in such a

map and the absence of drift.

2.3 Toplogical map building

Vale [54] developed a clustering-based method for

automatic building of a topological environment

map out of a set of images. Unfortunately, his

method is only suited for image comparison tech-

niques which are a metric function (which doesn’t

hold for the similar ity measure we use ), and

does not give correct results if self-similarities are

present in the environment, i.e. places that are dif-

ferent but look similar.

Very popular are various probabilistic approaches

of the top ological map building problem. [40] fo r

instance use Bayesian inference to ﬁnd the topolog-

ical structure that explains best a set of pa noramic

observations, while [45] ﬁt hidden Markov models

to the data. If the state transition model of this

HMM is extended with robot action data, the latter

can b e modeled using a partially observable Markov

decision process or POMDP, as in [22] and [50]. [55]

solve the map building problem using graph cuts.

In contrast to these global topology ﬁtting ap-

proaches, an alternative way is detecting loop clos-

ings. During a ride through the environment, sen-

sor data is recorded. Because it is known that

the driven path is traversable, an initial topolog i-

cal representation consists of one long edge between

start and end node. Now, extra links are created

where a certain place is revisited, i.e. an equivalent

sensor reading occurs twice in the sequence. This

is called a loop closing. A correct topological map

results if all loop closing links are added.

Also in loop closing, probabilistic methods are

introduced to cope with the uncertainty of link hy-

potheses and avoid links a t self-similarities. [7], for

instance, use Bayesian inference. [3] recently in-

troduced Dempster-Shafer probability theory into

loop closing, which has the advantage that igno-

rance can be modelled and no prior knowledge is

needed. Their approach is promising, but limited

to simple sensors and environments. In this pa-

per, we present a new framework for loop closing

using rich visual s ensors in natural complex envi-

ronments, which is also based on Dempster-Shafer.

2.4 Visual Servoing

As expla ined in section 6, the e xecution of a path

using such a topological environment map boils

down to a se ries of visual s e rvoing operations be-

tween places deﬁned by images.

Cartwright and Collett [6] proposed the so-called

bearing-only ’snapshot’ model, inspired by the vi-

sual homing be haviour of insects such as bees and

ants. Their proposed algorithm consists of the con-

struction of a home vector, computed as the av-

erage of landmark displacement vectors. Franz et

al. [13] analysed the computational foundations of

this method and derived its error and convergence

properties. They conclude that every visual homing

method based solely on bearing angles of landmar ks

like this one, inevitably depends on bas ic assump-

tions such as equal landmark distances, isotropic

landmark distribution or the availability of an ex-

ternal compass reference. Unfortunately, because

none of these assumptions generally hold in our

targeted application we propose an alternative ap-

proach.

If both image dimensions are taken into ac c ount,

not limiting the available information to the bear-

ing angle, the most obvious choice is working via

epipolar geometry estimation (e.g. [51, 2]). Un-

fortunately, for perspective cameras this problem

is in many cases ill conditioned, although Svo-

boda [48] proved tha t motion estimation with om-

nidirectional images is much better conditioned.

That is why we chose a method based on omni-

directional epipolar geometry. Other work in this

ﬁeld is the research of Mariottini et al. [29], who

split the homing procedure in a rotation phase and

a translation phase, but this approach can not be

used in our application because of the non-smooth

robot motion it produces.

3 Fast wide baseline matching

The novel technique we use for image comparison

is fast wide baseline matching. This key technique

enables ex traction of natural landmar ks and image

comparison for our map building, localisation and

visual servoing algorithms.

We use a combination of two diﬀerent kinds of

wide baseline features, namely a rotation reduced

and colour enhanced fo rm of Lowe’s SIFT fea-

tures [27], and the invariant column segments we

developed [15]. These techniques extrac t local re-

gions in each imag e, and desc ribe these regions with

a vector of measures which are invariant to image

deformations and illumination changes. Across dif-

ferent images, similar regions can be found by com-

paring these descriptors. This makes it po ssible to

ﬁnd correspondences between image s taken from

very diﬀerent positions, or under diﬀerent lighting

conditions. The crux of the matter is that the ex-

traction of these regio ns can be done beforehand

on each image separately, rather than during the

matching. Data base images ca n be processed oﬀ-

line, so that the images themselves do not have to

be available at the time of matching with another

image.

3.1 Camera motion constraint

The camera we use is a catadio ptr ic system, con-

sisting of an upward looking camera with a hyp e r-

boloidal mirror mounted above it. The result is

a ﬁeld of view of 360

◦

in horizontal direc tion a nd

more than 180

◦

in vertical direction. The disadvan-

tage is that these ima ges conta in se vere distortions,

as seen for instance in ﬁg. 5.

We presume the robot to move on one horizo ntal

plane. The optical axis of the camera is oriented

vertically. In other words, allowed movements con-

sist of translatio ns in the plane and rotatio n around

a vertical axis, see also ﬁgure 3.

Omnidirectional Vision Based Topological Navigation

Figures

Citations

Visual-inertial navigation, mapping and localization: A scalable real-time causal approach

Large-Scale 6-DOF SLAM With Stereo-in-Hand

Visual teach and repeat for long-range rover autonomy

Vast-scale Outdoor Navigation Using Adaptive Relative Bundle Adjustment

Vision-based topological mapping and localization methods

References

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

A note on two problems in connexion with graphs

Object recognition from local scale-invariant features

A mathematical theory of evidence

Fundamentals of digital image processing

Related Papers (5)

Distinctive Image Features from Scale-Invariant Keypoints

MonoSLAM: Real-Time Single Camera SLAM

SURF: speeded up robust features

Speeded-Up Robust Features (SURF)

Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography

Frequently Asked Questions (14)

Q1. What are the contributions mentioned in the paper "Omnidirectional vision based topological navigation" ?

Q2. How do the authors include colour information in the descriptor vector?

Q3. Why do the authors use the DCT instead of Fourier?

Q4. How many correct matches are found without including erroneous ones?

Q5. Why is the robotic wheelchair more economically feasible?

Q6. What is the feature to use to characterise the intensity profile along the column segment?

Q7. What are the main assumptions that are used to determine the homing of landmarks?

Q8. What is the popular approach to the topological map building problem?

Q9. What is the way to characterise the column segments?

Q10. What is the key technique used to extract local regions in images?

Q11. What is the way to find the new feature position in a small search space?

Q12. How many times did the same experiment show an average homing accuracy?

Q13. How does the user of the system give the instruction to go towards a certain goal?

Q14. What is the way to use a laser scanner?