What contributions have the authors mentioned in the paper "Appearance-based segmentation of indoors/outdoors sequences of spherical views" ?

In a previous work [ 11 ], the authors addressed metric and topological representations thanks to a multi-cameras system which allows building of dense visual maps of large scale 3D environments. The authors aim at detecting the changes in the structural properties of the scene during navigation. The work presented here is a further step toward a semantic representation.

What are the future works mentioned in the paper "Appearance-based segmentation of indoors/outdoors sequences of spherical views" ?

For future work, the authors plan to improve their algorithm robustness to illumination condition following [ 6 ] and its rotation independence. De-rotation mechanism can be applied as rotations can be estimated from spectra.

What is the way to detect change-points?

The segmentation algorithm relies on an efficient change-point detection based on multi-hypothesis testing and allowing constant time computation.

What is the purpose of the algorithm?

In a longer term, the segmentation algorithm could be coupled with a loop closure detection algorithm in orderto improve change-point localization stability and with a semantic level by adding place classification and labelling.

What is the meaning of spherical harmonics?

Pml corresponds to the associated Legendre polynomials with x ∈ [−1, 1] such that:Pml (x) = (−1)m(1− x2)m/22ll!dl+mdxl+m (x2 − 1)l (3)Every function defined on the sphere surface can be decomposed in a sum of spherical harmonics as follows:f = ∑l∈N∑|m|≤lfml Y m l (4)The fml coefficients are obtained from a function f by:fml =∫η∈S2 f(η)Y ml (η)dη (5)If fml = 0 for all l > L, f is said to be band limited with a bandwidth L. The coefficients set fml is called the spherical Fourier transform or the spectrum of f .

What is the description of a place?

As descriptors are based on appearance frequencies, when the robot approaches walls, frequencies become lower and a new topological place is defined.

What is the density function for each hypothesis?

Let’s assume the density functions under each hypothesis, i.e. f0 and f1, follow a multivariate normal distribution:f0 ∼ N (µ0,Σ0 f1 ∼ N (µ1,Σ1) (10)As each hypothesis is characteristic of one topological place, density functions characterize the structural parameters of topological places.

What is the definition of spherical harmonics?

Spherical harmonics being a frequency description of the spherical image, the authors propose to directly use the spectrum as a structure descriptor.

(Open Access) Appearance-based segmentation of indoors/outdoors sequences of spherical views (2013) | Alexandre Chapoulie

HAL Id: hal-00845450

https://hal.inria.fr/hal-00845450

Submitted on 17 Jul 2013

HAL is a multi-disciplinary open access

archive for the deposit and dissemination of sci-

entic research documents, whether they are pub-

lished or not. The documents may come from

teaching and research institutions in France or

abroad, or from public or private research centers.

L’archive ouverte pluridisciplinaire HAL, est

destinée au dépôt et à la diusion de documents

scientiques de niveau recherche, publiés ou non,

émanant des établissements d’enseignement et de

recherche français ou étrangers, des laboratoires

publics ou privés.

Appearance-based segmentation of indoors/outdoors

sequences of spherical views

Alexandre Chapoulie, Patrick Rives, David Filliat

To cite this version:

Alexandre Chapoulie, Patrick Rives, David Filliat. Appearance-based segmentation of in-

doors/outdoors sequences of spherical views. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems,

IROS’2013, Nov 2013, Tokyo, Japan. pp.1946-1951. �hal-00845450�

Appearance-based segmentation of indoors/outdoors sequences of

spherical views

Alexandre Chapoulie

, Patrick Rives

and David Filliat

Abstract—Navigating in large scale, complex and dynamic

environments requires reliable representations able to capture

metric, topological and semantic aspects of the scene for sup-

porting path planing and real time motion control. In a previous

work [11], we addressed metric and topological representations

thanks to a multi-cameras system which allows building of

dense visual maps of large scale 3D environments. The map

is a set of locally accurate spherical panoramas related by 6dof

poses graph. The work presented here is a further step toward a

semantic representation. We aim at detecting the changes in the

structural properties of the scene during navigation. Structural

properties are estimated online using a global descriptor relying

on spherical harmonics which are particularly well-ﬁtted to

capture properties in spherical views. A change-point detection

algorithm based on a statistical Neyman-Pearson test allows us

to ﬁnd optimal transitions between topological places. Results

are presented and discussed both for indoors and outdoors

experiments.

I. INTRODUCTION

Navigating in large scale, complex and dynamic environ-

ments is a challenging task for autonomous mobile robots.

Reliable representations able to capture metric, topological

and semantic aspects of the scene have to be built for sup-

porting path planing and real time motion control algorithms

[14]. It is usual to deﬁne three levels of representation as

illustrated in ﬁg. 1. Metric representation is used at the

control level in the design of trajectory tracking algorithms

[4]. Topological representation captures the environment

accessibility properties in a graph structure and provides a

ﬁrst level of abstraction allowing complex navigation tasks

in large scale environments [21]. Semantic representation

consists in adding information about the places represented

by nodes in the graph used at the topological level. The

semantic information can be basically the name of a place

[16] or its main characteristic such as ofﬁce or corridor [24].

The added information can also refer to objects presence or

other kind of information linked to the place. This level, with

a higher degree of abstraction, allows us to specify context-

based navigation tasks in terms of queries [7].

In [11], we addressed metric and topological representa-

tion levels thanks to a multi-cameras system onboard a man-

driven car which allows building of dense visual maps of

large scale 3D environments. As in Google Street View [23],

the map is composed of a set of locally accurate spherical

panoramas (ﬁg. 2) built online along the car trajectory. The

INRIA Sophia Antipolis - M

editerran

ee, 2004 route

des Lucioles - BP 93, 06902 Sophia Antipolis, France

firstname.lastname@inria.fr

ENSTA ParisTech, 32 Boulevard Victor, 75739 Paris, France

david.filliat@ensta-paristech.fr

1234567

895AB567

C3DDE

F3DD

96B29

12345678

9ABACAD784C

E26F78

Fig. 1. Navigation-based representation

spherical views are related by 6dof poses graph estimated

using a direct multi-views registration technique [12].

Fig. 2. Example of spherical view (Inria Campus Dataset).

The work presented here is a further step toward a

semantic representation of the scene. We aim at detecting

changes in the scene structural properties (such as textures,

appearance, frequency and orientation of the straight lines,

curvatures, repeated patterns) during navigation. A place, in

this work, is therefore associated to a segment of the robot

trajectory where the scene is sufﬁciently self similar, i.e. has

the same structural properties extracted from the spherical

views. The main advantage of this deﬁnition is that it ﬁts

both to indoor and outdoor environments in order to partition

the topological graph in terms of meaningful places. Such

partition also provides advantages such as increasing loop

closure algorithms efﬁciency [10] and can be viewed as a

ﬁrst step to environment semantic labeling.

In [3], we presented preliminary results where the struc-

tural properties were estimated using a global descriptor

called GIST specially modiﬁed to deal with spherical images.

Given our place deﬁnition, GIST appears more adapted than

local descriptors like SIFT used in [17] and [25]. Without

additional constraints, local descriptors have difﬁculty to

represent the environment global consistency. Since it has

been introduced [15], GIST has been used multiple time in

image-based learning algorithms and in robotics for place

recognition and loop closure detection [13] or for indoor re-

gion classiﬁcation [18]. Despite these good properties, GIST

is not well adapted to encompass the spherical representation

richness because sphere spatial periodicity is partially lost.

In this paper, we propose a novel representation relying

on spherical harmonics which are particularly well-ﬁtted to

capture the structural properties in spherical views.

In the following, section 2 presents the representation

based on spherical harmonics. Section 3 is devoted to the

detection of statistical changes in the scene structural prop-

erties. Experimental results for indoor and outdoor environ-

ments are provided in section 4. The proposed method is

discussed in section 5.

II. SPHERICAL HARMONICS

Spherical harmonics are similar to the 2D Fourier trans-

form but deﬁned on the sphere surface and take complete

advantage of the spherical representation. Noticeably, the

complete spatial periodicity of the sphere is integrated into

the spherical harmonics computation. They have already

shown their usefulness in the domain of robotics for local-

ization [5] and for visual odometry [9]. Spherical harmonics

will be used here to deﬁne a new scene structure descriptor.

A. Deﬁnition

In this paper, we only detail the application of spherical

harmonics to our problem. Further mathematical details

about spherical harmonics can be found in [2], [1], [8].

The unit sphere S

included in R

is parametrized using

spherical coordinates. An element η of S

is written:

η =



cos(θ)sin(φ), sin(θ)sin(φ), cos(φ)



(1)

The spherical harmonics are deﬁned by:

(η) =

2l + 1

4π

(l − m)!

(l + m)!

|m|

(cos (φ)) e

jmθ

(2)

with l ∈ N and |m| ≤ l where l is the band number

corresponding to a frequency and m is an orientation param-

eter. P

corresponds to the associated Legendre polynomials

with x ∈ [−1, 1] such that:

(x) =

(−1)

(1 − x

)

m/2

l+m

− 1)

(3)

Every function deﬁned on the sphere surface can be

decomposed in a sum of spherical harmonics as follows:

f =

l∈N

|m|≤l

(4)

The f

coefﬁcients are obtained from a function f by:

η ∈S

f(η)

(η)dη (5)

If f

= 0 for all l > L, f is said to be band limited

with a bandwidth L. The coefﬁcients set f

is called the

spherical Fourier transform or the spectrum of f. The ﬁrst

ﬁve spherical harmonics bands are displayed in ﬁg. 3.

Fig. 3. The ﬁrst ﬁve spherical harmonics bands are presented as unsigned

spherical functions from the origin and by color on the unit sphere. Green

corresponds to positive values and red to negative values. (From [8])

Due to the integral, f

coefﬁcients exact computation can

be very time consuming. While it exists the fast Fourier trans-

form, there exists a fast method to compute those coefﬁcients,

based on the Monte Carlo integration, precomputed tables

and the properties of the associated Legendre polynomials.

This method is widely used in computer graphics for real-

time lighting rendering. Further details can be found in [8].

B. Spherical harmonics as environment structure description

Assuming that environment structure information is con-

tained in the spherical image frequencies, pixel intensities

can be chosen as the samples f(x

) values of the function

f. Spherical harmonics being a frequency description of the

spherical image, we propose to directly use the spectrum as

a structure descriptor. Frequency information corresponds to

band number l and orientation information to parameter m

(the higher l is, the higher the frequency is, see ﬁg. 3). The

spectrum coefﬁcients f

are stacked into a vector which

constitutes the global structure descriptor.

The number of bands used is an important parameter. In

the case of the 2D discrete Fourier transform, the spectrum

size is constrained by the image size. In the case of the spher-

ical harmonics, nothing constraints the required number of

bands. The number of coefﬁcients follows a square function

of the number of bands. The descriptor size is S

= l

. In

ﬁg.3, l = 5 and we have l

= 25 coefﬁcients.

In computer graphics, only three bands are used due to an

exponential attenuation in bands of higher frequencies [8].

For our study, there is no such attenuation and it is hard

to determine the required number of bands. In [5], precise

localization is achieved using only the ﬁrst ﬁve bands. While

we seek a global description of the environment, the ﬁrst ﬁve

bands should guarantee a sufﬁcient information.

III. CHANGE-POINT DETECTION ALGORITHM

A. Hypotheses and assumptions

According to our place deﬁnition as a set of positions

from which environment structure is similar, we aim to

detect the signiﬁcant changes in the global descriptor value

along the sequence of spherical views. This can be viewed

as novelty detection as used in [19] or [20] for vehicle

safeguarding or as change-point detection as used in [17]

and [16] for landmark detection and place labelling. Change-

point detection is based on hypothesis testing:

• Null hypothesis H

is the normal situation in which the

observed parameters stick to the previous model.

• Alternate hypothesis H

is the alternate situation where

parameters vary from the previous model.

Change-point detection algorithm evaluates the monitored

parameters and determines when a switch occurs from hy-

pothesis H

to hypothesis H

Let us assume a set of independent input observations:

, X

, ..., X

τ−1

, X

, ..., X

(6)

Assume that the input observations X

, ..., X

τ−1

are inde-

pendent random variables with a probability density function

), while the observations X

, ... are independent ran-

dom variables with a probability density function f

Let us assume that f

is the probability density function

under hypothesis H

and f

under H

. Suppose we have

, ..., X

observations up to an instance t and we test the

above hypotheses for these observations. The likelihood ratio

(eq. 7) indicates whether the value X

mostly belongs to f

or f

= ln

)

(7)

The Neyman-Pearson lemma conducting a simple hypothesis

test, as used in [22], deﬁnes the uniformly most powerful test

as the one rejecting the null hypothesis H

whenever:

j=τ

)

j=τ

> ν (8)

The above equation yields to the simple hypothesis test:

= min{t : arg max

0≤τ≤t

j=τ

)

> ν} (9)

where ν is the threshold controlling the detection sensitivity.

arg max

0≤τ≤t

j=τ

)

> ν returns the instant τ giving the

maximum of dissimilarity between f

and f

. t being the

current instant, t

will be either t leading to no change-point

detection or τ which is the exact change-point instant.

This algorithm gives the exact change-point instants

whereas it needs a delay to evaluate the probability density

function f

. The computation time is very low for a small

t but increases rapidly with the number of observations.

No assertions are done concerning H

and the probability

density functions f

and f

always need to be estimated for

all the change-points τ tested over all observations.

Let’s assume the density functions under each hypothesis,

i.e. f

and f

, follow a multivariate normal distribution:

∼ N (µ

, Σ

∼ N (µ

, Σ

) (10)

As each hypothesis is characteristic of one topological

place, density functions characterize the structural parameters

of topological places. The mean vector represents the most

probable structural parameters set. The covariance matrix

represents the parameters distribution tolerance inside a topo-

logical place. Two matters arise concerning the distributions

parameters estimation:

• Sufﬁcient number of samples are necessary to insure

well conditioned density function estimation and in par-

ticular the covariance matrix semi-deﬁnite positiveness

property.

• Density function estimation requires identically and

independently distributed samples (i.i.d). Independence

is assumed due to independent input observations as-

sumption from Neyman-Pearson lemma. Approximate

constant distance interval gathering (constant time gath-

ering with minimal distance between samples condition)

allows approximate identical distribution. This simple

method avoids accumulation at low or null speed.

B. Online application

As explained previously, the algorithm rapidly becomes

time consuming and only one change-point detection is

possible for a complete set of input observations. In order to

alleviate those limitations, we introduce a ﬁxed size sliding

window over the signal made up of the input observations

(ﬁg. 4). First half of the sliding window corresponds to

normal hypothesis H

while second half corresponds to

alternate hypothesis H

. Change-point hypothesis is then

tested only at the sliding window center. Each time the robot

acquires a new observation, the signal is expanded with a new

input. The sliding window always encompasses the N last

input observations. Older observations, already analysed, are

forgotten. We ﬁnally obtain an approximation (due to non

complete signal observation) of the exact change-point.

This simple trick brings many advantages. The most

obvious ones are constant time change-point detection and

dynamic signal analysis leading to an inline algorithm.

Moreover, one of the most important is multiple hypothesis

testing. This last one allows to have many change-points

over the signal contrarily to the original Neyman-Pearson

algorithm formulation.

Fig. 4. Sliding window used in the estimation process.

Considering hypotheses about the density functions and

the sliding window trick, the Neyman-Pearson ﬁnal equation

results in:



|Σ





−1

+ µ

−1

− 2µ

−1



j=t−N/2



−1

− Σ

−1



(11)

The equation contains three terms:

• First term is linked to distribution spreads. The term is

canceled for equal spreads.

• Second term approximately corresponds (because of im-

possible factorization) to the squared difference between

distribution means.

• Last term is the sum of the squared observations

weighted by the spread difference between the density

functions. The term is canceled for equal spreads.

As stated before, we can observe that the equation com-

putes a value linked to the difference between two distribu-

tions. The greater the difference is, the higher the value is.

In our case, this leads to change-point detection indicating a

change in the structural parameters, which corresponds to a

transition between two topological places.

An example of signal obtained with equation 11, made up

of the change-point values, is displayed in ﬁg. 5. The signal

is ﬁltered in the time domain with a simple Gaussian ﬁlter

(parameters: µ = 0, σ = N/10) in order to reduce the signal

noise. Peak detection mechanism relies on peak magnitude

relatively to the minima ﬂanking the peak. Threshold (ν =

0.4) is then used on the peak amplitude and not on the

peak maximum value. This results in a peak detection less

sensitive to noise.

Considering the density function estimation constraints

aforementioned, the sliding window has to be sufﬁciently

large for a correct estimation. For the experiments, the size

is of 80 observations. As the minimal distance between two

samples is 0.015m, the sliding window spatial size is 1.2m.

Each density function is then estimated over a distance of

0.6m. These values satisfy the requisites for density estima-

tion but has consequences on the experiment as two change-

points cannot be closer than 0.6m for detection. This distance

Fig. 5. Sample signal obtained with the change-point detection algorithm

combined with spherical harmonics approach for structural parameters

description. Detected peaks are marked with red dots.

is a reasonable trade-off between minimal environment size

for structural parameters extraction and minimal detectable

topological place. For environments changing slowly, the

window can be larger.

IV. EXPERIMENTAL RESULTS

This section presents experimental results for topological

segmentation in indoor and outdoor environments. Testing

different kind of environment aims to show the method is

generic and robust to context change. Using various kind of

camera for spherical view acquisition furthermore highlights

the generic spherical concept. The indoor experiment was

realized in the Robotic Hall at INRIA Sophia Antipolis using

a Neobotix MP-500 platform equipped with a paracatadiop-

tric camera. In the outdoor experiment, a man-driven vehicle

equipped with the multi-cameras system described in [11]

was used. The trajectory was about 600 meters across the

INRIA Sophia Antipolis research center.

The whole code is written in Matlab without being specif-

ically optimized. Spherical harmonics spectrum computation

requires 290ms using the implementation described above

(the sphere is sampled with 62500 samples uniformly dis-

tributed). The change-point detection algorithm runs in 10ms.

The complete algorithm then runs inline in about 300ms

(acquisition up to 3.3Hz). However, the spherical harmonics

spectrum code is highly parallelizable and might take great

advantage of a C/C++ parallel implementation.

A. Indoor experiment analysis

Figure 6 presents the robot trajectory and the detected

change-points. It is ﬁrst interesting to notice that all change-

points correspond to important structure variations such as

doorsteps or room volume variation (i.e. passing from a nook

to a more open space). The trajectory in the wide space is

very little segmented.

The easiest way to validate a topological place segmen-

tation algorithm is to consider the doorsteps case. This case

is illustrated by images 2680, 3480, 5328, 10455, 11954

and 12322 where change-points are precisely localized at

doorsteps. The examples illustrated by images 996, 1401 and

2044 correspond to room volume variations. Image 996 and

1401 show when the robot comes from a narrow space to

a wider space. Image 2044 shows the opposite case when

the robot leaves a wide environment to enter a quite narrow

place similar to a corridor. Images 6376 and 6624 correspond

to the detection of changes in the objects present in the

Appearance-based segmentation of indoors/outdoors sequences of spherical views

Figures

Citations

Visual Place Recognition: A Survey

Vision-based topological mapping and localization methods

Fast Hybrid Relocation in Large Scale Metric-Topologic-Semantic Map

A compact RGB-D map representation dedicated to autonomous navigation

Vision-based Assistive Indoor Localization

References

Visual Place Categorization: Problem, dataset, and algorithm

Taking Online Maps Down to Street Level

Localization in Urban Environments Using a Panoramic Gist Descriptor

Hierarchical map building using visual landmarks and geometric constraints

Autonomous Navigation of Vehicles from a Visual Memory Using a Generic Camera Model

Related Papers (5)

Topological segmentation of indoors/outdoors sequences of spherical views

Dense Omnidirectional RGB-D Mapping of Large-scale Outdoor Environments for Real-time Localization and Autonomous Navigation

A mapping and localization framework for scalable appearance-based navigation

Global localization and relative pose estimation based on scale-invariant features

A scene-based dependable indoor navigation system

Frequently Asked Questions (12)

Q1. What contributions have the authors mentioned in the paper "Appearance-based segmentation of indoors/outdoors sequences of spherical views" ?

Q2. What are the future works mentioned in the paper "Appearance-based segmentation of indoors/outdoors sequences of spherical views" ?

Q3. What is the way to detect change-points?

Q4. What is the purpose of the algorithm?

Q5. What is the meaning of spherical harmonics?

Q6. What is the description of a place?

Q7. What is the spherical harmonics spectrum code?

Q8. What is the size of the sliding window?

Q9. What is the density function for each hypothesis?

Q10. What is the definition of spherical harmonics?

Q11. How long does the spherical harmonics spectrum computation take?

Q12. What is the frequency information of the spherical image?