scispace - formally typeset
Open AccessProceedings ArticleDOI

Appearance-based segmentation of indoors/outdoors sequences of spherical views

Reads0
Chats0
TLDR
This work aims at detecting the changes in the structural properties of the scene during navigation by using a change-point detection algorithm based on a statistical Neyman-Pearson test to find optimal transitions between topological places.
Abstract
Navigating in large scale, complex and dynamic environments requires reliable representations able to capture metric, topological and semantic aspects of the scene for supporting path planing and real time motion control. In a previous work [11], we addressed metric and topological representations thanks to a multi-cameras system which allows building of dense visual maps of large scale 3D environments. The map is a set of locally accurate spherical panoramas related by 6d of poses graph. The work presented here is a further step toward a semantic representation. We aim at detecting the changes in the structural properties of the scene during navigation. Structural properties are estimated online using a global descriptor relying on spherical harmonics which are particularly well-fitted to capture properties in spherical views. A change-point detection algorithm based on a statistical Neyman-Pearson test allows us to find optimal transitions between topological places. Results are presented and discussed both for indoors and outdoors experiments.

read more

Content maybe subject to copyright    Report

HAL Id: hal-00845450
https://hal.inria.fr/hal-00845450
Submitted on 17 Jul 2013
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entic research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diusion de documents
scientiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Appearance-based segmentation of indoors/outdoors
sequences of spherical views
Alexandre Chapoulie, Patrick Rives, David Filliat
To cite this version:
Alexandre Chapoulie, Patrick Rives, David Filliat. Appearance-based segmentation of in-
doors/outdoors sequences of spherical views. IEEE/RSJ Int. Conf. on Intelligent Robots and Systems,
IROS’2013, Nov 2013, Tokyo, Japan. pp.1946-1951. �hal-00845450�

Appearance-based segmentation of indoors/outdoors sequences of
spherical views
Alexandre Chapoulie
1
, Patrick Rives
1
and David Filliat
2
AbstractNavigating in large scale, complex and dynamic
environments requires reliable representations able to capture
metric, topological and semantic aspects of the scene for sup-
porting path planing and real time motion control. In a previous
work [11], we addressed metric and topological representations
thanks to a multi-cameras system which allows building of
dense visual maps of large scale 3D environments. The map
is a set of locally accurate spherical panoramas related by 6dof
poses graph. The work presented here is a further step toward a
semantic representation. We aim at detecting the changes in the
structural properties of the scene during navigation. Structural
properties are estimated online using a global descriptor relying
on spherical harmonics which are particularly well-fitted to
capture properties in spherical views. A change-point detection
algorithm based on a statistical Neyman-Pearson test allows us
to find optimal transitions between topological places. Results
are presented and discussed both for indoors and outdoors
experiments.
I. INTRODUCTION
Navigating in large scale, complex and dynamic environ-
ments is a challenging task for autonomous mobile robots.
Reliable representations able to capture metric, topological
and semantic aspects of the scene have to be built for sup-
porting path planing and real time motion control algorithms
[14]. It is usual to define three levels of representation as
illustrated in fig. 1. Metric representation is used at the
control level in the design of trajectory tracking algorithms
[4]. Topological representation captures the environment
accessibility properties in a graph structure and provides a
first level of abstraction allowing complex navigation tasks
in large scale environments [21]. Semantic representation
consists in adding information about the places represented
by nodes in the graph used at the topological level. The
semantic information can be basically the name of a place
[16] or its main characteristic such as office or corridor [24].
The added information can also refer to objects presence or
other kind of information linked to the place. This level, with
a higher degree of abstraction, allows us to specify context-
based navigation tasks in terms of queries [7].
In [11], we addressed metric and topological representa-
tion levels thanks to a multi-cameras system onboard a man-
driven car which allows building of dense visual maps of
large scale 3D environments. As in Google Street View [23],
the map is composed of a set of locally accurate spherical
panoramas (fig. 2) built online along the car trajectory. The
1
INRIA Sophia Antipolis - M
´
editerran
´
ee, 2004 route
des Lucioles - BP 93, 06902 Sophia Antipolis, France
firstname.lastname@inria.fr
2
ENSTA ParisTech, 32 Boulevard Victor, 75739 Paris, France
david.filliat@ensta-paristech.fr
1234567
895AB567
C3DDE
F3DD
96B29
12345678
9ABACAD784C
E26F78
Fig. 1. Navigation-based representation
spherical views are related by 6dof poses graph estimated
using a direct multi-views registration technique [12].
Fig. 2. Example of spherical view (Inria Campus Dataset).
The work presented here is a further step toward a
semantic representation of the scene. We aim at detecting
changes in the scene structural properties (such as textures,
appearance, frequency and orientation of the straight lines,
curvatures, repeated patterns) during navigation. A place, in
this work, is therefore associated to a segment of the robot
trajectory where the scene is sufficiently self similar, i.e. has

the same structural properties extracted from the spherical
views. The main advantage of this definition is that it fits
both to indoor and outdoor environments in order to partition
the topological graph in terms of meaningful places. Such
partition also provides advantages such as increasing loop
closure algorithms efficiency [10] and can be viewed as a
first step to environment semantic labeling.
In [3], we presented preliminary results where the struc-
tural properties were estimated using a global descriptor
called GIST specially modified to deal with spherical images.
Given our place definition, GIST appears more adapted than
local descriptors like SIFT used in [17] and [25]. Without
additional constraints, local descriptors have difficulty to
represent the environment global consistency. Since it has
been introduced [15], GIST has been used multiple time in
image-based learning algorithms and in robotics for place
recognition and loop closure detection [13] or for indoor re-
gion classification [18]. Despite these good properties, GIST
is not well adapted to encompass the spherical representation
richness because sphere spatial periodicity is partially lost.
In this paper, we propose a novel representation relying
on spherical harmonics which are particularly well-fitted to
capture the structural properties in spherical views.
In the following, section 2 presents the representation
based on spherical harmonics. Section 3 is devoted to the
detection of statistical changes in the scene structural prop-
erties. Experimental results for indoor and outdoor environ-
ments are provided in section 4. The proposed method is
discussed in section 5.
II. SPHERICAL HARMONICS
Spherical harmonics are similar to the 2D Fourier trans-
form but defined on the sphere surface and take complete
advantage of the spherical representation. Noticeably, the
complete spatial periodicity of the sphere is integrated into
the spherical harmonics computation. They have already
shown their usefulness in the domain of robotics for local-
ization [5] and for visual odometry [9]. Spherical harmonics
will be used here to define a new scene structure descriptor.
A. Definition
In this paper, we only detail the application of spherical
harmonics to our problem. Further mathematical details
about spherical harmonics can be found in [2], [1], [8].
The unit sphere S
2
included in R
3
is parametrized using
spherical coordinates. An element η of S
2
is written:
η =
cos(θ)sin(φ), sin(θ)sin(φ), cos(φ)
T
(1)
The spherical harmonics are defined by:
Y
m
l
(η) =
s
2l + 1
4π
(l m)!
(l + m)!
P
|m|
l
(cos (φ)) e
j
(2)
with l N and |m| l where l is the band number
corresponding to a frequency and m is an orientation param-
eter. P
m
l
corresponds to the associated Legendre polynomials
with x [1, 1] such that:
P
m
l
(x) =
(1)
m
(1 x
2
)
m/2
2
l
l!
d
l+m
dx
l+m
(x
2
1)
l
(3)
Every function defined on the sphere surface can be
decomposed in a sum of spherical harmonics as follows:
f =
X
lN
X
|m|≤l
f
m
l
Y
m
l
(4)
The f
m
l
coefficients are obtained from a function f by:
f
m
l
=
Z
η S
2
f(η)
Y
m
l
(η) (5)
If f
m
l
= 0 for all l > L, f is said to be band limited
with a bandwidth L. The coefficients set f
m
l
is called the
spherical Fourier transform or the spectrum of f. The first
ve spherical harmonics bands are displayed in fig. 3.
Fig. 3. The first five spherical harmonics bands are presented as unsigned
spherical functions from the origin and by color on the unit sphere. Green
corresponds to positive values and red to negative values. (From [8])
Due to the integral, f
m
l
coefficients exact computation can
be very time consuming. While it exists the fast Fourier trans-
form, there exists a fast method to compute those coefficients,
based on the Monte Carlo integration, precomputed tables
and the properties of the associated Legendre polynomials.
This method is widely used in computer graphics for real-
time lighting rendering. Further details can be found in [8].
B. Spherical harmonics as environment structure description
Assuming that environment structure information is con-
tained in the spherical image frequencies, pixel intensities
can be chosen as the samples f(x
i
) values of the function
f. Spherical harmonics being a frequency description of the
spherical image, we propose to directly use the spectrum as
a structure descriptor. Frequency information corresponds to
band number l and orientation information to parameter m
(the higher l is, the higher the frequency is, see fig. 3). The
spectrum coefficients f
m
l
are stacked into a vector which
constitutes the global structure descriptor.
The number of bands used is an important parameter. In
the case of the 2D discrete Fourier transform, the spectrum
size is constrained by the image size. In the case of the spher-
ical harmonics, nothing constraints the required number of
bands. The number of coefficients follows a square function

of the number of bands. The descriptor size is S
d
= l
2
. In
fig.3, l = 5 and we have l
2
= 25 coefficients.
In computer graphics, only three bands are used due to an
exponential attenuation in bands of higher frequencies [8].
For our study, there is no such attenuation and it is hard
to determine the required number of bands. In [5], precise
localization is achieved using only the first five bands. While
we seek a global description of the environment, the first five
bands should guarantee a sufficient information.
III. CHANGE-POINT DETECTION ALGORITHM
A. Hypotheses and assumptions
According to our place definition as a set of positions
from which environment structure is similar, we aim to
detect the significant changes in the global descriptor value
along the sequence of spherical views. This can be viewed
as novelty detection as used in [19] or [20] for vehicle
safeguarding or as change-point detection as used in [17]
and [16] for landmark detection and place labelling. Change-
point detection is based on hypothesis testing:
Null hypothesis H
0
is the normal situation in which the
observed parameters stick to the previous model.
Alternate hypothesis H
1
is the alternate situation where
parameters vary from the previous model.
Change-point detection algorithm evaluates the monitored
parameters and determines when a switch occurs from hy-
pothesis H
0
to hypothesis H
1
.
Let us assume a set of independent input observations:
X
1
, X
2
, ..., X
τ1
, X
τ
, ..., X
t
(6)
Assume that the input observations X
1
, ..., X
τ1
are inde-
pendent random variables with a probability density function
f
0
(X
j
), while the observations X
τ
, ... are independent ran-
dom variables with a probability density function f
1
(X
j
).
Let us assume that f
0
is the probability density function
under hypothesis H
0
and f
1
under H
1
. Suppose we have
X
1
, ..., X
t
observations up to an instance t and we test the
above hypotheses for these observations. The likelihood ratio
(eq. 7) indicates whether the value X
j
mostly belongs to f
1
or f
0
.
s
j
= ln
f
1
(X
j
)
f
0
(X
j
)
(7)
The Neyman-Pearson lemma conducting a simple hypothesis
test, as used in [22], defines the uniformly most powerful test
as the one rejecting the null hypothesis H
0
whenever:
S
t
τ
=
t
X
j=τ
ln
f
1
(X
j
)
f
0
(X
j
)
=
t
X
j=τ
s
j
> ν (8)
The above equation yields to the simple hypothesis test:
t
c
= min{t : arg max
0τt
t
X
j=τ
ln
f
1
(X
j
)
f
0
(X
j
)
> ν} (9)
where ν is the threshold controlling the detection sensitivity.
arg max
0τt
P
t
j=τ
ln
f
1
(X
j
)
f
0
(X
j
)
> ν returns the instant τ giving the
maximum of dissimilarity between f
0
and f
1
. t being the
current instant, t
c
will be either t leading to no change-point
detection or τ which is the exact change-point instant.
This algorithm gives the exact change-point instants
whereas it needs a delay to evaluate the probability density
function f
1
. The computation time is very low for a small
t but increases rapidly with the number of observations.
No assertions are done concerning H
0
and the probability
density functions f
0
and f
1
always need to be estimated for
all the change-points τ tested over all observations.
Let’s assume the density functions under each hypothesis,
i.e. f
0
and f
1
, follow a multivariate normal distribution:
f
0
N (µ
0
, Σ
0
f
1
N (µ
1
, Σ
1
) (10)
As each hypothesis is characteristic of one topological
place, density functions characterize the structural parameters
of topological places. The mean vector represents the most
probable structural parameters set. The covariance matrix
represents the parameters distribution tolerance inside a topo-
logical place. Two matters arise concerning the distributions
parameters estimation:
Sufficient number of samples are necessary to insure
well conditioned density function estimation and in par-
ticular the covariance matrix semi-definite positiveness
property.
Density function estimation requires identically and
independently distributed samples (i.i.d). Independence
is assumed due to independent input observations as-
sumption from Neyman-Pearson lemma. Approximate
constant distance interval gathering (constant time gath-
ering with minimal distance between samples condition)
allows approximate identical distribution. This simple
method avoids accumulation at low or null speed.
B. Online application
As explained previously, the algorithm rapidly becomes
time consuming and only one change-point detection is
possible for a complete set of input observations. In order to
alleviate those limitations, we introduce a fixed size sliding
window over the signal made up of the input observations
(fig. 4). First half of the sliding window corresponds to
normal hypothesis H
0
while second half corresponds to
alternate hypothesis H
1
. Change-point hypothesis is then
tested only at the sliding window center. Each time the robot
acquires a new observation, the signal is expanded with a new
input. The sliding window always encompasses the N last
input observations. Older observations, already analysed, are
forgotten. We finally obtain an approximation (due to non
complete signal observation) of the exact change-point.
This simple trick brings many advantages. The most
obvious ones are constant time change-point detection and
dynamic signal analysis leading to an inline algorithm.
Moreover, one of the most important is multiple hypothesis
testing. This last one allows to have many change-points
over the signal contrarily to the original Neyman-Pearson
algorithm formulation.

Fig. 4. Sliding window used in the estimation process.
Considering hypotheses about the density functions and
the sliding window trick, the Neyman-Pearson final equation
results in:
S
t
τ
=
N
4
ln
|Σ
0
|
|Σ
1
|
+
N
4
µ
T
0
Σ
1
0
µ
0
+ µ
T
1
Σ
1
1
µ
1
2µ
T
0
Σ
1
0
µ
1
+
1
2
t
X
j=tN/2
X
T
j
Σ
1
0
Σ
1
1
X
j
(11)
The equation contains three terms:
First term is linked to distribution spreads. The term is
canceled for equal spreads.
Second term approximately corresponds (because of im-
possible factorization) to the squared difference between
distribution means.
Last term is the sum of the squared observations
weighted by the spread difference between the density
functions. The term is canceled for equal spreads.
As stated before, we can observe that the equation com-
putes a value linked to the difference between two distribu-
tions. The greater the difference is, the higher the value is.
In our case, this leads to change-point detection indicating a
change in the structural parameters, which corresponds to a
transition between two topological places.
An example of signal obtained with equation 11, made up
of the change-point values, is displayed in fig. 5. The signal
is filtered in the time domain with a simple Gaussian filter
(parameters: µ = 0, σ = N/10) in order to reduce the signal
noise. Peak detection mechanism relies on peak magnitude
relatively to the minima flanking the peak. Threshold (ν =
0.4) is then used on the peak amplitude and not on the
peak maximum value. This results in a peak detection less
sensitive to noise.
Considering the density function estimation constraints
aforementioned, the sliding window has to be sufficiently
large for a correct estimation. For the experiments, the size
is of 80 observations. As the minimal distance between two
samples is 0.015m, the sliding window spatial size is 1.2m.
Each density function is then estimated over a distance of
0.6m. These values satisfy the requisites for density estima-
tion but has consequences on the experiment as two change-
points cannot be closer than 0.6m for detection. This distance
Fig. 5. Sample signal obtained with the change-point detection algorithm
combined with spherical harmonics approach for structural parameters
description. Detected peaks are marked with red dots.
is a reasonable trade-off between minimal environment size
for structural parameters extraction and minimal detectable
topological place. For environments changing slowly, the
window can be larger.
IV. EXPERIMENTAL RESULTS
This section presents experimental results for topological
segmentation in indoor and outdoor environments. Testing
different kind of environment aims to show the method is
generic and robust to context change. Using various kind of
camera for spherical view acquisition furthermore highlights
the generic spherical concept. The indoor experiment was
realized in the Robotic Hall at INRIA Sophia Antipolis using
a Neobotix MP-500 platform equipped with a paracatadiop-
tric camera. In the outdoor experiment, a man-driven vehicle
equipped with the multi-cameras system described in [11]
was used. The trajectory was about 600 meters across the
INRIA Sophia Antipolis research center.
The whole code is written in Matlab without being specif-
ically optimized. Spherical harmonics spectrum computation
requires 290ms using the implementation described above
(the sphere is sampled with 62500 samples uniformly dis-
tributed). The change-point detection algorithm runs in 10ms.
The complete algorithm then runs inline in about 300ms
(acquisition up to 3.3Hz). However, the spherical harmonics
spectrum code is highly parallelizable and might take great
advantage of a C/C++ parallel implementation.
A. Indoor experiment analysis
Figure 6 presents the robot trajectory and the detected
change-points. It is first interesting to notice that all change-
points correspond to important structure variations such as
doorsteps or room volume variation (i.e. passing from a nook
to a more open space). The trajectory in the wide space is
very little segmented.
The easiest way to validate a topological place segmen-
tation algorithm is to consider the doorsteps case. This case
is illustrated by images 2680, 3480, 5328, 10455, 11954
and 12322 where change-points are precisely localized at
doorsteps. The examples illustrated by images 996, 1401 and
2044 correspond to room volume variations. Image 996 and
1401 show when the robot comes from a narrow space to
a wider space. Image 2044 shows the opposite case when
the robot leaves a wide environment to enter a quite narrow
place similar to a corridor. Images 6376 and 6624 correspond
to the detection of changes in the objects present in the

Citations
More filters
Journal ArticleDOI

Visual Place Recognition: A Survey

TL;DR: A survey of the visual place recognition research landscape is presented, introducing the concepts behind place recognition, how a “place” is defined in a robotics context, and the major components of a place recognition system.
Journal ArticleDOI

Vision-based topological mapping and localization methods

TL;DR: This paper reviews the main solutions presented in the last fifteen years of topological mapping and localization methods, and classify them in accordance to the kind of image descriptor employed, including global, local, BoW and combinations.
Proceedings ArticleDOI

Fast Hybrid Relocation in Large Scale Metric-Topologic-Semantic Map

TL;DR: This work proposes a robust and efficient algorithm that relies on MTS-map structure and semantic description of sub-maps to relocate very fast and combines the discriminative power of semantics with the robustness of an interpretation tree to compare the graphsvery fast and outperform state-of-the-art-techniques.
Dissertation

A compact RGB-D map representation dedicated to autonomous navigation

TL;DR: An approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and geometric information available from the augmented spherical database.

Vision-based Assistive Indoor Localization

Feng Hu
TL;DR: This dissertation first provides a description of assistive indoor localization problem with its detailed connotations as well as overall methodology, and the framework of omnidirectional-vision-based indoor assistive localization is introduced.
References
More filters
Proceedings ArticleDOI

Visual Place Categorization: Problem, dataset, and algorithm

TL;DR: A solution to VPC is presented based upon a recently-developed visual feature known as CENTRIST (CENsus TRansform hISTogram), and a new dataset is described which is believed to be the first significant, realistic dataset for the VPC problem.
Journal ArticleDOI

Taking Online Maps Down to Street Level

Luc Vincent
- 01 Dec 2007 - 
TL;DR: The Street View feature of Google Maps aims to provide an interface that can display street-level images in a natural way that enables convenient navigation between images without losing the map context.
Journal ArticleDOI

Localization in Urban Environments Using a Panoramic Gist Descriptor

TL;DR: This paper describes how to represent a panorama using the global gist descriptor, while maintaining desirable invariance properties for location recognition and loop detection, and proposes different gist similarity measures and algorithms for appearance-based localization and an online loop-closure detection method.
Proceedings ArticleDOI

Hierarchical map building using visual landmarks and geometric constraints

TL;DR: In this article, a hierarchical map from images is constructed from a large collection of omnidirectional images taken at many locations in a building using a metric based on visual landmarks (SIFT features) and geometrical constraints.
Journal ArticleDOI

Autonomous Navigation of Vehicles from a Visual Memory Using a Generic Camera Model

TL;DR: A complete framework for autonomous vehicle navigation using a single camera and natural landmarks is presented, designed for a generic class of cameras (including conventional, catadioptric, and fisheye cameras).
Related Papers (5)
Frequently Asked Questions (12)
Q1. What contributions have the authors mentioned in the paper "Appearance-based segmentation of indoors/outdoors sequences of spherical views" ?

In a previous work [ 11 ], the authors addressed metric and topological representations thanks to a multi-cameras system which allows building of dense visual maps of large scale 3D environments. The authors aim at detecting the changes in the structural properties of the scene during navigation. The work presented here is a further step toward a semantic representation. 

For future work, the authors plan to improve their algorithm robustness to illumination condition following [ 6 ] and its rotation independence. De-rotation mechanism can be applied as rotations can be estimated from spectra. 

The segmentation algorithm relies on an efficient change-point detection based on multi-hypothesis testing and allowing constant time computation. 

In a longer term, the segmentation algorithm could be coupled with a loop closure detection algorithm in orderto improve change-point localization stability and with a semantic level by adding place classification and labelling. 

Pml corresponds to the associated Legendre polynomials with x ∈ [−1, 1] such that:Pml (x) = (−1)m(1− x2)m/22ll!dl+mdxl+m (x2 − 1)l (3)Every function defined on the sphere surface can be decomposed in a sum of spherical harmonics as follows:f = ∑l∈N∑|m|≤lfml Y m l (4)The fml coefficients are obtained from a function f by:fml =∫η∈S2 f(η)Y ml (η)dη (5)If fml = 0 for all l > L, f is said to be band limited with a bandwidth L. The coefficients set fml is called the spherical Fourier transform or the spectrum of f . 

As descriptors are based on appearance frequencies, when the robot approaches walls, frequencies become lower and a new topological place is defined. 

the spherical harmonics spectrum code is highly parallelizable and might take great advantage of a C/C++ parallel implementation. 

Considering the density function estimation constraints aforementioned, the sliding window has to be sufficiently large for a correct estimation. 

Let’s assume the density functions under each hypothesis, i.e. f0 and f1, follow a multivariate normal distribution:f0 ∼ N (µ0,Σ0 f1 ∼ N (µ1,Σ1) (10)As each hypothesis is characteristic of one topological place, density functions characterize the structural parameters of topological places. 

Spherical harmonics being a frequency description of the spherical image, the authors propose to directly use the spectrum as a structure descriptor. 

Spherical harmonics spectrum computation requires 290ms using the implementation described above (the sphere is sampled with 62500 samples uniformly distributed). 

Frequency information corresponds to band number l and orientation information to parameter m (the higher l is, the higher the frequency is, see fig.