scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

A compact spherical RGBD keyframe-based representation

TL;DR: An environmental representation approach based on hybrid metric and topological maps as a key component for mobile robot navigation is proposed and an uncertainty error model propagation is formulated for outlier rejection and data fusion.
Abstract: This paper proposes an environmental representation approach based on hybrid metric and topological maps as a key component for mobile robot navigation. Focus is made on an ego-centric pose graph structure by the use of Keyframes to capture the local properties of the scene. With the aim of reducing data redundancy, suppress sensor noise whilst maintaining a dense compact representation of the environment, neighbouring augmented spheres are fused in a single representation. To this end, an uncertainty error model propagation is formulated for outlier rejection and data fusion, enhanced with the notion of landmark stability over time. Finally, our algorithm is tested thoroughly on a newly developed wide angle 360° field of view (FOV) spherical sensor where improvements such as trajectory drift, compactness and reduced tracking error are demonstrated.

Summary (3 min read)

Introduction

  • Visual mapping is a required capability for autonomous robots and a key component for long term navigation and localisation.
  • On the other hand, this representation is prone to drift phenomena which becomes significant over extended trajectories.
  • This representation not only brings about a good abstraction level of the environment but common tasks such as homing, navigation, exploration and path planning become more efficient.
  • Accurate and compact 3D environment modelling and reconstruction has drawn increased interests within the vision and robotics community over the years as it is perceived as a vital tool for Visual SLAM techniques in realising tasks such as localisation, navigation, exploration and path planning [3].

II. METHODOLOGY AND CONTRIBUTIONS

  • The authors aim is concentrated around building ego-centric topometric maps represented as a graph of keyframes, spread by spherical RGBD nodes.
  • This not only reduces data redundancy but also help in suppressing sensor noise whilst contributing significantly in drift reduction.
  • This work is directly related to two previous syntheses of [4] and [12].
  • Then the uncertainty error model is presented followed by the fusion stage.

III. PRELIMINARIES

  • The basic environment representation consists of a set of spheres acquired over time together with a set of rigid transforms T ∈ SE(3) connecting adjacent spheres (e.g. Tij lies Sj and Si) – this representation is well described in [14].
  • The inverse transform g−1 corresponds to the spherical projection model.
  • Point coordinates correspondences between spheres are given by the warping function w, under observability conditions at different viewpoints.
  • In the following, spherical RGBD registration and keyframe based environment representations are introduced.

A. Spherical Registration and Keyframe Selection

  • The relative location between raw spheres is obtained using a visual spherical registration procedure [12] [7].
  • The linearization of the over-determined system of equations (2) leads to a classic iterative Least Mean Squares (ILMS) solution.
  • Furthermore for computational efficiency, one can choose a subset of more informative pixels (salient points) that yield enough constraints over the 6DOF, without compromising the accuracy of the pose estimate.
  • This simple registration procedure applied for each sequential pair of spheres allows to represent the scene structure, but subjected to cumulative VO errors and scalability issues due to long-term frame accumulation.
  • A criteria based on differential entropy approach [9] has been applied in this work for keyframe selection.

IV. SPHERICAL UNCERTAINTY PROPAGATION AND MODEL FUSION

  • The authors approach to topometric map building is an egocentric representation operating locally on sensor data.
  • The concept of proximity used to combine information is evaluated mainly with the entropy similarity criteria after the registration procedure.
  • Instead of performing a complete bundle adjustment along all parameters including poses and structure for the full set of close raw spheres.
  • S∗, the procedure is done incrementally in two stages.

A. Warped Sphere Uncertainty

  • This section aims to represent the confidence of the elements in Sw, which clearly depends on the combination of an apriori pixel position, the depth and the pose errors over a set of geometric and projective operations – the warping function as in (1).
  • Before introducing these two terms, let’s represent the uncertainty due to the combination of pose T and a cartesian 3D point q errors.
  • The uncertainty index σ2Dw is then the normalized covari- ance given by: σ2Dw(p) = σ 2 Dt(p)/(qw(p,T) ⊤qw(p,T)) 2 (8) Finally, under the assumption of Lambertian surfaces, the photometric component is simply Iw(p∗) = I(w(p∗,T)) and it’s uncertainty σ2I is set by a robust weighting function on the error using a Huber’s M-estimator as in [14].

B. Spherical Keyframe Fusion

  • S∗ with that of the transformed observation Sw, a probabilistic test is performed to exclude outlier pixel measurements from Sw, allowing fusion to occur only if the raw observation agrees with its corresponding value in S∗.
  • Hence, the tuple A = {D∗,Dw} and B = {I∗, Iw} are defined as the sets of model predicted and measured depth and intensity values respectively.
  • Finally, let a class c : D∗(p) = Dw(p) relate to the case where the measurement value agrees with its corresponding observation value.

C. Dynamic 3D points filtering

  • So far, the problem of data fusion of consistent estimates in a local model has been addressed.
  • These points exhibit erratic behaviours along the trajectory and as a matter of fact, they are highly unstable.
  • There are however different levels of “dynamicity” as mentioned in [11].
  • Points/landmarks observed can exhibit a gradual degradation over time, while others may undergo a sudden brutal change – the case of an occlusion for example.
  • The probabilistic framework for data association developed in the section IV-B is a perfect fit to filter out inconsistent data.

D. Application to Saliency map

  • Instead of naively dropping out points below a certain threshold, for e.g, p < 0.8, they are better pruned out of a saliency map [13].
  • The underlying algorithm is outlined in algorithm (1).
  • This happens when the Keyframe criteria based on an entropy ratio α [7][9] is reached.
  • Firstly, the notion of uncertainty is incorporated in spherical pixel tracking.
  • Eventually, between an updated model at time t0 and the following re-initialised one, at tn, an optimal mix of information sharing happens between the two.

V. EXPERIMENTS AND RESULTS

  • A new sensor for a large field of view RGBD image acquisition has been used in this work.
  • The chosen configuration offers the advantage of creating full 360 ◦ RGBD images of the scene isometrically, i.e. the same solid angle is assigned to each pixel.
  • The authors experimental test bench consists of the spherical sensor embarked on a mobile experimental platform and driven around in an indoor office building environment for a first learning phase whilst spherical RGBD data is acquired online and registered in a database.
  • This is even more emphasized by inspecting the 3D structure of the reconstructed environment as shown by figures (4), (4) where the two images correspond to the sequence with and without fusion; method 1 and method 2 respectively.
  • The threshold for α is generally heuristically tuned.

VI. CONCLUSION

  • A framework for hybrid metric and topological maps in a single compact skeletal pose graph representation has been proposed.
  • Two methods have been experimented and the importance of data fusion has been highlighted with the benefits of reducing data noise, redundancy, tracking drift as well as maintaining a compact environment representation using keyframes.
  • The authors activities are centered around building robust and stable environment representations for lifelong navigation and map building.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

A Compact Spherical RGBD Keyframe-based Representation
Tawsif Gokhool, Renato Martins, Patrick Rives and No¨ela Despr´e
Abstract This paper proposes an environmental represen-
tation approach based on hybrid metric and topological maps
as a key component for mobile robot navigation. Focus is
made on an ego-centric pose graph structure by the use
of Keyframes to capture the local properties of the scene.
With the aim of reducing data redundancy, suppress sensor
noise whilst maintaining a dense compact representation of
the environment, neighbouring augmented spheres are fused
in a single representation. To this end, an uncertainty error
model propagation is formulated for outlier rejection and data
fusion, enhanced with the notion of landmark stability over
time. Finally, our algorithm is tested thoroughly on a newly
developed wide angle 360
0
field of view (FOV) spherical sensor
where improvements such as trajectory drift, compactness and
reduced tracking error are demonstrated.
I. INTRODUCTION
Visual mapping is a required capability for autonomous
robots and a key component for long term navigation and
localisation. Inherent limitations of GPS in terms of dropping
accuracy in densely populated urban environments or lack
of observability in indoor settings have propelled the ap-
plications of visual mapping techniques. Moreover, the rich
content provided by vision sensors maximises the description
of the environment.
A metric map emerges from a locally defined reference
frame and all information acquired along the trajectory is
then represented in relation to this reference frame. Hence,
localisation in a metric map is represented by a pose ob-
tained from frame to frame odometry. The benefit of this
approach is its precision at the local level allowing precise
navigation. On the other hand, this representation is prone
to drift phenomena which becomes significant over extended
trajectories. Furthermore, the amount of information that is
accumulated introduces considerable problems in memory
management due to limited capacity.
By contrast, a topological map provides a good trade-off
solution to the above-mentioned issues. Distinct places of
the environment are defined by nodes and joined together by
edges representing the accessibility between places generat-
ing a graph. This representation not only brings about a good
abstraction level of the environment but common tasks such
as homing, navigation, exploration and path planning become
more efficient. However, the lack of metric information may
render some of these tasks precision deficient as only rough
*This work is funded by EADS/Airbus under contract number N.7128
and by CNPq of Brazil under contract number 216026/2013-0
T. Gokhool, R. Martins and P. Rives are with INRIA Sophia-Antipolis,
France {tawsif.gokhool, renato-jose.martins,
patrick.rives}@inria.fr
No¨ela Despr´e is with Airbus Defence and Space, Toulouse, France
noela.despre@astrium.eads.net
Fig. 1. Normal consistency between raw and improved spherical keyframe
model of one of the nodes of the skeletal pose graph. The colours in
the figure encode surface normal orientations, where better consistence is
achieved when local planar regions depicts the same colour
estimates are available. Eventually, to maximise the benefits
of these two complementary approaches, it is needed to strike
the right balance between metric and topological maps which
leads us to the term topo-metric maps as in [1].
In this context, accurate and compact 3D environment
modelling and reconstruction has drawn increased interests
within the vision and robotics community over the years as
it is perceived as a vital tool for Visual SLAM techniques in
realising tasks such as localisation, navigation, exploration
and path planning [3]. Planar representation of the envi-
ronment achieve fast localisation [6], but they ignore other
useful information in the scene. Space discretisation using
volumetric methods such as signed distance functions [17],
with combined sparse representations using octrees [18],
though they have received widespread attention recently due
to their reconstruction quality do however present certain
caveats. Storage capacity and computational burden restrict
their application to limited scales. To provide alternatives,
a different approach consisting of multi-view rendering was
proposed in [15]. Depth maps captured in a window of n
views are rendered in a central view taken as the reference
frame and are fused based on a stability analysis of the
projected measurements. On a similar note, recent works
undertaken in [12] adopted a multi-keyframe fusion approach
by forward warping and blending to create a novel synthe-
sized depth map.
When exploring vast scale environments, many frames
sharing redundant information clutter the memory space
considerably. Furthermore, performing frame to frame regis-
tration introduces drift in the trajectory due to uncertainty in
the estimated pose as pointed out in [9]. The idea to retain
keyframes based on predefined criteria proves very useful
within a sparse skeletal pose graph, which is the chosen

model for our compact environment representation.
II. METHODOLOGY AND CONTRIBUTIONS
Our aim is concentrated around building ego-centric topo-
metric maps represented as a graph of keyframes, spread
by spherical RGBD nodes. A locally defined geo-referenced
frame is taken as an initial model which is then refined over
the course of the trajectory by neighbouring frames’ render-
ing. This not only reduces data redundancy but also help in
suppressing sensor noise whilst contributing significantly in
drift reduction. A generic uncertainty propagation model is
devised and leaned upon a data association framework for
discrepancy detection between the measured and observed
data. We build upon the above two concepts to introduce the
notion of landmark stability over time. This is an important
aspect for intelligent 3D points selection which serve as
better potential candidates for subsequent inter frame to
keyframe motion estimation task.
This work is directly related to two previous syntheses of
[4] and [12]. The former differs from our approach in the
sense that it is a sparse feature based technique and only
consider depth information propagation. On the other hand,
the latter aboard a similar dense based method to ours but
without considering error models and a simpler Mahalanobis
test defines the hypothesis for outlier rejection. Additionally,
we build on a previous work of [13] which constitutes of
building a saliency map for each reference model, by adding
the concept of stability of the underlying 3D structure.
Key contributions of this work are outlined as follows:
development of a generic spherical uncertainty error
propagation model, which can be easily adapted to other
sensor models (e.g. perspective RGBD cameras)
a coherent dense outlier rejection and data fusion
framework relying on the proposed uncertainty model,
yielding more precise spherical keyframes
dynamic 3D points are tracked along the trajectory and
are pruned out by skimming down a saliency map
The rest of this paper is organised as follows: basic
concepts are first introduced and shall serve as the backbone
to allow a smooth transition between respective sections.
Then the uncertainty error model is presented followed by
the fusion stage. Subsequently, the problem of dynamic 3D
points is tackled leading to a direct application of the saliency
map. An experimental and results section verifies all of the
above derived concepts before wrapping up with a final
conclusion and some perspectives for future work.
III. PRELIMINARIES
An augmented spherical image S = {I, D} is composed
of I [0 , 1]
m×n
as pixel intensities and D R
m×n
as the
depth information for each pixel in I. The basic environment
representation consists of a set of spheres acquired over
time together with a set of rigid transforms T SE(3)
connecting adjacent spheres (e.g. T
ij
lies S
j
and S
i
) this
representation is well described in [14].
The spherical images are encoded in a 2D image and the
mapping between the image pixel coordinates p and depth to
cartesian coordinates is given by g : (u, v, 1 ) 7→ q, g(p) =
ρq
S
(p), with q
S
being the point representation in the unit
spherical space S
2
and ρ = D(p) is radial depth. The inverse
transform g
1
corresponds to the spherical projection model.
Point coordinates correspondences between spheres are
given by the warping function w, under observability condi-
tions at different viewpoints. Given a pixel coordinate p
, its
coordinate p in another sphere related by a rigid transform
T is given by a screw transform from the 3D q = g(p
)
followed by a spherical projection:
p = w(p
, T) = g
1
[I 0]T
1
»
g(p
)
1
–«
, (1)
where I is a (3 ×3) identity matrix and 0 is a (3 ×1) zeros
vector. In the following, spherical RGBD registration and
keyframe based environment representations are introduced.
A. Spherical Registration and Keyframe Selection
The relative location between raw spheres is obtained
using a visual spherical registration procedure [12] [7]. Given
a first pose estimate
b
T of T =
b
TT(x), the sphere to sphere
registration problem consists of incrementally estimating a
6 degrees of freedom (DOF) pose x R
6
, through the
minimization of a robust hybrid photometric and geometric
error cost function F
S
=
1
2
ke
I
k
2
I
+
λ
2
2
ke
ρ
k
2
D
, which can be
written explicitly as:
F
S
=
1
2
P
p
W
I
(p
)
I
w(p
,
b
TT(x))
I
`
p
´
2
+
λ
2
2
P
p
W
D
(p
)
n
T
g(w(p
,
b
TT(x)))
b
TT(x)g
(p
)
2
(2)
where w() is the warping function as in (1), λ is a tuning
parameter to effectively balance the two cost functions,
W
I
and W
D
are weights relating the confidence of each
measure and n is the normal computed from the cross
product of adjacent points from q(p
). The linearization
of the over-determined system of equations (2) leads to
a classic iterative Least Mean Squares (ILMS) solution.
Furthermore for computational efficiency, one can choose a
subset of more informative pixels (salient points) that yield
enough constraints over the 6DOF, without compromising
the accuracy of the pose estimate.
This simple registration procedure applied for each se-
quential pair of spheres allows to represent the scene struc-
ture, but subjected to cumulative VO errors and scalability
issues due to long-term frame accumulation. The idea to
retain just some spherical keyframes, based on predefined
criteria, proves very useful to overcome these issues and
generate a more compact graph scene representation. A
criteria based on differential entropy approach [9] has been
applied in this work for keyframe selection.
IV. SPHERICAL UNCERTAINTY PROPAGATION AND
MODEL FUSION
Our approach to topometric map building is an egocentric
representation operating locally on sensor data. The con-
cept of proximity used to combine information is evaluated

Reference
Sphere
Current
Sphere
Saliency
Map
Spherical
VO
Keyframe
Selection
Dynamic
Points
Filtering
3D Map
Model
Spherical
keyframe
Fusion
Warping and
error model
Fig. 2. Full system pipeline
mainly with the entropy similarity criteria after the regis-
tration procedure. Instead of performing a complete bundle
adjustment along all parameters including poses and structure
for the full set of close raw spheres S
i
to the related keyframe
model S
, the procedure is done incrementally in two stages.
The concept is as follows: primarily, given a reference
sphere S
and a candidate sphere S, the cost function in
(2) is employed to extract T =
b
TT(x) and the entropy
criteria is applied for a similarity measure between the tuple
{S
, S}. While this metric is below a predefined threshold,
the keyframe model is refined in a second stage warping
S and carrying out a geometric and photometric fusion
procedure is composed of three steps:
warping S and its resulting model error propagation
data fusion with occlusions and outlier rejection
wise 3D point selection based on stable salient points
which are detailed in the following subsections. The full
system pipeline is depicted in figure (2).
A. Warped Sphere Uncertainty
Warping the augmented sphere S generates a synthetic
view of the scene S
w
= {I
w
, D
w
}, as it would appear
from a new viewpoint. This section aims to represent the
confidence of the elements in S
w
, which clearly depends
on the combination of an apriori pixel position, the depth
and the pose errors over a set of geometric and projective
operations the warping function as in (1). Starting with
D
w
, the projected depth image is
D
w
(p
) = D
t
(w( p
, T)) and D
t
(p) =
p
q
w
(p, T)
q
w
(p, T)
with q
w
(p, T) =
[I 0]T
»
g(p)
1
–«
(3)
The uncertainty of the final warped depth σ
2
D
w
then depends
on two terms Σ
w
and σ
2
D
t
; the former relates to the error
due to the warping w of pixel correspondences between two
spheres and the latter, to the depth image representation in
the reference coordinate system σ
2
D
t
.
Before introducing these two terms, let’s represent the
uncertainty due to the combination of pose T and a cartesian
3D point q errors. Taking a first order approximation of
q = g(p) = ρq
S
, the error can be decomposed as:
Σ
q
(p) = σ
2
ρ
q
S
q
S
+ ρ
2
Σ
q
S
=
σ
2
ρ
ρ
2
g(p)g(p)
+ ρ
2
Σ
g(p)
(4)
The basic error model for the raw depth is proportional
to its fourth degree itself: σ
2
ρ
ρ
4
, which can be applied
to both stereopsis and active depth measurement systems
(for instance see [10] for details). The next step consists
of combining the uncertain rigid transform T with the errors
in q. Given the mean of the 6DOF ¯x = {t
x
, t
y
, t
z
, θ, φ, ψ}
in 3D+YPR form and its covariance Σ
x
, for q
w
(p, T) =
Rq + t = Rg(p) + t ,
Σ
q
w
(p, T) = J
q
(q, ¯x)Σ
q
J
q
(q, ¯x)
+ J
T
(q, ¯x)Σ
x
J
T
(q, ¯x)
= RΣ
q
R
+
x
M
,
(5)
with Σ
q
as in (4) and the general formula of M is given in
[2]. The first term Σ
w
using (5) and (1) is given by:
Σ
w
(p
, T) = J
g
1
(q
w
())Σ
q
w
()J
g
1
(q
w
())
(6)
where J
g
1
is the jacobian of the spherical projection (the
inverse of g) and () = (p
, T
1
). The second term
expression for the depth represented in the coordinate system
of the reference sphere using the warped 3D point in (5) and
(3) is straightforward
σ
2
D
t
(p
, T) = J
D
t
(q
w
(p
, T))Σ
q
w
(p
, T)J
D
t
(q
w
(p
, T))
(7)
with J
D
t
(z) = (z
/
z
z).
The uncertainty index σ
2
D
w
is then the normalized covari-
ance given by:
σ
2
D
w
(p) = σ
2
D
t
(p)/(q
w
(p, T)
q
w
(p, T))
2
(8)
Finally, under the assumption of Lambertian surfaces, the
photometric component is simply I
w
(p
) = I(w(p
, T))
and it’s uncertainty σ
2
I
is set by a robust weighting function
on the error using a Huber’s M-estimator as in [14].
B. Spherical Keyframe Fusion
Before combining the keyframe reference model S
with
that of the transformed observation S
w
, a probabilistic test is
performed to exclude outlier pixel measurements from S
w
,
allowing fusion to occur only if the raw observation agrees
with its corresponding value in S
.
Hence, the tuple A = { D
, D
w
} and B = {I
, I
w
}
are defined as the sets of model predicted and measured
depth and intensity values respectively. Finally, let a class
c : D
(p) = D
w
(p) relate to the case where the measure-
ment value agrees with its corresponding observation value.
Inspired by the work of [16], the Bayesian framework for
data association leads us to:
p(c|A, B) =
p(A,B|c)p(c)
p(A,B)
(9)
Applying independence rule between depth and visual prop-
erties and assuming a uniform prior on the class c (can also
be learned from supervised techniques), the above expression
simplifies to:
p(c|A, B) p(A|c)p(B|c) (10)
Treating each term independently, the first term of equation
(10) is devised as p(A|c) = p(D
w
(p)|D
(p), c), whereby
marginalizing over the true depth value ρ
i
leads to:
p(D
w
(p)|D
(p), c) =
Z
p(D
w
(p)|ρ
i
, D
(p), c)p(ρ
i
|D
(p), c)
i
(11)

Approximating both probability density functions as
Gaussians, the above integral reduces to: p(A|c) =
N(D
w
(p)|D
(p), σ
D
w
(p), σ
D
(p)). Following a similar
treatment, p(B|c) = N(I
w
(p)|I
(p), σ
I
w
(p), σ
I
(p)).
Since equation (10) can be viewed as a likelihood function,
it is easier to analytically work with its logarithm in order to
extract a decision boundary. Plugging p(A|c) and p(B|c) into
(10) and taking its negative log gives the following decision
rule for an inlier observation value:
(D
w
(p) D
(p))
2
σ
2
D
w
(p) + σ
2
D
(p)
+
(I
w
(p) I
(p))
2
σ
2
I
w
(p) + σ
2
I
(p)
< λ
2
M
, (12)
relating to the square of the Mahalanobis distance. The
threshold λ
2
M
is obtained by looking up the χ
2
2
table.
Ultimately, we close up with a classic fusion stage,
whereby depth and appearance based consistencies are co-
alesced to obtain an improved estimate of the keyframe
sphere. Warped values that pass the test in (12) are fused
up by combining their respective uncertainties as follows:
I
k+1
(p) =
W
I
k
(p)I
k
(p) + Π
I
(p)I
w
(p)
W
I
k
(p) + Π
I
(p)
,
D
k+1
(p) =
W
D
k
(p)D
k
(p) + Π
D
(p)D
w
(p)
W
D
k
(p) + Π
D
(p)
(13)
for the intensity and depth values respectively and weight
update:
W
I
k+1
= W
I
k
+ Π
I
and W
D
k+1
= W
D
k
+ Π
D
(14)
where Π
I
(p) = 1
2
I
w
(p) and Π
D
(p) = 1/σ
2
D
w
(p) from
the uncertainty warp propagation in sec. IV-A.
C. Dynamic 3D points filtering
So far, the problem of data fusion of consistent estimates
in a local model has been addressed. But to improve the
performance of any model, another important aspect of any
mapping system is to limit if not completely eliminate the
negative effects of dynamic points. These points exhibit
erratic behaviours along the trajectory and as a matter of fact,
they are highly unstable. There are however different levels
of “dynamicity as mentioned in [11]. Points/landmarks
observed can exhibit a gradual degradation over time, while
others may undergo a sudden brutal change the case of
an occlusion for example. The latter being considerably
apparent in indoor environments where small viewpoint
changes can trigger a large part of a scene to be occluded.
Other cases are observations undergoing cyclic dynamics
(doors opening and closing). Whilst the above-mentioned
behaviours are learned in clusters [11], in this work, points
with periodic dynamics are simply evaluated as occlusion
phenomena.
The probabilistic framework for data association devel-
oped in the section IV-B is a perfect fit to filter out in-
consistent data. 3D points giving a positive response to test
equation (12) are given a vote 1, or otherwise attributed a 0.
This gives rise to a confidence map C
i
(k) which is updated
as follows:
C
i
(k + 1) =
(
C
i
(k ) + 1, if λ
(95%)
M
< 5.991
0, otherwise
(15)
Algorithm 1 3D Points Pruning using Saliency map
Require:
˘
S
sal
, C
i
(k ), N, n
¯
return Optimal Set A
10%
S
sal
Initialise new A
for i=S
sal
(p
) = 1 to S
sal
(p
) = max do
compute p
(occur)
(p
i
)
compute γ
k+1
(p
i
)
compute p
(stable)
(p
i
)
if p
(stable)
(p
i
) 0.8 then
A[ i ] p
i
end if
if length(A[ i ]) A
10%
then
break
end if
end for
Hence, the probability of occurence is given by:
p(occur) =
C
i
(k + N)
N
, (16)
where n is the total number of accumulated views between
two consecutive keyframes. p(occur), though it gives an
indication on how many times a point has been tracked
along the trajectory, it can however not distinguish between
noisy data or an occlusion. Treading on a similar technique
to that adopted in [8], a Markov observation independence
is imposed. In the event that a landmark/3D point has been
detected at time instant k, it is most probable to appear again
at k + 1 irrespective of its past history. On the contrary, if it
has not been re-observed, this may mean that the landmark
is quite noisy/unstable or has been removed indeterminately
from the environment and has little chance to appear again.
These hypotheses are formally translated as follows:
γ
k+1
(p
) =
(
1, if p
k
= 1
(1 p(occur))
n
, otherwise
(17)
Finally, the overall stability of the point is given as:
p(stable) = γ
k+1
p(occur) (18)
D. Application to Saliency map
Instead of naively dropping out points below a certain
threshold, for e.g, p(stable) < 0.8, they are better pruned
out of a saliency map [13]. A saliency map, S
sal
, is the
outcome of careful selection of the most informative points,
best representing a 6 degree of freedom pose, x R
6
,
based on a Normal Flow Constraint spherical jacobian. The
underlying algorithm is outlined in algorithm (1). The green
and red sub-blocks in figure (3) represent the set of inliers
b
TT(x)
`
(
b
TT(x)
´
1
set Bset A
A B
w
A A
10%
B
w
A
10%
Fig. 3. Saliency map skimming

and outliers respectively, while the yellow one corresponds to
the set of pixels which belong to the universal set {U : U =
A B
w
} but which have not been pruned out. This happens
when the Keyframe criteria based on an entropy ratio α [7][9]
is reached. The latter is an abstraction of uncertainty related
to the pose x along the trajectory, whose behaviour shall be
discussed in the results section.
The novelty of this approach compared to the initial work
of [13] is two-fold. Firstly, the notion of uncertainty is
incorporated in spherical pixel tracking. Secondly, as new
incoming frames are acquired, rasterised and fused, the
information content of the initial model is enriched and hence
the saliency map needs updating. This gives a newly ordered
set of pixels to which is attributed a corresponding stability
factor. Based on this information, an enhanced pixel selection
is performed consisting of pixels with a greater chance of
occurence in the subsequent frame. This set of pixel shall
then be used for the forthcoming frame to keyframe motion
estimation task. Eventually, between an updated model at
time t
0
and the following re-initialised one, at t
n
, an optimal
mix of information sharing happens between the two.
V. EXPERIMENTS AND RESULTS
A new sensor for a large field of view RGBD image
acquisition has been used in this work. This device integrates
8 Asus Xtion Pro Live (Asus XPL) sensors as shown in figure
(4)(left) and allows to build a spherical representation. The
chosen configuration offers the advantage of creating full
360
RGBD images of the scene isometrically, i.e. the same
solid angle is assigned to each pixel. This permits to apply
directly some operations, like point cloud reconstruction,
photo consistency alignment or image subsampling. For more
information about sensor calibration and spherical RGBD
construction, the interested reader is referred to [5].
Our experimental test bench consists of the spherical sen-
sor embarked on a mobile experimental platform and driven
around in an indoor office building environment for a first
learning phase whilst spherical RGBD data is acquired online
and registered in a database. In this work, we do not address
the problem of the real time aspect of autonomous navigation
and mapping but rather investigate ways of building robust
and compact environment representations by capturing the
observability and dynamics of the latter. Along this stream-
line, reliable estimates coming from sensor data based on
3D geometrical structure are combined together to serve as
a useful purpose for later navigation and localisation tasks
which can thereafter be treated with better precision online.
Our algorithm has been extensively tested on a dataset
1
of
around 2500 frames covering a total distance of around 60m.
Figures 5(a), (b) illustrates the trajectories obtained from two
experimented methods, namely; RGBD registration without
(method 1) and with (method 2) keyframe fusion in order
to identify the added value of the fusion stage. The visible
discrepancy between the trajectories for the same pathway
1
video url:(http://www-sop.inria.fr/members/Tawsif.Gokhool/icra15.html)
highlights the effects of erroneous pose estimation that oc-
curs along the trajectories. This is even more emphasized by
inspecting the 3D structure of the reconstructed environment
as shown by figures (4)(centre), (4)(right) where the two
images correspond to the sequence with and without fusion;
method 1 and method 2 respectively. In detail, reconstruc-
tion with method 1 demonstrates the effects of duplicated
structures (especially surrounding wall structures) which is
explained by the fact that point clouds are not perfectly
stitched together on revisited areas due to inherent presence
of rotational drift, that is more pronounced than translational
drift. However, these effects are well reduced by the fusion
stage though not completely eliminated as illustrated in
figure 4(right). The red dots on the reconstruction images
are attributed to the reference spheres initialised along the
trajectories using the keyframe criteria described in section
III-A. 270 key spheres were recorded for method 1 while
only 67 were registered for method 2.
Finally, comparing figures 5(a), (b), the gain in compact-
ness for method 2 is clearly demonstrated by the sparse
positioning of keyframes in the skeletal pose graph structure.
Figure 5(c) depicts the behaviour of our keyframe selection
entropy-based criteria α . The threshold for α is generally
heuristically tuned. For method 1, a preset value of 0.96 was
used based on the number of iterations to convergence of the
cost function outlined in section III-A. With the fusion stage,
the value of α was allowed to drop to 0.78 with generally
faster convergence achieved. Figure 5(d) confirms the motion
estimation quality of method 2 as it exhibits a lower error
norm across frames as compared to method 1. Figure (1)
depicts the quality of the depth map before and after applying
the filtering technique. The colours in the figure represent
normal orientations with respect to the underlying surfaces.
VI. CONCLUSION
In this work, a framework for hybrid metric and topo-
logical maps in a single compact skeletal pose graph rep-
resentation has been proposed. Two methods have been
experimented and the importance of data fusion has been
highlighted with the benefits of reducing data noise, re-
dundancy, tracking drift as well as maintaining a compact
environment representation using keyframes.
Our activities are centered around building robust and sta-
ble environment representations for lifelong navigation and
map building. Future investigations will be directed towards
gathering more semantic information about the environment
to be able to further exploit the quality and stability of the
encompassing 3D structure.
ACKNOWLEDGMENT
The authors extend their warmest gratitude to Maxime
Meilland and Panagiotis Papadakis for their critical insights
and suggestions in improving the material of this paper.
REFERENCES
[1] H. Badino, D. Huber, and T. Kanade. Visual Topometric Localization.
In Intelligent Vehicles Symposium, Baden-Baden, Germany, June 2011.

Citations
More filters
Journal ArticleDOI
16 Jan 2019-Sensors
TL;DR: In this article, the relative poses between RGB-D cameras with minimal overlapping fields of view are estimated by using descriptor-based patterns to provide well-matched 2D keypoints in the case of a minimal overlapping field of view between cameras.
Abstract: This paper presents a novel method to estimate the relative poses between RGB-D cameras with minimal overlapping fields of view. This calibration problem is relevant to applications such as indoor 3D mapping and robot navigation that can benefit from a wider field of view using multiple RGB-D cameras. The proposed approach relies on descriptor-based patterns to provide well-matched 2D keypoints in the case of a minimal overlapping field of view between cameras. Integrating the matched 2D keypoints with corresponding depth values, a set of 3D matched keypoints are constructed to calibrate multiple RGB-D cameras. Experiments validated the accuracy and efficiency of the proposed calibration approach.

13 citations

Dissertation
05 Jun 2015
TL;DR: An approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and geometric information available from the augmented spherical database.
Abstract: Our aim is concentrated around building ego-centric topometric maps represented as a graph of keyframe nodes which can be efficiently used by autonomous agents. The keyframe nodes which combines a spherical image and a depth map (augmented visual sphere) synthesises information collected in a local area of space by an embedded acquisition system. The representation of the global environment consists of a collection of augmented visual spheres that provide the necessary coverage of an operational area. A "pose" graph that links these spheres together in six degrees of freedom, also defines the domain potentially exploitable for navigation tasks in real time. As part of this research, an approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and ; geometric information available from our augmented spherical database ; how to determine the quantity and optimal placement of these augmented spheres to cover an environment completely ; how tomodel sensor uncertainties and update the dense infomation of the augmented spheres ; how to compactly represent the information contained in the augmented sphere to ensure robustness, accuracy and stability along an explored trajectory by making use of saliency maps.

7 citations


Cites methods from "A compact spherical RGBD keyframe-b..."

  • ...Part of the conceptual formulation and experimental evaluations of this chapter has been published in [Gokhool et al. 2015]....

    [...]

  • ...La formulation et l’évaluation expérimentale correspondante ont été en partie présentée dans une conférence internationale [Gokhool et al. 2015]....

    [...]

Proceedings ArticleDOI
17 Dec 2015
TL;DR: This paper presents a methodology to combine information from a sequence of RGB-D spherical views acquired by a home-made multi-stereo device in order to improve the computed depth images both in terms of accuracy and completeness.
Abstract: This paper presents a methodology to combine information from a sequence of RGB-D spherical views acquired by a home-made multi-stereo device in order to improve the computed depth images both in terms of accuracy and completeness. This methodology is embedded in a larger visual mapping framework aiming to produce accurate and dense topometric urban maps. Our method is based on two main filtering stages. Firstly, we perform a segmentation process considering both geometric and photometric image constraints, followed by a regularization step (spatial-integration). We then proceed to a fusion stage where the geometric information is further refined by considering the depth images of nearby frames (temporal integration). This methodology can be applied to other projective models, such as perspective stereo images. Our approach is evaluated within the frameworks of image registration, localization and mapping, demonstrating higher accuracy and larger convergence domains over different datasets.

6 citations


Cites methods from "A compact spherical RGBD keyframe-b..."

  • ...For this, we follow the approach of [5] which is briefly summarized for the sake of completeness....

    [...]

  • ...As stated previously, the presented methodology is directly related to [10] and [5]....

    [...]

  • ...The fusion relies on our previous work [5], where coherent regularized frames are merged in a single keyframe, taking into account the related uncertainties and their co-visibility....

    [...]

  • ...By last, we extend the method proposed in [5] by exploiting the rigidity of neighbourhoods through a joint depth – color segmentation, which has a clear improvement in the maps produced, specially when considering the more challenging data coming from stereo outdoor sequences....

    [...]

Proceedings ArticleDOI
17 Dec 2015
TL;DR: This work proposes a new approach to update maps pertaining to large-scale dynamic environments with semantics with semantics which is able to build a stable representation with only two observations of the environment.
Abstract: Mapping evolving environments requires an update mechanism to efficiently deal with dynamic objects. In this context, we propose a new approach to update maps pertaining to large-scale dynamic environments with semantics. While previous works mainly rely on large amount of observations, the proposed framework is able to build a stable representation with only two observations of the environment. To do this, scene understanding is used to detect dynamic objects and to recover the labels of the occluded parts of the scene through an inference process which takes into account both spatial context and a class occlusion model. Our method was evaluated on a database acquired at two different times with an interval of three years in a large dynamic outdoor environment. The results point out the ability to retrieve the hidden classes with a precision score of 0.98. The performances in term of localisation are also improved.

5 citations


Additional excerpts

  • ...At a larger scale, all submaps are positioned in the scene thanks to a dense visual odometry method presented in [11] and constitute a global graph of the environment....

    [...]

Book ChapterDOI
20 Nov 2016
TL;DR: This work proposes an activation function based on the conditioning of the RGB and ICP point-to-plane error terms that strengthens the geometric error influence in the first coarse iterations, while the intensity data term dominates in the finer increments.
Abstract: Dense direct RGB-D registration methods are widely used in tasks ranging from localization and tracking to 3D scene reconstruction. This work addresses a peculiar aspect which drastically limits the applicability of direct registration, namely the weakness of the convergence domain. First, we propose an activation function based on the conditioning of the RGB and ICP point-to-plane error terms. This function strengthens the geometric error influence in the first coarse iterations, while the intensity data term dominates in the finer increments. The information gathered from the geometric and photometric cost functions is not only considered for improving the system observability, but for exploiting the different convergence properties and convexity of each data term. Next, we develop a set of strategies as a flexible regularization and a pixel saliency selection to further improve the quality and robustness of this approach. The methodology is formulated for a generic warping model and results are given using perspective and spherical sensor models. Finally, our method is validated in different RGB-D spherical datasets, including both indoor and outdoor real sequences and using the KITTI VO/SLAM benchmark dataset. We show that the different proposed techniques (weighted activation function, regularization, saliency pixel selection), lead to faster convergence and larger convergence domains, which are the main limitations to the use of direct methods.

4 citations


Cites methods from "A compact spherical RGBD keyframe-b..."

  • ...Finally, our regularization method, which is an extension of [24] [25], is directly related to [26] which perform a region growing using simultaneously intensity and geometric contours....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: A new framework for visual place recognition is proposed that incrementally learns models of each place and offers adaptability to dynamic elements in the scene and shows improvement in recognition performance when compared to state-of-the-art image retrieval approaches to place recognition.
Abstract: This paper proposes a new framework for visual place recognition that incrementally learns models of each place and offers adaptability to dynamic elements in the scene. Traditional Bag-Of-Words (BOW) image-retrieval approaches to place recognition typically treat images in a holistic manner and are not capable of dealing with sub-scene dynamics, such as structural changes to a building facade or seasonal effects on foliage. However, by treating local features as observations of real-world landmarks in a scene that is observed repeatedly over a period of time, such dynamics can be modelled at a local level, and the spatio-temporal properties of each landmark can be independently updated incrementally. The method proposed models each place as a set of such landmarks and their geometric relationships. A new BOW filtering stage and geometric verification scheme are introduced to compute a similarity score between a query image and each scene model. As further training images are acquired for each place, the landmark properties are updated over time and in the long term, the model can adapt to dynamic behaviour in the scene. Results on an outdoor dataset of images captured along a 7 km path, over a period of 5 months, show an improvement in recognition performance when compared to state-of-the-art image retrieval approaches to place recognition.

41 citations


"A compact spherical RGBD keyframe-b..." refers methods in this paper

  • ...Treading on a similar technique to that adopted in [8], a Markov observation independence is imposed....

    [...]

Proceedings ArticleDOI
07 Jun 2006
TL;DR: A hybrid method using laser rangefinders and vision for building local 2D metrical maps that incorporate safety information (called local safety maps), and a method for removing noise from dense stereo data using motion are presented.
Abstract: To be useful as a mobility assistant for a human driver, an intelligent robotic wheelchair must be able to distinguish between safe and hazardous regions in its immediate environment. We present a hybrid method using laser rangefinders and vision for building local 2D metrical maps that incorporate safety information (called local safety maps). Laser range-finders are used for localization and mapping of obstacles in the 2D laser plane, and vision is used for detection of hazards and other obstacles in 3D space. The hazards and obstacles identified by vision are projected into the travel plane of the robot and combined with the laser map to construct the local 2D safety map. The main contributions of this work are (i) the definition of a local 2D safety map, (ii) a hybrid method for building the safety map, and (iii) a method for removing noise from dense stereo data using motion.

36 citations


"A compact spherical RGBD keyframe-b..." refers methods in this paper

  • ...Inspired by the work of [16], the Bayesian framework for data association leads us to:...

    [...]

Proceedings Article
01 Jan 2014
TL;DR: This work proposes to tackle the pose estimation problem by using both photometric and geometric information in a direct RGBD image registration method and a pose graph representation, whereby, given a database of augmented visual spheres, a travelled trajectory with redundant information is pruned out to a skeletal pose graph.
Abstract: Visual mapping is a required capability for practical autonomous mobile robots where there exists a growing industry with applications ranging from the service to industrial sectors. Prior to map building, Visual Odometry(VO) is an essential step required in the process of pose graph construction. In this work, we first propose to tackle the pose estimation problem by using both photometric and geometric information in a direct RGBD image registration method. Secondly, the mapping problem is tackled with a pose graph representation, whereby, given a database of augmented visual spheres, a travelled trajectory with redundant information is pruned out to a skeletal pose graph. Both methods are evaluated with data acquired with a recently proposed omnidirectional RGBD sensor for indoor environments.

19 citations


"A compact spherical RGBD keyframe-b..." refers background or methods in this paper

  • ...This happens when the Keyframe criteria based on an entropy ratio α [7][9] is reached....

    [...]

  • ...The relative location between raw spheres is obtained using a visual spherical registration procedure [12] [7]....

    [...]

Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "A compact spherical rgbd keyframe-based representation" ?

This paper proposes an environmental representation approach based on hybrid metric and topological maps as a key component for mobile robot navigation. With the aim of reducing data redundancy, suppress sensor noise whilst maintaining a dense compact representation of the environment, neighbouring augmented spheres are fused in a single representation.