scispace - formally typeset
Open AccessProceedings ArticleDOI

A compact spherical RGBD keyframe-based representation

TLDR
An environmental representation approach based on hybrid metric and topological maps as a key component for mobile robot navigation is proposed and an uncertainty error model propagation is formulated for outlier rejection and data fusion.
Abstract
This paper proposes an environmental representation approach based on hybrid metric and topological maps as a key component for mobile robot navigation. Focus is made on an ego-centric pose graph structure by the use of Keyframes to capture the local properties of the scene. With the aim of reducing data redundancy, suppress sensor noise whilst maintaining a dense compact representation of the environment, neighbouring augmented spheres are fused in a single representation. To this end, an uncertainty error model propagation is formulated for outlier rejection and data fusion, enhanced with the notion of landmark stability over time. Finally, our algorithm is tested thoroughly on a newly developed wide angle 360° field of view (FOV) spherical sensor where improvements such as trajectory drift, compactness and reduced tracking error are demonstrated.

read more

Content maybe subject to copyright    Report

A Compact Spherical RGBD Keyframe-based Representation
Tawsif Gokhool, Renato Martins, Patrick Rives and No¨ela Despr´e
Abstract This paper proposes an environmental represen-
tation approach based on hybrid metric and topological maps
as a key component for mobile robot navigation. Focus is
made on an ego-centric pose graph structure by the use
of Keyframes to capture the local properties of the scene.
With the aim of reducing data redundancy, suppress sensor
noise whilst maintaining a dense compact representation of
the environment, neighbouring augmented spheres are fused
in a single representation. To this end, an uncertainty error
model propagation is formulated for outlier rejection and data
fusion, enhanced with the notion of landmark stability over
time. Finally, our algorithm is tested thoroughly on a newly
developed wide angle 360
0
field of view (FOV) spherical sensor
where improvements such as trajectory drift, compactness and
reduced tracking error are demonstrated.
I. INTRODUCTION
Visual mapping is a required capability for autonomous
robots and a key component for long term navigation and
localisation. Inherent limitations of GPS in terms of dropping
accuracy in densely populated urban environments or lack
of observability in indoor settings have propelled the ap-
plications of visual mapping techniques. Moreover, the rich
content provided by vision sensors maximises the description
of the environment.
A metric map emerges from a locally defined reference
frame and all information acquired along the trajectory is
then represented in relation to this reference frame. Hence,
localisation in a metric map is represented by a pose ob-
tained from frame to frame odometry. The benefit of this
approach is its precision at the local level allowing precise
navigation. On the other hand, this representation is prone
to drift phenomena which becomes significant over extended
trajectories. Furthermore, the amount of information that is
accumulated introduces considerable problems in memory
management due to limited capacity.
By contrast, a topological map provides a good trade-off
solution to the above-mentioned issues. Distinct places of
the environment are defined by nodes and joined together by
edges representing the accessibility between places generat-
ing a graph. This representation not only brings about a good
abstraction level of the environment but common tasks such
as homing, navigation, exploration and path planning become
more efficient. However, the lack of metric information may
render some of these tasks precision deficient as only rough
*This work is funded by EADS/Airbus under contract number N.7128
and by CNPq of Brazil under contract number 216026/2013-0
T. Gokhool, R. Martins and P. Rives are with INRIA Sophia-Antipolis,
France {tawsif.gokhool, renato-jose.martins,
patrick.rives}@inria.fr
No¨ela Despr´e is with Airbus Defence and Space, Toulouse, France
noela.despre@astrium.eads.net
Fig. 1. Normal consistency between raw and improved spherical keyframe
model of one of the nodes of the skeletal pose graph. The colours in
the figure encode surface normal orientations, where better consistence is
achieved when local planar regions depicts the same colour
estimates are available. Eventually, to maximise the benefits
of these two complementary approaches, it is needed to strike
the right balance between metric and topological maps which
leads us to the term topo-metric maps as in [1].
In this context, accurate and compact 3D environment
modelling and reconstruction has drawn increased interests
within the vision and robotics community over the years as
it is perceived as a vital tool for Visual SLAM techniques in
realising tasks such as localisation, navigation, exploration
and path planning [3]. Planar representation of the envi-
ronment achieve fast localisation [6], but they ignore other
useful information in the scene. Space discretisation using
volumetric methods such as signed distance functions [17],
with combined sparse representations using octrees [18],
though they have received widespread attention recently due
to their reconstruction quality do however present certain
caveats. Storage capacity and computational burden restrict
their application to limited scales. To provide alternatives,
a different approach consisting of multi-view rendering was
proposed in [15]. Depth maps captured in a window of n
views are rendered in a central view taken as the reference
frame and are fused based on a stability analysis of the
projected measurements. On a similar note, recent works
undertaken in [12] adopted a multi-keyframe fusion approach
by forward warping and blending to create a novel synthe-
sized depth map.
When exploring vast scale environments, many frames
sharing redundant information clutter the memory space
considerably. Furthermore, performing frame to frame regis-
tration introduces drift in the trajectory due to uncertainty in
the estimated pose as pointed out in [9]. The idea to retain
keyframes based on predefined criteria proves very useful
within a sparse skeletal pose graph, which is the chosen

model for our compact environment representation.
II. METHODOLOGY AND CONTRIBUTIONS
Our aim is concentrated around building ego-centric topo-
metric maps represented as a graph of keyframes, spread
by spherical RGBD nodes. A locally defined geo-referenced
frame is taken as an initial model which is then refined over
the course of the trajectory by neighbouring frames’ render-
ing. This not only reduces data redundancy but also help in
suppressing sensor noise whilst contributing significantly in
drift reduction. A generic uncertainty propagation model is
devised and leaned upon a data association framework for
discrepancy detection between the measured and observed
data. We build upon the above two concepts to introduce the
notion of landmark stability over time. This is an important
aspect for intelligent 3D points selection which serve as
better potential candidates for subsequent inter frame to
keyframe motion estimation task.
This work is directly related to two previous syntheses of
[4] and [12]. The former differs from our approach in the
sense that it is a sparse feature based technique and only
consider depth information propagation. On the other hand,
the latter aboard a similar dense based method to ours but
without considering error models and a simpler Mahalanobis
test defines the hypothesis for outlier rejection. Additionally,
we build on a previous work of [13] which constitutes of
building a saliency map for each reference model, by adding
the concept of stability of the underlying 3D structure.
Key contributions of this work are outlined as follows:
development of a generic spherical uncertainty error
propagation model, which can be easily adapted to other
sensor models (e.g. perspective RGBD cameras)
a coherent dense outlier rejection and data fusion
framework relying on the proposed uncertainty model,
yielding more precise spherical keyframes
dynamic 3D points are tracked along the trajectory and
are pruned out by skimming down a saliency map
The rest of this paper is organised as follows: basic
concepts are first introduced and shall serve as the backbone
to allow a smooth transition between respective sections.
Then the uncertainty error model is presented followed by
the fusion stage. Subsequently, the problem of dynamic 3D
points is tackled leading to a direct application of the saliency
map. An experimental and results section verifies all of the
above derived concepts before wrapping up with a final
conclusion and some perspectives for future work.
III. PRELIMINARIES
An augmented spherical image S = {I, D} is composed
of I [0 , 1]
m×n
as pixel intensities and D R
m×n
as the
depth information for each pixel in I. The basic environment
representation consists of a set of spheres acquired over
time together with a set of rigid transforms T SE(3)
connecting adjacent spheres (e.g. T
ij
lies S
j
and S
i
) this
representation is well described in [14].
The spherical images are encoded in a 2D image and the
mapping between the image pixel coordinates p and depth to
cartesian coordinates is given by g : (u, v, 1 ) 7→ q, g(p) =
ρq
S
(p), with q
S
being the point representation in the unit
spherical space S
2
and ρ = D(p) is radial depth. The inverse
transform g
1
corresponds to the spherical projection model.
Point coordinates correspondences between spheres are
given by the warping function w, under observability condi-
tions at different viewpoints. Given a pixel coordinate p
, its
coordinate p in another sphere related by a rigid transform
T is given by a screw transform from the 3D q = g(p
)
followed by a spherical projection:
p = w(p
, T) = g
1
[I 0]T
1
»
g(p
)
1
–«
, (1)
where I is a (3 ×3) identity matrix and 0 is a (3 ×1) zeros
vector. In the following, spherical RGBD registration and
keyframe based environment representations are introduced.
A. Spherical Registration and Keyframe Selection
The relative location between raw spheres is obtained
using a visual spherical registration procedure [12] [7]. Given
a first pose estimate
b
T of T =
b
TT(x), the sphere to sphere
registration problem consists of incrementally estimating a
6 degrees of freedom (DOF) pose x R
6
, through the
minimization of a robust hybrid photometric and geometric
error cost function F
S
=
1
2
ke
I
k
2
I
+
λ
2
2
ke
ρ
k
2
D
, which can be
written explicitly as:
F
S
=
1
2
P
p
W
I
(p
)
I
w(p
,
b
TT(x))
I
`
p
´
2
+
λ
2
2
P
p
W
D
(p
)
n
T
g(w(p
,
b
TT(x)))
b
TT(x)g
(p
)
2
(2)
where w() is the warping function as in (1), λ is a tuning
parameter to effectively balance the two cost functions,
W
I
and W
D
are weights relating the confidence of each
measure and n is the normal computed from the cross
product of adjacent points from q(p
). The linearization
of the over-determined system of equations (2) leads to
a classic iterative Least Mean Squares (ILMS) solution.
Furthermore for computational efficiency, one can choose a
subset of more informative pixels (salient points) that yield
enough constraints over the 6DOF, without compromising
the accuracy of the pose estimate.
This simple registration procedure applied for each se-
quential pair of spheres allows to represent the scene struc-
ture, but subjected to cumulative VO errors and scalability
issues due to long-term frame accumulation. The idea to
retain just some spherical keyframes, based on predefined
criteria, proves very useful to overcome these issues and
generate a more compact graph scene representation. A
criteria based on differential entropy approach [9] has been
applied in this work for keyframe selection.
IV. SPHERICAL UNCERTAINTY PROPAGATION AND
MODEL FUSION
Our approach to topometric map building is an egocentric
representation operating locally on sensor data. The con-
cept of proximity used to combine information is evaluated

Reference
Sphere
Current
Sphere
Saliency
Map
Spherical
VO
Keyframe
Selection
Dynamic
Points
Filtering
3D Map
Model
Spherical
keyframe
Fusion
Warping and
error model
Fig. 2. Full system pipeline
mainly with the entropy similarity criteria after the regis-
tration procedure. Instead of performing a complete bundle
adjustment along all parameters including poses and structure
for the full set of close raw spheres S
i
to the related keyframe
model S
, the procedure is done incrementally in two stages.
The concept is as follows: primarily, given a reference
sphere S
and a candidate sphere S, the cost function in
(2) is employed to extract T =
b
TT(x) and the entropy
criteria is applied for a similarity measure between the tuple
{S
, S}. While this metric is below a predefined threshold,
the keyframe model is refined in a second stage warping
S and carrying out a geometric and photometric fusion
procedure is composed of three steps:
warping S and its resulting model error propagation
data fusion with occlusions and outlier rejection
wise 3D point selection based on stable salient points
which are detailed in the following subsections. The full
system pipeline is depicted in figure (2).
A. Warped Sphere Uncertainty
Warping the augmented sphere S generates a synthetic
view of the scene S
w
= {I
w
, D
w
}, as it would appear
from a new viewpoint. This section aims to represent the
confidence of the elements in S
w
, which clearly depends
on the combination of an apriori pixel position, the depth
and the pose errors over a set of geometric and projective
operations the warping function as in (1). Starting with
D
w
, the projected depth image is
D
w
(p
) = D
t
(w( p
, T)) and D
t
(p) =
p
q
w
(p, T)
q
w
(p, T)
with q
w
(p, T) =
[I 0]T
»
g(p)
1
–«
(3)
The uncertainty of the final warped depth σ
2
D
w
then depends
on two terms Σ
w
and σ
2
D
t
; the former relates to the error
due to the warping w of pixel correspondences between two
spheres and the latter, to the depth image representation in
the reference coordinate system σ
2
D
t
.
Before introducing these two terms, let’s represent the
uncertainty due to the combination of pose T and a cartesian
3D point q errors. Taking a first order approximation of
q = g(p) = ρq
S
, the error can be decomposed as:
Σ
q
(p) = σ
2
ρ
q
S
q
S
+ ρ
2
Σ
q
S
=
σ
2
ρ
ρ
2
g(p)g(p)
+ ρ
2
Σ
g(p)
(4)
The basic error model for the raw depth is proportional
to its fourth degree itself: σ
2
ρ
ρ
4
, which can be applied
to both stereopsis and active depth measurement systems
(for instance see [10] for details). The next step consists
of combining the uncertain rigid transform T with the errors
in q. Given the mean of the 6DOF ¯x = {t
x
, t
y
, t
z
, θ, φ, ψ}
in 3D+YPR form and its covariance Σ
x
, for q
w
(p, T) =
Rq + t = Rg(p) + t ,
Σ
q
w
(p, T) = J
q
(q, ¯x)Σ
q
J
q
(q, ¯x)
+ J
T
(q, ¯x)Σ
x
J
T
(q, ¯x)
= RΣ
q
R
+
x
M
,
(5)
with Σ
q
as in (4) and the general formula of M is given in
[2]. The first term Σ
w
using (5) and (1) is given by:
Σ
w
(p
, T) = J
g
1
(q
w
())Σ
q
w
()J
g
1
(q
w
())
(6)
where J
g
1
is the jacobian of the spherical projection (the
inverse of g) and () = (p
, T
1
). The second term
expression for the depth represented in the coordinate system
of the reference sphere using the warped 3D point in (5) and
(3) is straightforward
σ
2
D
t
(p
, T) = J
D
t
(q
w
(p
, T))Σ
q
w
(p
, T)J
D
t
(q
w
(p
, T))
(7)
with J
D
t
(z) = (z
/
z
z).
The uncertainty index σ
2
D
w
is then the normalized covari-
ance given by:
σ
2
D
w
(p) = σ
2
D
t
(p)/(q
w
(p, T)
q
w
(p, T))
2
(8)
Finally, under the assumption of Lambertian surfaces, the
photometric component is simply I
w
(p
) = I(w(p
, T))
and it’s uncertainty σ
2
I
is set by a robust weighting function
on the error using a Huber’s M-estimator as in [14].
B. Spherical Keyframe Fusion
Before combining the keyframe reference model S
with
that of the transformed observation S
w
, a probabilistic test is
performed to exclude outlier pixel measurements from S
w
,
allowing fusion to occur only if the raw observation agrees
with its corresponding value in S
.
Hence, the tuple A = { D
, D
w
} and B = {I
, I
w
}
are defined as the sets of model predicted and measured
depth and intensity values respectively. Finally, let a class
c : D
(p) = D
w
(p) relate to the case where the measure-
ment value agrees with its corresponding observation value.
Inspired by the work of [16], the Bayesian framework for
data association leads us to:
p(c|A, B) =
p(A,B|c)p(c)
p(A,B)
(9)
Applying independence rule between depth and visual prop-
erties and assuming a uniform prior on the class c (can also
be learned from supervised techniques), the above expression
simplifies to:
p(c|A, B) p(A|c)p(B|c) (10)
Treating each term independently, the first term of equation
(10) is devised as p(A|c) = p(D
w
(p)|D
(p), c), whereby
marginalizing over the true depth value ρ
i
leads to:
p(D
w
(p)|D
(p), c) =
Z
p(D
w
(p)|ρ
i
, D
(p), c)p(ρ
i
|D
(p), c)
i
(11)

Approximating both probability density functions as
Gaussians, the above integral reduces to: p(A|c) =
N(D
w
(p)|D
(p), σ
D
w
(p), σ
D
(p)). Following a similar
treatment, p(B|c) = N(I
w
(p)|I
(p), σ
I
w
(p), σ
I
(p)).
Since equation (10) can be viewed as a likelihood function,
it is easier to analytically work with its logarithm in order to
extract a decision boundary. Plugging p(A|c) and p(B|c) into
(10) and taking its negative log gives the following decision
rule for an inlier observation value:
(D
w
(p) D
(p))
2
σ
2
D
w
(p) + σ
2
D
(p)
+
(I
w
(p) I
(p))
2
σ
2
I
w
(p) + σ
2
I
(p)
< λ
2
M
, (12)
relating to the square of the Mahalanobis distance. The
threshold λ
2
M
is obtained by looking up the χ
2
2
table.
Ultimately, we close up with a classic fusion stage,
whereby depth and appearance based consistencies are co-
alesced to obtain an improved estimate of the keyframe
sphere. Warped values that pass the test in (12) are fused
up by combining their respective uncertainties as follows:
I
k+1
(p) =
W
I
k
(p)I
k
(p) + Π
I
(p)I
w
(p)
W
I
k
(p) + Π
I
(p)
,
D
k+1
(p) =
W
D
k
(p)D
k
(p) + Π
D
(p)D
w
(p)
W
D
k
(p) + Π
D
(p)
(13)
for the intensity and depth values respectively and weight
update:
W
I
k+1
= W
I
k
+ Π
I
and W
D
k+1
= W
D
k
+ Π
D
(14)
where Π
I
(p) = 1
2
I
w
(p) and Π
D
(p) = 1/σ
2
D
w
(p) from
the uncertainty warp propagation in sec. IV-A.
C. Dynamic 3D points filtering
So far, the problem of data fusion of consistent estimates
in a local model has been addressed. But to improve the
performance of any model, another important aspect of any
mapping system is to limit if not completely eliminate the
negative effects of dynamic points. These points exhibit
erratic behaviours along the trajectory and as a matter of fact,
they are highly unstable. There are however different levels
of “dynamicity as mentioned in [11]. Points/landmarks
observed can exhibit a gradual degradation over time, while
others may undergo a sudden brutal change the case of
an occlusion for example. The latter being considerably
apparent in indoor environments where small viewpoint
changes can trigger a large part of a scene to be occluded.
Other cases are observations undergoing cyclic dynamics
(doors opening and closing). Whilst the above-mentioned
behaviours are learned in clusters [11], in this work, points
with periodic dynamics are simply evaluated as occlusion
phenomena.
The probabilistic framework for data association devel-
oped in the section IV-B is a perfect fit to filter out in-
consistent data. 3D points giving a positive response to test
equation (12) are given a vote 1, or otherwise attributed a 0.
This gives rise to a confidence map C
i
(k) which is updated
as follows:
C
i
(k + 1) =
(
C
i
(k ) + 1, if λ
(95%)
M
< 5.991
0, otherwise
(15)
Algorithm 1 3D Points Pruning using Saliency map
Require:
˘
S
sal
, C
i
(k ), N, n
¯
return Optimal Set A
10%
S
sal
Initialise new A
for i=S
sal
(p
) = 1 to S
sal
(p
) = max do
compute p
(occur)
(p
i
)
compute γ
k+1
(p
i
)
compute p
(stable)
(p
i
)
if p
(stable)
(p
i
) 0.8 then
A[ i ] p
i
end if
if length(A[ i ]) A
10%
then
break
end if
end for
Hence, the probability of occurence is given by:
p(occur) =
C
i
(k + N)
N
, (16)
where n is the total number of accumulated views between
two consecutive keyframes. p(occur), though it gives an
indication on how many times a point has been tracked
along the trajectory, it can however not distinguish between
noisy data or an occlusion. Treading on a similar technique
to that adopted in [8], a Markov observation independence
is imposed. In the event that a landmark/3D point has been
detected at time instant k, it is most probable to appear again
at k + 1 irrespective of its past history. On the contrary, if it
has not been re-observed, this may mean that the landmark
is quite noisy/unstable or has been removed indeterminately
from the environment and has little chance to appear again.
These hypotheses are formally translated as follows:
γ
k+1
(p
) =
(
1, if p
k
= 1
(1 p(occur))
n
, otherwise
(17)
Finally, the overall stability of the point is given as:
p(stable) = γ
k+1
p(occur) (18)
D. Application to Saliency map
Instead of naively dropping out points below a certain
threshold, for e.g, p(stable) < 0.8, they are better pruned
out of a saliency map [13]. A saliency map, S
sal
, is the
outcome of careful selection of the most informative points,
best representing a 6 degree of freedom pose, x R
6
,
based on a Normal Flow Constraint spherical jacobian. The
underlying algorithm is outlined in algorithm (1). The green
and red sub-blocks in figure (3) represent the set of inliers
b
TT(x)
`
(
b
TT(x)
´
1
set Bset A
A B
w
A A
10%
B
w
A
10%
Fig. 3. Saliency map skimming

and outliers respectively, while the yellow one corresponds to
the set of pixels which belong to the universal set {U : U =
A B
w
} but which have not been pruned out. This happens
when the Keyframe criteria based on an entropy ratio α [7][9]
is reached. The latter is an abstraction of uncertainty related
to the pose x along the trajectory, whose behaviour shall be
discussed in the results section.
The novelty of this approach compared to the initial work
of [13] is two-fold. Firstly, the notion of uncertainty is
incorporated in spherical pixel tracking. Secondly, as new
incoming frames are acquired, rasterised and fused, the
information content of the initial model is enriched and hence
the saliency map needs updating. This gives a newly ordered
set of pixels to which is attributed a corresponding stability
factor. Based on this information, an enhanced pixel selection
is performed consisting of pixels with a greater chance of
occurence in the subsequent frame. This set of pixel shall
then be used for the forthcoming frame to keyframe motion
estimation task. Eventually, between an updated model at
time t
0
and the following re-initialised one, at t
n
, an optimal
mix of information sharing happens between the two.
V. EXPERIMENTS AND RESULTS
A new sensor for a large field of view RGBD image
acquisition has been used in this work. This device integrates
8 Asus Xtion Pro Live (Asus XPL) sensors as shown in figure
(4)(left) and allows to build a spherical representation. The
chosen configuration offers the advantage of creating full
360
RGBD images of the scene isometrically, i.e. the same
solid angle is assigned to each pixel. This permits to apply
directly some operations, like point cloud reconstruction,
photo consistency alignment or image subsampling. For more
information about sensor calibration and spherical RGBD
construction, the interested reader is referred to [5].
Our experimental test bench consists of the spherical sen-
sor embarked on a mobile experimental platform and driven
around in an indoor office building environment for a first
learning phase whilst spherical RGBD data is acquired online
and registered in a database. In this work, we do not address
the problem of the real time aspect of autonomous navigation
and mapping but rather investigate ways of building robust
and compact environment representations by capturing the
observability and dynamics of the latter. Along this stream-
line, reliable estimates coming from sensor data based on
3D geometrical structure are combined together to serve as
a useful purpose for later navigation and localisation tasks
which can thereafter be treated with better precision online.
Our algorithm has been extensively tested on a dataset
1
of
around 2500 frames covering a total distance of around 60m.
Figures 5(a), (b) illustrates the trajectories obtained from two
experimented methods, namely; RGBD registration without
(method 1) and with (method 2) keyframe fusion in order
to identify the added value of the fusion stage. The visible
discrepancy between the trajectories for the same pathway
1
video url:(http://www-sop.inria.fr/members/Tawsif.Gokhool/icra15.html)
highlights the effects of erroneous pose estimation that oc-
curs along the trajectories. This is even more emphasized by
inspecting the 3D structure of the reconstructed environment
as shown by figures (4)(centre), (4)(right) where the two
images correspond to the sequence with and without fusion;
method 1 and method 2 respectively. In detail, reconstruc-
tion with method 1 demonstrates the effects of duplicated
structures (especially surrounding wall structures) which is
explained by the fact that point clouds are not perfectly
stitched together on revisited areas due to inherent presence
of rotational drift, that is more pronounced than translational
drift. However, these effects are well reduced by the fusion
stage though not completely eliminated as illustrated in
figure 4(right). The red dots on the reconstruction images
are attributed to the reference spheres initialised along the
trajectories using the keyframe criteria described in section
III-A. 270 key spheres were recorded for method 1 while
only 67 were registered for method 2.
Finally, comparing figures 5(a), (b), the gain in compact-
ness for method 2 is clearly demonstrated by the sparse
positioning of keyframes in the skeletal pose graph structure.
Figure 5(c) depicts the behaviour of our keyframe selection
entropy-based criteria α . The threshold for α is generally
heuristically tuned. For method 1, a preset value of 0.96 was
used based on the number of iterations to convergence of the
cost function outlined in section III-A. With the fusion stage,
the value of α was allowed to drop to 0.78 with generally
faster convergence achieved. Figure 5(d) confirms the motion
estimation quality of method 2 as it exhibits a lower error
norm across frames as compared to method 1. Figure (1)
depicts the quality of the depth map before and after applying
the filtering technique. The colours in the figure represent
normal orientations with respect to the underlying surfaces.
VI. CONCLUSION
In this work, a framework for hybrid metric and topo-
logical maps in a single compact skeletal pose graph rep-
resentation has been proposed. Two methods have been
experimented and the importance of data fusion has been
highlighted with the benefits of reducing data noise, re-
dundancy, tracking drift as well as maintaining a compact
environment representation using keyframes.
Our activities are centered around building robust and sta-
ble environment representations for lifelong navigation and
map building. Future investigations will be directed towards
gathering more semantic information about the environment
to be able to further exploit the quality and stability of the
encompassing 3D structure.
ACKNOWLEDGMENT
The authors extend their warmest gratitude to Maxime
Meilland and Panagiotis Papadakis for their critical insights
and suggestions in improving the material of this paper.
REFERENCES
[1] H. Badino, D. Huber, and T. Kanade. Visual Topometric Localization.
In Intelligent Vehicles Symposium, Baden-Baden, Germany, June 2011.

Citations
More filters
Journal ArticleDOI

A Novel Method for Extrinsic Calibration of Multiple RGB-D Cameras Using Descriptor-Based Patterns.

TL;DR: In this article, the relative poses between RGB-D cameras with minimal overlapping fields of view are estimated by using descriptor-based patterns to provide well-matched 2D keypoints in the case of a minimal overlapping field of view between cameras.
Dissertation

A compact RGB-D map representation dedicated to autonomous navigation

TL;DR: An approach to map-based representation has been proposed by considering the following issues : how to robustly apply visual odometry by making the most of both photometric and geometric information available from the augmented spherical database.
Proceedings ArticleDOI

Dense accurate urban mapping from spherical RGB-D images

TL;DR: This paper presents a methodology to combine information from a sequence of RGB-D spherical views acquired by a home-made multi-stereo device in order to improve the computed depth images both in terms of accuracy and completeness.
Proceedings ArticleDOI

Hybrid metric-topological-semantic mapping in dynamic environments

TL;DR: This work proposes a new approach to update maps pertaining to large-scale dynamic environments with semantics with semantics which is able to build a stable representation with only two observations of the environment.
Book ChapterDOI

Adaptive Direct RGB-D Registration and Mapping for Large Motions

TL;DR: This work proposes an activation function based on the conditioning of the RGB and ICP point-to-plane error terms that strengthens the geometric error influence in the first coarse iterations, while the intensity data term dominates in the finer increments.
References
More filters
Proceedings ArticleDOI

On unifying key-frame and voxel-based dense visual SLAM at large scales

TL;DR: An approach to real-time dense localisation and mapping that aims at unifying two different representations commonly used to define dense models, able to perform large scale reconstruction accurately at the scale of mapping a building.
Proceedings ArticleDOI

Fast place recognition with plane-based maps

TL;DR: A new method for recognizing places in indoor environments based on the extraction of planar regions from range data provided by a hand-held RGB-D sensor, working satisfactorily even when there are substantial changes in the scene.
Proceedings ArticleDOI

A spherical robot-centered representation for urban navigation

TL;DR: This paper describes a generic method for vision-based navigation in real urban environments based on spherical images augmented with depth information and a spherical saliency map, both constructed in a learning phase.
Proceedings ArticleDOI

Extrinsic calibration of a set of range cameras in 5 seconds without pattern

TL;DR: This paper proposes a new uncomplicated technique for extrinsic calibration of range cameras that relies on finding and matching planes, providing a versatile solution that is extremely fast and easy to apply.
Proceedings ArticleDOI

Dense visual mapping of large scale environments for real-time localisation

TL;DR: A spherical ego-centric representation of the environment is proposed that is able to reproduce photo-realistic omnidirectional views of captured environments and is used for real-time model-based localisation and navigation.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "A compact spherical rgbd keyframe-based representation" ?

This paper proposes an environmental representation approach based on hybrid metric and topological maps as a key component for mobile robot navigation. With the aim of reducing data redundancy, suppress sensor noise whilst maintaining a dense compact representation of the environment, neighbouring augmented spheres are fused in a single representation.