scispace - formally typeset
Open AccessProceedings ArticleDOI

Semi-dense Visual Odometry for a Monocular Camera

Jakob Engel, +2 more
- pp 1449-1456
TLDR
A fundamentally novel approach to real-time visual odometry for a monocular camera that allows to benefit from the simplicity and accuracy of dense tracking - which does not depend on visual features - while running in real- time on a CPU.
Abstract
We propose a fundamentally novel approach to real-time visual odometry for a monocular camera. It allows to benefit from the simplicity and accuracy of dense tracking - which does not depend on visual features - while running in real-time on a CPU. The key idea is to continuously estimate a semi-dense inverse depth map for the current frame, which in turn is used to track the motion of the camera using dense image alignment. More specifically, we estimate the depth of all pixels which have a non-negligible image gradient. Each estimate is represented as a Gaussian probability distribution over the inverse depth. We propagate this information over time, and update it with new measurements as new images arrive. In terms of tracking accuracy and computational speed, the proposed method compares favorably to both state-of-the-art dense and feature-based visual odometry and SLAM algorithms. As our method runs in real-time on a CPU, it is of large practical value for robotics and augmented reality applications.

read more

Content maybe subject to copyright    Report

Semi-Dense Visual Odometry for a Monocular Camera
Jakob Engel, J
¨
urgen Sturm, Daniel Cremers
TU M
¨
unchen, Germany
Abstract
We propose a fundamentally novel approach to real-time
visual odometry for a monocular camera. It allows to ben-
efit from the simplicity and accuracy of dense tracking
which does not depend on visual features while running
in real-time on a CPU. The key idea is to continuously esti-
mate a semi-dense inverse depth map for the current frame,
which in turn is used to track the motion of the camera using
dense image alignment. More specifically, we estimate the
depth of all pixels which have a non-negligible image gradi-
ent. Each estimate is represented as a Gaussian probability
distribution over the inverse depth. We propagate this in-
formation over time, and update it with new measurements
as new images arrive. In terms of tracking accuracy and
computational speed, the proposed method compares favor-
ably to both state-of-the-art dense and feature-based visual
odometry and SLAM algorithms. As our method runs in
real-time on a CPU, it is of large practical value for robotics
and augmented reality applications.
1. Towards Dense Monocular Visual Odometry
Tracking a hand-held camera and recovering the three-
dimensional structure of the environment in real-time is
among the most prominent challenges in computer vision.
In the last years, dense approaches to these challenges have
become increasingly popular: Instead of operating solely
on visual feature positions, they reconstruct and track on
the whole image using a surface-based map and thereby
are fundamentally different from feature-based approaches.
Yet, these methods are to date either not real-time capable
on standard CPUs [11, 15, 17] or require direct depth mea-
surements from the sensor [7], making them unsuitable for
many practical applications.
In this paper, we propose a novel semi-dense visual
odometry approach for a monocular camera, which com-
bines the accuracy and robustness of dense approaches
with the efficiency of feature-based methods. Further, it
computes highly accurate semi-dense depth maps from the
monocular images, providing rich information about the 3D
This work was supported by the ERC Starting Grant ConvexVision
and the DFG project Mapping on Demand
far
close
Figure 1. Semi-Dense Monocular Visual Odometry: Our ap-
proach works on a semi-dense inverse depth map and combines the
accuracy and robustness of dense visual SLAM methods with the
efficiency of feature-based techniques. Left: video frame, Right:
color-coded semi-dense depth map, which consists of depth esti-
mates in all image regions with sufficient structure.
structure of the environment. We use the term visual odom-
etry as supposed to SLAM, as for simplicity we deliber-
ately maintain only information about the currently visible
scene, instead of building a global world-model.
1.1. Related Work
Feature-based monocular SLAM. In all feature-based
methods (such as [4, 8]), tracking and mapping consists of
two separate steps: First, discrete feature observations (i.e.,
their locations in the image) are extracted and matched to
each other. Second, the camera and the full feature poses
are calculated from a set of such observations disregard-
ing the images themselves. While this preliminary abstrac-
tion step greatly reduces the complexity of the overall prob-
lem and allows it to be tackled in real time, it inherently
comes with two significant drawbacks: First, only image
information conforming to the respective feature type and
parametrization typically image corners and blobs [6] or
line segments [9] is utilized. Second, features have to
be matched to each other, which often requires the costly
computation of scale- and rotation-invariant descriptors and
robust outlier estimation methods like RANSAC.
Dense monocular SLAM. To overcome these limitations
and to better exploit the available image information, dense
monocular SLAM methods [11, 17] have recently been pro-
posed. The fundamental difference to keypoint-based ap-
proaches is that these methods directly work on the images
1

instead of a set of extracted features, for both mapping and
tracking: The world is modeled as dense surface while in
turn new frames are tracked using whole-image alignment.
This concept removes the need for discrete features, and
allows to exploit all information present in the image, in-
creasing tracking accuracy and robustness. To date how-
ever, doing this in real-time is only possible using modern,
powerful GPU processors.
Similar methods are broadly used in combination with
RGB-D cameras [7], which directly measure the depth of
each pixel, or stereo camera rigs [3] greatly reducing the
complexity of the problem.
Dense multi-view stereo. Significant prior work exists on
multi-view dense reconstruction, both in a real-time setting
[13, 11, 15], as well as off-line [5, 14]. In particular for off-
line reconstruction, there is a long history of using different
baselines to steer the stereo-inherent trade-off between ac-
curacy and precision [12]. Most similar to our approach is
the early work of Matthies et al., who proposed probabilis-
tic depth map fusion and propagation for image sequences
[10], however only for structure from motion, i.e., not cou-
pled with subsequent dense tracking.
1.2. Contributions
In this paper, we propose a novel semi-dense approach to
monocular visual odometry, which does not require feature
points. The key concepts are
a probabilistic depth map representation,
tracking based on whole-image alignment,
the reduction on image-regions which carry informa-
tion (semi-dense), and
the full incorporation of stereo measurement uncer-
tainty.
To the best of our knowledge, this is the first featureless,
real-time monocular visual odometry approach, which runs
in real-time on a CPU.
1.3. Method Outline
Our approach is partially motivated by the basic princi-
ple that for most real-time applications, video information
is abundant and cheap to come by. Therefore, the computa-
tional budget should be spent such that the expected infor-
mation gain is maximized. Instead of reducing the images
to a sparse set of feature observations however, our method
continuously estimates a semi-dense inverse depth map for
the current frame, i.e., a dense depth map covering all image
regions with non-negligible gradient (see Fig. 2). It is com-
prised of one inverse depth hypothesis per pixel modeled
by a Gaussian probability distribution. This representation
still allows to use whole-image alignment [7] to track new
far
close
original image semi-dense depth map (ours)
keypoint depth map [8] dense depth map [11] RGB-D camera [16]
Figure 2. Semi-Dense Approach: Our approach reconstructs and
tracks on a semi-dense inverse depth map, which is dense in all
image regions carrying information (top-right). For comparison,
the bottom row shows the respective result from a keypoint-based
approach, a fully dense approach and the ground truth from an
RGB-D camera.
frames, while at the same time greatly reducing computa-
tional complexity compared to volumetric methods. The
estimated depth map is propagated from frame to frame,
and updated with variable-baseline stereo comparisons. We
explicitly use prior knowledge about a pixel’s depth to se-
lect a suitable reference frame on a per-pixel basis, and to
limit the disparity search range.
The remainder of this paper is organized as follows: Sec-
tion 2 describes the semi-dense mapping part of the pro-
posed method, including the derivation of the observation
accuracy as well as the probabilistic data fusion, propaga-
tion and regularization steps. Section 3 describes how new
frames are tracked using whole-image alignment, and Sec. 4
summarizes the complete visual odometry method. A qual-
itative as well as a quantitative evaluation is presented in
Sec. 5. We then give a brief conclusion in Sec. 6.
2. Semi-Dense Depth Map Estimation
One of the key ideas proposed in this paper is to esti-
mate a semi-dense inverse depth map for the current cam-
era image, which in turn can be used for estimating the
camera pose of the next frame. This depth map is continu-
ously propagated from frame to frame, and refined with new
stereo depth measurements, which are obtained by perform-
ing per-pixel, adaptive-baseline stereo comparisons. This
allows us to accurately estimate the depth both of close-by
and far-away image regions. In contrast to previous work
that accumulates the photometric cost over a sequence of
several frames [11, 15], we keep exactly one inverse depth
hypothesis per pixel that we represent as Gaussian proba-
bility distribution.
This section is comprised of three main parts: Sec-

reference small baseline medium baseline large baseline
0 0.05 0.1 0.15 0.2 0.25 0.3
0
1
2
small
medium
large
cost
inverse depth d
Figure 3. Variable Baseline Stereo: Reference image (left), three
stereo images at different baselines (right), and the respective
matching cost functions. While a small baseline (black) gives a
unique, but imprecise minimum, a large baseline (red) allows for
a very precise estimate, but has many false minima.
tion 2.1 describes the stereo method used to extract new
depth measurements from previous frames, and how they
are incorporated into the prior depth map. In Sec. 2.2, we
describe how the depth map is propagated from frame to
frame. In Sec. 2.3, we detail how we partially regularize
the obtained depth map in each iteration, and how outliers
are handled. Throughout this section, d denotes the inverse
depth of a pixel.
2.1. Stereo-Based Depth Map Update
It is well known [12] that for stereo, there is a trade-off
between precision and accuracy (see Fig. 3). While many
multiple-baseline stereo approaches resolve this by accu-
mulating the respective cost functions over many frames
[5, 13], we propose a probabilistic approach which ex-
plicitly takes advantage of the fact that in a video, small-
baseline frames are available before large-baseline frames.
The full depth map update (performed once for each new
frame) consists of the following steps: First, a subset of pix-
els is selected for which the accuracy of a disparity search
is sufficiently large. For this we use three intuitive and
very efficiently computable criteria, which will be derived
in Sec. 2.1.3. For each selected pixel, we then individu-
ally select a suitable reference frame, and perform a one-
dimensional disparity search. Propagated prior knowledge
is used to reduce the disparity search range when possible,
decreasing computational cost and eliminating false min-
ima. The obtained inverse depth estimate is then fused into
the depth map.
2.1.1 Reference Frame Selection
Ideally, the reference frame is chosen such that it max-
imizes the stereo accuracy, while keeping the disparity
search range as well as the observation angle sufficiently
current frame pixel’s “age”
-4.8 s -3.9 s -3.1 s -2.2 s
-1.2 s -0.8 s -0.5 s -0.4 s
Figure 4. Adaptive Baseline Selection: For each pixel in the
new frame (top left), a different stereo-reference frame is selected,
based on how long the pixel was visible (top right: the more yel-
low, the older the pixel.). Some of the reference frames are dis-
played below, the red regions were used for stereo comparisons.
small. As the stereo accuracy depends on many factors and
because this selection is done for each pixel independently,
we employ the following heuristic: We use the oldest frame
the pixel was observed in, where the disparity search range
and the observation angle do not exceed a certain threshold
(see Fig. 4). If a disparity search is unsuccessful (i.e., no
good match is found), the pixel’s “age” is increased, such
that subsequent disparity searches use newer frames where
the pixel is likely to be still visible.
2.1.2 Stereo Matching Method
We perform an exhaustive search for the pixel’s intensity
along the epipolar line in the selected reference frame, and
then perform a sub-pixel accurate localization of the match-
ing disparity. If a prior inverse depth hypothesis is avail-
able, the search interval is limited by d ± 2σ
d
, where d and
σ
d
denote the mean and standard deviation of the prior hy-
pothesis. Otherwise, the full disparity range is searched.
In our implementation, we use the SSD error over ve
equidistant points on the epipolar line: While this signifi-
cantly increases robustness in high-frequent image regions,
it does not change the purely one-dimensional nature of this
search. Furthermore, it is computationally efficient, as 4 out
of 5 interpolated image values can be re-used for each SSD
evaluation.
2.1.3 Uncertainty Estimation
In this section, we use uncertainty propagation to derive an
expression for the error variance σ
2
d
on the inverse depth d.

In general this can be done by expressing the optimal in-
verse depth d
as a function of the noisy inputs here we
consider the images I
0
, I
1
themselves, their relative orien-
tation ξ and the camera calibration in terms of a projection
function π
1
d
= d(I
0
, I
1
, ξ, π). (1)
The error-variance of d
is then given by
σ
2
d
= J
d
ΣJ
T
d
, (2)
where J
d
is the Jacobian of d, and Σ the covariance of the
input-error. For more details on covariance propagation, in-
cluding the derivation of this formula, we refer to [2]. For
simplicity, the following analysis is performed for patch-
free stereo, i.e., we consider only a point-wise search for a
single intensity value along the epipolar line.
For this analysis, we split the computation into three
steps: First, the epipolar line in the reference frame is com-
puted. Second, the best matching position λ
R along it
(i.e., the disparity) is determined. Third, the inverse depth
d
is computed from the disparity λ
. The first two steps
involve two independent error sources: the geometric error,
which originates from noise on ξ and π and affects the first
step, and the photometric error, which originates from noise
in the images I
0
, I
1
and affects the second step. The third
step scales these errors by a factor, which depends on the
baseline.
Geometric disparity error. The geometric error is the er-
ror
λ
on the disparity λ
caused by noise on ξ and π. While
it would be possible to model, propagate, and estimate the
complete covariance on ξ and π, we found that the gain
in accuracy does not justify the increase in computational
complexity. We therefore use an intuitive approximation:
Let the considered epipolar line segment L R
2
be de-
fined by
L :=
n
l
0
+ λ
l
x
l
y
| λ S
o
, (3)
where λ is the disparity with search interval S, (l
x
, l
y
)
T
the
normalized epipolar line direction and l
0
the point corre-
sponding to infinite depth. We now assume that only the
absolute position of this line segment, i.e., l
0
is subject to
isotropic Gaussian noise
l
. As in practice we keep the
searched epipolar line segments short, the influence of rota-
tional error is small, making this a good approximation.
Intuitively, a positioning error
l
on the epipolar line
causes a small disparity error
λ
if the epipolar line is par-
allel to the image gradient, and a large one otherwise (see
Fig. 5). This can be mathematically derived as follows: The
image constrains the optimal disparity λ
to lie on a certain
isocurve, i.e. a curve of equal intensity. We approximate
1
In the linear case, this is the camera matrix K in practice however,
nonlinear distortion and other (unmodeled) effects also play a role.
L
λ
l
g, l
L
λ
l
l
g
Figure 5. Geometric Disparity Error: Influence of a small posi-
tioning error
l
of the epipolar line on the disparity error
λ
. The
dashed line represents the isocurve on which the matching point
has to lie.
λ
is small if the epipolar line is parallel to the image
gradient (left), and a large otherwise (right).
this isocurve to be locally linear, i.e. the gradient direction
to be locally constant. This gives
l
0
+ λ
l
x
l
y
!
= g
0
+ γ
g
y
g
x
, γ R (4)
where g := (g
x
, g
y
) is the image gradient and g
0
a point on
the isoline. The influence of noise on the image values will
be derived in the next paragraph, hence at this point g and
g
0
are assumed noise-free. Solving for λ gives the optimal
disparity λ
in terms of the noisy input l
0
:
λ
(l
0
) =
hg, g
0
l
0
i
hg, li
(5)
Analogously to (2), the variance of the geometric disparity
error can then be expressed as
σ
2
λ(ξ,π)
= J
λ
(l
0
)
σ
2
l
0
0 σ
2
l
J
T
λ
(l
0
)
=
σ
2
l
hg, li
2
, (6)
where g is the normalized image gradient, l the normalized
epipolar line direction and σ
2
l
the variance of
l
. Note that
this error term solely originates from noise on the relative
camera orientation ξ and the camera calibration π, i.e., it is
independent of image intensity noise.
Photometric disparity error. Intuitively, this error en-
codes that small image intensity errors have a large effect
on the estimated disparity if the image gradient is small, and
a small effect otherwise (see Fig. 6). Mathematically, this
relation can be derived as follows. We seek the disparity λ
that minimizes the difference in intensities, i.e.,
λ
= min
λ
(i
ref
I
p
(λ))
2
, (7)
where i
ref
is the reference intensity, and I
p
(λ) the image in-
tensity on the epipolar line at disparity λ. We assume a good
initialization λ
0
to be available from the exhaustive search.
Using a first-order Taylor approximation for I
p
gives
λ
(I) = λ
0
+ (i
ref
I
p
(λ
0
)) g
1
p
, (8)
where g
p
is the gradient of I
p
, that is image gradient along
the epipolar line. For clarity we only consider noise on i
ref
and I
p
(λ
0
); equivalent results are obtained in the general
case when taking into account noise on the image values
involved in the computation of g
p
. The variance of the pho-

i
λ
I
p
i
λ
i
λ
I
p
i
λ
Figure 6. Photometric Disparity Error: Noise
i
on the image
intensity values causes a small disparity error
λ
if the image gra-
dient along the epipolar line is large (left). If the gradient is small,
the disparity error is magnified (right).
tometric disparity error is given by
σ
2
λ(I)
= J
λ
(I)
σ
2
i
0
0 σ
2
i
J
λ
(I)
=
2σ
2
i
g
2
p
, (9)
where σ
2
i
is the variance of the image intensity noise. The
respective error originates solely from noisy image intensity
values, and hence is independent of the geometric disparity
error.
Pixel to inverse depth conversion. Using that, for small
camera rotation, the inverse depth d is approximately pro-
portional to the disparity λ, the observation variance of the
inverse depth σ
2
d,obs
can be calculated using
σ
2
d,obs
= α
2
σ
2
λ(ξ,π)
+ σ
2
λ(I)
, (10)
where the proportionality constant α in the general, non-
rectified case is different for each pixel, and can be calcu-
lated from
α :=
δ
d
δ
λ
, (11)
where δ
d
is the length of the searched inverse depth inter-
val, and δ
λ
the length of the searched epipolar line segment.
While α is inversely linear in the length of the camera trans-
lation, it also depends on the translation direction and the
pixel’s location in the image.
When using an SSD error over multiple points along the
epipolar line as our implementation does a good upper
bound for the matching uncertainty is then given by
σ
2
d,obs-SSD
α
2
min{σ
2
λ(ξ,π)
} + min{σ
2
λ(I)
}
, (12)
where the min goes over all points included in the SSD er-
ror.
2.1.4 Depth Observation Fusion
After a depth observation for a pixel in the current image
has been obtained, we integrate it into the depth map as fol-
lows: If no prior hypothesis for a pixel exists, we initialize
it directly with the observation. Otherwise, the new obser-
vation is incorporated into the prior, i.e., the two distribu-
tions are multiplied (corresponding to the update step in a
Kalman filter): Given a prior distribution N (d
p
, σ
2
p
) and a
noisy observation N(d
o
, σ
2
o
), the posterior is given by
N
σ
2
p
d
o
+ σ
2
o
d
p
σ
2
p
+ σ
2
o
,
σ
2
p
σ
2
o
σ
2
p
+ σ
2
o
!
. (13)
2.1.5 Summary of Uncertainty-Aware Stereo
New stereo observations are obtained on a per-pixel ba-
sis, adaptively selecting for each pixel a suitable reference
frame and performing a one-dimensional search along the
epipolar line. We identified the three major factors which
determine the accuracy of such a stereo observation, i.e.,
the photometric disparity error σ
2
λ(ξ,π)
, depending
on the magnitude of the image gradient along the
epipolar line,
the geometric disparity error σ
2
λ(I)
, depending on the
angle between the image gradient and the epipolar line
(independent of the gradient magnitude), and
the pixel to inverse depth ratio α, depending on the
camera translation, the focal length and the pixel’s po-
sition.
These three simple-to-compute and purely local criteria are
used to determine for which pixel a stereo update is worth
the computational cost. Further, the computed observation
variance is then used to integrate the new measurements into
the existing depth map.
2.2. Depth Map Propagation
We continuously propagate the estimated inverse depth
map from frame to frame, once the camera position of the
next frame has been estimated. Based on the inverse depth
estimate d
0
for a pixel, the corresponding 3D point is calcu-
lated and projected into the new frame, providing an inverse
depth estimate d
1
in the new frame. The hypothesis is then
assigned to the closest integer pixel position to eliminate
discretization errors, the sub-pixel accurate image location
of the projected point is kept, and re-used for the next prop-
agation step.
For propagating the inverse depth variance, we assume
the camera rotation to be small. The new inverse depth d
1
can then be approximated by
d
1
(d
0
) = (d
1
0
t
z
)
1
, (14)
where t
z
is the camera translation along the optical axis.
The variance of d
1
is hence given by
σ
2
d
1
= J
d
1
σ
2
d
0
J
T
d
1
+ σ
2
p
=
d
1
d
0
4
σ
2
d
0
+ σ
2
p
, (15)
where σ
2
p
is the prediction uncertainty, which directly cor-
responds to the prediction step in an extended Kalman fil-
ter. It can also be interpreted as keeping the variance on

Figures
Citations
More filters
Book ChapterDOI

LSD-SLAM: Large-Scale Direct Monocular SLAM

TL;DR: A novel direct tracking method which operates on \(\mathfrak{sim}(3)\), thereby explicitly detecting scale-drift, and an elegant probabilistic solution to include the effect of noisy depth values into tracking are introduced.
Proceedings ArticleDOI

SVO: Fast semi-direct monocular visual odometry

TL;DR: A semi-direct monocular visual odometry algorithm that is precise, robust, and faster than current state-of-the-art methods and applied to micro-aerial-vehicle state-estimation in GPS-denied environments is proposed.
Journal ArticleDOI

SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems

TL;DR: A semidirect VO that uses direct methods to track and triangulate pixels that are characterized by high image gradients, but relies on proven feature-based methods for joint optimization of structure and motion is proposed.
Journal ArticleDOI

BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration

TL;DR: In this paper, a robust pose estimation strategy is proposed for real-time, high-quality, 3D scanning of large-scale scenes using RGB-D input with an efficient hierarchical approach, which removes heavy reliance on temporal tracking and continually localizes to the globally optimized frames instead.
Proceedings ArticleDOI

CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction

TL;DR: A method where CNN-predicted dense depth maps are naturally fused together with depth measurements obtained from direct monocular SLAM, based on a scheme that privileges depth prediction in image locations where monocularSLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa.
References
More filters
Proceedings ArticleDOI

A Combined Corner and Edge Detector

TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Proceedings ArticleDOI

Parallel Tracking and Mapping for Small AR Workspaces

TL;DR: A system specifically designed to track a hand-held camera in a small AR workspace, processed in parallel threads on a dual-core computer, that produces detailed maps with thousands of landmarks which can be tracked at frame-rate with accuracy and robustness rivalling that of state-of-the-art model-based systems.
Journal ArticleDOI

MonoSLAM: Real-Time Single Camera SLAM

TL;DR: The first successful application of the SLAM methodology from mobile robotics to the "pure vision" domain of a single uncontrolled camera, achieving real time but drift-free performance inaccessible to structure from motion approaches is presented.
Journal ArticleDOI

Lucas-Kanade 20 Years On: A Unifying Framework

TL;DR: In this paper, a wide variety of extensions have been made to the original formulation of the Lucas-Kanade algorithm and their extensions can be used with the inverse compositional algorithm without any significant loss of efficiency.
Proceedings ArticleDOI

A benchmark for the evaluation of RGB-D SLAM systems

TL;DR: A large set of image sequences from a Microsoft Kinect with highly accurate and time-synchronized ground truth camera poses from a motion capture system is recorded for the evaluation of RGB-D SLAM systems.
Related Papers (5)
Frequently Asked Questions (4)
Q1. What are the contributions in "Semi-dense visual odometry for a monocular camera∗" ?

The authors propose a fundamentally novel approach to real-time visual odometry for a monocular camera. In this paper, the authors propose a novel semi-dense visual odometry approach for a monocular camera, which combines the accuracy and robustness of dense approaches with the efficiency of feature-based methods. This work was supported by the ERC Starting Grant ConvexVision and the DFG project Mapping on Demand far Further, it computes highly accurate semi-dense depth maps from the monocular images, providing rich information about the 3D ∗ 

The key concepts are• a probabilistic depth map representation, • tracking based on whole-image alignment, • the reduction on image-regions which carry informa-tion (semi-dense), and• the full incorporation of stereo measurement uncertainty. 

In this paper, the authors propose a novel semi-dense visual odometry approach for a monocular camera, which combines the accuracy and robustness of dense approaches with the efficiency of feature-based methods. 

This work was supported by the ERC Starting Grant ConvexVision and the DFG project Mapping on Demandstructure of the environment.