Dense visual SLAM for RGB-D cameras

doi:10.1109/IROS.2013.6696650

Home
/
Papers
/
Dense visual SLAM for RGB-D cameras

Proceedings Article•DOI•

Dense visual SLAM for RGB-D cameras

Christian Kerl¹, Jürgen Sturm¹, Daniel Cremers¹•Institutions (1)

Ludwig Maximilian University of Munich¹

01 Nov 2013-pp 2100-2106

TL;DR: This paper proposes a dense visual SLAM method for RGB-D cameras that minimizes both the photometric and the depth error over all pixels, and proposes an entropy-based similarity measure for keyframe selection and loop closure detection.

read less

Abstract: In this paper, we propose a dense visual SLAM method for RGB-D cameras that minimizes both the photometric and the depth error over all pixels. In contrast to sparse, feature-based methods, this allows us to better exploit the available information in the image data which leads to higher pose accuracy. Furthermore, we propose an entropy-based similarity measure for keyframe selection and loop closure detection. From all successful matches, we build up a graph that we optimize using the g2o framework. We evaluated our approach extensively on publicly available benchmark datasets, and found that it performs well in scenes with low texture as well as low structure. In direct comparison to several state-of-the-art methods, our approach yields a significantly lower trajectory error. We release our software as open-source.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras

[...]

Raul Mur-Artal¹, Juan D. Tardós¹•Institutions (1)

University of Zaragoza¹

12 Jun 2017-IEEE Transactions on Robotics

TL;DR: ORB-SLAM2, a complete simultaneous localization and mapping (SLAM) system for monocular, stereo and RGB-D cameras, including map reuse, loop closing, and relocalization capabilities, is presented, being in most cases the most accurate SLAM solution.

...read moreread less

Abstract: We present ORB-SLAM2, a complete simultaneous localization and mapping (SLAM) system for monocular, stereo and RGB-D cameras, including map reuse, loop closing, and relocalization capabilities. The system works in real time on standard central processing units in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our back-end, based on bundle adjustment with monocular and stereo observations, allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches with map points that allow for zero-drift localization. The evaluation on 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields.

...read moreread less

3,499 citations

Cites methods from "Dense visual SLAM for RGB-D cameras..."

...In Table III we compare our accuracy to the following state-of-the-art methods: ElasticFusion [15], Kintinuous [12], DVO-SLAM [14] and RGB-D SLAM [13]....
[...]
...DVO-SLAM also searches for loop candidates in a heuristic fashion over all previous frames, instead of relying on place recognition....
[...]
...In Table III, we compare our accuracy to the following state-of-the-art methods: ElasticFusion [15], Kintinuous [12], DVO-SLAM [14], and RGB-D SLAM [13]....
[...]
...Similarly the back-end of DVO-SLAM by Kerl et al. [14] optimizes a pose-graph where keyframe-to-keyframe constraints are computed from a visual odometry that minimizes both photometric and depth error....
[...]
...[14] optimizes a pose graph where keyframe-to-keyframe constraints are computed from a visual odometry that minimizes both photometric and depth error....
[...]

Book Chapter•DOI•

LSD-SLAM: Large-Scale Direct Monocular SLAM

[...]

Jakob Engel¹, Thomas Schops¹, Daniel Cremers¹•Institutions (1)

Technische Universität München¹

06 Sep 2014

TL;DR: A novel direct tracking method which operates on \(\mathfrak{sim}(3)\), thereby explicitly detecting scale-drift, and an elegant probabilistic solution to include the effect of noisy depth values into tracking are introduced.

...read moreread less

Abstract: We propose a direct (feature-less) monocular SLAM algorithm which, in contrast to current state-of-the-art regarding direct methods, allows to build large-scale, consistent maps of the environment Along with highly accurate pose estimation based on direct image alignment, the 3D environment is reconstructed in real-time as pose-graph of keyframes with associated semi-dense depth maps These are obtained by filtering over a large number of pixelwise small-baseline stereo comparisons The explicitly scale-drift aware formulation allows the approach to operate on challenging sequences including large variations in scene scale Major enablers are two key novelties: (1) a novel direct tracking method which operates on \(\mathfrak{sim}(3)\), thereby explicitly detecting scale-drift, and (2) an elegant probabilistic solution to include the effect of noisy depth values into tracking The resulting direct monocular SLAM system runs in real-time on a CPU

...read moreread less

3,273 citations

Cites background or methods or result from "Dense visual SLAM for RGB-D cameras..."

...For comparison we show respective results from semi-dense mono-VO [9], keypoint-based mono-SLAM [15], direct RGB-D SLAM [14] and keypoint-based RGB-D SLAM [7]....
[...]
...from occlusions or reflections, different weighting-schemes [14] have been proposed, resulting in an iteratively reweighted least-squares problem: In each iteration, a weight matrixW = W(ξ) is computed which down-weights large residuals....
[...]
...In [14], a pose graph based RGB-D SLAM method is proposed, which also incorporates geometric error to allow tracking through scenes with little texture....
[...]
...Note that [14] and [7] use depth information from the sensor, while the others do not....
[...]
...While direct image alignment is well-established for RGB-D or stereo sensors [14,4], only recently monocular direct VO algorithms have been proposed: In [24,20,21], accurate and fully dense depth maps are computed using a variational formulation, which however is computationally demanding and requires...
[...]

Journal Article•DOI•

ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D Cameras

[...]

Raul Mur-Artal¹, Juan D. Tardós¹•Institutions (1)

University of Zaragoza¹

20 Oct 2016-arXiv: Robotics

TL;DR: ORB-SLAM2 as mentioned in this paper is a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities.

...read moreread less

Abstract: We present ORB-SLAM2 a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities. The system works in real-time on standard CPUs in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our back-end based on bundle adjustment with monocular and stereo observations allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches to map points that allow for zero-drift localization. The evaluation on 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields.

...read moreread less

2,857 citations

Proceedings Article•DOI•

ElasticFusion: Dense SLAM Without A Pose Graph

[...]

Thomas Whelan¹, Stefan Leutenegger¹, Renato F. Salas-Moreno¹, Ben Glocker¹, Andrew J. Davison¹ - Show less +1 more•Institutions (1)

Imperial College London¹

31 Dec 2015

TL;DR: This system is capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera in an incremental online fashion, without pose graph optimisation or any postprocessing steps.

...read moreread less

Abstract: We present a novel approach to real-time dense visual SLAM. Our system is capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera in an incremental online fashion, without pose graph optimisation or any postprocessing steps. This is accomplished by using dense frame-tomodel camera tracking and windowed surfel-based fusion coupled with frequent model refinement through non-rigid surface deformations. Our approach applies local model-to-model surface loop closure optimisations as often as possible to stay close to the mode of the map distribution, while utilising global loop closure to recover from arbitrary drift and maintain global consistency.

...read moreread less

754 citations

Cites background or methods from "Dense visual SLAM for RGB-D cameras..."

...applies keyframe-based pose graph optimisation principles to a dense tracking frontend but performs no explicit map reconstruction and functions off of raw keyframes alone [10]....
[...]
...In Table I we compare our system to four other state-of-the-art RGB-D based SLAM systems; DVO SLAM [10], RGB-D SLAM [5], MRSMap [21] and Kintinuous [25]....
[...]
...The DVO SLAM system of Kerl et al. applies keyframe-based pose graph optimisation principles to a dense tracking frontend but performs no explicit map reconstruction and functions off of raw keyframes alone [10]....
[...]

Journal Article•DOI•

BundleFusion: real-time globally consistent 3D reconstruction using on-the-fly surface re-integration

[...]

Angela Dai¹, Matthias Nießner¹, Michael Zollhöfer², Shahram Izadi³, Christian Theobalt² - Show less +1 more•Institutions (3)

Stanford University¹, Max Planck Society², Microsoft³

01 May 2017-ACM Transactions on Graphics

TL;DR: In this paper, a robust pose estimation strategy is proposed for real-time, high-quality, 3D scanning of large-scale scenes using RGB-D input with an efficient hierarchical approach, which removes heavy reliance on temporal tracking and continually localizes to the globally optimized frames instead.

...read moreread less

Abstract: Real-time, high-quality, 3D scanning of large-scale scenes is key to mixed reality and robotic applications. However, scalability brings challenges of drift in pose estimation, introducing significant errors in the accumulated model. Approaches often require hours of offline processing to globally correct model errors. Recent online methods demonstrate compelling results but suffer from (1) needing minutes to perform online correction, preventing true real-time use; (2) brittle frame-to-frame (or frame-to-model) pose estimation, resulting in many tracking failures; or (3) supporting only unstructured point-based representations, which limit scan quality and applicability. We systematically address these issues with a novel, real-time, end-to-end reconstruction framework. At its core is a robust pose estimation strategy, optimizing per frame for a global set of camera poses by considering the complete history of RGB-D input with an efficient hierarchical approach. We remove the heavy reliance on temporal tracking and continually localize to the globally optimized frames instead. We contribute a parallelizable optimization framework, which employs correspondences based on sparse features and dense geometric and photometric matching. Our approach estimates globally optimized (i.e., bundle adjusted) poses in real time, supports robust tracking with recovery from gross tracking failures (i.e., relocalization), and re-estimates the 3D model in real time to ensure global consistency, all within a single framework. Our approach outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness. Our framework leads to a comprehensive online scanning solution for large indoor environments, enabling ease of use and high-quality results.1

...read moreread less

711 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180

Collapse

References

PDF

Open Access

More filters

Book•

Pattern Recognition and Machine Learning

[...]

Christopher M. Bishop¹•Institutions (1)

Microsoft¹

17 Aug 2006

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.

...read moreread less

Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

...read moreread less

22,840 citations

Journal Article•DOI•

Pattern Recognition and Machine Learning

[...]

Radford M. Neal

01 Aug 2007-Technometrics

TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.

...read moreread less

Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

...read moreread less

18,802 citations

"Dense visual SLAM for RGB-D cameras..." refers background in this paper

...We find the t-distribution to be a suitable model, because it can be interpreted as an infinite mixture of Gaussians with different variances [33]....
[...]

Journal Article•DOI•

A method for registration of 3-D shapes

[...]

Paul J. Besl¹, H.D. McKay¹•Institutions (1)

General Motors¹

01 Feb 1992-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, the authors describe a general-purpose representation-independent method for the accurate and computationally efficient registration of 3D shapes including free-form curves and surfaces, based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point.

...read moreread less

Abstract: The authors describe a general-purpose, representation-independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of 'shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model, prior to shape inspection. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces. >

...read moreread less

17,598 citations

Proceedings Article•DOI•

KinectFusion: Real-time dense surface mapping and tracking

[...]

Richard Newcombe¹, Shahram Izadi², Otmar Hilliges², David Molyneaux³, David Kim⁴, Andrew J. Davison¹, Pushmeet Kohi², Jamie Shotton², Steve Hodges⁴, Andrew Fitzgibbon² - Show less +6 more•Institutions (4)

Imperial College London¹, Microsoft², Lancaster University³, Newcastle University⁴

26 Oct 2011

TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.

...read moreread less

Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

...read moreread less

4,184 citations

"Dense visual SLAM for RGB-D cameras..." refers background or methods in this paper

...As KinectFusion does not optimize previous camera poses, there is no possibility to correct accumulated errors in the model....
[...]
...The following columns show the RMSE of the absolute trajectory error for our system, RGB-D SLAM, MRSMap and KinectFusion....
[...]
...While frame-to-model approaches such as KinectFusion [5], [6] jointly estimate a persistent model of the world, they still accumulate drift (although slower than visual odometry methods), and therefore only work for the reconstruction of a small workspace (such as a desk, or part of a room)....
[...]
...Finally, we compare our approach to recent state-of-the-art visual SLAM approaches, namely the RGB-D SLAM system [2], [31], the multi-resolution surfel maps (MRSMap) [11], and the PCL implementation of KinectFusion (KinFu) [5]....
[...]
...proposed to incrementally build a dense model of the scene and registering every new measurement to this model [5], [23]....
[...]

Proceedings Article•DOI•

Parallel Tracking and Mapping for Small AR Workspaces

[...]

Georg Klein¹, David W. Murray¹•Institutions (1)

University of Oxford¹

13 Nov 2007

TL;DR: A system specifically designed to track a hand-held camera in a small AR workspace, processed in parallel threads on a dual-core computer, that produces detailed maps with thousands of landmarks which can be tracked at frame-rate with accuracy and robustness rivalling that of state-of-the-art model-based systems.

...read moreread less

Abstract: This paper presents a method of estimating camera pose in an unknown scene. While this has previously been attempted by adapting SLAM algorithms developed for robotic exploration, we propose a system specifically designed to track a hand-held camera in a small AR workspace. We propose to split tracking and mapping into two separate tasks, processed in parallel threads on a dual-core computer: one thread deals with the task of robustly tracking erratic hand-held motion, while the other produces a 3D map of point features from previously observed video frames. This allows the use of computationally expensive batch optimisation techniques not usually associated with real-time operation: The result is a system that produces detailed maps with thousands of landmarks which can be tracked at frame-rate, with an accuracy and robustness rivalling that of state-of-the-art model-based systems.

...read moreread less

4,091 citations

"Dense visual SLAM for RGB-D cameras..." refers background or methods in this paper

...The second group uses batch optimization to refine feature locations and camera poses [27], [28]....
[...]
...[27], when the number of features visible in both images is below a threshold [1]....
[...]