scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Dense visual SLAM for RGB-D cameras

TL;DR: This paper proposes a dense visual SLAM method for RGB-D cameras that minimizes both the photometric and the depth error over all pixels, and proposes an entropy-based similarity measure for keyframe selection and loop closure detection.
Abstract: In this paper, we propose a dense visual SLAM method for RGB-D cameras that minimizes both the photometric and the depth error over all pixels. In contrast to sparse, feature-based methods, this allows us to better exploit the available information in the image data which leads to higher pose accuracy. Furthermore, we propose an entropy-based similarity measure for keyframe selection and loop closure detection. From all successful matches, we build up a graph that we optimize using the g2o framework. We evaluated our approach extensively on publicly available benchmark datasets, and found that it performs well in scenes with low texture as well as low structure. In direct comparison to several state-of-the-art methods, our approach yields a significantly lower trajectory error. We release our software as open-source.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: ORB-SLAM2, a complete simultaneous localization and mapping (SLAM) system for monocular, stereo and RGB-D cameras, including map reuse, loop closing, and relocalization capabilities, is presented, being in most cases the most accurate SLAM solution.
Abstract: We present ORB-SLAM2, a complete simultaneous localization and mapping (SLAM) system for monocular, stereo and RGB-D cameras, including map reuse, loop closing, and relocalization capabilities. The system works in real time on standard central processing units in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our back-end, based on bundle adjustment with monocular and stereo observations, allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches with map points that allow for zero-drift localization. The evaluation on 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields.

3,499 citations


Cites methods from "Dense visual SLAM for RGB-D cameras..."

  • ...In Table III we compare our accuracy to the following state-of-the-art methods: ElasticFusion [15], Kintinuous [12], DVO-SLAM [14] and RGB-D SLAM [13]....

    [...]

  • ...DVO-SLAM also searches for loop candidates in a heuristic fashion over all previous frames, instead of relying on place recognition....

    [...]

  • ...In Table III, we compare our accuracy to the following state-of-the-art methods: ElasticFusion [15], Kintinuous [12], DVO-SLAM [14], and RGB-D SLAM [13]....

    [...]

  • ...Similarly the back-end of DVO-SLAM by Kerl et al. [14] optimizes a pose-graph where keyframe-to-keyframe constraints are computed from a visual odometry that minimizes both photometric and depth error....

    [...]

  • ...[14] optimizes a pose graph where keyframe-to-keyframe constraints are computed from a visual odometry that minimizes both photometric and depth error....

    [...]

Book ChapterDOI
06 Sep 2014
TL;DR: A novel direct tracking method which operates on \(\mathfrak{sim}(3)\), thereby explicitly detecting scale-drift, and an elegant probabilistic solution to include the effect of noisy depth values into tracking are introduced.
Abstract: We propose a direct (feature-less) monocular SLAM algorithm which, in contrast to current state-of-the-art regarding direct methods, allows to build large-scale, consistent maps of the environment Along with highly accurate pose estimation based on direct image alignment, the 3D environment is reconstructed in real-time as pose-graph of keyframes with associated semi-dense depth maps These are obtained by filtering over a large number of pixelwise small-baseline stereo comparisons The explicitly scale-drift aware formulation allows the approach to operate on challenging sequences including large variations in scene scale Major enablers are two key novelties: (1) a novel direct tracking method which operates on \(\mathfrak{sim}(3)\), thereby explicitly detecting scale-drift, and (2) an elegant probabilistic solution to include the effect of noisy depth values into tracking The resulting direct monocular SLAM system runs in real-time on a CPU

3,273 citations


Cites background or methods or result from "Dense visual SLAM for RGB-D cameras..."

  • ...For comparison we show respective results from semi-dense mono-VO [9], keypoint-based mono-SLAM [15], direct RGB-D SLAM [14] and keypoint-based RGB-D SLAM [7]....

    [...]

  • ...from occlusions or reflections, different weighting-schemes [14] have been proposed, resulting in an iteratively reweighted least-squares problem: In each iteration, a weight matrixW = W(ξ) is computed which down-weights large residuals....

    [...]

  • ...In [14], a pose graph based RGB-D SLAM method is proposed, which also incorporates geometric error to allow tracking through scenes with little texture....

    [...]

  • ...Note that [14] and [7] use depth information from the sensor, while the others do not....

    [...]

  • ...While direct image alignment is well-established for RGB-D or stereo sensors [14,4], only recently monocular direct VO algorithms have been proposed: In [24,20,21], accurate and fully dense depth maps are computed using a variational formulation, which however is computationally demanding and requires...

    [...]

Journal ArticleDOI
TL;DR: ORB-SLAM2 as mentioned in this paper is a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities.
Abstract: We present ORB-SLAM2 a complete SLAM system for monocular, stereo and RGB-D cameras, including map reuse, loop closing and relocalization capabilities. The system works in real-time on standard CPUs in a wide variety of environments from small hand-held indoors sequences, to drones flying in industrial environments and cars driving around a city. Our back-end based on bundle adjustment with monocular and stereo observations allows for accurate trajectory estimation with metric scale. Our system includes a lightweight localization mode that leverages visual odometry tracks for unmapped regions and matches to map points that allow for zero-drift localization. The evaluation on 29 popular public sequences shows that our method achieves state-of-the-art accuracy, being in most cases the most accurate SLAM solution. We publish the source code, not only for the benefit of the SLAM community, but with the aim of being an out-of-the-box SLAM solution for researchers in other fields.

2,857 citations

Proceedings ArticleDOI
31 Dec 2015
TL;DR: This system is capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera in an incremental online fashion, without pose graph optimisation or any postprocessing steps.
Abstract: We present a novel approach to real-time dense visual SLAM. Our system is capable of capturing comprehensive dense globally consistent surfel-based maps of room scale environments explored using an RGB-D camera in an incremental online fashion, without pose graph optimisation or any postprocessing steps. This is accomplished by using dense frame-tomodel camera tracking and windowed surfel-based fusion coupled with frequent model refinement through non-rigid surface deformations. Our approach applies local model-to-model surface loop closure optimisations as often as possible to stay close to the mode of the map distribution, while utilising global loop closure to recover from arbitrary drift and maintain global consistency.

754 citations


Cites background or methods from "Dense visual SLAM for RGB-D cameras..."

  • ...applies keyframe-based pose graph optimisation principles to a dense tracking frontend but performs no explicit map reconstruction and functions off of raw keyframes alone [10]....

    [...]

  • ...In Table I we compare our system to four other state-of-the-art RGB-D based SLAM systems; DVO SLAM [10], RGB-D SLAM [5], MRSMap [21] and Kintinuous [25]....

    [...]

  • ...The DVO SLAM system of Kerl et al. applies keyframe-based pose graph optimisation principles to a dense tracking frontend but performs no explicit map reconstruction and functions off of raw keyframes alone [10]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a robust pose estimation strategy is proposed for real-time, high-quality, 3D scanning of large-scale scenes using RGB-D input with an efficient hierarchical approach, which removes heavy reliance on temporal tracking and continually localizes to the globally optimized frames instead.
Abstract: Real-time, high-quality, 3D scanning of large-scale scenes is key to mixed reality and robotic applications. However, scalability brings challenges of drift in pose estimation, introducing significant errors in the accumulated model. Approaches often require hours of offline processing to globally correct model errors. Recent online methods demonstrate compelling results but suffer from (1) needing minutes to perform online correction, preventing true real-time use; (2) brittle frame-to-frame (or frame-to-model) pose estimation, resulting in many tracking failures; or (3) supporting only unstructured point-based representations, which limit scan quality and applicability. We systematically address these issues with a novel, real-time, end-to-end reconstruction framework. At its core is a robust pose estimation strategy, optimizing per frame for a global set of camera poses by considering the complete history of RGB-D input with an efficient hierarchical approach. We remove the heavy reliance on temporal tracking and continually localize to the globally optimized frames instead. We contribute a parallelizable optimization framework, which employs correspondences based on sparse features and dense geometric and photometric matching. Our approach estimates globally optimized (i.e., bundle adjusted) poses in real time, supports robust tracking with recovery from gross tracking failures (i.e., relocalization), and re-estimates the 3D model in real time to ensure global consistency, all within a single framework. Our approach outperforms state-of-the-art online systems with quality on par to offline methods, but with unprecedented speed and scan completeness. Our framework leads to a comprehensive online scanning solution for large indoor environments, enabling ease of use and high-quality results.1

711 citations

References
More filters
Book
Christopher M. Bishop1
17 Aug 2006
TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

22,840 citations

Journal ArticleDOI
TL;DR: This book covers a broad range of topics for regular factorial designs and presents all of the material in very mathematical fashion and will surely become an invaluable resource for researchers and graduate students doing research in the design of factorial experiments.
Abstract: (2007). Pattern Recognition and Machine Learning. Technometrics: Vol. 49, No. 3, pp. 366-366.

18,802 citations


"Dense visual SLAM for RGB-D cameras..." refers background in this paper

  • ...We find the t-distribution to be a suitable model, because it can be interpreted as an infinite mixture of Gaussians with different variances [33]....

    [...]

Journal ArticleDOI
Paul J. Besl1, H.D. McKay1
TL;DR: In this paper, the authors describe a general-purpose representation-independent method for the accurate and computationally efficient registration of 3D shapes including free-form curves and surfaces, based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point.
Abstract: The authors describe a general-purpose, representation-independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of 'shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model, prior to shape inspection. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces. >

17,598 citations

Proceedings ArticleDOI
26 Oct 2011
TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.
Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

4,184 citations


"Dense visual SLAM for RGB-D cameras..." refers background or methods in this paper

  • ...As KinectFusion does not optimize previous camera poses, there is no possibility to correct accumulated errors in the model....

    [...]

  • ...The following columns show the RMSE of the absolute trajectory error for our system, RGB-D SLAM, MRSMap and KinectFusion....

    [...]

  • ...While frame-to-model approaches such as KinectFusion [5], [6] jointly estimate a persistent model of the world, they still accumulate drift (although slower than visual odometry methods), and therefore only work for the reconstruction of a small workspace (such as a desk, or part of a room)....

    [...]

  • ...Finally, we compare our approach to recent state-of-the-art visual SLAM approaches, namely the RGB-D SLAM system [2], [31], the multi-resolution surfel maps (MRSMap) [11], and the PCL implementation of KinectFusion (KinFu) [5]....

    [...]

  • ...proposed to incrementally build a dense model of the scene and registering every new measurement to this model [5], [23]....

    [...]

Proceedings ArticleDOI
13 Nov 2007
TL;DR: A system specifically designed to track a hand-held camera in a small AR workspace, processed in parallel threads on a dual-core computer, that produces detailed maps with thousands of landmarks which can be tracked at frame-rate with accuracy and robustness rivalling that of state-of-the-art model-based systems.
Abstract: This paper presents a method of estimating camera pose in an unknown scene. While this has previously been attempted by adapting SLAM algorithms developed for robotic exploration, we propose a system specifically designed to track a hand-held camera in a small AR workspace. We propose to split tracking and mapping into two separate tasks, processed in parallel threads on a dual-core computer: one thread deals with the task of robustly tracking erratic hand-held motion, while the other produces a 3D map of point features from previously observed video frames. This allows the use of computationally expensive batch optimisation techniques not usually associated with real-time operation: The result is a system that produces detailed maps with thousands of landmarks which can be tracked at frame-rate, with an accuracy and robustness rivalling that of state-of-the-art model-based systems.

4,091 citations


"Dense visual SLAM for RGB-D cameras..." refers background or methods in this paper

  • ...The second group uses batch optimization to refine feature locations and camera poses [27], [28]....

    [...]

  • ...[27], when the number of features visible in both images is below a threshold [1]....

    [...]