scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera

TL;DR: Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction, to enable real-time multi-touch interactions anywhere.
Abstract: KinectFusion enables a user holding and moving a standard Kinect camera to rapidly create detailed 3D reconstructions of an indoor scene. Only the depth data from Kinect is used to track the 3D pose of the sensor and reconstruct, geometrically precise, 3D models of the physical scene in real-time. The capabilities of KinectFusion, as well as the novel GPU-based pipeline are described in full. Uses of the core system for low-cost handheld scanning, and geometry-aware augmented reality and physics-based interactions are shown. Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction. These extensions are used to enable real-time multi-touch interactions anywhere, allowing any planar or non-planar reconstructed physical surface to be appropriated for touch.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
26 Oct 2011
TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.
Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

4,184 citations


Cites background from "KinectFusion: real-time 3D reconstr..."

  • ...In [16] we discuss all these possibilities in detail....

    [...]

Journal ArticleDOI
01 Feb 2012-Sensors
TL;DR: The calibration of the Kinect sensor is discussed, and an analysis of the accuracy and resolution of its depth data is provided, based on a mathematical model of depth measurement from disparity.
Abstract: Consumer-grade range cameras such as the Kinect sensor have the potential to be used in mapping applications where accuracy requirements are less strict. To realize this potential insight into the geometric quality of the data acquired by the sensor is essential. In this paper we discuss the calibration of the Kinect sensor, and provide an analysis of the accuracy and resolution of its depth data. Based on a mathematical model of depth measurement from disparity a theoretical error analysis is presented, which provides an insight into the factors influencing the accuracy of the data. Experimental results show that the random error of depth measurement increases with increasing distance to the sensor, and ranges from a few millimeters up to about 4 cm at the maximum range of the sensor. The quality of the data is also found to be influenced by the low resolution of the depth measurements.

1,671 citations


Cites background from "KinectFusion: real-time 3D reconstr..."

  • ...Kinect have attracted the attention of researchers from other fields [3–11] including mapping and 3D modeling [12–15]....

    [...]

Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper introduces an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks, and presents a dataset that enables the train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.
Abstract: Although RGB-D sensors have enabled major break-throughs for several vision tasks, such as 3D reconstruction, we have not attained the same level of success in high-level scene understanding. Perhaps one of the main reasons is the lack of a large-scale benchmark with 3D annotations and 3D evaluation metrics. In this paper, we introduce an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks. Our dataset is captured by four different sensors and contains 10,335 RGB-D images, at a similar scale as PASCAL VOC. The whole dataset is densely annotated and includes 146,617 2D polygons and 64,595 3D bounding boxes with accurate object orientations, as well as a 3D room layout and scene category for each image. This dataset enables us to train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.

1,564 citations


Cites background from "KinectFusion: real-time 3D reconstr..."

  • ...…the recent arrival of affordable depth sensors in consumer markets enables us to acquire reliable depth maps at a very low cost, stimulating breakthroughs in several vision tasks, such as body pose recognition [56, 58], intrinsic image estimation [4], 3D modeling [27] and SfM reconstruction [72]....

    [...]

Journal ArticleDOI
TL;DR: A comprehensive review of recent Kinect-based computer vision algorithms and applications covering topics including preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping.
Abstract: With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.

1,513 citations


Cites background from "KinectFusion: real-time 3D reconstr..."

  • ...The pioneering work of dense point-tracking is termed KinectFusion [100], [101]....

    [...]

Book ChapterDOI
02 Sep 2014
TL;DR: A structured lighting system for creating high-resolution stereo datasets of static indoor scenes with highly accurate ground-truth disparities using novel techniques for efficient 2D subpixel correspondence search and self-calibration of cameras and projectors with modeling of lens distortion is presented.
Abstract: We present a structured lighting system for creating high-resolution stereo datasets of static indoor scenes with highly accurate ground-truth disparities. The system includes novel techniques for efficient 2D subpixel correspondence search and self-calibration of cameras and projectors with modeling of lens distortion. Combining disparity estimates from multiple projector positions we are able to achieve a disparity accuracy of 0.2 pixels on most observed surfaces, including in half-occluded regions. We contribute 33 new 6-megapixel datasets obtained with our system and demonstrate that they present new challenges for the next generation of stereo algorithms.

1,071 citations


Cites background from "KinectFusion: real-time 3D reconstr..."

  • ...Applications range from cultural heritage [21] to interactive 3D modeling [19]....

    [...]

References
More filters
Journal ArticleDOI
Paul J. Besl1, H.D. McKay1
TL;DR: In this paper, the authors describe a general-purpose representation-independent method for the accurate and computationally efficient registration of 3D shapes including free-form curves and surfaces, based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point.
Abstract: The authors describe a general-purpose, representation-independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of 'shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model, prior to shape inspection. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces. >

17,598 citations

Book
01 Jan 2000
TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.
Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

15,558 citations

01 Jan 2001
TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.
Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

14,282 citations


"KinectFusion: real-time 3D reconstr..." refers background in this paper

  • ...Reconstructing geometry using active sensors [16], passive cameras [11, 18], online images [7], or from unordered 3D points [14, 29] are well-studied areas of research in computer graphics and vision....

    [...]

Book
31 Oct 2002
TL;DR: A student or researcher working in mathematics, computer graphics, science, or engineering interested in any dynamic moving front, which might change its topology or develop singularities, will find this book interesting and useful.
Abstract: This book is an introduction to level set methods and dynamic implicit surfaces. These are powerful techniques for analyzing and computing moving fronts in a variety of different settings. While it gives many examples of the utility of the methods to a diverse set of applications, it also gives complete numerical analysis and recipes, which will enable users to quickly apply the techniques to real problems. The book begins with a description of implicit surfaces and their basic properties, then devises the level set geometry and calculus toolbox, including the construction of signed distance functions. Part II adds dynamics to this static calculus. Topics include the level set equation itself, Hamilton-Jacobi equations, motion of a surface normal to itself, re-initialization to a signed distance function, extrapolation in the normal direction, the particle level set method and the motion of co-dimension two (and higher) objects. Part III is concerned with topics taken from the fields of Image Processing and Computer Vision. These include the restoration of images degraded by noise and blur, image segmentation with active contours (snakes), and reconstruction of surfaces from unorganized data points. Part IV is dedicated to Computational Physics. It begins with one phase compressible fluid dynamics, then two-phase compressible flow involving possibly different equations of state, detonation and deflagration waves, and solid/fluid structure interaction. Next it discusses incompressible fluid dynamics, including a computer graphics simulation of smoke, free surface flows, including a computer graphics simulation of water, and fully two-phase incompressible flow. Additional related topics include incompressible flames with applications to computer graphics and coupling a compressible and incompressible fluid. Finally, heat flow and Stefan problems are discussed. A student or researcher working in mathematics, computer graphics, science, or engineering interested in any dynamic moving front, which might change its topology or develop singularities, will find this book interesting and useful.

5,526 citations


"KinectFusion: real-time 3D reconstr..." refers methods in this paper

  • ...Assuming the gradient is orthogonal to the surface interface, the surface normal is computed directly as the derivative of the TSDF at the zero-crossing [22]....

    [...]

  • ...Global 3D vertices are integrated into voxels using a variant of Signed Distance Functions (SDFs) [22], specifying a relative distance to the actual surface....

    [...]

Proceedings ArticleDOI
26 Oct 2011
TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.
Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

4,184 citations


"KinectFusion: real-time 3D reconstr..." refers methods in this paper

  • ...As shown in [21], this allows us to mitigate issues of drift and reduce ICP errors, by tracking directly from the raycasted model as opposed to frame-to-frame ICP tracking....

    [...]

  • ...A full formulation of our method is provided in [21], as well as quantitative evaluation of reconstruction performance....

    [...]