KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera

doi:10.1145/2047196.2047270

Home
/
Papers
/
KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera

Proceedings Article•DOI•

KinectFusion: real-time 3D reconstruction and interaction using a moving depth camera

Shahram Izadi¹, David Kim¹, Otmar Hilliges¹, David Molyneaux¹, Richard Newcombe², Pushmeet Kohli¹, Jamie Shotton¹, Steve Hodges¹, Dustin Freeman³, Andrew J. Davison², Andrew Fitzgibbon¹ - Show less +7 more•Institutions (3)

Microsoft¹, Imperial College London², University of Toronto³

16 Oct 2011-pp 559-568

TL;DR: Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction, to enable real-time multi-touch interactions anywhere.

read less

Abstract: KinectFusion enables a user holding and moving a standard Kinect camera to rapidly create detailed 3D reconstructions of an indoor scene. Only the depth data from Kinect is used to track the 3D pose of the sensor and reconstruct, geometrically precise, 3D models of the physical scene in real-time. The capabilities of KinectFusion, as well as the novel GPU-based pipeline are described in full. Uses of the core system for low-cost handheld scanning, and geometry-aware augmented reality and physics-based interactions are shown. Novel extensions to the core GPU pipeline demonstrate object segmentation and user interaction directly in front of the sensor, without degrading camera tracking or reconstruction. These extensions are used to enable real-time multi-touch interactions anywhere, allowing any planar or non-planar reconstructed physical surface to be appropriated for touch.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

KinectFusion: Real-time dense surface mapping and tracking

[...]

Richard Newcombe¹, Shahram Izadi², Otmar Hilliges², David Molyneaux³, David Kim⁴, Andrew J. Davison¹, Pushmeet Kohi², Jamie Shotton², Steve Hodges⁴, Andrew Fitzgibbon² - Show less +6 more•Institutions (4)

Imperial College London¹, Microsoft², Lancaster University³, Newcastle University⁴

26 Oct 2011

TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.

...read moreread less

Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

...read moreread less

4,184 citations

Cites background from "KinectFusion: real-time 3D reconstr..."

...In [16] we discuss all these possibilities in detail....
[...]

Journal Article•DOI•

Accuracy and Resolution of Kinect Depth Data for Indoor Mapping Applications

[...]

Kourosh Khoshelham¹, Sander Oude Elberink²•Institutions (2)

University of Twente¹, ITC Enschede²

01 Feb 2012-Sensors

TL;DR: The calibration of the Kinect sensor is discussed, and an analysis of the accuracy and resolution of its depth data is provided, based on a mathematical model of depth measurement from disparity.

...read moreread less

Abstract: Consumer-grade range cameras such as the Kinect sensor have the potential to be used in mapping applications where accuracy requirements are less strict. To realize this potential insight into the geometric quality of the data acquired by the sensor is essential. In this paper we discuss the calibration of the Kinect sensor, and provide an analysis of the accuracy and resolution of its depth data. Based on a mathematical model of depth measurement from disparity a theoretical error analysis is presented, which provides an insight into the factors influencing the accuracy of the data. Experimental results show that the random error of depth measurement increases with increasing distance to the sensor, and ranges from a few millimeters up to about 4 cm at the maximum range of the sensor. The quality of the data is also found to be influenced by the low resolution of the depth measurements.

...read moreread less

1,671 citations

Cites background from "KinectFusion: real-time 3D reconstr..."

...Kinect have attracted the attention of researchers from other fields [3–11] including mapping and 3D modeling [12–15]....
[...]

Proceedings Article•DOI•

SUN RGB-D: A RGB-D scene understanding benchmark suite

[...]

Shuran Song¹, Samuel P. Lichtenberg¹, Jianxiong Xiao¹•Institutions (1)

Princeton University¹

07 Jun 2015

TL;DR: This paper introduces an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks, and presents a dataset that enables the train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.

...read moreread less

Abstract: Although RGB-D sensors have enabled major break-throughs for several vision tasks, such as 3D reconstruction, we have not attained the same level of success in high-level scene understanding. Perhaps one of the main reasons is the lack of a large-scale benchmark with 3D annotations and 3D evaluation metrics. In this paper, we introduce an RGB-D benchmark suite for the goal of advancing the state-of-the-arts in all major scene understanding tasks. Our dataset is captured by four different sensors and contains 10,335 RGB-D images, at a similar scale as PASCAL VOC. The whole dataset is densely annotated and includes 146,617 2D polygons and 64,595 3D bounding boxes with accurate object orientations, as well as a 3D room layout and scene category for each image. This dataset enables us to train data-hungry algorithms for scene-understanding tasks, evaluate them using meaningful 3D metrics, avoid overfitting to a small testing set, and study cross-sensor bias.

...read moreread less

1,564 citations

Cites background from "KinectFusion: real-time 3D reconstr..."

...…the recent arrival of affordable depth sensors in consumer markets enables us to acquire reliable depth maps at a very low cost, stimulating breakthroughs in several vision tasks, such as body pose recognition [56, 58], intrinsic image estimation [4], 3D modeling [27] and SfM reconstruction [72]....
[...]

Journal Article•DOI•

Enhanced Computer Vision With Microsoft Kinect Sensor: A Review

[...]

Jungong Han, Ling Shao¹, Dong Xu², Jamie Shotton³•Institutions (3)

Nanjing University of Information Science and Technology¹, Nanyang Technological University², Microsoft³

25 Jun 2013-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: A comprehensive review of recent Kinect-based computer vision algorithms and applications covering topics including preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping.

...read moreread less

Abstract: With the invention of the low-cost Microsoft Kinect sensor, high-resolution depth and visual (RGB) sensing has become available for widespread use. The complementary nature of the depth and visual information provided by the Kinect sensor opens up new opportunities to solve fundamental problems in computer vision. This paper presents a comprehensive review of recent Kinect-based computer vision algorithms and applications. The reviewed approaches are classified according to the type of vision problems that can be addressed or enhanced by means of the Kinect sensor. The covered topics include preprocessing, object tracking and recognition, human activity analysis, hand gesture analysis, and indoor 3-D mapping. For each category of methods, we outline their main algorithmic contributions and summarize their advantages/differences compared to their RGB counterparts. Finally, we give an overview of the challenges in this field and future research trends. This paper is expected to serve as a tutorial and source of references for Kinect-based computer vision researchers.

...read moreread less

1,513 citations

Cites background from "KinectFusion: real-time 3D reconstr..."

...The pioneering work of dense point-tracking is termed KinectFusion [100], [101]....
[...]

Book Chapter•DOI•

High-Resolution Stereo Datasets with Subpixel-Accurate Ground Truth

[...]

Daniel Scharstein¹, Heiko Hirschmüller², York Kitajima¹, Greg Krathwohl¹, Nera Nešić³, Xi Wang¹, Porter Westling - Show less +3 more•Institutions (3)

Middlebury College¹, German Aerospace Center², Reykjavík University³

02 Sep 2014

TL;DR: A structured lighting system for creating high-resolution stereo datasets of static indoor scenes with highly accurate ground-truth disparities using novel techniques for efficient 2D subpixel correspondence search and self-calibration of cameras and projectors with modeling of lens distortion is presented.

...read moreread less

Abstract: We present a structured lighting system for creating high-resolution stereo datasets of static indoor scenes with highly accurate ground-truth disparities. The system includes novel techniques for efficient 2D subpixel correspondence search and self-calibration of cameras and projectors with modeling of lens distortion. Combining disparity estimates from multiple projector positions we are able to achieve a disparity accuracy of 0.2 pixels on most observed surfaces, including in half-occluded regions. We contribute 33 new 6-megapixel datasets obtained with our system and demonstrate that they present new challenges for the next generation of stereo algorithms.

...read moreread less

1,071 citations

Cites background from "KinectFusion: real-time 3D reconstr..."

...Applications range from cultural heritage [21] to interactive 3D modeling [19]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

A method for registration of 3-D shapes

[...]

Paul J. Besl¹, H.D. McKay¹•Institutions (1)

General Motors¹

01 Feb 1992-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this paper, the authors describe a general-purpose representation-independent method for the accurate and computationally efficient registration of 3D shapes including free-form curves and surfaces, based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point.

...read moreread less

Abstract: The authors describe a general-purpose, representation-independent method for the accurate and computationally efficient registration of 3-D shapes including free-form curves and surfaces. The method handles the full six degrees of freedom and is based on the iterative closest point (ICP) algorithm, which requires only a procedure to find the closest point on a geometric entity to a given point. The ICP algorithm always converges monotonically to the nearest local minimum of a mean-square distance metric, and the rate of convergence is rapid during the first few iterations. Therefore, given an adequate set of initial rotations and translations for a particular class of objects with a certain level of 'shape complexity', one can globally minimize the mean-square distance metric over all six degrees of freedom by testing each initial registration. One important application of this method is to register sensed data from unfixtured rigid objects with an ideal geometric model, prior to shape inspection. Experimental results show the capabilities of the registration algorithm on point sets, curves, and surfaces. >

...read moreread less

17,598 citations

Book•

Multiple view geometry in computer vision

[...]

Richard Hartley¹, Andrew Zisserman²•Institutions (2)

Australian National University¹, University of Oxford²

01 Jan 2000

TL;DR: In this article, the authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly in a unified framework, including geometric principles and how to represent objects algebraically so they can be computed and applied.

...read moreread less

Abstract: From the Publisher: A basic problem in computer vision is to understand the structure of a real world scene given several images of it. Recent major developments in the theory and practice of scene reconstruction are described in detail in a unified framework. The book covers the geometric principles and how to represent objects algebraically so they can be computed and applied. The authors provide comprehensive background material and explain how to apply the methods and implement the algorithms directly.

...read moreread less

15,558 citations

Multiple View Geometry in Computer Vision.

[...]

Bernhard P. Wrobel

01 Jan 2001

TL;DR: This book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts and it will show the best book collections and completed collections.

...read moreread less

Abstract: Downloading the book in this website lists can give you more advantages. It will show you the best book collections and completed collections. So many books can be found in this website. So, this is not only this multiple view geometry in computer vision. However, this book is referred to read because it is an inspiring book to give you more chance to get experiences and also thoughts. This is simple, read the soft file of the book and you get it.

...read moreread less

14,282 citations

"KinectFusion: real-time 3D reconstr..." refers background in this paper

...Reconstructing geometry using active sensors [16], passive cameras [11, 18], online images [7], or from unordered 3D points [14, 29] are well-studied areas of research in computer graphics and vision....
[...]

Book•

Level Set Methods and Dynamic Implicit Surfaces

[...]

Stanley Osher¹, Ronald Fedkiw²•Institutions (2)

University of California, Los Angeles¹, Stanford University²

31 Oct 2002

TL;DR: A student or researcher working in mathematics, computer graphics, science, or engineering interested in any dynamic moving front, which might change its topology or develop singularities, will find this book interesting and useful.

...read moreread less

Abstract: This book is an introduction to level set methods and dynamic implicit surfaces. These are powerful techniques for analyzing and computing moving fronts in a variety of different settings. While it gives many examples of the utility of the methods to a diverse set of applications, it also gives complete numerical analysis and recipes, which will enable users to quickly apply the techniques to real problems. The book begins with a description of implicit surfaces and their basic properties, then devises the level set geometry and calculus toolbox, including the construction of signed distance functions. Part II adds dynamics to this static calculus. Topics include the level set equation itself, Hamilton-Jacobi equations, motion of a surface normal to itself, re-initialization to a signed distance function, extrapolation in the normal direction, the particle level set method and the motion of co-dimension two (and higher) objects. Part III is concerned with topics taken from the fields of Image Processing and Computer Vision. These include the restoration of images degraded by noise and blur, image segmentation with active contours (snakes), and reconstruction of surfaces from unorganized data points. Part IV is dedicated to Computational Physics. It begins with one phase compressible fluid dynamics, then two-phase compressible flow involving possibly different equations of state, detonation and deflagration waves, and solid/fluid structure interaction. Next it discusses incompressible fluid dynamics, including a computer graphics simulation of smoke, free surface flows, including a computer graphics simulation of water, and fully two-phase incompressible flow. Additional related topics include incompressible flames with applications to computer graphics and coupling a compressible and incompressible fluid. Finally, heat flow and Stefan problems are discussed. A student or researcher working in mathematics, computer graphics, science, or engineering interested in any dynamic moving front, which might change its topology or develop singularities, will find this book interesting and useful.

...read moreread less

5,526 citations

"KinectFusion: real-time 3D reconstr..." refers methods in this paper

...Assuming the gradient is orthogonal to the surface interface, the surface normal is computed directly as the derivative of the TSDF at the zero-crossing [22]....
[...]
...Global 3D vertices are integrated into voxels using a variant of Signed Distance Functions (SDFs) [22], specifying a relative distance to the actual surface....
[...]

Proceedings Article•DOI•

KinectFusion: Real-time dense surface mapping and tracking

[...]

Imperial College London¹, Microsoft², Lancaster University³, Newcastle University⁴

26 Oct 2011

...read moreread less

4,184 citations

"KinectFusion: real-time 3D reconstr..." refers methods in this paper

...As shown in [21], this allows us to mitigate issues of drift and reduce ICP errors, by tracking directly from the raycasted model as opposed to frame-to-frame ICP tracking....
[...]
...A full formulation of our method is provided in [21], as well as quantitative evaluation of reconstruction performance....
[...]