scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

KillingFusion: Non-rigid 3D Reconstruction without Correspondences

TL;DR: A novel regularizer is proposed that imposes local rigidity by requiring the deformation to be a smooth and approximately Killing vector field, i.e. generating nearly isometric motions and enforcing that the level set property of unity gradient magnitude is preserved over iterations.
Abstract: We introduce a geometry-driven approach for real-time 3D reconstruction of deforming surfaces from a single RGB-D stream without any templates or shape priors. To this end, we tackle the problem of non-rigid registration by level set evolution without explicit correspondence search. Given a pair of signed distance fields (SDFs) representing the shapes of interest, we estimate a dense deformation field that aligns them. It is defined as a displacement vector field of the same resolution as the SDFs and is determined iteratively via variational minimization. To ensure it generates plausible shapes, we propose a novel regularizer that imposes local rigidity by requiring the deformation to be a smooth and approximately Killing vector field, i.e. generating nearly isometric motions. Moreover, we enforce that the level set property of unity gradient magnitude is preserved over iterations. As a result, KillingFusion reliably reconstructs objects that are undergoing topological changes and fast inter-frame motion. In addition to incrementally building a model from scratch, our system can also deform complete surfaces. We demonstrate these capabilities on several public datasets and introduce our own sequences that permit both qualitative and quantitative comparison to related approaches.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
14 Jun 2020
TL;DR: In this paper, the authors proposed Implicit Feature Networks (IF-Nets), which deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data retaining the nice properties of recent learned implicit functions.
Abstract: While many works focus on 3D reconstruction from images, in this paper, we focus on 3D shape reconstruction and completion from a variety of 3D inputs, which are deficient in some respect: low and high resolution voxels, sparse and dense point clouds, complete or incomplete. Processing of such 3D inputs is an increasingly important problem as they are the output of 3D scanners, which are becoming more accessible, and are the intermediate output of 3D computer vision algorithms. Recently, learned implicit functions have shown great promise as they produce continuous reconstructions. However, we identified two limitations in reconstruction from 3D inputs: 1) details present in the input data are not retained, and 2) poor reconstruction of articulated humans. To solve this, we propose Implicit Feature Networks (IF-Nets), which deliver continuous outputs, can handle multiple topologies, and complete shapes for missing or sparse input data retaining the nice properties of recent learned implicit functions, but critically they can also retain detail when it is present in the input data, and can reconstruct articulated humans. Our work differs from prior work in two crucial aspects. First, instead of using a single vector to encode a 3D shape, we extract a learnable 3-dimensional multi-scale tensor of deep features, which is aligned with the original Euclidean space embedding the shape. Second, instead of classifying x-y-z point coordinates directly, we classify deep features extracted from the tensor at a continuous query point. We show that this forces our model to make decisions based on global and local shape structure, as opposed to point coordinates, which are arbitrary under Euclidean transformations. Experiments demonstrate that IF-Nets outperform prior work in 3D object reconstruction in ShapeNet, and obtain significantly more accurate 3D human reconstructions. Code and project website is available at https://virtualhumans.mpi-inf.mpg.de/ifnets/.

390 citations

Proceedings ArticleDOI
15 Jun 2019
TL;DR: Octopus, a learning-based model to infer the personalized 3D shape of people from a few frames of a monocular video in which the person is moving with a reconstruction accuracy of 4 to 5mm, while being orders of magnitude faster than previous methods.
Abstract: We present Octopus, a learning-based model to infer the personalized 3D shape of people from a few frames (1-8) of a monocular video in which the person is moving with a reconstruction accuracy of 4 to 5mm, while being orders of magnitude faster than previous methods. From semantic segmentation images, our Octopus model reconstructs a 3D shape, including the parameters of SMPL plus clothing and hair in 10 seconds or less. The model achieves fast and accurate predictions based on two key design choices. First, by predicting shape in a canonical T-pose space, the network learns to encode the images of the person into pose-invariant latent codes, where the information is fused. Second, based on the observation that feed-forward predictions are fast but do not always align with the input images, we predict using both, bottom-up and top-down streams (one per view) allowing information to flow in both directions. Learning relies only on synthetic 3D data. Once learned, Octopus can take a variable number of frames as input, and is able to reconstruct shapes even from a single image with an accuracy of 5mm. Results on 3 different datasets demonstrate the efficacy and accuracy of our approach.

289 citations


Cites background from "KillingFusion: Non-rigid 3D Reconst..."

  • ...Although template-free approaches [52, 37, 65] are flexible, they can only handle very careful motions....

    [...]

Proceedings ArticleDOI
13 Mar 2018
TL;DR: In this paper, a parametric body model is used to estimate 3D shape, texture and implanted animation skeleton from a single RGB camera video in which a person is moving, and a robust processing pipeline is presented to infer 3D model shapes including clothed people with 4.5mm reconstruction accuracy.
Abstract: This paper describes a method to obtain accurate 3D body models and texture of arbitrary people from a single, monocular video in which a person is moving. Based on a parametric body model, we present a robust processing pipeline to infer 3D model shapes including clothed people with 4.5mm reconstruction accuracy. At the core of our approach is the transformation of dynamic body pose into a canonical frame of reference. Our main contribution is a method to transform the silhouette cones corresponding to dynamic human silhouettes to obtain a visual hull in a common reference frame. This enables efficient estimation of a consensus 3D shape, texture and implanted animation skeleton based on a large number of frames. Results on 4 different datasets demonstrate the effectiveness of our approach to produce accurate 3D models. Requiring only an RGB camera, our method enables everyone to create their own fully animatable digital double, e.g., for social VR applications or virtual try-on for online fashion shopping.

280 citations

Journal ArticleDOI
TL;DR: This report explains, compare, and critically analyze the common underlying algorithmic concepts that enabled recent developments in RGB‐D scene reconstruction in detail, and shows how algorithms are designed to best exploit the benefits ofRGB‐D data while suppressing their often non‐trivial data distortions.
Abstract: The advent of affordable consumer grade RGB‐D cameras has brought about a profound advancement of visual scene reconstruction methods. Both computer graphics and computer vision researchers spend significant effort to develop entirely new algorithms to capture comprehensive shape models of static and dynamic scenes with RGB‐D cameras. This led to significant advances of the state of the art along several dimensions. Some methods achieve very high reconstruction detail, despite limited sensor resolution. Others even achieve real‐time performance, yet possibly at lower quality. New concepts were developed to capture scenes at larger spatial and temporal extent. Other recent algorithms flank shape reconstruction with concurrent material and lighting estimation, even in general scenes and unconstrained conditions. In this state‐of‐the‐art report, we analyze these recent developments in RGB‐D scene reconstruction in detail and review essential related work. We explain, compare, and critically analyze the common underlying algorithmic concepts that enabled these recent advancements. Furthermore, we show how algorithms are designed to best exploit the benefits of RGB‐D data while suppressing their often non‐trivial data distortions. In addition, this report identifies and discusses important open research questions and suggests relevant directions for future work.

248 citations

Posted Content
TL;DR: This paper describes a method to obtain accurate 3D body models and texture of arbitrary people from a single, monocular video in which a person is moving and presents a robust processing pipeline to infer 3D model shapes including clothed people with 4.5mm reconstruction accuracy.
Abstract: This paper describes how to obtain accurate 3D body models and texture of arbitrary people from a single, monocular video in which a person is moving. Based on a parametric body model, we present a robust processing pipeline achieving 3D model fits with 5mm accuracy also for clothed people. Our main contribution is a method to nonrigidly deform the silhouette cones corresponding to the dynamic human silhouettes, resulting in a visual hull in a common reference frame that enables surface reconstruction. This enables efficient estimation of a consensus 3D shape, texture and implanted animation skeleton based on a large number of frames. We present evaluation results for a number of test subjects and analyze overall performance. Requiring only a smartphone or webcam, our method enables everyone to create their own fully animatable digital double, e.g., for social VR applications or virtual try-on for online fashion shopping.

240 citations


Cites background from "KillingFusion: Non-rigid 3D Reconst..."

  • ...While general, such template-free approaches [45, 31, 60] are limited to slow and careful motions....

    [...]

References
More filters
Proceedings ArticleDOI
01 Aug 1987
TL;DR: In this paper, a divide-and-conquer approach is used to generate inter-slice connectivity, and then a case table is created to define triangle topology using linear interpolation.
Abstract: We present a new algorithm, called marching cubes, that creates triangle models of constant density surfaces from 3D medical data. Using a divide-and-conquer approach to generate inter-slice connectivity, we create a case table that defines triangle topology. The algorithm processes the 3D medical data in scan-line order and calculates triangle vertices using linear interpolation. We find the gradient of the original data, normalize it, and use it as a basis for shading the models. The detail in images produced from the generated surface models is the result of maintaining the inter-slice connectivity, surface data, and gradient information present in the original 3D data. Results from computed tomography (CT), magnetic resonance (MR), and single-photon emission computed tomography (SPECT) illustrate the quality and functionality of marching cubes. We also discuss improvements that decrease processing time and add solid modeling capabilities.

13,231 citations

Journal ArticleDOI
TL;DR: The PSC algorithm as mentioned in this paper approximates the Hamilton-Jacobi equations with parabolic right-hand-sides by using techniques from the hyperbolic conservation laws, which can be used also for more general surface motion problems.

13,020 citations

Book
31 Oct 2002
TL;DR: A student or researcher working in mathematics, computer graphics, science, or engineering interested in any dynamic moving front, which might change its topology or develop singularities, will find this book interesting and useful.
Abstract: This book is an introduction to level set methods and dynamic implicit surfaces. These are powerful techniques for analyzing and computing moving fronts in a variety of different settings. While it gives many examples of the utility of the methods to a diverse set of applications, it also gives complete numerical analysis and recipes, which will enable users to quickly apply the techniques to real problems. The book begins with a description of implicit surfaces and their basic properties, then devises the level set geometry and calculus toolbox, including the construction of signed distance functions. Part II adds dynamics to this static calculus. Topics include the level set equation itself, Hamilton-Jacobi equations, motion of a surface normal to itself, re-initialization to a signed distance function, extrapolation in the normal direction, the particle level set method and the motion of co-dimension two (and higher) objects. Part III is concerned with topics taken from the fields of Image Processing and Computer Vision. These include the restoration of images degraded by noise and blur, image segmentation with active contours (snakes), and reconstruction of surfaces from unorganized data points. Part IV is dedicated to Computational Physics. It begins with one phase compressible fluid dynamics, then two-phase compressible flow involving possibly different equations of state, detonation and deflagration waves, and solid/fluid structure interaction. Next it discusses incompressible fluid dynamics, including a computer graphics simulation of smoke, free surface flows, including a computer graphics simulation of water, and fully two-phase incompressible flow. Additional related topics include incompressible flames with applications to computer graphics and coupling a compressible and incompressible fluid. Finally, heat flow and Stefan problems are discussed. A student or researcher working in mathematics, computer graphics, science, or engineering interested in any dynamic moving front, which might change its topology or develop singularities, will find this book interesting and useful.

5,526 citations


"KillingFusion: Non-rigid 3D Reconst..." refers background in this paper

  • ...On the contrary, level sets inherently handle such cases [35]....

    [...]

  • ...One of its characteristic geometric properties is that its gradient magnitude equals unity everywhere where it is differentiable [35]....

    [...]

  • ...Level set property To ensure geometric correctness during the evolution of φn, the property that the gradient magnitude in the non-truncated regions of an SDF is unity has to be conserved [35]:...

    [...]

  • ...Furthermore, we ensure that the SDF evolution is geometrically correct by conserving the level set property of unity gradient magnitude [26, 35]....

    [...]

Proceedings ArticleDOI
26 Oct 2011
TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.
Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

4,184 citations


"KillingFusion: Non-rigid 3D Reconst..." refers background in this paper

  • ...While many excellent solutions for the reconstruction of static scenes exist [5, 12, 23, 31, 33, 34, 43, 54], the more common real-life scenario - where objects move and interact non-rigidly - is still posing a challenge....

    [...]

Journal ArticleDOI
TL;DR: A survey of recent publications concerning medical image registration techniques is presented, according to a model based on nine salient criteria, the main dichotomy of which is extrinsic versus intrinsic methods.

3,426 citations


"KillingFusion: Non-rigid 3D Reconst..." refers methods in this paper

  • ...In medical imaging, where high fidelity shape priors for various organs are available [13, 16], level set methods have been applied to segmentation [2, 17] and registration [25, 30], usually guided by analytically defined evolution equations [36]....

    [...]