Showing papers on "3D reconstruction published in 2017"

PDF

Open Access

Proceedings Article•DOI•

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

[...]

Haoqiang Fan¹, Hao Su², Leonidas J. Guibas²•Institutions (2)

Tsinghua University¹, Stanford University²

01 Jul 2017

TL;DR: This paper addresses the problem of 3D reconstruction from a single image, generating a straight-forward form of output unorthordox, and designs architecture, loss function and learning paradigm that are novel and effective, capable of predicting multiple plausible 3D point clouds from an input image.

...read moreread less

Abstract: Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images, however, these representations obscure the natural invariance of 3D shapes under geometric transformations, and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output – point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthordox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3D reconstruction benchmarks, but it also shows strong performance for 3D shape completion and promising ability in making multiple plausible predictions.

...read moreread less

1,419 citations

Journal Article•DOI•

Tanks and temples: benchmarking large-scale scene reconstruction

[...]

A. Knapitsch¹, Jaesik Park¹, Qian-Yi Zhou¹, Vladlen Koltun¹•Institutions (1)

Intel¹

20 Jul 2017-ACM Transactions on Graphics

TL;DR: A benchmark for image-based 3D reconstruction with high-resolution video sequences provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity.

...read moreread less

Abstract: We present a benchmark for image-based 3D reconstruction. The benchmark sequences were acquired outside the lab, in realistic conditions. Ground-truth data was captured using an industrial laser scanner. The benchmark includes both outdoor scenes and indoor environments. High-resolution video sequences are provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity. We report the performance of many image-based 3D reconstruction pipelines on the new benchmark. The results point to exciting challenges and opportunities for future work.

...read moreread less

553 citations

Proceedings Article•DOI•

Hierarchical Surface Prediction for 3D Object Reconstruction

[...]

Christian Häne¹, Shubham Tulsiani², Jitendra Malik²•Institutions (2)

University of California¹, University of California, Berkeley²

01 Oct 2017

TL;DR: This work proposes a general framework, called hierarchical surface prediction (HSP), which facilitates prediction of high resolution voxel grids, and shows that high resolution predictions are more accurate than low resolution predictions.

...read moreread less

Abstract: Recently, Convolutional Neural Networks have shown promising results for 3D geometry prediction. They can make predictions from very little input data such as a single color image. A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well. We propose a general framework, called hierarchical surface prediction (HSP), which facilitates prediction of high resolution voxel grids. The main insight is that it is sufficient to predict high resolution voxels around the predicted surfaces. The exterior and interior of the objects can be represented with coarse resolution voxels. Our approach is not dependent on a specific input type. We show results for geometry prediction from color images, depth images and shape completion from partial voxel grids. Our analysis shows that our high resolution predictions are more accurate than low resolution predictions.

...read moreread less

335 citations

Journal Article•DOI•

Soft 3D reconstruction for view synthesis

[...]

Eric Scott Penner¹, Li Zhang¹•Institutions (1)

Google¹

20 Nov 2017-ACM Transactions on Graphics

TL;DR: A novel algorithm for view synthesis that utilizes a soft 3D reconstruction to improve quality, continuity and robustness and it is shown that this representation is beneficial throughout the view synthesis pipeline.

...read moreread less

Abstract: We present a novel algorithm for view synthesis that utilizes a soft 3D reconstruction to improve quality, continuity and robustness Our main contribution is the formulation of a soft 3D representation that preserves depth uncertainty through each stage of 3D reconstruction and rendering We show that this representation is beneficial throughout the view synthesis pipeline During view synthesis, it provides a soft model of scene geometry that provides continuity across synthesized views and robustness to depth uncertainty During 3D reconstruction, the same robust estimates of scene visibility can be applied iteratively to improve depth estimation around object edges Our algorithm is based entirely on O(1) filters, making it conducive to acceleration and it works with structured or unstructured sets of input views We compare with recent classical and learning-based algorithms on plenoptic lightfields, wide baseline captures, and lightfield videos produced from camera arrays

...read moreread less

319 citations

Proceedings Article•

Learning a Multi-View Stereo Machine

[...]

Abhishek Kar¹, Christian Häne¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2017

TL;DR: The authors leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays to jointly reason about shape priors while conforming to geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches.

...read moreread less

Abstract: We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end learning allows us to jointly reason about shape priors while conforming to geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches as well as completion of unseen surfaces. We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches and recent learning based methods.

...read moreread less

305 citations

Proceedings Article•DOI•

Stereo DSO: Large-Scale Direct Sparse Visual Odometry with Stereo Cameras

[...]

Rui Wang¹, Martin Schworer, Daniel Cremers¹•Institutions (1)

Technische Universität München¹

01 Oct 2017

TL;DR: Stereo Direct Sparse Odometry (Stereo DSO) as discussed by the authors integrates constraints from static stereo into the bundle adjustment pipeline of temporal multi-view stereo to improve tracking accuracy and robustness.

...read moreread less

Abstract: We propose Stereo Direct Sparse Odometry (Stereo DSO) as a novel method for highly accurate real-time visual odometry estimation of large-scale environments from stereo cameras. It jointly optimizes for all the model parameters within the active window, including the intrinsic/extrinsic camera parameters of all keyframes and the depth values of all selected pixels. In particular, we propose a novel approach to integrate constraints from static stereo into the bundle adjustment pipeline of temporal multi-view stereo. Real-time optimization is realized by sampling pixels uniformly from image regions with sufficient intensity gradient. Fixed-baseline stereo resolves scale drift. It also reduces the sensitivities to large optical flow and to rolling shutter effect which are known shortcomings of direct image alignment methods. Quantitative evaluation demonstrates that the proposed Stereo DSO outperforms existing state-of-the-art visual odometry methods both in terms of tracking accuracy and robustness. Moreover, our method delivers a more precise metric 3D reconstruction than previous dense/semi-dense direct approaches while providing a higher reconstruction density than feature-based methods.

...read moreread less

260 citations

Journal Article•DOI•

MicMac – a free, open-source solution for photogrammetry

[...]

Ewelina Rupnik¹, Ewelina Rupnik², M. Daakir¹, Marc Deseilligny¹•Institutions (2)

École Normale Supérieure¹, Institut de Physique du Globe de Paris²

01 Jun 2017-Open Geospatial Data, Software and Standards

TL;DR: The essential algorithmic aspects of the structure from motion and image dense matching problems are discussed from the implementation and the user’s viewpoints.

...read moreread less

Abstract: The publication familiarizes the reader with MicMac - a free, open-source photogrammetric software for 3D reconstruction A brief history of the tool, its organisation and unique features vis-a-vis other software tools are in the highlight The essential algorithmic aspects of the structure from motion and image dense matching problems are discussed from the implementation and the user’s viewpoints

...read moreread less

233 citations

Proceedings Article•

MarrNet: 3D Shape Reconstruction via 2.5D Sketches

[...]

Jiajun Wu¹, Yifan Wang², Tianfan Xue¹, Xingyuan Sun³, Bill Freeman¹, Josh Tenenbaum¹ - Show less +2 more•Institutions (3)

Massachusetts Institute of Technology¹, ShanghaiTech University², Shanghai Jiao Tong University³

01 Jan 2017

TL;DR: This work proposes MarrNet, an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shape and derives differentiable projective functions from 3D shape to 2.

...read moreread less

Abstract: 3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenge for learning-based approaches, as 3D object annotations in real images are scarce. Previous work chose to train on synthetic data with ground truth 3D information, but suffered from the domain adaptation issue when tested on real data. In this work, we propose an end-to-end trainable framework, sequentially estimating 2.5D sketches and 3D object shapes. Our disentangled, two-step formulation has three advantages. First, compared to full 3D shape, 2.5D sketches are much easier to be recovered from a 2D image, and to transfer from synthetic to real data. Second, for 3D reconstruction from the 2.5D sketches, we can easily transfer the learned model on synthetic data to real images, as rendered 2.5D sketches are invariant to object appearance variations in real images, including lighting, texture, etc. This further relieves the domain adaptation problem. Third, we derive differentiable projective functions from 3D shape to 2.5D sketches, making the framework end-to-end trainable on real images, requiring no real-image annotations. Our framework achieves state-of-the-art performance on 3D shape reconstruction.

...read moreread less

219 citations

Proceedings Article•DOI•

Intel(R) RealSense(TM) Stereoscopic Depth Cameras

[...]

Leonid M. Keselman¹, John Iselin Woodfill¹, Anders Grunnet-Jepsen¹, Achintya K. Bhowmik¹•Institutions (1)

Intel¹

01 Jul 2017

TL;DR: A comprehensive overview of the stereoscopic Intel RealSense RGBD imaging systems is presented, providing information about the systems' optical characteristics, their correlation algorithms, and how these properties can affect different applications, including 3D reconstruction and gesture recognition.

...read moreread less

Abstract: We present a comprehensive overview of the stereoscopic Intel RealSense RGBD imaging systems. We discuss these systems' mode-of-operation, functional behavior and include models of their expected performance, shortcomings, and limitations. We provide information about the systems' optical characteristics, their correlation algorithms, and how these properties can affect different applications, including 3D reconstruction and gesture recognition. Our discussion covers the Intel RealSense R200 and RS400.

...read moreread less

201 citations

Proceedings Article•DOI•

OctNetFusion: Learning Depth Fusion from Data

[...]

Gernot Riegler, Ali Osman Ulusoy¹, Horst Bischof, Andreas Geiger¹, Andreas Geiger² - Show less +1 more•Institutions (2)

Max Planck Society¹, ETH Zurich²

04 Apr 2017

TL;DR: In this article, a learning-based approach to depth fusion is proposed, which is able to reconstruct (partially) occluded surfaces and fill in gaps in the reconstruction by learning the structure of real world 3D objects and scenes.

...read moreread less

Abstract: In this paper, we present a learning based approach to depth fusion, i.e., dense 3D reconstruction from multiple depth images. The most common approach to depth fusion is based on averaging truncated signed distance functions, which was originally proposed by Curless and Levoy in 1996. While this method is simple and provides great results, it is not able to reconstruct (partially) occluded surfaces and requires a large number frames to filter out sensor noise and outliers. Motivated by the availability of large 3D model repositories and recent advances in deep learning, we present a novel 3D CNN architecture that learns to predict an implicit surface representation from the input depth maps. Our learning based method significantly outperforms the traditional volumetric fusion approach in terms of noise reduction and outlier suppression. By learning the structure of real world 3D objects and scenes, our approach is further able to reconstruct occluded regions and to fill in gaps in the reconstruction. We demonstrate that our learning based approach outperforms both vanilla TSDF fusion as well as TV-L1 fusion on the task of volumetric fusion. Further, we demonstrate state-of-the-art 3D shape completion results.

...read moreread less

201 citations

Posted Content•

Learning a Multi-View Stereo Machine

[...]

Abhishek Kar¹, Christian Häne¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

17 Aug 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images than required by classical approaches as well as completion of unseen surfaces.

...read moreread less

Abstract: We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches as well as completion of unseen surfaces. We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods.

...read moreread less

Journal Article•DOI•

Sparse Representation for 3D Shape Estimation: A Convex Relaxation Approach

[...]

Xiaowei Zhou¹, Menglong Zhu¹, Spyridon Leonardos¹, Kostas Daniilidis¹•Institutions (1)

University of Pennsylvania¹

01 Aug 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work investigates the problem of estimating the 3D shape of an object defined by a set of 3D landmarks, given their 2D correspondences in a single image and proposes a convex approach to addressing this challenge and develops an efficient algorithm to solve the proposed convex program.

...read moreread less

Abstract: We investigate the problem of estimating the 3D shape of an object defined by a set of 3D landmarks, given their 2D correspondences in a single image. A successful approach to alleviating the reconstruction ambiguity is the 3D deformable shape model and a sparse representation is often used to capture complex shape variability. But the model inference is still challenging due to the nonconvexity in the joint optimization of shape and viewpoint. In contrast to prior work that relies on an alternating scheme whose solution depends on initialization, we propose a convex approach to addressing this challenge and develop an efficient algorithm to solve the proposed convex program. We further propose a robust model to handle gross errors in the 2D correspondences. We demonstrate the exact recovery property of the proposed method, the advantage compared to several nonconvex baselines and the applicability to recover 3D human poses and car models from single images.

...read moreread less

Proceedings Article•DOI•

3D Shape Reconstruction from Sketches via Multi-view Convolutional Networks

[...]

Zhaoliang Lun¹, Matheus Gadelha¹, Evangelos Kalogerakis¹, Subhransu Maji¹, Rui Wang² - Show less +1 more•Institutions (2)

University of Massachusetts Amherst¹, Zhejiang University²

20 Jul 2017

TL;DR: In this article, a deep, encoder-decoder network is proposed to reconstruct 3D shapes from 2D sketches in the form of line drawings, where the encoder converts the sketch into a compact representation encoding shape information and the decoder converts this representation into depth and normal maps capturing the underlying surface from several output viewpoints.

...read moreread less

Abstract: We propose a method for reconstructing 3D shapes from 2D sketches in the form of line drawings. Our method takes as input a single sketch, or multiple sketches, and outputs a dense point cloud representing a 3D reconstruction of the input sketch(es). The point cloud is then converted into a polygon mesh. At the heart of our method lies a deep, encoder-decoder network. The encoder converts the sketch into a compact representation encoding shape information. The decoder converts this representation into depth and normal maps capturing the underlying surface from several output viewpoints. The multi-view maps are then consolidated into a 3D point cloud by solving an optimization problem that fuses depth and normals across all viewpoints. Based on our experiments, compared to other methods, such as volumetric networks, our architecture offers several advantages, including more faithful reconstruction, higher output surface resolution, better preservation of topology and shape structure.

...read moreread less

Proceedings Article•DOI•

Polarimetric Multi-view Stereo

[...]

Zhaopeng Cui¹, Jinwei Gu², Boxin Shi³, Ping Tan¹, Jan Kautz² - Show less +1 more•Institutions (3)

Simon Fraser University¹, Nvidia², National Institute of Advanced Industrial Science and Technology³

01 Jul 2017

TL;DR: This paper proposes polarimetric multi-view stereo, which combines per-pixel photometric information from polarization with epipolar constraints from multiple views for 3D reconstruction, and proves there are exactly two types of ambiguities on estimating surface azimuth angles from polarization.

...read moreread less

Abstract: Multi-view stereo relies on feature correspondences for 3D reconstruction, and thus is fundamentally flawed in dealing with featureless scenes. In this paper, we propose polarimetric multi-view stereo, which combines per-pixel photometric information from polarization with epipolar constraints from multiple views for 3D reconstruction. Polarization reveals surface normal information, and is thus helpful to propagate depth to featureless regions. Polarimetric multi-view stereo is completely passive and can be applied outdoors in uncontrolled illumination, since the data capture can be done simply with either a polarizer or a polarization camera. Unlike previous work on shape-from-polarization which is limited to either diffuse polarization or specular polarization only, we propose a novel polarization imaging model that can handle real-world objects with mixed polarization. We prove there are exactly two types of ambiguities on estimating surface azimuth angles from polarization, and we resolve them with graph optimization and iso-depth contour tracing. This step significantly improves the initial depth map estimate, which are later fused together for complete 3D reconstruction. Extensive experimental results demonstrate high-quality 3D reconstruction and better performance than state-of-the-art multi-view stereo methods, especially on featureless 3D objects, such as ceramic tiles, office room with white walls, and highly reflective cars in the outdoors.

...read moreread less

Journal Article•DOI•

Casual 3D photography

[...]

Peter Hedman¹, Suhib Alsisan², Richard Szeliski², Johannes Kopf²•Institutions (2)

University College London¹, Facebook²

20 Nov 2017-ACM Transactions on Graphics

TL;DR: An algorithm that enables casual 3D photography and proposes a novel parallax-tolerant stitching algorithm that warps the depth maps into the central panorama and stitches two color-and-depth panoramas for the front and back scene surfaces.

...read moreread less

Abstract: We present an algorithm that enables casual 3D photography. Given a set of input photos captured with a hand-held cell phone or DSLR camera, our algorithm reconstructs a 3D photo, a central panoramic, textured, normal mapped, multi-layered geometric mesh representation. 3D photos can be stored compactly and are optimized for being rendered from viewpoints that are near the capture viewpoints. They can be rendered using a standard rasterization pipeline to produce perspective views with motion parallax. When viewed in VR, 3D photos provide geometrically consistent views for both eyes. Our geometric representation also allows interacting with the scene using 3D geometry-aware effects, such as adding new objects to the scene and artistic lighting effects.Our 3D photo reconstruction algorithm starts with a standard structure from motion and multi-view stereo reconstruction of the scene. The dense stereo reconstruction is made robust to the imperfect capture conditions using a novel near envelope cost volume prior that discards erroneous near depth hypotheses. We propose a novel parallax-tolerant stitching algorithm that warps the depth maps into the central panorama and stitches two color-and-depth panoramas for the front and back scene surfaces. The two panoramas are fused into a single non-redundant, well-connected geometric mesh. We provide videos demonstrating users interactively viewing and manipulating our 3D photos.

...read moreread less

Posted Content•

MarrNet: 3D Shape Reconstruction via 2.5D Sketches

[...]

Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, William T. Freeman, Joshua B. Tenenbaum - Show less +2 more

08 Nov 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: MarrNet as discussed by the authors proposes an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shapes from a single image, which is trained on synthetic data with ground truth 3D information.

...read moreread less

Abstract: 3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenges for learning-based approaches, as 3D object annotations are scarce in real images. Previous work chose to train on synthetic data with ground truth 3D information, but suffered from domain adaptation when tested on real data. In this work, we propose MarrNet, an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shape. Our disentangled, two-step formulation has three advantages. First, compared to full 3D shape, 2.5D sketches are much easier to be recovered from a 2D image; models that recover 2.5D sketches are also more likely to transfer from synthetic to real data. Second, for 3D reconstruction from 2.5D sketches, systems can learn purely from synthetic data. This is because we can easily render realistic 2.5D sketches without modeling object appearance variations in real images, including lighting, texture, etc. This further relieves the domain adaptation problem. Third, we derive differentiable projective functions from 3D shape to 2.5D sketches; the framework is therefore end-to-end trainable on real images, requiring no human annotations. Our model achieves state-of-the-art performance on 3D shape reconstruction.

...read moreread less

Proceedings Article•DOI•

Weakly Supervised 3D Reconstruction with Adversarial Constraint

[...]

JunYoung Gwak¹, Christopher Choy¹, Manmohan Chandraker², Animesh Garg³, Silvio Savarese¹ - Show less +1 more•Institutions (3)

Stanford University¹, Princeton University², University of California, Berkeley³

31 May 2017

TL;DR: In this paper, the authors use foreground masks as weak supervision through a ray trace pooling layer that enables perspective projection and backpropagation, and constrain the 3D reconstruction to the manifold of unlabeled realistic 3D shapes that match mask observations.

...read moreread less

Abstract: Supervised 3D reconstruction has witnessed a significant progress through the use of deep neural networks. However, this increase in performance requires large scale annotations of 2D/3D data. In this paper, we explore inexpensive 2D supervision as an alternative for expensive 3D CAD annotation. Specifically, we use foreground masks as weak supervision through a raytrace pooling layer that enables perspective projection and backpropagation. Additionally, since the 3D reconstruction from masks is an ill posed problem, we propose to constrain the 3D reconstruction to the manifold of unlabeled realistic 3D shapes that match mask observations. We demonstrate that learning a log-barrier solution to this constrained optimization problem resembles the GAN objective, enabling the use of existing tools for training GANs. We evaluate and analyze the manifold constrained reconstruction on various datasets for single and multi-view reconstruction of both synthetic and real images.

...read moreread less

Proceedings Article•DOI•

Intrinsic3D: High-Quality 3D Reconstruction by Joint Appearance and Geometry Optimization with Spatially-Varying Lighting

[...]

Robert Maier¹, Kihwan Kim², Daniel Cremers³, Jan Kautz⁴, Matthias NieBner³ - Show less +1 more•Institutions (4)

Google¹, Dongseo University², Technische Universität München³, Nvidia⁴

01 Oct 2017

TL;DR: In this article, a joint surface reconstruction approach based on Shape-from-Shading (SfS) techniques and spatially-varying spherical harmonics (SVSH) from subvolumes of the reconstructed scene is proposed.

...read moreread less

Abstract: We introduce a novel method to obtain high-quality 3D reconstructions from consumer RGB-D sensors. Our core idea is to simultaneously optimize for geometry encoded in a signed distance field (SDF), textures from automatically-selected keyframes, and their camera poses along with material and scene lighting. To this end, we propose a joint surface reconstruction approach that is based on Shape-from-Shading (SfS) techniques and utilizes the estimation of spatially-varying spherical harmonics (SVSH) from subvolumes of the reconstructed scene. Through extensive examples and evaluations, we demonstrate that our method dramatically increases the level of detail in the reconstructed scene geometry and contributes highly to consistent surface texture recovery.

...read moreread less

Posted Content•

InfiniTAM v3: A Framework for Large-Scale 3D Reconstruction with Loop Closure

[...]

Victor Adrian Prisacariu, Olaf Kähler, Stuart Golodetz, Michael Sapienza, Tommaso Cavallari, Philip H. S. Torr, David W. Murray - Show less +3 more

02 Aug 2017-arXiv: Computer Vision and Pattern Recognition

TL;DR: The idea is that steps like camera tracking, scene representation and integration of new data can easily be replaced and adapted to the user's needs and provide for a fast, flexible 3D reconstruction pipeline called InfiniTAM.

...read moreread less

Abstract: Volumetric models have become a popular representation for 3D scenes in recent years. One breakthrough leading to their popularity was KinectFusion, which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a TSDF leads to most of the simplicity and efficiency that can be achieved with GPU implementations of these systems. However, this representation is memory-intensive and limits applicability to small-scale reconstructions. Several avenues have been explored to overcome this. With the aim of summarizing them and providing for a fast, flexible 3D reconstruction pipeline, we propose a new, unifying framework called InfiniTAM. The idea is that steps like camera tracking, scene representation and integration of new data can easily be replaced and adapted to the user's needs. This report describes the technical implementation details of InfiniTAM v3, the third version of our InfiniTAM system. We have added various new features, as well as making numerous enhancements to the low-level code that significantly improve our camera tracking performance. The new features that we expect to be of most interest are (i) a robust camera tracking module; (ii) an implementation of Glocker et al.'s keyframe-based random ferns camera relocaliser; (iii) a novel approach to globally-consistent TSDF-based reconstruction, based on dividing the scene into rigid submaps and optimising the relative poses between them; and (iv) an implementation of Keller et al.'s surfel-based reconstruction approach.

...read moreread less

Journal Article•DOI•

Dense 3D face alignment from 2D video for real-time use

[...]

Lszl A. Jeni¹, Jeffrey F. Cohn², Takeo Kanade¹•Institutions (2)

Carnegie Mellon University¹, University of Pittsburgh²

01 Feb 2017-Image and Vision Computing

TL;DR: A 3D cascade regression approach in which facial landmarks remain invariant across pose over a range of approximately 60 degrees is developed, which strongly support the validity of real-time, 3D registration and reconstruction from 2D video.

...read moreread less

Proceedings Article•DOI•

Automatic 3D Reconstruction from Multi-date Satellite Images

[...]

Gabriele Facciolo¹, Carlo de Franchis¹, Enric Meinhardt-Llopis¹•Institutions (1)

École Normale Supérieure¹

01 Jul 2017

TL;DR: An algorithm for computing a 3D model from several satellite images of the same site so that with a large number of input images the resulting 3D models can be as accurate as those obtained from a single same-date stereo pair.

...read moreread less

Abstract: We propose an algorithm for computing a 3D model from several satellite images of the same site. The method works even if the images were taken at different dates with important lighting and vegetation differences. We show that with a large number of input images the resulting 3D models can be as accurate as those obtained from a single same-date stereo pair. To deal with seasonal vegetation changes, we propose a strategy that accounts for the multi-modal nature of 3D models computed from multi-date images. Our method uses a local affine camera approximation and thus focuses on the 3D reconstruction of small areas. This is a common setup in urgent cartography for emergency management, for which abundant multi-date imagery can be immediately available to build a reference 3D model. A preliminary implementation of this method was used to win the IARPA Multi-View Stereo 3D Mapping Challenge 2016. Experiments on the challenge dataset are used to substantiate our claims.

...read moreread less

Journal Article•DOI•

GENFIRE: A generalized Fourier iterative reconstruction algorithm for high-resolution 3D imaging.

[...]

Alan Pryor¹, Yongsoo Yang¹, Arjun Rana¹, Marcus Gallagher-Jones¹, Jihan Zhou¹, Yuan Hung Lo¹, Georgian Melinte², Georgian Melinte¹, Wah Chiu³, Jose A. Rodriguez¹, Jianwei Miao¹ - Show less +7 more•Institutions (3)

University of California, Los Angeles¹, University of Strasbourg², SLAC National Accelerator Laboratory³

05 Sep 2017-Scientific Reports

TL;DR: This work presents the mathematical implementation of a tomographic algorithm, termed GENeralized Fourier Iterative REconstruction (GENFIRE), for high-resolution 3D reconstruction from a limited number of 2D projections and demonstrates that it can produce superior results relative to several other popular tomographic reconstruction techniques.

...read moreread less

Abstract: Tomography has made a radical impact on diverse fields ranging from the study of 3D atomic arrangements in matter to the study of human health in medicine. Despite its very diverse applications, the core of tomography remains the same, that is, a mathematical method must be implemented to reconstruct the 3D structure of an object from a number of 2D projections. Here, we present the mathematical implementation of a tomographic algorithm, termed GENeralized Fourier Iterative REconstruction (GENFIRE), for high-resolution 3D reconstruction from a limited number of 2D projections. GENFIRE first assembles a 3D Fourier grid with oversampling and then iterates between real and reciprocal space to search for a global solution that is concurrently consistent with the measured data and general physical constraints. The algorithm requires minimal human intervention and also incorporates angular refinement to reduce the tilt angle error. We demonstrate that GENFIRE can produce superior results relative to several other popular tomographic reconstruction techniques through numerical simulations and by experimentally reconstructing the 3D structure of a porous material and a frozen-hydrated marine cyanobacterium. Equipped with a graphical user interface, GENFIRE is freely available from our website and is expected to find broad applications across different disciplines.

...read moreread less

Journal Article•DOI•

An adaptive approach for the reconstruction and modeling of as-built 3D pipelines from point clouds

[...]

Ashok Kumar Patil¹, Pavitra Holi¹, Sang Keun Lee¹, Youngho Chai¹•Institutions (1)

Chung-Ang University¹

01 Mar 2017-Automation in Construction

TL;DR: In this article, a modification of the existing Hough transform for the automatic detection of cylinder parameters in point clouds is presented, where the relationship between cylinders is reconstructed to form a continuous axis network by tracking cylinder parameters obtained from earlier steps.

...read moreread less

Posted Content•

Weakly supervised 3D Reconstruction with Adversarial Constraint

[...]

JunYoung Gwak, Christopher Choy, Animesh Garg, Manmohan Chandraker, Silvio Savarese - Show less +1 more

31 May 2017-arXiv: Computer Vision and Pattern Recognition

...read moreread less

Journal Article•DOI•

Dense Semantic 3D Reconstruction

[...]

Christian Häne¹, Christopher Zach², Andrea Cohen³, Marc Pollefeys³•Institutions (3)

University of California, Berkeley¹, Toshiba², ETH Zurich³

01 Sep 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: It is argued that image segmentation and dense 3D reconstruction contribute valuable information to each other’s task and a mathematical framework to formulate and solve a joint segmentations and dense reconstruction problem is proposed.

...read moreread less

Abstract: Both image segmentation and dense 3D modeling from images represent an intrinsically ill-posed problem. Strong regularizers are therefore required to constrain the solutions from being ‘too noisy’. These priors generally yield overly smooth reconstructions and/or segmentations in certain regions while they fail to constrain the solution sufficiently in other areas. In this paper, we argue that image segmentation and dense 3D reconstruction contribute valuable information to each other’s task. As a consequence, we propose a mathematical framework to formulate and solve a joint segmentation and dense reconstruction problem. On the one hand knowing about the semantic class of the geometry provides information about the likelihood of the surface direction. On the other hand the surface direction provides information about the likelihood of the semantic class. Experimental results on several data sets highlight the advantages of our joint formulation. We show how weakly observed surfaces are reconstructed more faithfully compared to a geometry only reconstruction. Thanks to the volumetric nature of our formulation we also infer surfaces which cannot be directly observed for example the surface between the ground and a building. Finally, our method returns a semantic segmentation which is consistent across the whole dataset.

...read moreread less

Journal Article•DOI•

A high dynamic range structured light means for the 3D measurement of specular surface

[...]

Zhan Song¹, Zhan Song², Hualie Jiang¹, Haibo Lin¹, Suming Tang¹ - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, The Chinese University of Hong Kong²

01 Aug 2017-Optics and Lasers in Engineering

TL;DR: The results showed that, different specular targets with various shapes can be precisely reconstructed by the proposed method.

...read moreread less

Proceedings Article•DOI•

Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames

[...]

Suryansh Kumar¹, Yuchao Dai¹, Hongdong Li¹•Institutions (1)

Australian National University¹

01 Oct 2017

TL;DR: In this paper, a new approach for dense 3D reconstruction of a complex dynamic scene from two perspective frames is proposed, which reduces the dynamic reconstruction problem to a 3D jigsaw puzzle problem which takes pieces from an unorganized "soup of superpixels".

...read moreread less

Abstract: This paper proposes a new approach for monocular dense 3D reconstruction of a complex dynamic scene from two perspective frames. By applying superpixel over-segmentation to the image, we model a generically dynamic (hence non-rigid) scene with a piecewise planar and rigid approximation. In this way, we reduce the dynamic reconstruction problem to a “3D jigsaw puzzle ” problem which takes pieces from an unorganized “soup of superpixels". We show that our method provides an effective solution to the inherent relative scale ambiguity in structure-from-motion. Since our method does not assume a template prior, or per-object segmentation, or knowledge about the rigidity of the dynamic scene, it is applicable to a wide range of scenarios. Extensive experiments on both synthetic and real monocular sequences demonstrate the superiority of our method compared with the state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Shape Estimation from Shading, Defocus, and Correspondence Using Light-Field Angular Coherence

[...]

Michael W. Tao¹, Pratul P. Srinivasan¹, Sunil Hadap², Szymon Rusinkiewicz³, Jitendra Malik¹, Ravi Ramamoorthi⁴ - Show less +2 more•Institutions (4)

University of California, Berkeley¹, Adobe Systems², Princeton University³, University of California, San Diego⁴

01 Mar 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work presents a principled algorithm for dense depth estimation that combines defocus and correspondence metrics, and shows that combining all three sources of information: defocus, correspondence, and shading, outperforms state-of-the-art light-field depth estimation algorithms in multiple scenarios.

...read moreread less

Abstract: Light-field cameras are quickly becoming commodity items, with consumer and industrial applications. They capture many nearby views simultaneously using a single image with a micro-lens array, thereby providing a wealth of cues for depth recovery: defocus, correspondence, and shading. In particular, apart from conventional image shading, one can refocus images after acquisition, and shift one's viewpoint within the sub-apertures of the main lens, effectively obtaining multiple views. We present a principled algorithm for dense depth estimation that combines defocus and correspondence metrics. We then extend our analysis to the additional cue of shading, using it to refine fine details in the shape. By exploiting an all-in-focus image, in which pixels are expected to exhibit angular coherence, we define an optimization framework that integrates photo consistency, depth consistency, and shading consistency. We show that combining all three sources of information: defocus, correspondence, and shading, outperforms state-of-the-art light-field depth estimation algorithms in multiple scenarios.

...read moreread less

Journal Article•DOI•

An Integrated Platform for Live 3D Human Reconstruction and Motion Capturing

[...]

Dimitrios S. Alexiadis¹, Anargyros Chatzitofis¹, Nikolaos Zioulis¹, Olga Zoidi¹, Georgios Louizis¹, Dimitrios Zarpalas¹, Petros Daras¹ - Show less +3 more•Institutions (1)

Information Technology Institute¹

01 Apr 2017-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The main elements of an integrated platform, which target tele-immersion and future 3D applications, are described in this paper, addressing the tasks of real-time capturing, robust 3D human shape/appearance reconstruction, and skeleton-based motion tracking.

...read moreread less

Abstract: The latest developments in 3D capturing, processing, and rendering provide means to unlock novel 3D application pathways. The main elements of an integrated platform, which target tele-immersion and future 3D applications, are described in this paper, addressing the tasks of real-time capturing, robust 3D human shape/appearance reconstruction, and skeleton-based motion tracking. More specifically, initially, the details of a multiple RGB-depth (RGB-D) capturing system are given, along with a novel sensors’ calibration method. A robust, fast reconstruction method from multiple RGB-D streams is then proposed, based on an enhanced variation of the volumetric Fourier transform-based method, parallelized on the Graphics Processing Unit, and accompanied with an appropriate texture-mapping algorithm. On top of that, given the lack of relevant objective evaluation methods, a novel framework is proposed for the quantitative evaluation of real-time 3D reconstruction systems. Finally, a generic, multiple depth stream-based method for accurate real-time human skeleton tracking is proposed. Detailed experimental results with multi-Kinect2 data sets verify the validity of our arguments and the effectiveness of the proposed system and methodologies.

...read moreread less

Journal Article•DOI•

Large-scale outdoor 3D reconstruction on a mobile device

[...]

Thomas Schps, Torsten Sattler, Christian Hne¹, Marc Pollefeys²•Institutions (2)

University of California, Berkeley¹, Microsoft²

01 Apr 2017-Computer Vision and Image Understanding

TL;DR: This paper presents an approach for reconstructing large-scale outdoor scenes through monocular motion stereo at interactive frame rates on a modern mobile device, and is the first method to enable live reconstruction of large outdoor scenes on a mobile device.

...read moreread less

Collapse