scispace - formally typeset
Search or ask a question

Showing papers on "3D reconstruction published in 2017"


Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper addresses the problem of 3D reconstruction from a single image, generating a straight-forward form of output unorthordox, and designs architecture, loss function and learning paradigm that are novel and effective, capable of predicting multiple plausible 3D point clouds from an input image.
Abstract: Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images, however, these representations obscure the natural invariance of 3D shapes under geometric transformations, and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output – point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthordox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3D reconstruction benchmarks, but it also shows strong performance for 3D shape completion and promising ability in making multiple plausible predictions.

1,419 citations


Journal ArticleDOI
TL;DR: A benchmark for image-based 3D reconstruction with high-resolution video sequences provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity.
Abstract: We present a benchmark for image-based 3D reconstruction. The benchmark sequences were acquired outside the lab, in realistic conditions. Ground-truth data was captured using an industrial laser scanner. The benchmark includes both outdoor scenes and indoor environments. High-resolution video sequences are provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity. We report the performance of many image-based 3D reconstruction pipelines on the new benchmark. The results point to exciting challenges and opportunities for future work.

553 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: This work proposes a general framework, called hierarchical surface prediction (HSP), which facilitates prediction of high resolution voxel grids, and shows that high resolution predictions are more accurate than low resolution predictions.
Abstract: Recently, Convolutional Neural Networks have shown promising results for 3D geometry prediction. They can make predictions from very little input data such as a single color image. A major limitation of such approaches is that they only predict a coarse resolution voxel grid, which does not capture the surface of the objects well. We propose a general framework, called hierarchical surface prediction (HSP), which facilitates prediction of high resolution voxel grids. The main insight is that it is sufficient to predict high resolution voxels around the predicted surfaces. The exterior and interior of the objects can be represented with coarse resolution voxels. Our approach is not dependent on a specific input type. We show results for geometry prediction from color images, depth images and shape completion from partial voxel grids. Our analysis shows that our high resolution predictions are more accurate than low resolution predictions.

335 citations


Journal ArticleDOI
Eric Scott Penner1, Li Zhang1
TL;DR: A novel algorithm for view synthesis that utilizes a soft 3D reconstruction to improve quality, continuity and robustness and it is shown that this representation is beneficial throughout the view synthesis pipeline.
Abstract: We present a novel algorithm for view synthesis that utilizes a soft 3D reconstruction to improve quality, continuity and robustness Our main contribution is the formulation of a soft 3D representation that preserves depth uncertainty through each stage of 3D reconstruction and rendering We show that this representation is beneficial throughout the view synthesis pipeline During view synthesis, it provides a soft model of scene geometry that provides continuity across synthesized views and robustness to depth uncertainty During 3D reconstruction, the same robust estimates of scene visibility can be applied iteratively to improve depth estimation around object edges Our algorithm is based entirely on O(1) filters, making it conducive to acceleration and it works with structured or unstructured sets of input views We compare with recent classical and learning-based algorithms on plenoptic lightfields, wide baseline captures, and lightfield videos produced from camera arrays

319 citations


Proceedings Article
01 Jan 2017
TL;DR: The authors leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays to jointly reason about shape priors while conforming to geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches.
Abstract: We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end learning allows us to jointly reason about shape priors while conforming to geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches as well as completion of unseen surfaces. We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches and recent learning based methods.

305 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: Stereo Direct Sparse Odometry (Stereo DSO) as discussed by the authors integrates constraints from static stereo into the bundle adjustment pipeline of temporal multi-view stereo to improve tracking accuracy and robustness.
Abstract: We propose Stereo Direct Sparse Odometry (Stereo DSO) as a novel method for highly accurate real-time visual odometry estimation of large-scale environments from stereo cameras. It jointly optimizes for all the model parameters within the active window, including the intrinsic/extrinsic camera parameters of all keyframes and the depth values of all selected pixels. In particular, we propose a novel approach to integrate constraints from static stereo into the bundle adjustment pipeline of temporal multi-view stereo. Real-time optimization is realized by sampling pixels uniformly from image regions with sufficient intensity gradient. Fixed-baseline stereo resolves scale drift. It also reduces the sensitivities to large optical flow and to rolling shutter effect which are known shortcomings of direct image alignment methods. Quantitative evaluation demonstrates that the proposed Stereo DSO outperforms existing state-of-the-art visual odometry methods both in terms of tracking accuracy and robustness. Moreover, our method delivers a more precise metric 3D reconstruction than previous dense/semi-dense direct approaches while providing a higher reconstruction density than feature-based methods.

260 citations


Journal ArticleDOI
TL;DR: The essential algorithmic aspects of the structure from motion and image dense matching problems are discussed from the implementation and the user’s viewpoints.
Abstract: The publication familiarizes the reader with MicMac - a free, open-source photogrammetric software for 3D reconstruction A brief history of the tool, its organisation and unique features vis-a-vis other software tools are in the highlight The essential algorithmic aspects of the structure from motion and image dense matching problems are discussed from the implementation and the user’s viewpoints

233 citations


Proceedings Article
01 Jan 2017
TL;DR: This work proposes MarrNet, an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shape and derives differentiable projective functions from 3D shape to 2.
Abstract: 3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenge for learning-based approaches, as 3D object annotations in real images are scarce. Previous work chose to train on synthetic data with ground truth 3D information, but suffered from the domain adaptation issue when tested on real data. In this work, we propose an end-to-end trainable framework, sequentially estimating 2.5D sketches and 3D object shapes. Our disentangled, two-step formulation has three advantages. First, compared to full 3D shape, 2.5D sketches are much easier to be recovered from a 2D image, and to transfer from synthetic to real data. Second, for 3D reconstruction from the 2.5D sketches, we can easily transfer the learned model on synthetic data to real images, as rendered 2.5D sketches are invariant to object appearance variations in real images, including lighting, texture, etc. This further relieves the domain adaptation problem. Third, we derive differentiable projective functions from 3D shape to 2.5D sketches, making the framework end-to-end trainable on real images, requiring no real-image annotations. Our framework achieves state-of-the-art performance on 3D shape reconstruction.

219 citations


Proceedings ArticleDOI
01 Jul 2017
TL;DR: A comprehensive overview of the stereoscopic Intel RealSense RGBD imaging systems is presented, providing information about the systems' optical characteristics, their correlation algorithms, and how these properties can affect different applications, including 3D reconstruction and gesture recognition.
Abstract: We present a comprehensive overview of the stereoscopic Intel RealSense RGBD imaging systems. We discuss these systems' mode-of-operation, functional behavior and include models of their expected performance, shortcomings, and limitations. We provide information about the systems' optical characteristics, their correlation algorithms, and how these properties can affect different applications, including 3D reconstruction and gesture recognition. Our discussion covers the Intel RealSense R200 and RS400.

201 citations


Proceedings ArticleDOI
04 Apr 2017
TL;DR: In this article, a learning-based approach to depth fusion is proposed, which is able to reconstruct (partially) occluded surfaces and fill in gaps in the reconstruction by learning the structure of real world 3D objects and scenes.
Abstract: In this paper, we present a learning based approach to depth fusion, i.e., dense 3D reconstruction from multiple depth images. The most common approach to depth fusion is based on averaging truncated signed distance functions, which was originally proposed by Curless and Levoy in 1996. While this method is simple and provides great results, it is not able to reconstruct (partially) occluded surfaces and requires a large number frames to filter out sensor noise and outliers. Motivated by the availability of large 3D model repositories and recent advances in deep learning, we present a novel 3D CNN architecture that learns to predict an implicit surface representation from the input depth maps. Our learning based method significantly outperforms the traditional volumetric fusion approach in terms of noise reduction and outlier suppression. By learning the structure of real world 3D objects and scenes, our approach is further able to reconstruct occluded regions and to fill in gaps in the reconstruction. We demonstrate that our learning based approach outperforms both vanilla TSDF fusion as well as TV-L1 fusion on the task of volumetric fusion. Further, we demonstrate state-of-the-art 3D shape completion results.

201 citations


Posted Content
TL;DR: End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images than required by classical approaches as well as completion of unseen surfaces.
Abstract: We present a learnt system for multi-view stereopsis. In contrast to recent learning based methods for 3D reconstruction, we leverage the underlying 3D geometry of the problem through feature projection and unprojection along viewing rays. By formulating these operations in a differentiable manner, we are able to learn the system end-to-end for the task of metric 3D reconstruction. End-to-end learning allows us to jointly reason about shape priors while conforming geometric constraints, enabling reconstruction from much fewer images (even a single image) than required by classical approaches as well as completion of unseen surfaces. We thoroughly evaluate our approach on the ShapeNet dataset and demonstrate the benefits over classical approaches as well as recent learning based methods.

Journal ArticleDOI
TL;DR: This work investigates the problem of estimating the 3D shape of an object defined by a set of 3D landmarks, given their 2D correspondences in a single image and proposes a convex approach to addressing this challenge and develops an efficient algorithm to solve the proposed convex program.
Abstract: We investigate the problem of estimating the 3D shape of an object defined by a set of 3D landmarks, given their 2D correspondences in a single image. A successful approach to alleviating the reconstruction ambiguity is the 3D deformable shape model and a sparse representation is often used to capture complex shape variability. But the model inference is still challenging due to the nonconvexity in the joint optimization of shape and viewpoint. In contrast to prior work that relies on an alternating scheme whose solution depends on initialization, we propose a convex approach to addressing this challenge and develop an efficient algorithm to solve the proposed convex program. We further propose a robust model to handle gross errors in the 2D correspondences. We demonstrate the exact recovery property of the proposed method, the advantage compared to several nonconvex baselines and the applicability to recover 3D human poses and car models from single images.

Proceedings ArticleDOI
20 Jul 2017
TL;DR: In this article, a deep, encoder-decoder network is proposed to reconstruct 3D shapes from 2D sketches in the form of line drawings, where the encoder converts the sketch into a compact representation encoding shape information and the decoder converts this representation into depth and normal maps capturing the underlying surface from several output viewpoints.
Abstract: We propose a method for reconstructing 3D shapes from 2D sketches in the form of line drawings. Our method takes as input a single sketch, or multiple sketches, and outputs a dense point cloud representing a 3D reconstruction of the input sketch(es). The point cloud is then converted into a polygon mesh. At the heart of our method lies a deep, encoder-decoder network. The encoder converts the sketch into a compact representation encoding shape information. The decoder converts this representation into depth and normal maps capturing the underlying surface from several output viewpoints. The multi-view maps are then consolidated into a 3D point cloud by solving an optimization problem that fuses depth and normals across all viewpoints. Based on our experiments, compared to other methods, such as volumetric networks, our architecture offers several advantages, including more faithful reconstruction, higher output surface resolution, better preservation of topology and shape structure.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This paper proposes polarimetric multi-view stereo, which combines per-pixel photometric information from polarization with epipolar constraints from multiple views for 3D reconstruction, and proves there are exactly two types of ambiguities on estimating surface azimuth angles from polarization.
Abstract: Multi-view stereo relies on feature correspondences for 3D reconstruction, and thus is fundamentally flawed in dealing with featureless scenes. In this paper, we propose polarimetric multi-view stereo, which combines per-pixel photometric information from polarization with epipolar constraints from multiple views for 3D reconstruction. Polarization reveals surface normal information, and is thus helpful to propagate depth to featureless regions. Polarimetric multi-view stereo is completely passive and can be applied outdoors in uncontrolled illumination, since the data capture can be done simply with either a polarizer or a polarization camera. Unlike previous work on shape-from-polarization which is limited to either diffuse polarization or specular polarization only, we propose a novel polarization imaging model that can handle real-world objects with mixed polarization. We prove there are exactly two types of ambiguities on estimating surface azimuth angles from polarization, and we resolve them with graph optimization and iso-depth contour tracing. This step significantly improves the initial depth map estimate, which are later fused together for complete 3D reconstruction. Extensive experimental results demonstrate high-quality 3D reconstruction and better performance than state-of-the-art multi-view stereo methods, especially on featureless 3D objects, such as ceramic tiles, office room with white walls, and highly reflective cars in the outdoors.

Journal ArticleDOI
TL;DR: An algorithm that enables casual 3D photography and proposes a novel parallax-tolerant stitching algorithm that warps the depth maps into the central panorama and stitches two color-and-depth panoramas for the front and back scene surfaces.
Abstract: We present an algorithm that enables casual 3D photography. Given a set of input photos captured with a hand-held cell phone or DSLR camera, our algorithm reconstructs a 3D photo, a central panoramic, textured, normal mapped, multi-layered geometric mesh representation. 3D photos can be stored compactly and are optimized for being rendered from viewpoints that are near the capture viewpoints. They can be rendered using a standard rasterization pipeline to produce perspective views with motion parallax. When viewed in VR, 3D photos provide geometrically consistent views for both eyes. Our geometric representation also allows interacting with the scene using 3D geometry-aware effects, such as adding new objects to the scene and artistic lighting effects.Our 3D photo reconstruction algorithm starts with a standard structure from motion and multi-view stereo reconstruction of the scene. The dense stereo reconstruction is made robust to the imperfect capture conditions using a novel near envelope cost volume prior that discards erroneous near depth hypotheses. We propose a novel parallax-tolerant stitching algorithm that warps the depth maps into the central panorama and stitches two color-and-depth panoramas for the front and back scene surfaces. The two panoramas are fused into a single non-redundant, well-connected geometric mesh. We provide videos demonstrating users interactively viewing and manipulating our 3D photos.

Posted Content
TL;DR: MarrNet as discussed by the authors proposes an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shapes from a single image, which is trained on synthetic data with ground truth 3D information.
Abstract: 3D object reconstruction from a single image is a highly under-determined problem, requiring strong prior knowledge of plausible 3D shapes. This introduces challenges for learning-based approaches, as 3D object annotations are scarce in real images. Previous work chose to train on synthetic data with ground truth 3D information, but suffered from domain adaptation when tested on real data. In this work, we propose MarrNet, an end-to-end trainable model that sequentially estimates 2.5D sketches and 3D object shape. Our disentangled, two-step formulation has three advantages. First, compared to full 3D shape, 2.5D sketches are much easier to be recovered from a 2D image; models that recover 2.5D sketches are also more likely to transfer from synthetic to real data. Second, for 3D reconstruction from 2.5D sketches, systems can learn purely from synthetic data. This is because we can easily render realistic 2.5D sketches without modeling object appearance variations in real images, including lighting, texture, etc. This further relieves the domain adaptation problem. Third, we derive differentiable projective functions from 3D shape to 2.5D sketches; the framework is therefore end-to-end trainable on real images, requiring no human annotations. Our model achieves state-of-the-art performance on 3D shape reconstruction.

Proceedings ArticleDOI
31 May 2017
TL;DR: In this paper, the authors use foreground masks as weak supervision through a ray trace pooling layer that enables perspective projection and backpropagation, and constrain the 3D reconstruction to the manifold of unlabeled realistic 3D shapes that match mask observations.
Abstract: Supervised 3D reconstruction has witnessed a significant progress through the use of deep neural networks. However, this increase in performance requires large scale annotations of 2D/3D data. In this paper, we explore inexpensive 2D supervision as an alternative for expensive 3D CAD annotation. Specifically, we use foreground masks as weak supervision through a raytrace pooling layer that enables perspective projection and backpropagation. Additionally, since the 3D reconstruction from masks is an ill posed problem, we propose to constrain the 3D reconstruction to the manifold of unlabeled realistic 3D shapes that match mask observations. We demonstrate that learning a log-barrier solution to this constrained optimization problem resembles the GAN objective, enabling the use of existing tools for training GANs. We evaluate and analyze the manifold constrained reconstruction on various datasets for single and multi-view reconstruction of both synthetic and real images.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this article, a joint surface reconstruction approach based on Shape-from-Shading (SfS) techniques and spatially-varying spherical harmonics (SVSH) from subvolumes of the reconstructed scene is proposed.
Abstract: We introduce a novel method to obtain high-quality 3D reconstructions from consumer RGB-D sensors. Our core idea is to simultaneously optimize for geometry encoded in a signed distance field (SDF), textures from automatically-selected keyframes, and their camera poses along with material and scene lighting. To this end, we propose a joint surface reconstruction approach that is based on Shape-from-Shading (SfS) techniques and utilizes the estimation of spatially-varying spherical harmonics (SVSH) from subvolumes of the reconstructed scene. Through extensive examples and evaluations, we demonstrate that our method dramatically increases the level of detail in the reconstructed scene geometry and contributes highly to consistent surface texture recovery.

Posted Content
TL;DR: The idea is that steps like camera tracking, scene representation and integration of new data can easily be replaced and adapted to the user's needs and provide for a fast, flexible 3D reconstruction pipeline called InfiniTAM.
Abstract: Volumetric models have become a popular representation for 3D scenes in recent years. One breakthrough leading to their popularity was KinectFusion, which focuses on 3D reconstruction using RGB-D sensors. However, monocular SLAM has since also been tackled with very similar approaches. Representing the reconstruction volumetrically as a TSDF leads to most of the simplicity and efficiency that can be achieved with GPU implementations of these systems. However, this representation is memory-intensive and limits applicability to small-scale reconstructions. Several avenues have been explored to overcome this. With the aim of summarizing them and providing for a fast, flexible 3D reconstruction pipeline, we propose a new, unifying framework called InfiniTAM. The idea is that steps like camera tracking, scene representation and integration of new data can easily be replaced and adapted to the user's needs. This report describes the technical implementation details of InfiniTAM v3, the third version of our InfiniTAM system. We have added various new features, as well as making numerous enhancements to the low-level code that significantly improve our camera tracking performance. The new features that we expect to be of most interest are (i) a robust camera tracking module; (ii) an implementation of Glocker et al.'s keyframe-based random ferns camera relocaliser; (iii) a novel approach to globally-consistent TSDF-based reconstruction, based on dividing the scene into rigid submaps and optimising the relative poses between them; and (iv) an implementation of Keller et al.'s surfel-based reconstruction approach.

Journal ArticleDOI
TL;DR: A 3D cascade regression approach in which facial landmarks remain invariant across pose over a range of approximately 60 degrees is developed, which strongly support the validity of real-time, 3D registration and reconstruction from 2D video.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: An algorithm for computing a 3D model from several satellite images of the same site so that with a large number of input images the resulting 3D models can be as accurate as those obtained from a single same-date stereo pair.
Abstract: We propose an algorithm for computing a 3D model from several satellite images of the same site. The method works even if the images were taken at different dates with important lighting and vegetation differences. We show that with a large number of input images the resulting 3D models can be as accurate as those obtained from a single same-date stereo pair. To deal with seasonal vegetation changes, we propose a strategy that accounts for the multi-modal nature of 3D models computed from multi-date images. Our method uses a local affine camera approximation and thus focuses on the 3D reconstruction of small areas. This is a common setup in urgent cartography for emergency management, for which abundant multi-date imagery can be immediately available to build a reference 3D model. A preliminary implementation of this method was used to win the IARPA Multi-View Stereo 3D Mapping Challenge 2016. Experiments on the challenge dataset are used to substantiate our claims.

Journal ArticleDOI
TL;DR: This work presents the mathematical implementation of a tomographic algorithm, termed GENeralized Fourier Iterative REconstruction (GENFIRE), for high-resolution 3D reconstruction from a limited number of 2D projections and demonstrates that it can produce superior results relative to several other popular tomographic reconstruction techniques.
Abstract: Tomography has made a radical impact on diverse fields ranging from the study of 3D atomic arrangements in matter to the study of human health in medicine. Despite its very diverse applications, the core of tomography remains the same, that is, a mathematical method must be implemented to reconstruct the 3D structure of an object from a number of 2D projections. Here, we present the mathematical implementation of a tomographic algorithm, termed GENeralized Fourier Iterative REconstruction (GENFIRE), for high-resolution 3D reconstruction from a limited number of 2D projections. GENFIRE first assembles a 3D Fourier grid with oversampling and then iterates between real and reciprocal space to search for a global solution that is concurrently consistent with the measured data and general physical constraints. The algorithm requires minimal human intervention and also incorporates angular refinement to reduce the tilt angle error. We demonstrate that GENFIRE can produce superior results relative to several other popular tomographic reconstruction techniques through numerical simulations and by experimentally reconstructing the 3D structure of a porous material and a frozen-hydrated marine cyanobacterium. Equipped with a graphical user interface, GENFIRE is freely available from our website and is expected to find broad applications across different disciplines.

Journal ArticleDOI
TL;DR: In this article, a modification of the existing Hough transform for the automatic detection of cylinder parameters in point clouds is presented, where the relationship between cylinders is reconstructed to form a continuous axis network by tracking cylinder parameters obtained from earlier steps.

Posted Content
TL;DR: In this paper, the authors use foreground masks as weak supervision through a ray trace pooling layer that enables perspective projection and backpropagation, and constrain the 3D reconstruction to the manifold of unlabeled realistic 3D shapes that match mask observations.
Abstract: Supervised 3D reconstruction has witnessed a significant progress through the use of deep neural networks. However, this increase in performance requires large scale annotations of 2D/3D data. In this paper, we explore inexpensive 2D supervision as an alternative for expensive 3D CAD annotation. Specifically, we use foreground masks as weak supervision through a raytrace pooling layer that enables perspective projection and backpropagation. Additionally, since the 3D reconstruction from masks is an ill posed problem, we propose to constrain the 3D reconstruction to the manifold of unlabeled realistic 3D shapes that match mask observations. We demonstrate that learning a log-barrier solution to this constrained optimization problem resembles the GAN objective, enabling the use of existing tools for training GANs. We evaluate and analyze the manifold constrained reconstruction on various datasets for single and multi-view reconstruction of both synthetic and real images.

Journal ArticleDOI
TL;DR: It is argued that image segmentation and dense 3D reconstruction contribute valuable information to each other’s task and a mathematical framework to formulate and solve a joint segmentations and dense reconstruction problem is proposed.
Abstract: Both image segmentation and dense 3D modeling from images represent an intrinsically ill-posed problem. Strong regularizers are therefore required to constrain the solutions from being ‘too noisy’. These priors generally yield overly smooth reconstructions and/or segmentations in certain regions while they fail to constrain the solution sufficiently in other areas. In this paper, we argue that image segmentation and dense 3D reconstruction contribute valuable information to each other’s task. As a consequence, we propose a mathematical framework to formulate and solve a joint segmentation and dense reconstruction problem. On the one hand knowing about the semantic class of the geometry provides information about the likelihood of the surface direction. On the other hand the surface direction provides information about the likelihood of the semantic class. Experimental results on several data sets highlight the advantages of our joint formulation. We show how weakly observed surfaces are reconstructed more faithfully compared to a geometry only reconstruction. Thanks to the volumetric nature of our formulation we also infer surfaces which cannot be directly observed for example the surface between the ground and a building. Finally, our method returns a semantic segmentation which is consistent across the whole dataset.

Journal ArticleDOI
TL;DR: The results showed that, different specular targets with various shapes can be precisely reconstructed by the proposed method.

Proceedings ArticleDOI
01 Oct 2017
TL;DR: In this paper, a new approach for dense 3D reconstruction of a complex dynamic scene from two perspective frames is proposed, which reduces the dynamic reconstruction problem to a 3D jigsaw puzzle problem which takes pieces from an unorganized "soup of superpixels".
Abstract: This paper proposes a new approach for monocular dense 3D reconstruction of a complex dynamic scene from two perspective frames. By applying superpixel over-segmentation to the image, we model a generically dynamic (hence non-rigid) scene with a piecewise planar and rigid approximation. In this way, we reduce the dynamic reconstruction problem to a “3D jigsaw puzzle ” problem which takes pieces from an unorganized “soup of superpixels". We show that our method provides an effective solution to the inherent relative scale ambiguity in structure-from-motion. Since our method does not assume a template prior, or per-object segmentation, or knowledge about the rigidity of the dynamic scene, it is applicable to a wide range of scenarios. Extensive experiments on both synthetic and real monocular sequences demonstrate the superiority of our method compared with the state-of-the-art methods.

Journal ArticleDOI
TL;DR: This work presents a principled algorithm for dense depth estimation that combines defocus and correspondence metrics, and shows that combining all three sources of information: defocus, correspondence, and shading, outperforms state-of-the-art light-field depth estimation algorithms in multiple scenarios.
Abstract: Light-field cameras are quickly becoming commodity items, with consumer and industrial applications. They capture many nearby views simultaneously using a single image with a micro-lens array, thereby providing a wealth of cues for depth recovery: defocus, correspondence, and shading. In particular, apart from conventional image shading, one can refocus images after acquisition, and shift one's viewpoint within the sub-apertures of the main lens, effectively obtaining multiple views. We present a principled algorithm for dense depth estimation that combines defocus and correspondence metrics. We then extend our analysis to the additional cue of shading, using it to refine fine details in the shape. By exploiting an all-in-focus image, in which pixels are expected to exhibit angular coherence, we define an optimization framework that integrates photo consistency, depth consistency, and shading consistency. We show that combining all three sources of information: defocus, correspondence, and shading, outperforms state-of-the-art light-field depth estimation algorithms in multiple scenarios.

Journal ArticleDOI
TL;DR: The main elements of an integrated platform, which target tele-immersion and future 3D applications, are described in this paper, addressing the tasks of real-time capturing, robust 3D human shape/appearance reconstruction, and skeleton-based motion tracking.
Abstract: The latest developments in 3D capturing, processing, and rendering provide means to unlock novel 3D application pathways. The main elements of an integrated platform, which target tele-immersion and future 3D applications, are described in this paper, addressing the tasks of real-time capturing, robust 3D human shape/appearance reconstruction, and skeleton-based motion tracking. More specifically, initially, the details of a multiple RGB-depth (RGB-D) capturing system are given, along with a novel sensors’ calibration method. A robust, fast reconstruction method from multiple RGB-D streams is then proposed, based on an enhanced variation of the volumetric Fourier transform-based method, parallelized on the Graphics Processing Unit, and accompanied with an appropriate texture-mapping algorithm. On top of that, given the lack of relevant objective evaluation methods, a novel framework is proposed for the quantitative evaluation of real-time 3D reconstruction systems. Finally, a generic, multiple depth stream-based method for accurate real-time human skeleton tracking is proposed. Detailed experimental results with multi-Kinect2 data sets verify the validity of our arguments and the effectiveness of the proposed system and methodologies.

Journal ArticleDOI
TL;DR: This paper presents an approach for reconstructing large-scale outdoor scenes through monocular motion stereo at interactive frame rates on a modern mobile device, and is the first method to enable live reconstruction of large outdoor scenes on a mobile device.