Showing papers on "3D reconstruction published in 2016"

PDF

Open Access

Book Chapter•DOI•

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

[...]

Christopher Choy¹, Danfei Xu¹, JunYoung Gwak¹, Kevin Chen¹, Silvio Savarese¹ - Show less +1 more•Institutions (1)

08 Oct 2016

TL;DR: 3D-R2N2 as discussed by the authors proposes a 3D Recurrent Reconstruction Neural Network that learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data.

...read moreread less

Abstract: Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data [13]. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework (i) outperforms the state-of-the-art methods for single view reconstruction, and (ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

...read moreread less

1,336 citations

Book•

Learning OpenCV 3: Computer Vision in C++ with the OpenCV Library

[...]

Adrian Kaehler, Gary Bradski

14 Dec 2016

TL;DR: Whether you want to build simple or sophisticated vision applications, Learning OpenCV is the book any developer or hobbyist needs to get started, with the help of hands-on exercises in each chapter.

...read moreread less

Abstract: Learning OpenCV puts you in the middle of the rapidly expanding field of computer vision. Written by the creators of the free open source OpenCV library, this book introduces you to computer vision and demonstrates how you can quickly build applications that enable computers to "see" and make decisions based on that data.The second edition is updated to cover new features and changes in OpenCV 2.0, especially the C++ interface.Computer vision is everywherein security systems, manufacturing inspection systems, medical image analysis, Unmanned Aerial Vehicles, and more. OpenCV provides an easy-to-use computer vision framework and a comprehensive library with more than 500 functions that can run vision code in real time. Whether you want to build simple or sophisticated vision applications, Learning OpenCV is the book any developer or hobbyist needs to get started, with the help of hands-on exercises in each chapter.This book includes:A thorough introduction to OpenCV Getting input from cameras Transforming images Segmenting images and shape matching Pattern recognition, including face detection Tracking and motion in 2 and 3 dimensions 3D reconstruction from stereo vision Machine learning algorithms

...read moreread less

1,222 citations

Posted Content•

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

[...]

Haoqiang Fan¹, Hao Su², Leonidas J. Guibas²•Institutions (2)

Tsinghua University¹, Stanford University²

02 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -point cloud coordinates. But the groundtruth shape for an input image may be ambiguous, and they design architecture, loss function and learning paradigm that are novel and effective.

...read moreread less

Abstract: Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -- point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions.

...read moreread less

1,194 citations

Book Chapter•DOI•

Real-Time 3D Reconstruction and 6-DoF Tracking with an Event Camera

[...]

Hanme Kim¹, Stefan Leutenegger¹, Andrew J. Davison¹•Institutions (1)

Imperial College London¹

08 Oct 2016

TL;DR: To the best of the knowledge, this is the first algorithm provably able to track a general 6D motion along with reconstruction of arbitrary structure including its intensity and the reconstruction of grayscale video that exclusively relies on event camera data.

...read moreread less

Abstract: We propose a method which can perform real-time 3D reconstruction from a single hand-held event camera with no additional sensing, and works in unstructured scenes of which it has no prior knowledge. It is based on three decoupled probabilistic filters, each estimating 6-DoF camera motion, scene logarithmic (log) intensity gradient and scene inverse depth relative to a keyframe, and we build a real-time graph of these to track and model over an extended local workspace. We also upgrade the gradient estimate for each keyframe into an intensity image, allowing us to recover a real-time video-like intensity sequence with spatial and temporal super-resolution from the low bit-rate input event stream. To the best of our knowledge, this is the first algorithm provably able to track a general 6D motion along with reconstruction of arbitrary structure including its intensity and the reconstruction of grayscale video that exclusively relies on event camera data.

...read moreread less

377 citations

Posted Content•

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

[...]

Christopher Choy¹, Danfei Xu¹, JunYoung Gwak¹, Kevin Chen¹, Silvio Savarese¹ - Show less +1 more•Institutions (1)

Stanford University¹

02 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

...read moreread less

Abstract: Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework i) outperforms the state-of-the-art methods for single view reconstruction, and ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

...read moreread less

370 citations

Book Chapter•DOI•

Comparison of Kinect V1 and V2 Depth Images in Terms of Accuracy and Precision

[...]

Oliver Wasenmüller¹, Didier Stricker¹•Institutions (1)

German Research Centre for Artificial Intelligence¹

20 Nov 2016

TL;DR: A systematic comparison of the Kinect v1 and Kinect v2 is presented, investigating the accuracy and precision of the devices for their usage in the context of 3D reconstruction, SLAM or visual odometry.

...read moreread less

Abstract: RGB-D cameras like the Microsoft Kinect had a huge impact on recent research in Computer Vision as well as Robotics. With the release of the Kinect v2 a new promising device is available, which will – most probably – be used in many future research. In this paper, we present a systematic comparison of the Kinect v1 and Kinect v2. We investigate the accuracy and precision of the devices for their usage in the context of 3D reconstruction, SLAM or visual odometry. For each device we rigorously figure out and quantify influencing factors on the depth images like temperature, the distance of the camera or the scene color. Furthermore, we demonstrate errors like flying pixels and multipath interference. Our insights build the basis for incorporating or modeling the errors of the devices in follow-up algorithms for diverse applications.

...read moreread less

198 citations

Proceedings Article•DOI•

An information gain formulation for active volumetric 3D reconstruction

[...]

Stefan Isler¹, Reza Sabzevari¹, Jeffrey A. Delmerico¹, Davide Scaramuzza¹•Institutions (1)

University of Zurich¹

16 May 2016

TL;DR: This work proposes and evaluates several formulations to quantify information gain for volumetric reconstruction of an object by a mobile robot equipped with a camera, including visibility likelihood and the likelihood of seeing new parts of the object.

...read moreread less

Abstract: We consider the problem of next-best view selection for volumetric reconstruction of an object by a mobile robot equipped with a camera. Based on a probabilistic volumetric map that is built in real time, the robot can quantify the expected information gain from a set of discrete candidate views. We propose and evaluate several formulations to quantify this information gain for the volumetric reconstruction task, including visibility likelihood and the likelihood of seeing new parts of the object. These metrics are combined with the cost of robot movement in utility functions. The next best view is selected by optimizing these functions, aiming to maximize the likelihood of discovering new parts of the object. We evaluate the functions with simulated and real world experiments within a modular software system that is adaptable to other robotic platforms and reconstruction problems. We release our implementation open source.

...read moreread less

139 citations

Proceedings Article•DOI•

Large-Scale Semantic 3D Reconstruction: An Adaptive Multi-resolution Model for Multi-class Volumetric Labeling

[...]

Maros Blaha¹, Christoph Vogel¹, Audrey Richard¹, Jan Dirk Wegner¹, Thomas Pock², Konrad Schindler¹ - Show less +2 more•Institutions (2)

ETH Zurich¹, Graz University of Technology²

27 Jun 2016

TL;DR: An adaptive multi-resolution formulation of semantic 3D reconstruction which refines the reconstruction only in regions that are likely to contain a surface, exploiting the fact that both high spatial resolution and high numerical precision are only required in those regions.

...read moreread less

Abstract: We propose an adaptive multi-resolution formulation of semantic 3D reconstruction. Given a set of images of a scene, semantic 3D reconstruction aims to densely reconstruct both the 3D shape of the scene and a segmentation into semantic object classes. Jointly reasoning about shape and class allows one to take into account class-specific shape priors (e.g., building walls should be smooth and vertical, and vice versa smooth, vertical surfaces are likely to be building walls), leading to improved reconstruction results. So far, semantic 3D reconstruction methods have been limited to small scenes and low resolution, because of their large memory footprint and computational cost. To scale them up to large scenes, we propose a hierarchical scheme which refines the reconstruction only in regions that are likely to contain a surface, exploiting the fact that both high spatial resolution and high numerical precision are only required in those regions. Our scheme amounts to solving a sequence of convex optimizations while progressively removing constraints, in such a way that the energy, in each iteration, is the tightest possible approximation of the underlying energy at full resolution. In our experiments the method saves up to 98% memory and 95% computation time, without any loss of accuracy.

...read moreread less

107 citations

Journal Article•DOI•

Principles of cryo-EM single-particle image processing

[...]

Fred J. Sigworth¹•Institutions (1)

Yale University¹

01 Feb 2016-Journal of Electron Microscopy

TL;DR: The fundamental principles of this process and the steps in the overall workflow for single-particle image processing are considered, as well as the limits that image signal-to-noise ratio places on resolution and the distinguishing of heterogeneous particle populations.

...read moreread less

Abstract: Single-particle reconstruction is the process by which 3D density maps are obtained from a set of low-dose cryo-EM images of individual macromolecules. This review considers the fundamental principles of this process and the steps in the overall workflow for single-particle image processing. Also considered are the limits that image signal-to-noise ratio places on resolution and the distinguishing of heterogeneous particle populations.

...read moreread less

106 citations

Posted Content•

SceneNet RGB-D: 5M Photorealistic Images of Synthetic Indoor Trajectories with Ground Truth

[...]

John McCormac, Ankur Handa, Stefan Leutenegger, Andrew J. Davison

15 Dec 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work introduces SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories and provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentations, and object detection.

...read moreread less

Abstract: We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical flow, depth estimation, camera pose estimation, and 3D reconstruction. Random sampling permits virtually unlimited scene configurations, and here we provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system. We host the dataset at this http URL

...read moreread less

104 citations

Proceedings Article•DOI•

Point Cloud Noise and Outlier Removal for Image-Based 3D Reconstruction

[...]

Katja Wolff, Changil Kim, Henning Zimmer, Christopher Schroers, Mario Botsch, Olga Sorkine-Hornung, Alexander Sorkine-Hornung - Show less +3 more

01 Oct 2016

TL;DR: This work presents a simple and effective method for removing noise and outliers from point sets generated by image-based 3D reconstruction techniques, which allows standard surface reconstruction methods to perform less smoothing and thus achieve higher quality surfaces with more features.

...read moreread less

Abstract: Point sets generated by image-based 3D reconstruction techniques are often much noisier than those obtained using active techniques like laser scanning. Therefore, they pose greater challenges to the subsequent surface reconstruction (meshing) stage. We present a simple and effective method for removing noise and outliers from such point sets. Our algorithm uses the input images and corresponding depth maps to remove pixels which are geometrically or photometrically inconsistent with the colored surface implied by the input. This allows standard surface reconstruction methods (such as Poisson surface reconstruction) to perform less smoothing and thus achieve higher quality surfaces with more features. Our algorithm is efficient, easy to implement, and robust to varying amounts of noise. We demonstrate the benefits of our algorithm in combination with a variety of state-of-the-art depth and surface reconstruction methods.

...read moreread less

Journal Article•DOI•

Concrete Crack Assessment Using Digital Image Processing and 3D Scene Reconstruction

[...]

Yu Fei Liu¹, Soojin Cho², Billie F. Spencer³, Jian Sheng Fan¹•Institutions (3)

Tsinghua University¹, Ulsan National Institute of Science and Technology², University of Illinois at Urbana–Champaign³

01 Jan 2016-Journal of Computing in Civil Engineering

TL;DR: In this article, a combination of 2D image processing and 3D scene reconstruction is proposed to locate the 3D position of crack edges in concrete structures, where the precise crack information is obtained from the 2D images after noise elimination and crack detection using image processing techniques.

...read moreread less

Abstract: Traditional crack assessment methods for concrete structures are time consuming and produce subjective results. The development of a means for automated assessment employing digital image processing offers high potential for practical implementation. However, two problems in two-dimensional (2D) image processing hinder direct application for crack assessment, as follows: (1) the image used for the digital image processing has to be taken perpendicular to the surface of the concrete structure, and (2) the working distance used in retrieving the imaging model has to be measured each time. To address these problems, this paper proposes a combination of 2D image processing and three-dimensional (3D) scene reconstruction to locate the 3D position of crack edges. In the proposed algorithm, first the precise crack information is obtained from the 2D images after noise elimination and crack detection using image processing techniques. Then, 3D reconstruction is conducted employing several crack images to ...

...read moreread less

Proceedings Article•DOI•

Automated 3D Face Reconstruction from Multiple Images Using Quality Measures

[...]

Marcel Piotraschke¹, Volker Blanz¹•Institutions (1)

University of Siegen¹

01 Jun 2016

TL;DR: A method that reconstructs individual 3D shapes from multiple single images of one person, judges their quality and then combines the best of all results, which is done separately for different regions of the face.

...read moreread less

Abstract: Automated 3D reconstruction of faces from images is challenging if the image material is difficult in terms of pose, lighting, occlusions and facial expressions, and if the initial 2D feature positions are inaccurate or unreliable. We propose a method that reconstructs individual 3D shapes from multiple single images of one person, judges their quality and then combines the best of all results. This is done separately for different regions of the face. The core element of this algorithm and the focus of our paper is a quality measure that judges a reconstruction without information about the true shape. We evaluate different quality measures, develop a method for combining results, and present a complete processing pipeline for automated reconstruction.

...read moreread less

Proceedings Article•DOI•

Efficient 3D Room Shape Recovery from a Single Panorama

[...]

Hao Yang¹, Hui Zhang¹•Institutions (1)

Tsinghua University¹

27 Jun 2016

TL;DR: An algorithm that can automatically infer a 3D shape from a collection of partially oriented superpixel facets and line segments and is efficient, that is, the inference time for each panorama is less than 1 minute.

...read moreread less

Abstract: We propose a method to recover the shape of a 3D room from a full-view indoor panorama. Our algorithm can automatically infer a 3D shape from a collection of partially oriented superpixel facets and line segments. The core part of the algorithm is a constraint graph, which includes lines and superpixels as vertices, and encodes their geometric relations as edges. A novel approach is proposed to perform 3D reconstruction based on the constraint graph by solving all the geometric constraints as constrained linear least-squares. The selected constraints used for reconstruction are identified using an occlusion detection method with a Markov random field. Experiments show that our method can recover room shapes that can not be addressed by previous approaches. Our method is also efficient, that is, the inference time for each panorama is less than 1 minute.

...read moreread less

Journal Article•DOI•

Efficient 3D Object Segmentation from Densely Sampled Light Fields with Applications to 3D Reconstruction

[...]

Kaan Yücer¹, Alexander Sorkine-Hornung¹, Oliver Wang¹, Olga Sorkine-Hornung²•Institutions (2)

Disney Research¹, ETH Zurich²

15 Mar 2016-ACM Transactions on Graphics

TL;DR: An efficient algorithm to automatically segment a static foreground object from highly cluttered background in light fields to exploit high spatio-angular sampling on the order of thousands of input frames, such that new structures are revealed due to the increased coherence in the data.

...read moreread less

Abstract: Precise object segmentation in image data is a fundamental problem with various applications, including 3D object reconstruction. We present an efficient algorithm to automatically segment a static foreground object from highly cluttered background in light fields. A key insight and contribution of our article is that a significant increase of the available input data can enable the design of novel, highly efficient approaches. In particular, the central idea of our method is to exploit high spatio-angular sampling on the order of thousands of input frames, for example, captured as a hand-held video, such that new structures are revealed due to the increased coherence in the data. We first show how purely local gradient information contained in slices of such a dense light field can be combined with information about the camera trajectory to make efficient estimates of the foreground and background. These estimates are then propagated to textureless regions using edge-aware filtering in the epipolar volume. Finally, we enforce global consistency in a gathering step to derive a precise object segmentation in both 2D and 3D space, which captures fine geometric details even in very cluttered scenes. The design of each of these steps is motivated by efficiency and scalability, allowing us to handle large, real-world video datasets on a standard desktop computer. We demonstrate how the results of our method can be used for considerably improving the speed and quality of image-based 3D reconstruction algorithms, and we compare our results to state-of-the-art segmentation and multiview stereo methods.

...read moreread less

Journal Article•DOI•

Multi-Viewpoint Panorama Construction With Wide-Baseline Images

[...]

Guofeng Zhang¹, He Yi¹, Weifeng Chen¹, Jiaya Jia², Hujun Bao¹ - Show less +1 more•Institutions (2)

Zhejiang University¹, The Chinese University of Hong Kong²

26 Feb 2016-IEEE Transactions on Image Processing

TL;DR: A novel image stitching approach that can produce visually plausible panoramic images with input taken from different viewpoints by solving a global objective function consisting of alignment and a set of prior constraints is presented.

...read moreread less

Abstract: We present a novel image stitching approach, which can produce visually plausible panoramic images with input taken from different viewpoints. Unlike previous methods, our approach allows wide baselines between images and non-planar scene structures. Instead of 3D reconstruction, we design a mesh-based framework to optimize alignment and regularity in 2D. By solving a global objective function consisting of alignment and a set of prior constraints, we construct panoramic images, which are locally as perspective as possible and yet nearly orthogonal in the global view. We improve composition and achieve good performance on misaligned areas. Experimental results on challenging data demonstrate the effectiveness of the proposed method.

...read moreread less

Journal Article•DOI•

Recent developments in large-scale tie-point matching

[...]

Wilfried Hartmann¹, Michal Havlena¹, Konrad Schindler¹•Institutions (1)

ETH Zurich¹

01 May 2016-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: A systematic survey of the state-of-the-art for tie-point generation in unordered image collections, including recent developments for very large image sets is attempted.

...read moreread less

Abstract: Feature matching – i.e. finding corresponding point features in different images to serve as tie-points for camera orientation – is a fundamental step in photogrammetric 3D reconstruction. If the input image set is large and unordered, which is becoming increasingly common with the spread of photogrammetric recording to untrained user groups and even crowd-sourced geodata collection, the bottleneck of the reconstruction pipeline is the matching step, for two reasons. (i) Image acquisition without detailed viewpoint planning requires a denser set of viewpoints with larger overlaps, to ensure appropriate coverage of the object of interest and to guarantee sufficient redundancy for reliable reconstruction in spite of the unoptimised network geometry. As a consequence, there is a large number of images with overlapping viewfields, resulting in a more expensive matching step than, say, a regular block geometry. (ii) In the absence of a carefully pre-planned recording sequence it is not even known which images overlap. One thus faces the even bigger challenge to determine which pairs of images even can have tie-points and should therefore be fed into the matching procedure. In this paper we attempt a systematic survey of the state-of-the-art for tie-point generation in unordered image collections, including recent developments for very large image sets.

...read moreread less

Posted Content•

Semi-Dense 3D Semantic Mapping from Monocular SLAM.

[...]

Xuanpeng Li, Rachid Belaroussi

13 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work combines the state-of-art deep learning method and semi-dense Simultaneous Localisation and Mapping (SLAM) based on video stream from a monocular camera to improve the 2D semantic labelling over baseline single frame predictions.

...read moreread less

Abstract: The bundle of geometry and appearance in computer vision has proven to be a promising solution for robots across a wide variety of applications. Stereo cameras and RGB-D sensors are widely used to realise fast 3D reconstruction and trajectory tracking in a dense way. However, they lack flexibility of seamless switch between different scaled environments, i.e., indoor and outdoor scenes. In addition, semantic information are still hard to acquire in a 3D mapping. We address this challenge by combining the state-of-art deep learning method and semi-dense Simultaneous Localisation and Mapping (SLAM) based on video stream from a monocular camera. In our approach, 2D semantic information are transferred to 3D mapping via correspondence between connective Keyframes with spatial consistency. There is no need to obtain a semantic segmentation for each frame in a sequence, so that it could achieve a reasonable computation time. We evaluate our method on indoor/outdoor datasets and lead to an improvement in the 2D semantic labelling over baseline single frame predictions.

...read moreread less

Journal Article•DOI•

ICON: 3D reconstruction with 'missing-information' restoration in biological electron tomography.

[...]

Yuchen Deng¹, Yu Chen¹, Yan Zhang¹, Shengliu Wang¹, Fa Zhang¹, Fei Sun¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

01 Jul 2016-Journal of Structural Biology

TL;DR: An algorithm called Iterative Compressed-sensing Optimized Non-uniform fast Fourier transform reconstruction (ICON) based on the theory of compressed-s sensing and the assumption of sparsity of biological specimens can significantly restore the missing information in comparison with other reconstruction algorithms.

...read moreread less

Posted Content•

3DMatch: Learning the Matching of Local 3D Geometry in Range Scans.

[...]

Andy Zeng, Shuran Song, Matthias Nießner, Matthew Fisher, Jianxiong Xiao - Show less +1 more

27 Mar 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: 3DMatch is introduced, a data-driven local feature learner that jointly learns a geometric feature representation and an associated metric function from a large collection of real-world scanning data and concurrently supports deep learning with convolutional neural networks directly in 3D.

...read moreread less

Abstract: Establishing correspondences between 3D geometries is essential to a large variety of graphics and vision applications, including 3D reconstruction, localization, and shape matching. Despite significant progress, geometric matching on real-world 3D data is still a challenging task due to the noisy, low-resolution, and incomplete nature of scanning data. These difficulties limit the performance of current state-of-art methods which are typically based on histograms over geometric properties. In this paper, we introduce 3DMatch, a data-driven local feature learner that jointly learns a geometric feature representation and an associated metric function from a large collection of real-world scanning data. We represent 3D geometry using accumulated distance fields around key-point locations. This representation is suited to handle noisy and partial scanning data, and concurrently supports deep learning with convolutional neural networks directly in 3D. To train the networks, we propose a way to automatically generate correspondence labels for deep learning by leveraging existing RGB-D reconstruction algorithms. In our results, we demonstrate that we are able to outperform state-of-the-art approaches by a significant margin. In addition, we show the robustness of our descriptor in a purely geometric sparse bundle adjustment pipeline for 3D reconstruction.

...read moreread less

Book Chapter•DOI•

ORBSLAM-Based Endoscope Tracking and 3D Reconstruction

[...]

Nader Mahmoud¹, Iñigo Cirauqui², Alexandre Hostettler, Christophe Doignon¹, Luc Soler, Jacques Marescaux, J. M. M. Montiel² - Show less +3 more•Institutions (2)

University of Strasbourg¹, University of Zaragoza²

17 Oct 2016

TL;DR: In this article, the authors track the endoscope location inside the surgical scene and provide 3D reconstruction, in real-time, from the sole input of the image sequence captured by the monocular endoscope.

...read moreread less

Abstract: We aim to track the endoscope location inside the surgical scene and provide 3D reconstruction, in real-time, from the sole input of the image sequence captured by the monocular endoscope. This information offers new possibilities for developing surgical navigation and augmented reality applications. The main benefit of this approach is the lack of extra tracking elements which can disturb the surgeon performance in the clinical routine. It is our first contribution to exploit ORBSLAM, one of the best performing monocular SLAM algorithms, to estimate both of the endoscope location, and 3D structure of the surgical scene. However, the reconstructed 3D map poorly describe textureless soft organ surfaces such as liver. It is our second contribution to extend ORBSLAM to be able to reconstruct a semi-dense map of soft organs. Experimental results on in-vivo pigs, shows a robust endoscope tracking even with organs deformations and partial instrument occlusions. It also shows the reconstruction density, and accuracy against ground truth surface obtained from CT.

...read moreread less

Journal Article•DOI•

3D Reconstruction of Human Motion from Monocular Image Sequences

[...]

Bastian Wandt¹, Hanno Ackermann¹, Bodo Rosenhahn¹•Institutions (1)

Leibniz University of Hanover¹

01 Aug 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This article shows that strong periodic assumptions on the coefficients can be used to define an efficient and accurate algorithm for estimating periodic motion such as walking patterns and proposes a novel regularization term based on temporal bone length constancy for non-periodic motion.

...read moreread less

Abstract: This article tackles the problem of estimating non-rigid human 3D shape and motion from image sequences taken by uncalibrated cameras. Similar to other state-of-the-art solutions we factorize 2D observations in camera parameters, base poses and mixing coefficients. Existing methods require sufficient camera motion during the sequence to achieve a correct 3D reconstruction. To obtain convincing 3D reconstructions from arbitrary camera motion, our method is based on a-priorly trained base poses. We show that strong periodic assumptions on the coefficients can be used to define an efficient and accurate algorithm for estimating periodic motion such as walking patterns. For the extension to non-periodic motion we propose a novel regularization term based on temporal bone length constancy. In contrast to other works, the proposed method does not use a predefined skeleton or anthropometric constraints and can handle arbitrary camera motion. We achieve convincing 3D reconstructions, even under the influence of noise and occlusions. Multiple experiments based on a 3D error metric demonstrate the stability of the proposed method. Compared to other state-of-the-art methods our algorithm shows a significant improvement.

...read moreread less

Journal Article•DOI•

An Advanced Pre-Processing Pipeline to Improve Automated Photogrammetric Reconstructions of Architectural Scenes

[...]

Marco Gaiani, Fabio Remondino, Fabrizio Ivan Apollonio, Andrea Ballabeni

25 Feb 2016-Remote Sensing

TL;DR: An efficient pipeline based on color enhancement, image denoising, color-to-gray conversion and image content enrichment is presented, which proves how an effective image pre-processing can improve the automated orientation procedure and dense 3D point cloud reconstruction, even in the case of poor texture scenarios.

...read moreread less

Abstract: Automated image-based 3D reconstruction methods are more and more flooding our 3D modeling applications. Fully automated solutions give the impression that from a sample of randomly acquired images we can derive quite impressive visual 3D models. Although the level of automation is reaching very high standards, image quality is a fundamental pre-requisite to produce successful and photo-realistic 3D products, in particular when dealing with large datasets of images. This article presents an efficient pipeline based on color enhancement, image denoising, color-to-gray conversion and image content enrichment. The pipeline stems from an analysis of various state-of-the-art algorithms and aims to adjust the most promising methods, giving solutions to typical failure causes. The assessment evaluation proves how an effective image pre-processing, which considers the entire image dataset, can improve the automated orientation procedure and dense 3D point cloud reconstruction, even in the case of poor texture scenarios.

...read moreread less

Journal Article•DOI•

Robust dense reconstruction by range merging based on confidence estimation

[...]

Yadang Chen¹, Yadang Chen², Chuanyan Hao³, Chuanyan Hao¹, Wen Wu¹, Enhua Wu⁴, Enhua Wu¹ - Show less +3 more•Institutions (4)

University of Macau¹, Nanjing University of Information Science and Technology², Nanjing University of Posts and Telecommunications³, Chinese Academy of Sciences⁴

18 Aug 2016-Science in China Series F: Information Sciences

TL;DR: Effective range-computation and confidence-estimation methods are proposed to handle the problems of textureless regions, outliers and detail loss and these difficult problems are handled effectively by a robust model that outputs an accurate and dense reconstruction as the final result from an input of multiple images captured by a normal camera.

...read moreread less

Abstract: Although the stereo matching problem has been extensively studied during the past decades, automatically computing a dense 3D reconstruction from several multiple views is still a difficult task owing to the problems of textureless regions, outliers, detail loss, and various other factors. In this paper, these difficult problems are handled effectively by a robust model that outputs an accurate and dense reconstruction as the final result from an input of multiple images captured by a normal camera. First, the positions of the camera and sparse 3D points are estimated by a structure-from-motion algorithm and we compute the range map with a confidence estimation for each image in our approach. Then all the range maps are integrated into a fine point cloud data set. In the final step we use a Poisson reconstruction algorithm to finish the reconstruction. The major contributions of the work lie in the following points: effective range-computation and confidence-estimation methods are proposed to handle the problems of textureless regions, outliers and detail loss. Then, the range maps are merged into the point cloud data in terms of a confidence-estimation. Finally, Poisson reconstruction algorithm completes the dense mesh. In addition, texture mapping is also implemented as a post-processing work for obtaining good visual effects. Experimental results are presented to demonstrate the effectiveness of the proposed approach.

...read moreread less

Journal Article•DOI•

Influence of camera calibration conditions on the accuracy of 3D reconstruction

[...]

Anne-Sophie Poulin-Girard, Simon Thibault, Denis Laurendeau¹•Institutions (1)

Laval University¹

08 Feb 2016-Optics Express

TL;DR: The data show that the mean reprojection error should not always be used to evaluate the performance of the calibration process and that a low quality of feature detection does not always lead to a high mean reconstruction error.

...read moreread less

Abstract: For stereoscopic systems designed for metrology applications, the accuracy of camera calibration dictates the precision of the 3D reconstruction. In this paper, the impact of various calibration conditions on the reconstruction quality is studied using a virtual camera calibration technique and the design file of a commercially available lens. This technique enables the study of the statistical behavior of the reconstruction task in selected calibration conditions. The data show that the mean reprojection error should not always be used to evaluate the performance of the calibration process and that a low quality of feature detection does not always lead to a high mean reconstruction error.

...read moreread less

Journal Article•DOI•

The three R's of computer vision

[...]

Jitendra Malik¹, Pablo Arbeláez², Joao Carreira¹, Katerina Fragkiadaki¹, Ross Girshick³, Georgia Gkioxari¹, Saurabh Gupta¹, Bharath Hariharan³, Abhishek Kar¹, Shubham Tulsiani¹ - Show less +6 more•Institutions (3)

University of California, Berkeley¹, University of Los Andes², Facebook³

01 Mar 2016-Pattern Recognition Letters

TL;DR: This work argues for the importance of the interaction between recognition, reconstruction and re-organization, and proposes that as a unifying framework for computer vision, with pipelined versions of two systems, one for RGB-D images, and another for RGB images, which produce rich 3D scene interpretations in this framework.

...read moreread less

Book Chapter•DOI•

Combining Feature-Based and Direct Methods for Semi-dense Real-Time Stereo Visual Odometry

[...]

Nicola Krombach¹, David Droeschel¹, Sven Behnke¹•Institutions (1)

University of Bonn¹

03 Jul 2016

TL;DR: A two-layer approach for visual odometry with stereo cameras, which runs in real-time and combines feature-based matching with semi-dense direct image alignment, which is faster than state-of-the-art methods without losing accuracy.

...read moreread less

Abstract: Visual motion estimation is challenging, due to high data rates, fast camera motions, featureless or repetitive environments, uneven lighting, and many other issues. In this work, we propose a two-layer approach for visual odometry with stereo cameras, which runs in real-time and combines feature-based matching with semi-dense direct image alignment. Our method initializes semi-dense depth estimation, which is computationally expensive, from motion that is tracked by a fast but robust feature point-based method. By that, we are not only able to efficiently estimate the pose of the camera with a high frame rate, but also to reconstruct the 3D structure of the environment at image gradients, which is useful, e.g., for mapping and obstacle avoidance. Experiments on datasets captured by a micro aerial vehicle (MAV) show that our approach is faster than state-of-the-art methods without losing accuracy. Moreover, our combined approach achieves promising results on the KITTI dataset, which is very challenging for direct methods, because of the low frame rate in conjunction with fast motion.

...read moreread less

Book Chapter•DOI•

Benchmarking Close-range Structure from Motion 3D Reconstruction Software Under Varying Capturing Conditions

[...]

Ivan Adriyanov Nikolov¹, Claus B. Madsen¹•Institutions (1)

Aalborg University¹

31 Oct 2016

TL;DR: This paper proposes a number of testing scenarios using different lighting conditions, camera positions and image acquisition methods for the best in-depth analysis and discusses the results, the overall performance and the problems present in each software.

...read moreread less

Abstract: Structure from Motion 3D reconstruction has become widely used in recent years in a number of fields such as industrial surface inspection, archeology, cultural heritage preservation and geomapping. A number of software solutions have been released using variations of this technique. In this paper we analyse the state of the art of these software applications, by comparing the resultant 3D meshes qualitatively and quantitatively. We propose a number of testing scenarios using different lighting conditions, camera positions and image acquisition methods for the best in-depth analysis and discuss the results, the overall performance and the problems present in each software. We employ distance and roughness metrics for evaluating the final reconstruction results.

...read moreread less

Posted Content•

Semantic 3D Reconstruction with Continuous Regularization and Ray Potentials Using a Visibility Consistency Constraint

[...]

Nikolay Savinov¹, Christian Haene¹, Lubor Ladicky¹, Marc Pollefeys¹•Institutions (1)

ETH Zurich¹

11 Apr 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, a convex relaxation is proposed for dense semantic 3D reconstruction, which uses a data term that is defined as potentials over viewing rays, combined with continuous surface area penalization.

...read moreread less

Abstract: We propose an approach for dense semantic 3D reconstruction which uses a data term that is defined as potentials over viewing rays, combined with continuous surface area penalization. Our formulation is a convex relaxation which we augment with a crucial non-convex constraint that ensures exact handling of visibility. To tackle the non-convex minimization problem, we propose a majorize-minimize type strategy which converges to a critical point. We demonstrate the benefits of using the non-convex constraint experimentally. For the geometry-only case, we set a new state of the art on two datasets of the commonly used Middlebury multi-view stereo benchmark. Moreover, our general-purpose formulation directly reconstructs thin objects, which are usually treated with specialized algorithms. A qualitative evaluation on the dense semantic 3D reconstruction task shows that we improve significantly over previous methods.

...read moreread less

Proceedings Article•DOI•

3D Reconstruction of Transparent Objects with Position-Normal Consistency

[...]

Yiming Qian¹, Minglun Gong², Yee-Hong Yang¹•Institutions (2)

University of Alberta¹, Memorial University of Newfoundland²

01 Jun 2016

TL;DR: This work presents the first approach to simultaneously reconstructing the 3D positions and normals of the object's surface at both refraction locations under the assumption that the rays refract only twice when traveling through the object.

...read moreread less

Abstract: Estimating the shape of transparent and refractive objects is one of the few open problems in 3D reconstruction. Under the assumption that the rays refract only twice when traveling through the object, we present the first approach to simultaneously reconstructing the 3D positions and normals of the object's surface at both refraction locations. Our acquisition setup requires only two cameras and one monitor, which serves as the light source. After acquiring the ray-ray correspondences between each camera and the monitor, we solve an optimization function which enforces a new position-normal consistency constraint. That is, the 3D positions of surface points shall agree with the normals required to refract the rays under Snell's law. Experimental results using both synthetic and real data demonstrate the robustness and accuracy of the proposed approach.

...read moreread less

Collapse