scispace - formally typeset
Search or ask a question

Showing papers on "3D reconstruction published in 2016"


Book ChapterDOI
Christopher Choy1, Danfei Xu1, JunYoung Gwak1, Kevin Chen1, Silvio Savarese1 
08 Oct 2016
TL;DR: 3D-R2N2 as discussed by the authors proposes a 3D Recurrent Reconstruction Neural Network that learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data.
Abstract: Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data [13]. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework (i) outperforms the state-of-the-art methods for single view reconstruction, and (ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

1,336 citations


Book
14 Dec 2016
TL;DR: Whether you want to build simple or sophisticated vision applications, Learning OpenCV is the book any developer or hobbyist needs to get started, with the help of hands-on exercises in each chapter.
Abstract: Learning OpenCV puts you in the middle of the rapidly expanding field of computer vision. Written by the creators of the free open source OpenCV library, this book introduces you to computer vision and demonstrates how you can quickly build applications that enable computers to "see" and make decisions based on that data.The second edition is updated to cover new features and changes in OpenCV 2.0, especially the C++ interface.Computer vision is everywherein security systems, manufacturing inspection systems, medical image analysis, Unmanned Aerial Vehicles, and more. OpenCV provides an easy-to-use computer vision framework and a comprehensive library with more than 500 functions that can run vision code in real time. Whether you want to build simple or sophisticated vision applications, Learning OpenCV is the book any developer or hobbyist needs to get started, with the help of hands-on exercises in each chapter.This book includes:A thorough introduction to OpenCV Getting input from cameras Transforming images Segmenting images and shape matching Pattern recognition, including face detection Tracking and motion in 2 and 3 dimensions 3D reconstruction from stereo vision Machine learning algorithms

1,222 citations


Posted Content
TL;DR: In this article, the authors address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -point cloud coordinates. But the groundtruth shape for an input image may be ambiguous, and they design architecture, loss function and learning paradigm that are novel and effective.
Abstract: Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -- point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions.

1,194 citations


Book ChapterDOI
08 Oct 2016
TL;DR: To the best of the knowledge, this is the first algorithm provably able to track a general 6D motion along with reconstruction of arbitrary structure including its intensity and the reconstruction of grayscale video that exclusively relies on event camera data.
Abstract: We propose a method which can perform real-time 3D reconstruction from a single hand-held event camera with no additional sensing, and works in unstructured scenes of which it has no prior knowledge. It is based on three decoupled probabilistic filters, each estimating 6-DoF camera motion, scene logarithmic (log) intensity gradient and scene inverse depth relative to a keyframe, and we build a real-time graph of these to track and model over an extended local workspace. We also upgrade the gradient estimate for each keyframe into an intensity image, allowing us to recover a real-time video-like intensity sequence with spatial and temporal super-resolution from the low bit-rate input event stream. To the best of our knowledge, this is the first algorithm provably able to track a general 6D motion along with reconstruction of arbitrary structure including its intensity and the reconstruction of grayscale video that exclusively relies on event camera data.

377 citations


Posted Content
Christopher Choy1, Danfei Xu1, JunYoung Gwak1, Kevin Chen1, Silvio Savarese1 
TL;DR: The 3D-R2N2 reconstruction framework outperforms the state-of-the-art methods for single view reconstruction, and enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).
Abstract: Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework i) outperforms the state-of-the-art methods for single view reconstruction, and ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

370 citations


Book ChapterDOI
20 Nov 2016
TL;DR: A systematic comparison of the Kinect v1 and Kinect v2 is presented, investigating the accuracy and precision of the devices for their usage in the context of 3D reconstruction, SLAM or visual odometry.
Abstract: RGB-D cameras like the Microsoft Kinect had a huge impact on recent research in Computer Vision as well as Robotics. With the release of the Kinect v2 a new promising device is available, which will – most probably – be used in many future research. In this paper, we present a systematic comparison of the Kinect v1 and Kinect v2. We investigate the accuracy and precision of the devices for their usage in the context of 3D reconstruction, SLAM or visual odometry. For each device we rigorously figure out and quantify influencing factors on the depth images like temperature, the distance of the camera or the scene color. Furthermore, we demonstrate errors like flying pixels and multipath interference. Our insights build the basis for incorporating or modeling the errors of the devices in follow-up algorithms for diverse applications.

198 citations


Proceedings ArticleDOI
16 May 2016
TL;DR: This work proposes and evaluates several formulations to quantify information gain for volumetric reconstruction of an object by a mobile robot equipped with a camera, including visibility likelihood and the likelihood of seeing new parts of the object.
Abstract: We consider the problem of next-best view selection for volumetric reconstruction of an object by a mobile robot equipped with a camera. Based on a probabilistic volumetric map that is built in real time, the robot can quantify the expected information gain from a set of discrete candidate views. We propose and evaluate several formulations to quantify this information gain for the volumetric reconstruction task, including visibility likelihood and the likelihood of seeing new parts of the object. These metrics are combined with the cost of robot movement in utility functions. The next best view is selected by optimizing these functions, aiming to maximize the likelihood of discovering new parts of the object. We evaluate the functions with simulated and real world experiments within a modular software system that is adaptable to other robotic platforms and reconstruction problems. We release our implementation open source.

139 citations


Proceedings ArticleDOI
27 Jun 2016
TL;DR: An adaptive multi-resolution formulation of semantic 3D reconstruction which refines the reconstruction only in regions that are likely to contain a surface, exploiting the fact that both high spatial resolution and high numerical precision are only required in those regions.
Abstract: We propose an adaptive multi-resolution formulation of semantic 3D reconstruction. Given a set of images of a scene, semantic 3D reconstruction aims to densely reconstruct both the 3D shape of the scene and a segmentation into semantic object classes. Jointly reasoning about shape and class allows one to take into account class-specific shape priors (e.g., building walls should be smooth and vertical, and vice versa smooth, vertical surfaces are likely to be building walls), leading to improved reconstruction results. So far, semantic 3D reconstruction methods have been limited to small scenes and low resolution, because of their large memory footprint and computational cost. To scale them up to large scenes, we propose a hierarchical scheme which refines the reconstruction only in regions that are likely to contain a surface, exploiting the fact that both high spatial resolution and high numerical precision are only required in those regions. Our scheme amounts to solving a sequence of convex optimizations while progressively removing constraints, in such a way that the energy, in each iteration, is the tightest possible approximation of the underlying energy at full resolution. In our experiments the method saves up to 98% memory and 95% computation time, without any loss of accuracy.

107 citations


Journal ArticleDOI
Fred J. Sigworth1
TL;DR: The fundamental principles of this process and the steps in the overall workflow for single-particle image processing are considered, as well as the limits that image signal-to-noise ratio places on resolution and the distinguishing of heterogeneous particle populations.
Abstract: Single-particle reconstruction is the process by which 3D density maps are obtained from a set of low-dose cryo-EM images of individual macromolecules. This review considers the fundamental principles of this process and the steps in the overall workflow for single-particle image processing. Also considered are the limits that image signal-to-noise ratio places on resolution and the distinguishing of heterogeneous particle populations.

106 citations


Posted Content
TL;DR: This work introduces SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories and provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentations, and object detection.
Abstract: We introduce SceneNet RGB-D, expanding the previous work of SceneNet to enable large scale photorealistic rendering of indoor scene trajectories. It provides pixel-perfect ground truth for scene understanding problems such as semantic segmentation, instance segmentation, and object detection, and also for geometric computer vision problems such as optical flow, depth estimation, camera pose estimation, and 3D reconstruction. Random sampling permits virtually unlimited scene configurations, and here we provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system. We host the dataset at this http URL

104 citations


Proceedings ArticleDOI
01 Oct 2016
TL;DR: This work presents a simple and effective method for removing noise and outliers from point sets generated by image-based 3D reconstruction techniques, which allows standard surface reconstruction methods to perform less smoothing and thus achieve higher quality surfaces with more features.
Abstract: Point sets generated by image-based 3D reconstruction techniques are often much noisier than those obtained using active techniques like laser scanning. Therefore, they pose greater challenges to the subsequent surface reconstruction (meshing) stage. We present a simple and effective method for removing noise and outliers from such point sets. Our algorithm uses the input images and corresponding depth maps to remove pixels which are geometrically or photometrically inconsistent with the colored surface implied by the input. This allows standard surface reconstruction methods (such as Poisson surface reconstruction) to perform less smoothing and thus achieve higher quality surfaces with more features. Our algorithm is efficient, easy to implement, and robust to varying amounts of noise. We demonstrate the benefits of our algorithm in combination with a variety of state-of-the-art depth and surface reconstruction methods.

Journal ArticleDOI
TL;DR: In this article, a combination of 2D image processing and 3D scene reconstruction is proposed to locate the 3D position of crack edges in concrete structures, where the precise crack information is obtained from the 2D images after noise elimination and crack detection using image processing techniques.
Abstract: Traditional crack assessment methods for concrete structures are time consuming and produce subjective results. The development of a means for automated assessment employing digital image processing offers high potential for practical implementation. However, two problems in two-dimensional (2D) image processing hinder direct application for crack assessment, as follows: (1) the image used for the digital image processing has to be taken perpendicular to the surface of the concrete structure, and (2) the working distance used in retrieving the imaging model has to be measured each time. To address these problems, this paper proposes a combination of 2D image processing and three-dimensional (3D) scene reconstruction to locate the 3D position of crack edges. In the proposed algorithm, first the precise crack information is obtained from the 2D images after noise elimination and crack detection using image processing techniques. Then, 3D reconstruction is conducted employing several crack images to ...

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A method that reconstructs individual 3D shapes from multiple single images of one person, judges their quality and then combines the best of all results, which is done separately for different regions of the face.
Abstract: Automated 3D reconstruction of faces from images is challenging if the image material is difficult in terms of pose, lighting, occlusions and facial expressions, and if the initial 2D feature positions are inaccurate or unreliable. We propose a method that reconstructs individual 3D shapes from multiple single images of one person, judges their quality and then combines the best of all results. This is done separately for different regions of the face. The core element of this algorithm and the focus of our paper is a quality measure that judges a reconstruction without information about the true shape. We evaluate different quality measures, develop a method for combining results, and present a complete processing pipeline for automated reconstruction.

Proceedings ArticleDOI
Hao Yang1, Hui Zhang1
27 Jun 2016
TL;DR: An algorithm that can automatically infer a 3D shape from a collection of partially oriented superpixel facets and line segments and is efficient, that is, the inference time for each panorama is less than 1 minute.
Abstract: We propose a method to recover the shape of a 3D room from a full-view indoor panorama. Our algorithm can automatically infer a 3D shape from a collection of partially oriented superpixel facets and line segments. The core part of the algorithm is a constraint graph, which includes lines and superpixels as vertices, and encodes their geometric relations as edges. A novel approach is proposed to perform 3D reconstruction based on the constraint graph by solving all the geometric constraints as constrained linear least-squares. The selected constraints used for reconstruction are identified using an occlusion detection method with a Markov random field. Experiments show that our method can recover room shapes that can not be addressed by previous approaches. Our method is also efficient, that is, the inference time for each panorama is less than 1 minute.

Journal ArticleDOI
TL;DR: An efficient algorithm to automatically segment a static foreground object from highly cluttered background in light fields to exploit high spatio-angular sampling on the order of thousands of input frames, such that new structures are revealed due to the increased coherence in the data.
Abstract: Precise object segmentation in image data is a fundamental problem with various applications, including 3D object reconstruction. We present an efficient algorithm to automatically segment a static foreground object from highly cluttered background in light fields. A key insight and contribution of our article is that a significant increase of the available input data can enable the design of novel, highly efficient approaches. In particular, the central idea of our method is to exploit high spatio-angular sampling on the order of thousands of input frames, for example, captured as a hand-held video, such that new structures are revealed due to the increased coherence in the data. We first show how purely local gradient information contained in slices of such a dense light field can be combined with information about the camera trajectory to make efficient estimates of the foreground and background. These estimates are then propagated to textureless regions using edge-aware filtering in the epipolar volume. Finally, we enforce global consistency in a gathering step to derive a precise object segmentation in both 2D and 3D space, which captures fine geometric details even in very cluttered scenes. The design of each of these steps is motivated by efficiency and scalability, allowing us to handle large, real-world video datasets on a standard desktop computer. We demonstrate how the results of our method can be used for considerably improving the speed and quality of image-based 3D reconstruction algorithms, and we compare our results to state-of-the-art segmentation and multiview stereo methods.

Journal ArticleDOI
TL;DR: A novel image stitching approach that can produce visually plausible panoramic images with input taken from different viewpoints by solving a global objective function consisting of alignment and a set of prior constraints is presented.
Abstract: We present a novel image stitching approach, which can produce visually plausible panoramic images with input taken from different viewpoints. Unlike previous methods, our approach allows wide baselines between images and non-planar scene structures. Instead of 3D reconstruction, we design a mesh-based framework to optimize alignment and regularity in 2D. By solving a global objective function consisting of alignment and a set of prior constraints, we construct panoramic images, which are locally as perspective as possible and yet nearly orthogonal in the global view. We improve composition and achieve good performance on misaligned areas. Experimental results on challenging data demonstrate the effectiveness of the proposed method.

Journal ArticleDOI
TL;DR: A systematic survey of the state-of-the-art for tie-point generation in unordered image collections, including recent developments for very large image sets is attempted.
Abstract: Feature matching – i.e. finding corresponding point features in different images to serve as tie-points for camera orientation – is a fundamental step in photogrammetric 3D reconstruction. If the input image set is large and unordered, which is becoming increasingly common with the spread of photogrammetric recording to untrained user groups and even crowd-sourced geodata collection, the bottleneck of the reconstruction pipeline is the matching step, for two reasons. (i) Image acquisition without detailed viewpoint planning requires a denser set of viewpoints with larger overlaps, to ensure appropriate coverage of the object of interest and to guarantee sufficient redundancy for reliable reconstruction in spite of the unoptimised network geometry. As a consequence, there is a large number of images with overlapping viewfields, resulting in a more expensive matching step than, say, a regular block geometry. (ii) In the absence of a carefully pre-planned recording sequence it is not even known which images overlap. One thus faces the even bigger challenge to determine which pairs of images even can have tie-points and should therefore be fed into the matching procedure. In this paper we attempt a systematic survey of the state-of-the-art for tie-point generation in unordered image collections, including recent developments for very large image sets.

Posted Content
TL;DR: This work combines the state-of-art deep learning method and semi-dense Simultaneous Localisation and Mapping (SLAM) based on video stream from a monocular camera to improve the 2D semantic labelling over baseline single frame predictions.
Abstract: The bundle of geometry and appearance in computer vision has proven to be a promising solution for robots across a wide variety of applications. Stereo cameras and RGB-D sensors are widely used to realise fast 3D reconstruction and trajectory tracking in a dense way. However, they lack flexibility of seamless switch between different scaled environments, i.e., indoor and outdoor scenes. In addition, semantic information are still hard to acquire in a 3D mapping. We address this challenge by combining the state-of-art deep learning method and semi-dense Simultaneous Localisation and Mapping (SLAM) based on video stream from a monocular camera. In our approach, 2D semantic information are transferred to 3D mapping via correspondence between connective Keyframes with spatial consistency. There is no need to obtain a semantic segmentation for each frame in a sequence, so that it could achieve a reasonable computation time. We evaluate our method on indoor/outdoor datasets and lead to an improvement in the 2D semantic labelling over baseline single frame predictions.

Journal ArticleDOI
Yuchen Deng1, Yu Chen1, Yan Zhang1, Shengliu Wang1, Fa Zhang1, Fei Sun1 
TL;DR: An algorithm called Iterative Compressed-sensing Optimized Non-uniform fast Fourier transform reconstruction (ICON) based on the theory of compressed-s sensing and the assumption of sparsity of biological specimens can significantly restore the missing information in comparison with other reconstruction algorithms.

Posted Content
TL;DR: 3DMatch is introduced, a data-driven local feature learner that jointly learns a geometric feature representation and an associated metric function from a large collection of real-world scanning data and concurrently supports deep learning with convolutional neural networks directly in 3D.
Abstract: Establishing correspondences between 3D geometries is essential to a large variety of graphics and vision applications, including 3D reconstruction, localization, and shape matching. Despite significant progress, geometric matching on real-world 3D data is still a challenging task due to the noisy, low-resolution, and incomplete nature of scanning data. These difficulties limit the performance of current state-of-art methods which are typically based on histograms over geometric properties. In this paper, we introduce 3DMatch, a data-driven local feature learner that jointly learns a geometric feature representation and an associated metric function from a large collection of real-world scanning data. We represent 3D geometry using accumulated distance fields around key-point locations. This representation is suited to handle noisy and partial scanning data, and concurrently supports deep learning with convolutional neural networks directly in 3D. To train the networks, we propose a way to automatically generate correspondence labels for deep learning by leveraging existing RGB-D reconstruction algorithms. In our results, we demonstrate that we are able to outperform state-of-the-art approaches by a significant margin. In addition, we show the robustness of our descriptor in a purely geometric sparse bundle adjustment pipeline for 3D reconstruction.

Book ChapterDOI
17 Oct 2016
TL;DR: In this article, the authors track the endoscope location inside the surgical scene and provide 3D reconstruction, in real-time, from the sole input of the image sequence captured by the monocular endoscope.
Abstract: We aim to track the endoscope location inside the surgical scene and provide 3D reconstruction, in real-time, from the sole input of the image sequence captured by the monocular endoscope. This information offers new possibilities for developing surgical navigation and augmented reality applications. The main benefit of this approach is the lack of extra tracking elements which can disturb the surgeon performance in the clinical routine. It is our first contribution to exploit ORBSLAM, one of the best performing monocular SLAM algorithms, to estimate both of the endoscope location, and 3D structure of the surgical scene. However, the reconstructed 3D map poorly describe textureless soft organ surfaces such as liver. It is our second contribution to extend ORBSLAM to be able to reconstruct a semi-dense map of soft organs. Experimental results on in-vivo pigs, shows a robust endoscope tracking even with organs deformations and partial instrument occlusions. It also shows the reconstruction density, and accuracy against ground truth surface obtained from CT.

Journal ArticleDOI
TL;DR: This article shows that strong periodic assumptions on the coefficients can be used to define an efficient and accurate algorithm for estimating periodic motion such as walking patterns and proposes a novel regularization term based on temporal bone length constancy for non-periodic motion.
Abstract: This article tackles the problem of estimating non-rigid human 3D shape and motion from image sequences taken by uncalibrated cameras. Similar to other state-of-the-art solutions we factorize 2D observations in camera parameters, base poses and mixing coefficients. Existing methods require sufficient camera motion during the sequence to achieve a correct 3D reconstruction. To obtain convincing 3D reconstructions from arbitrary camera motion, our method is based on a-priorly trained base poses. We show that strong periodic assumptions on the coefficients can be used to define an efficient and accurate algorithm for estimating periodic motion such as walking patterns. For the extension to non-periodic motion we propose a novel regularization term based on temporal bone length constancy. In contrast to other works, the proposed method does not use a predefined skeleton or anthropometric constraints and can handle arbitrary camera motion. We achieve convincing 3D reconstructions, even under the influence of noise and occlusions. Multiple experiments based on a 3D error metric demonstrate the stability of the proposed method. Compared to other state-of-the-art methods our algorithm shows a significant improvement.

Journal ArticleDOI
TL;DR: An efficient pipeline based on color enhancement, image denoising, color-to-gray conversion and image content enrichment is presented, which proves how an effective image pre-processing can improve the automated orientation procedure and dense 3D point cloud reconstruction, even in the case of poor texture scenarios.
Abstract: Automated image-based 3D reconstruction methods are more and more flooding our 3D modeling applications. Fully automated solutions give the impression that from a sample of randomly acquired images we can derive quite impressive visual 3D models. Although the level of automation is reaching very high standards, image quality is a fundamental pre-requisite to produce successful and photo-realistic 3D products, in particular when dealing with large datasets of images. This article presents an efficient pipeline based on color enhancement, image denoising, color-to-gray conversion and image content enrichment. The pipeline stems from an analysis of various state-of-the-art algorithms and aims to adjust the most promising methods, giving solutions to typical failure causes. The assessment evaluation proves how an effective image pre-processing, which considers the entire image dataset, can improve the automated orientation procedure and dense 3D point cloud reconstruction, even in the case of poor texture scenarios.

Journal ArticleDOI
TL;DR: Effective range-computation and confidence-estimation methods are proposed to handle the problems of textureless regions, outliers and detail loss and these difficult problems are handled effectively by a robust model that outputs an accurate and dense reconstruction as the final result from an input of multiple images captured by a normal camera.
Abstract: Although the stereo matching problem has been extensively studied during the past decades, automatically computing a dense 3D reconstruction from several multiple views is still a difficult task owing to the problems of textureless regions, outliers, detail loss, and various other factors. In this paper, these difficult problems are handled effectively by a robust model that outputs an accurate and dense reconstruction as the final result from an input of multiple images captured by a normal camera. First, the positions of the camera and sparse 3D points are estimated by a structure-from-motion algorithm and we compute the range map with a confidence estimation for each image in our approach. Then all the range maps are integrated into a fine point cloud data set. In the final step we use a Poisson reconstruction algorithm to finish the reconstruction. The major contributions of the work lie in the following points: effective range-computation and confidence-estimation methods are proposed to handle the problems of textureless regions, outliers and detail loss. Then, the range maps are merged into the point cloud data in terms of a confidence-estimation. Finally, Poisson reconstruction algorithm completes the dense mesh. In addition, texture mapping is also implemented as a post-processing work for obtaining good visual effects. Experimental results are presented to demonstrate the effectiveness of the proposed approach.

Journal ArticleDOI
TL;DR: The data show that the mean reprojection error should not always be used to evaluate the performance of the calibration process and that a low quality of feature detection does not always lead to a high mean reconstruction error.
Abstract: For stereoscopic systems designed for metrology applications, the accuracy of camera calibration dictates the precision of the 3D reconstruction. In this paper, the impact of various calibration conditions on the reconstruction quality is studied using a virtual camera calibration technique and the design file of a commercially available lens. This technique enables the study of the statistical behavior of the reconstruction task in selected calibration conditions. The data show that the mean reprojection error should not always be used to evaluate the performance of the calibration process and that a low quality of feature detection does not always lead to a high mean reconstruction error.

Journal ArticleDOI
TL;DR: This work argues for the importance of the interaction between recognition, reconstruction and re-organization, and proposes that as a unifying framework for computer vision, with pipelined versions of two systems, one for RGB-D images, and another for RGB images, which produce rich 3D scene interpretations in this framework.

Book ChapterDOI
03 Jul 2016
TL;DR: A two-layer approach for visual odometry with stereo cameras, which runs in real-time and combines feature-based matching with semi-dense direct image alignment, which is faster than state-of-the-art methods without losing accuracy.
Abstract: Visual motion estimation is challenging, due to high data rates, fast camera motions, featureless or repetitive environments, uneven lighting, and many other issues. In this work, we propose a two-layer approach for visual odometry with stereo cameras, which runs in real-time and combines feature-based matching with semi-dense direct image alignment. Our method initializes semi-dense depth estimation, which is computationally expensive, from motion that is tracked by a fast but robust feature point-based method. By that, we are not only able to efficiently estimate the pose of the camera with a high frame rate, but also to reconstruct the 3D structure of the environment at image gradients, which is useful, e.g., for mapping and obstacle avoidance. Experiments on datasets captured by a micro aerial vehicle (MAV) show that our approach is faster than state-of-the-art methods without losing accuracy. Moreover, our combined approach achieves promising results on the KITTI dataset, which is very challenging for direct methods, because of the low frame rate in conjunction with fast motion.

Book ChapterDOI
31 Oct 2016
TL;DR: This paper proposes a number of testing scenarios using different lighting conditions, camera positions and image acquisition methods for the best in-depth analysis and discusses the results, the overall performance and the problems present in each software.
Abstract: Structure from Motion 3D reconstruction has become widely used in recent years in a number of fields such as industrial surface inspection, archeology, cultural heritage preservation and geomapping. A number of software solutions have been released using variations of this technique. In this paper we analyse the state of the art of these software applications, by comparing the resultant 3D meshes qualitatively and quantitatively. We propose a number of testing scenarios using different lighting conditions, camera positions and image acquisition methods for the best in-depth analysis and discuss the results, the overall performance and the problems present in each software. We employ distance and roughness metrics for evaluating the final reconstruction results.

Posted Content
TL;DR: In this article, a convex relaxation is proposed for dense semantic 3D reconstruction, which uses a data term that is defined as potentials over viewing rays, combined with continuous surface area penalization.
Abstract: We propose an approach for dense semantic 3D reconstruction which uses a data term that is defined as potentials over viewing rays, combined with continuous surface area penalization. Our formulation is a convex relaxation which we augment with a crucial non-convex constraint that ensures exact handling of visibility. To tackle the non-convex minimization problem, we propose a majorize-minimize type strategy which converges to a critical point. We demonstrate the benefits of using the non-convex constraint experimentally. For the geometry-only case, we set a new state of the art on two datasets of the commonly used Middlebury multi-view stereo benchmark. Moreover, our general-purpose formulation directly reconstructs thin objects, which are usually treated with specialized algorithms. A qualitative evaluation on the dense semantic 3D reconstruction task shows that we improve significantly over previous methods.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: This work presents the first approach to simultaneously reconstructing the 3D positions and normals of the object's surface at both refraction locations under the assumption that the rays refract only twice when traveling through the object.
Abstract: Estimating the shape of transparent and refractive objects is one of the few open problems in 3D reconstruction. Under the assumption that the rays refract only twice when traveling through the object, we present the first approach to simultaneously reconstructing the 3D positions and normals of the object's surface at both refraction locations. Our acquisition setup requires only two cameras and one monitor, which serves as the light source. After acquiring the ray-ray correspondences between each camera and the monitor, we solve an optimization function which enforces a new position-normal consistency constraint. That is, the 3D positions of surface points shall agree with the normals required to refract the rays under Snell's law. Experimental results using both synthetic and real data demonstrate the robustness and accuracy of the proposed approach.