scispace - formally typeset
Search or ask a question

Showing papers on "3D reconstruction published in 2013"


Journal ArticleDOI
01 Nov 2013
TL;DR: An online system for large and fine scale volumetric reconstruction based on a memory and speed efficient data structure that compresses space, and allows for real-time access and updates of implicit surface data, without the need for a regular or hierarchical grid data structure.
Abstract: Online 3D reconstruction is gaining newfound interest due to the availability of real-time consumer depth cameras. The basic problem takes live overlapping depth maps as input and incrementally fuses these into a single 3D model. This is challenging particularly when real-time performance is desired without trading quality or scale. We contribute an online system for large and fine scale volumetric reconstruction based on a memory and speed efficient data structure. Our system uses a simple spatial hashing scheme that compresses space, and allows for real-time access and updates of implicit surface data, without the need for a regular or hierarchical grid data structure. Surface data is only stored densely where measurements are observed. Additionally, data can be streamed efficiently in or out of the hash table, allowing for further scalability during sensor motion. We show interactive reconstructions of a variety of scenes, reconstructing both fine-grained details and large scale environments. We illustrate how all parts of our pipeline from depth map pre-processing, camera pose estimation, depth map fusion, and surface rendering are performed at real-time rates on commodity graphics hardware. We conclude with a comparison to current state-of-the-art online systems, illustrating improved performance and reconstruction quality.

940 citations


Proceedings ArticleDOI
29 Jun 2013
TL;DR: A new system for real-time dense reconstruction with equivalent quality to existing online methods, but with support for additional spatial scale and robustness in dynamic scenes, designed around a simple and flat point-Based representation.
Abstract: Real-time or online 3D reconstruction has wide applicability and receives further interest due to availability of consumer depth cameras. Typical approaches use a moving sensor to accumulate depth measurements into a single model which is continuously refined. Designing such systems is an intricate balance between reconstruction quality, speed, spatial scale, and scene assumptions. Existing online methods either trade scale to achieve higher quality reconstructions of small objects/scenes. Or handle larger scenes by trading real-time performance and/or quality, or by limiting the bounds of the active reconstruction. Additionally, many systems assume a static scene, and cannot robustly handle scene motion or reconstructions that evolve to reflect scene changes. We address these limitations with a new system for real-time dense reconstruction with equivalent quality to existing online methods, but with support for additional spatial scale and robustness in dynamic scenes. Our system is designed around a simple and flat point-Based representation, which directly works with the input acquired from range/depth sensors, without the overhead of converting between representations. The use of points enables speed and memory efficiency, directly leveraging the standard graphics pipeline for all central operations, i.e., camera pose estimation, data association, outlier removal, fusion of depth maps into a single denoised model, and detection and update of dynamic objects. We conclude with qualitative and quantitative results that highlight robust tracking and high quality reconstructions of a diverse set of scenes at varying scales.

388 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: It is argued that image segmentation and dense 3D reconstruction contribute valuable information to each other's task and a rigorous mathematical framework is proposed to formulate and solve a joint segmentations and dense reconstruction problem.
Abstract: Both image segmentation and dense 3D modeling from images represent an intrinsically ill-posed problem. Strong regularizers are therefore required to constrain the solutions from being 'too noisy'. Unfortunately, these priors generally yield overly smooth reconstructions and/or segmentations in certain regions whereas they fail in other areas to constrain the solution sufficiently. In this paper we argue that image segmentation and dense 3D reconstruction contribute valuable information to each other's task. As a consequence, we propose a rigorous mathematical framework to formulate and solve a joint segmentation and dense reconstruction problem. Image segmentations provide geometric cues about which surface orientations are more likely to appear at a certain location in space whereas a dense 3D reconstruction yields a suitable regularization for the segmentation problem by lifting the labeling from 2D images to 3D space. We show how appearance-based cues and 3D surface orientation priors can be learned from training data and subsequently used for class-specific regularization. Experimental results on several real data sets highlight the advantages of our joint formulation.

264 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: This paper proposes a complete on-device 3D reconstruction pipeline for mobile monocular hand-held devices, which generates dense 3D models with absolute scale on-site while simultaneously supplying the user with real-time interactive feedback.
Abstract: In this paper, we propose a complete on-device 3D reconstruction pipeline for mobile monocular hand-held devices, which generates dense 3D models with absolute scale on-site while simultaneously supplying the user with real-time interactive feedback. The method fills a gap in current cloud-based mobile reconstruction services as it ensures at capture time that the acquired image set fulfills desired quality and completeness criteria. In contrast to existing systems, the developed framework offers multiple innovative solutions. In particular, we investigate the usability of the available on-device inertial sensors to make the tracking and mapping process more resilient to rapid motions and to estimate the metric scale of the captured scene. Moreover, we propose an efficient and accurate scheme for dense stereo matching which allows to reduce the processing time to interactive speed. We demonstrate the performance of the reconstruction pipeline on multiple challenging indoor and outdoor scenes of different size and depth variability.

238 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper presents a novel method for real-time camera tracking and 3D reconstruction of static indoor environments using an RGB-D sensor that is more accurate and robust than the iterated closest point algorithm (ICP) used by KinectFusion, and yields often a comparable accuracy at much higher speed to feature-based bundle adjustment methods such asRGB-D SLAM.
Abstract: The ability to quickly acquire 3D models is an essential capability needed in many disciplines including robotics, computer vision, geodesy, and architecture. In this paper we present a novel method for real-time camera tracking and 3D reconstruction of static indoor environments using an RGB-D sensor. We show that by representing the geometry with a signed distance function (SDF), the camera pose can be efficiently estimated by directly minimizing the error of the depth images on the SDF. As the SDF contains the distances to the surface for each voxel, the pose optimization can be carried out extremely fast. By iteratively estimating the camera poses and integrating the RGB-D data in the voxel grid, a detailed reconstruction of an indoor environment can be achieved. We present reconstructions of several rooms using a hand-held sensor and from onboard an autonomous quadrocopter. Our extensive evaluation on publicly available benchmark data shows that our approach is more accurate and robust than the iterated closest point algorithm (ICP) used by KinectFusion, and yields often a comparable accuracy at much higher speed to feature-based bundle adjustment methods such as RGB-D SLAM for up to medium-sized scenes.

234 citations


Journal ArticleDOI
TL;DR: A depth-map merging based multiple view stereo method for large-scale scenes which takes both accuracy and efficiency into account and can reconstruct quite accurate and dense point clouds with high computational efficiency.
Abstract: In this paper, we propose a depth-map merging based multiple view stereo method for large-scale scenes which takes both accuracy and efficiency into account. In the proposed method, an efficient patch-based stereo matching process is used to generate depth-map at each image with acceptable errors, followed by a depth-map refinement process to enforce consistency over neighboring views. Compared to state-of-the-art methods, the proposed method can reconstruct quite accurate and dense point clouds with high computational efficiency. Besides, the proposed method could be easily parallelized at image level, i.e., each depth-map is computed individually, which makes it suitable for large-scale scene reconstruction with high resolution images. The accuracy and efficiency of the proposed method are evaluated quantitatively on benchmark data and qualitatively on large data sets.

225 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: This paper offers the first variational approach to the problem of dense 3D reconstruction of non-rigid surfaces from a monocular video sequence and reconstructs highly deforming smooth surfaces densely and accurately directly from video, without the need for any prior models or shape templates.
Abstract: This paper offers the first variational approach to the problem of dense 3D reconstruction of non-rigid surfaces from a monocular video sequence. We formulate non-rigid structure from motion (nrsfm) as a global variational energy minimization problem to estimate dense low-rank smooth 3D shapes for every frame along with the camera motion matrices, given dense 2D correspondences. Unlike traditional factorization based approaches to nrsfm, which model the low-rank non-rigid shape using a fixed number of basis shapes and corresponding coefficients, we minimize the rank of the matrix of time-varying shapes directly via trace norm minimization. In conjunction with this low-rank constraint, we use an edge preserving total-variation regularization term to obtain spatially smooth shapes for every frame. Thanks to proximal splitting techniques the optimization problem can be decomposed into many point-wise sub-problems and simple linear systems which can be easily solved on GPU hardware. We show results on real sequences of different objects (face, torso, beating heart) where, despite challenges in tracking, illumination changes and occlusions, our method reconstructs highly deforming smooth surfaces densely and accurately directly from video, without the need for any prior models or shape templates.

225 citations


Proceedings ArticleDOI
23 Jun 2013
TL;DR: A formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction, and automatically augments the SLAM system with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining image data and depth information for the pose and shape recovery.
Abstract: We propose a formulation of monocular SLAM which combines live dense reconstruction with shape priors-based 3D tracking and reconstruction. Current live dense SLAM approaches are limited to the reconstruction of visible surfaces. Moreover, most of them are based on the minimisation of a photo-consistency error, which usually makes them sensitive to specularities. In the 3D pose recovery literature, problems caused by imperfect and ambiguous image information have been dealt with by using prior shape knowledge. At the same time, the success of depth sensors has shown that combining joint image and depth information drastically increases the robustness of the classical monocular 3D tracking and 3D reconstruction approaches. In this work we link dense SLAM to 3D object pose and shape recovery. More specifically, we automatically augment our SLAM system with object specific identity, together with 6D pose and additional shape degrees of freedom for the object(s) of known class in the scene, combining image data and depth information for the pose and shape recovery. This leads to a system that allows for full scaled 3D reconstruction with the known object(s) segmented from the scene. The segmentation enhances the clarity, accuracy and completeness of the maps built by the dense SLAM system, while the dense 3D data aids the segmentation process, yielding faster and more reliable convergence than when using 2D image data alone.

173 citations


Proceedings ArticleDOI
06 May 2013
TL;DR: A robust algorithm is proposed that generates an efficient and accurate dense 3D reconstruction with associated semantic labellings for intelligent autonomous systems requiring accurate 3D reconstructions for applications such as navigation and localisation.
Abstract: In this paper we propose a robust algorithm that generates an efficient and accurate dense 3D reconstruction with associated semantic labellings. Intelligent autonomous systems require accurate 3D reconstructions for applications such as navigation and localisation. Such systems also need to recognise their surroundings in order to identify and interact with objects of interest. Considerable emphasis has been given to generating a good reconstruction but less effort has gone into generating a 3D semantic model. The inputs to our algorithm are street level stereo image pairs acquired from a camera mounted on a moving vehicle. The depth-maps, generated from the stereo pairs across time, are fused into a global 3D volume online in order to accommodate arbitrary long image sequences. The street level images are automatically labelled using a Conditional Random Field (CRF) framework exploiting stereo images, and label estimates are aggregated to annotate the 3D volume. We evaluate our approach on the KITTI odometry dataset and have manually generated ground truth for object class segmentation. Our qualitative evaluation is performed on various sequences of the dataset and we also quantify our results on a representative subset.

169 citations


Proceedings ArticleDOI
01 Dec 2013
TL;DR: A new method is proposed that allows us to jointly refine the 3D reconstruction of the scene (raw depth values) while accurately segmenting out the objects or scene elements from the3D reconstruction by introducing a new model which is called Voxel-CRF.
Abstract: Scene understanding is an important yet very challenging problem in computer vision. In the past few years, researchers have taken advantage of the recent diffusion of depth-RGB (RGB-D) cameras to help simplify the problem of inferring scene semantics. However, while the added 3D geometry is certainly useful to segment out objects with different depth values, it also adds complications in that the 3D geometry is often incorrect because of noisy depth measurements and the actual 3D extent of the objects is usually unknown because of occlusions. In this paper we propose a new method that allows us to jointly refine the 3D reconstruction of the scene (raw depth values) while accurately segmenting out the objects or scene elements from the 3D reconstruction. This is achieved by introducing a new model which we called Voxel-CRF. The Voxel-CRF model is based on the idea of constructing a conditional random field over a 3D volume of interest which captures the semantic and 3D geometric relationships among different elements (voxels) of the scene. Such model allows to jointly estimate (1) a dense voxel-based 3D reconstruction and (2) the semantic labels associated with each voxel even in presence of partial occlusions using an approximate yet efficient inference strategy. We evaluated our method on the challenging NYU Depth dataset (Version 1 and 2). Experimental results show that our method achieves competitive accuracy in inferring scene semantics and visually appealing results in improving the quality of the 3D reconstruction. We also demonstrate an interesting application of object removal and scene completion from RGB-D images.

148 citations


Journal ArticleDOI
01 Feb 2013-Sensors
TL;DR: A novel way to address the extrinsic calibration problem for a system composed of a 3D LIDAR and a camera via a nonlinear least squares problem, formulated in terms of the geometric constraints associated with a trihedral object.
Abstract: This paper presents a novel way to address the extrinsic calibration problem for a system composed of a 3D LIDAR and a camera. The relative transformation between the two sensors is calibrated via a nonlinear least squares (NLS) problem, which is formulated in terms of the geometric constraints associated with a trihedral object. Precise initial estimates of NLS are obtained by dividing it into two sub-problems that are solved individually. With the precise initializations, the calibration parameters are further refined by iteratively optimizing the NLS problem. The algorithm is validated on both simulated and real data, as well as a 3D reconstruction application. Moreover, since the trihedral target used for calibration can be either orthogonal or not, it is very often present in structured environments, making the calibration convenient.

BookDOI
01 Jan 2013
TL;DR: The refereed proceedings of the 4th International Conference on Scale Space Methods and Variational Methods in Computer Vision 2013, SSVM 2013 as mentioned in this paper, were published in the same year.
Abstract: This book constitutes the refereed proceedings of the 4th International Conference on Scale Space Methods and Variational Methods in Computer Vision, SSVM 2013, held in Schloss Seggau near Graz, Austria, in June 2013. The 42 revised full papers presented were carefully reviewed and selected 69 submissions. The papers are organized in topical sections on image denoising and restoration, image enhancement and texture synthesis, optical flow and 3D reconstruction, scale space and partial differential equations, image and shape analysis, and segmentation

Journal ArticleDOI
TL;DR: The proposed octree-based surface representation for KinectFusion, a realtime reconstruction technique of in-door scenes using a low-cost moving depth camera and a commodity graphics hardware, can reconstruct scenes with more than 10 times larger size than the original KinectFusions on the same hardware setup and achieves faster performance.
Abstract: This paper proposes an octree-based surface representation for KinectFusion, a realtime reconstruction technique of in-door scenes using a low-cost moving depth camera and a commodity graphics hardware. In KinectFusion, the scene is represented as a signed distance function (SDF) and stored as an uniform grid of voxels. Though the grid-based SDF is suitable for parallel computation in graphics hardware, most of the storage are wasted, because the geometry is very sparse in the scene volume. In order to reduce the memory cost and save the computation time, we represent the SDF in an octree, and developed several octree-based algorithms for reconstruction update and surface prediction that are suitable for parallel computation in graphics hardware. In the reconstruction update step, the octree nodes are adaptively split in breath-first order. To handle scenes with moving objects, the corresponding nodes are automatically detected and removed to avoid storage overflow. In the surface prediction step, an octree-based ray tracing method is adopted and parallelized for graphic hardware. To further reduce the computation time, the octree is organized into four layers, called top layer, branch layer, middle layer and data layer. The experiments showed that, the proposed method consumes only less than 10% memory of original KinectFusion method, and achieves faster performance. Consequently, it can reconstruct scenes with more than 10 times larger size than the original KinectFusion on the same hardware setup.

Proceedings ArticleDOI
23 Dec 2013
TL;DR: Qualitative results demonstrate high quality reconstructions even visually comparable to active depth sensor-based systems such as KinectFusion, making such systems even more accessible.
Abstract: MonoFusion allows a user to build dense 3D reconstructions of their environment in real-time, utilizing only a single, off-the-shelf web camera as the input sensor. The camera could be one already available in a tablet, phone, or a standalone device. No additional input hardware is required. This removes the need for power intensive active sensors that do not work robustly in natural outdoor lighting. Using the input stream of the camera we first estimate the 6DoF camera pose using a sparse tracking method. These poses are then used for efficient dense stereo matching between the input frame and a key frame (extracted previously). The resulting dense depth maps are directly fused into a voxel-based implicit model (using a computationally inexpensive method) and surfaces are extracted per frame. The system is able to recover from tracking failures as well as filter out geometrically inconsistent noise from the 3D reconstruction. Our method is both simple to implement and efficient, making such systems even more accessible. This paper details the algorithmic components that make up our system and a GPU implementation of our approach. Qualitative results demonstrate high quality reconstructions even visually comparable to active depth sensor-based systems such as KinectFusion.

Journal ArticleDOI
TL;DR: Among the strategies for dense 3D reconstruction, using the presented method for solving the scale problem and PMVS on the images captured with two DSLR cameras resulted in a dense point cloud as accurate as the Nikon laser scanner dataset.
Abstract: Photogrammetric methods for dense 3D surface reconstruction are increasingly available to both professional and amateur users who have requirements that span a wide variety of applications. One of the key concerns in choosing an appropriate method is to understand the achievable accuracy and how choices made within the workflow can alter that outcome. In this paper we consider accuracy in two components: the ability to generate a correctly scaled 3D model; and the ability to automatically deliver a high quality data set that provides good agreement to a reference surface. The determination of scale information is particularly important, since a network of images usually only provides angle measurements and thus leads to unscaled geometry. A solution is the introduction of known distances in object space, such as base lines between camera stations or distances between control points. In order to avoid using known object distances, the method presented in this paper exploits a calibrated stereo camera utilizing the calibrated base line information from the camera pair as an observational based geometric constraint. The method provides distance information throughout the object volume by orbiting the object. In order to test the performance of this approach, four topical surface matching methods have been investigated to determine their ability to produce accurate, dense point clouds. The methods include two versions of Semi-Global Matching as well as MicMac and Patch-based Multi-View Stereo (PMVS). These methods are implemented on a set of stereo images captured from four carefully selected objects by using (1) an off-the-shelf low cost 3D camera and (2) a pair of Nikon D700 DSLR cameras rigidly mounted in close proximity to each other. Inter-comparisons demonstrate the subtle differences between each of these permutations. The point clouds are also compared to a dataset obtained with a Nikon MMD laser scanner. Finally, the established process of achieving accurate point clouds from images and known object space distances are compared with the presented strategies. Results from the matching demonstrate that if a good imaging network is provided, using a stereo camera and bundle adjustment with geometric constraints can effectively resolve the scale. Among the strategies for dense 3D reconstruction, using the presented method for solving the scale problem and PMVS on the images captured with two DSLR cameras resulted in a dense point cloud as accurate as the Nikon laser scanner dataset.

Journal ArticleDOI
TL;DR: This Letter presents a multiview phase shifting (MPS) framework for full-resolution and high-speed reconstruction of arbitrary shape dynamic objects that can achieve full spatial resolution and high, accurate 3D reconstruction.
Abstract: This Letter presents a multiview phase shifting (MPS) framework for full-resolution and high-speed reconstruction of arbitrary shape dynamic objects. Unlike conventional methods, this framework can directly find the corresponding points from the wrapped phase-maps. Therefore, only a minimum number of images are required for phase shifting to measure arbitrary shape objects, including discontinuous surfaces. Benefit from phase shifting MPS can achieve full spatial resolution and high, accurate 3D reconstruction. Benefit from multiview constraint MPS is also robust to discontinuities. Experimental results are presented to verify the performance of the proposed technique.

Journal ArticleDOI
TL;DR: This article attempts to survey the state-of-the-art 3D building modeling methods in the areas of photogrammetry, computer vision, and computer graphics.
Abstract: 3D modeling from images and LiDAR (Light Detection And Ranging) has been an active research area in the photogrammetry, computer vision, and computer graphics communities. In terms of literature review, a comprehensive survey on 3D building modeling that contains methods from all these fields will be beneficial. This article attempts to survey the state-of-the-art 3D building modeling methods in the areas of photogrammetry, computer vision, and computer graphics. The existing methods are grouped into three categories: 3D reconstruction from images, 3D modeling using range data, and 3D modeling using images and range data. The use of both data for 3D modeling is a sensor fusion approach, in which methods of image-to-LiDAR registration, upsampling, and image-guided segmentation are reviewed. For each category, the key problems are identified and solutions are addressed.

Proceedings ArticleDOI
23 Jun 2013
TL;DR: A novel approach to model 3D human body with variations on both human shape and pose, by exploring a tensor decomposition technique, which outperforms the SCAPE model quite significantly.
Abstract: In this paper, we present a novel approach to model 3D human body with variations on both human shape and pose, by exploring a tensor decomposition technique. 3D human body modeling is important for 3D reconstruction and animation of realistic human body, which can be widely used in Tele-presence and video game applications. It is challenging due to a wide range of shape variations over different people and poses. The existing SCAPE model is popular in computer vision for modeling 3D human body. However, it considers shape and pose deformations separately, which is not accurate since pose deformation is person-dependent. Our tensor-based model addresses this issue by jointly modeling shape and pose deformations. Experimental results demonstrate that our tensor-based model outperforms the SCAPE model quite significantly. We also apply our model to capture human body using Microsoft Kinect sensors with excellent results.

Journal ArticleDOI
TL;DR: A new computer vision-based method for automated 3D energy performance modeling of existing buildings using thermal and digital imagery captured by a single thermal camera that expedites the modeling process and has the potential to be used as a rapid and robust building diagnostic tool.

Journal ArticleDOI
TL;DR: A Digital Image Correlation-based single camera pseudo-stereo system that uses a biprism in front of the camera objective to split the scene into two equivalent lateral stereo views in the two halves of the sensor is presented and tested.

Proceedings ArticleDOI
01 Nov 2013
TL;DR: A novel algorithm integrating dense reconstructions from monocular views, Monte Carlo localization, and an iterative pose refinement is presented, which achieves high accuracy whereas appearance-based, state-of-the-art approaches fail.
Abstract: We propose a new method for the localization of a Micro Aerial Vehicle (MAV) with respect to a ground robot. We solve the problem of registering the 3D maps computed by the robots using different sensors: a dense 3D reconstruction from the MAV monocular camera is aligned with the map computed from the depth sensor on the ground robot. Once aligned, the dense reconstruction from the MAV is used to augment the map computed by the ground robot, by extending it with the information conveyed by the aerial views. The overall approach is novel, as it builds on recent developments in live dense reconstruction from moving cameras to address the problem of air-ground localization. The core of our contribution is constituted by a novel algorithm integrating dense reconstructions from monocular views, Monte Carlo localization, and an iterative pose refinement. In spite of the radically different vantage points from which the maps are acquired, the proposed method achieves high accuracy whereas appearance-based, state-of-the-art approaches fail. Experimental validation in indoor and outdoor scenarios reported an accuracy in position estimation of 0.08 meters and real time performance. This demonstrates that our new approach effectively overcomes the limitations imposed by the difference in sensors and vantage points that negatively affect previous techniques relying on matching visual features.

Journal ArticleDOI
TL;DR: The approach is to compare the 3Dmodels by Autodesk 123D Catch and 3D models by terrestrial LIDAR considering different object size, from the detail (capitals, moldings, bases) to large scale b uildings for practitioner purpose.
Abstract: D reconstruction from images has undergone a revolution in the last few years. Computer vision techniq ues use photographs from data set collection to rapidly build detailed 3D mo dels. The simultaneous applications of different al gorithms ( MVS ), the different techniques of image matching, feature extracting an d mesh optimization are inside an active field of r esearch in computer vision. The results are promising: the obtained models are begi nning to challenge the precision of laser-based rec onstructions. Among all the possibilities we can mainly distinguish desktop and web-based packages. Those last ones offer the oppo rtunity to exploit the power of cloud computing in order to carry out a semi-aut omatic data processing, thus allowing the user to f ulfill other tasks on its computer; whereas desktop systems employ too much processing time and hard heavy approaches. Computer vision researchers have explored many applications to verify the visua l accuracy of 3D model but the approaches to verify metric accuracy are few and no one is on Autodesk 123D Catch applied on Architectural Heritage Documentation. Our approach to this challenging problem is to compare the 3Dmodels by Autodesk 123D Catch and 3D models by terrestrial LIDAR considering different object size, from the detail (capitals, moldings, bases) to large scale b uildings for practitioner purpose.

Journal ArticleDOI
TL;DR: Through tracking and registration, the reconstructed 3D models were loaded in an AR environment to facilitate displaying, interacting, and rendering that provides AR applications in construction design and management for better and qualitative communication in economical and handy ways.

Journal ArticleDOI
TL;DR: A PDE-based disparity estimation method which produces continuous depth fields with sharp depth discontinuities even in occluded and highly textured regions is proposed and evaluated against ground-truth from the Middlebury stereo test bed and LIDAR scans.
Abstract: We propose a 3D environment modelling method using multiple pairs of high-resolution spherical images. Spherical images of a scene are captured using a rotating line scan camera. Reconstruction is based on stereo image pairs with a vertical displacement between camera views. A 3D mesh model for each pair of spherical images is reconstructed by stereo matching. For accurate surface reconstruction, we propose a PDE-based disparity estimation method which produces continuous depth fields with sharp depth discontinuities even in occluded and highly textured regions. A full environment model is constructed by fusion of partial reconstruction from spherical stereo pairs at multiple widely spaced locations. To avoid camera calibration steps for all camera locations, we calculate 3D rigid transforms between capture points using feature matching and register all meshes into a unified coordinate system. Finally a complete 3D model of the environment is generated by selecting the most reliable observations among overlapped surface measurements considering surface visibility, orientation and distance from the camera. We analyse the characteristics and behaviour of errors for spherical stereo imaging. Performance of the proposed algorithm is evaluated against ground-truth from the Middlebury stereo test bed and LIDAR scans. Results are also compared with conventional structure-from-motion algorithms. The final composite model is rendered from a wide range of viewpoints with high quality textures.

Patent
24 Jan 2013
TL;DR: In this paper, a mobile depth camera is used to estimate the pose and orientation of a depth camera moving in an environment to be tracked for robotics, gaming, and other applications, which facilitates the alignment of the depth camera with the 3D model of the environment.
Abstract: Camera pose estimation for 3D reconstruction is described, for example, to enable position and orientation of a depth camera moving in an environment to be tracked for robotics, gaming and other applications. In various embodiments, depth observations from the mobile depth camera are aligned with surfaces of a 3D model of the environment in order to find an updated position and orientation of the mobile depth camera which facilitates the alignment. For example, the mobile depth camera is moved through the environment in order to build a 3D reconstruction of surfaces in the environment which may be stored as the 3D model. In examples, an initial estimate of the pose of the mobile depth camera is obtained and then updated by using a parallelized optimization process in real time.

Journal ArticleDOI
TL;DR: This work evaluates the performance of a low-cost commercial SFM–DMVR software by digitising a Cycladic woman figurine and questions the applicability and efficiency of two digitisation pipelines in relation to hardware requirements, background knowledge and man-hours.

Journal ArticleDOI
TL;DR: An automatic pipeline for identifying and extracting the silhouette of signs in every individual image and a multi-view constrained 3D reconstruction algorithm provides an optimum 3D silhouette for the detected signs.
Abstract: 3D reconstruction of traffic signs is of great interest in many applications such as image-based localization and navigation In order to reflect the reality, the reconstruction process should meet both accuracy and precision In order to reach such a valid reconstruction from calibrated multi-view images, accurate and precise extraction of signs in every individual view is a must This paper presents first an automatic pipeline for identifying and extracting the silhouette of signs in every individual image Then, a multi-view constrained 3D reconstruction algorithm provides an optimum 3D silhouette for the detected signs The first step called detection, tackles with a color-based segmentation to generate ROIs (Region of Interests) in image The shape of every ROI is estimated by fitting an ellipse, a quadrilateral or a triangle to edge points A ROI is rejected if none of the three shapes can be fitted sufficiently precisely Thanks to the estimated shape the remained candidates ROIs are rectified to remove the perspective distortion and then matched with a set of reference signs using textural information Poor matches are rejected and the types of remained ones are identified The output of the detection algorithm is a set of identified road signs whose silhouette in image plane is represented by and ellipse, a quadrilateral or a triangle The 3D reconstruction process is based on a hypothesis generation and verification Hypotheses are generated by a stereo matching approach taking into account epipolar geometry and also the similarity of the categories The hypotheses that are plausibly correspond to the same 3D road sign are identified and grouped during this process Finally, all the hypotheses of the same group are merged to generate a unique 3D road sign by a multi-view algorithm integrating a priori knowledges about 3D shape of road signs as constraints The algorithm is assessed on real and synthetic images and reached and average accuracy of 35cm for position and 45° for orientation

Proceedings ArticleDOI
23 Jun 2013
TL;DR: This work proposes a new approach for template-based extensible surface reconstruction from a single view that relies on the minimization of a proposed stretching energy formalized with respect to the Poisson ratio parameter of the surface.
Abstract: We propose a new approach for template-based extensible surface reconstruction from a single view. We extend the method of isometric surface reconstruction and more recent work on conformal surface reconstruction. Our approach relies on the minimization of a proposed stretching energy formalized with respect to the Poisson ratio parameter of the surface. We derive a patch-based formulation of this stretching energy by assuming local linear elasticity. This formulation unifies geometrical and mechanical constraints in a single energy term. We prevent local scale ambiguities by imposing a set of fixed boundary 3D points. We experimentally prove the sufficiency of this set of boundary points and demonstrate the effectiveness of our approach on different developable and non-developable surfaces with a wide range of extensibility.

Journal ArticleDOI
TL;DR: An investigation is reported about extraction of 3D building models from high resolution DSMs and orthorectified images produced from Worldview-2 stereo satellite imagery and a model driven approach based on the analysis of the 3D points of DSMs in a 2D projection plane is proposed.
Abstract: High resolution Digital Surface Models (DSMs) produced from airborne laser-scanning or stereo satellite images provide a very useful source of information for automated 3D building reconstruction. In this paper an investigation is reported about extraction of 3D building models from high resolution DSMs and orthorectified images produced from Worldview-2 stereo satellite imagery. The focus is on the generation of 3D models of parametric building roofs, which is the basis for creating Level Of Detail 2 (LOD2) according to the CityGML standard. In particular the building blocks containing several connected buildings with tilted roofs are investigated and the potentials and limitations of the modeling approach are discussed. The edge information extracted from orthorectified image has been employed as additional source of information in 3D reconstruction algorithm. A model driven approach based on the analysis of the 3D points of DSMs in a 2D projection plane is proposed. Accordingly, a building block is divided into smaller parts according to the direction and number of existing ridge lines for parametric building reconstruction. The 3D model is derived for each building part, and finally, a complete parametric model is formed by merging the 3D models of the individual building parts and adjusting the nodes after the merging step. For the remaining building parts that do not contain ridge lines, a prismatic model using polygon approximation of the corresponding boundary pixels is derived and merged to the parametric models to shape the final model of the building. A qualitative and quantitative assessment of the proposed method for the automatic reconstruction of buildings with parametric roofs is then provided by comparing the final model with the existing surface model as well as some field measurements.

Proceedings ArticleDOI
01 Dec 2013
TL;DR: A probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera and surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers.
Abstract: We introduce a probabilistic framework for simultaneous tracking and reconstruction of 3D rigid objects using an RGB-D camera. The tracking problem is handled using a bag-of-pixels representation and a back-projection scheme. Surface and background appearance models are learned online, leading to robust tracking in the presence of heavy occlusion and outliers. In both our tracking and reconstruction modules, the 3D object is implicitly embedded using a 3D level-set function. The framework is initialized with a simple shape primitive model (e.g. a sphere or a cube), and the real 3D object shape is tracked and reconstructed online. Unlike existing depth-based 3D reconstruction works, which either rely on calibrated/fixed camera set up or use the observed world map to track the depth camera, our framework can simultaneously track and reconstruct small moving objects. We use both qualitative and quantitative results to demonstrate the superior performance of both tracking and reconstruction of our method.