scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Accurate, Dense, and Robust Multiview Stereopsis

01 Aug 2010-IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE Computer Society)-Vol. 32, Iss: 8, pp 1362-1376
TL;DR: A novel algorithm for multiview stereopsis that outputs a dense set of small rectangular patches covering the surfaces visible in the images, which outperforms all others submitted so far for four out of the six data sets.
Abstract: This paper proposes a novel algorithm for multiview stereopsis that outputs a dense set of small rectangular patches covering the surfaces visible in the images. Stereopsis is implemented as a match, expand, and filter procedure, starting from a sparse set of matched keypoints, and repeatedly expanding these before using visibility constraints to filter away false matches. The keys to the performance of the proposed algorithm are effective techniques for enforcing local photometric consistency and global visibility constraints. Simple but effective methods are also proposed to turn the resulting patch model into a mesh which can be further refined by an algorithm that enforces both photometric consistency and regularization constraints. The proposed approach automatically detects and discards outliers and obstacles and does not require any initialization in the form of a visual hull, a bounding box, or valid depth ranges. We have tested our algorithm on various data sets including objects with fine surface details, deep concavities, and thin structures, outdoor scenes observed from a restricted set of viewpoints, and "crowded" scenes where moving obstacles appear in front of a static structure of interest. A quantitative evaluation on the Middlebury benchmark [1] shows that the proposed method outperforms all others submitted so far for four out of the six data sets.
Citations
More filters
Book ChapterDOI
08 Oct 2016
TL;DR: The core contributions are the joint estimation of depth andnormal information, pixelwise view selection using photometric and geometric priors, and a multi-view geometric consistency term for the simultaneous refinement and image-based depth and normal fusion.
Abstract: This work presents a Multi-View Stereo system for robust and efficient dense modeling from unstructured image collections. Our core contributions are the joint estimation of depth and normal information, pixelwise view selection using photometric and geometric priors, and a multi-view geometric consistency term for the simultaneous refinement and image-based depth and normal fusion. Experiments on benchmarks and large-scale Internet photo collections demonstrate state-of-the-art performance in terms of accuracy, completeness, and efficiency.

1,372 citations


Cites background from "Accurate, Dense, and Robust Multivi..."

  • ...6 and 5(c) show depth/normal maps, and the supplementary material provides more results and comparisons against [9,10,47]....

    [...]

  • ...[14] [60] [9] [62] [61] [28] [15] \N \P \S \B \PSB \G Ours...

    [...]

Journal ArticleDOI
TL;DR: The paper reports the state of the art of UAV for geomatics applications, giving an overview of different UAV platforms, applications, and case studies, showing also the latest developments of Uav image processing.
Abstract: Unmanned aerial vehicle (UAV) platforms are nowadays a valuable source of data for inspection, surveillance, mapping, and 3D modeling issues. As UAVs can be considered as a low-cost alternative to the classical manned aerial photogrammetry, new applications in the short- and close-range domain are introduced. Rotary or fixed-wing UAVs, capable of performing the photogrammetric data acquisition with amateur or SLR digital cameras, can fly in manual, semiautomated, and autonomous modes. Following a typical photogrammetric workflow, 3D results like digital surface or terrain models, contours, textured 3D models, vector information, etc. can be produced, even on large areas. The paper reports the state of the art of UAV for geomatics applications, giving an overview of different UAV platforms, applications, and case studies, showing also the latest developments of UAV image processing. New perspectives are also addressed.

1,358 citations

Journal ArticleDOI
TL;DR: This paper presents RGB-D Mapping, a full 3D mapping system that utilizes a novel joint optimization algorithm combining visual features and shape-based alignment to achieve globally consistent maps.
Abstract: RGB-D cameras (such as the Microsoft Kinect) are novel sensing systems that capture RGB images along with per-pixel depth information. In this paper we investigate how such cameras can be used for building dense 3D maps of indoor environments. Such maps have applications in robot navigation, manipulation, semantic mapping, and telepresence. We present RGB-D Mapping, a full 3D mapping system that utilizes a novel joint optimization algorithm combining visual features and shape-based alignment. Visual and depth information are also combined for view-based loop-closure detection, followed by pose optimization to achieve globally consistent maps. We evaluate RGB-D Mapping on two large indoor environments, and show that it effectively combines the visual and shape information available from RGB-D cameras.

1,223 citations


Cites background or methods from "Accurate, Dense, and Robust Multivi..."

  • ...In the vision and graphics communities, there has been a large amount of work on dense reconstruction from videos (e.g. Pollefeys et al. 2008) and photos (e.g. Debevec et al. 1996; Furukawa and Ponce 2010),5 mostly on objects or outdoor scenes....

    [...]

  • ...For example, patch-based multi-view stereo (PMVS; Furukawa and Ponce, 2010)5 can generate quite accurate reconstructions using a visual consistency measure, and it would be exciting to apply these techniques to our maps....

    [...]

  • ...In the conference version of this paper (Henry et al. 2010), we used SIFT features computed with SIFTGPU (Wu 2007)....

    [...]

Proceedings ArticleDOI
13 May 2019
TL;DR: Pixel-aligned Implicit Function (PIFu) as mentioned in this paper aligns pixels of 2D images with the global context of their corresponding 3D object to produce highresolution surfaces including largely unseen regions such as the back of a person.
Abstract: We introduce Pixel-aligned Implicit Function (PIFu), an implicit representation that locally aligns pixels of 2D images with the global context of their corresponding 3D object. Using PIFu, we propose an end-to-end deep learning method for digitizing highly detailed clothed humans that can infer both 3D surface and texture from a single image, and optionally, multiple input images. Highly intricate shapes, such as hairstyles, clothing, as well as their variations and deformations can be digitized in a unified way. Compared to existing representations used for 3D deep learning, PIFu produces high-resolution surfaces including largely unseen regions such as the back of a person. In particular, it is memory efficient unlike the voxel representation, can handle arbitrary topology, and the resulting surface is spatially aligned with the input image. Furthermore, while previous techniques are designed to process either a single image or multiple views, PIFu extends naturally to arbitrary number of views. We demonstrate high-resolution and robust reconstructions on real world images from the DeepFashion dataset, which contains a variety of challenging clothing types. Our method achieves state-of-the-art performance on a public benchmark and outperforms the prior work for clothed human digitization from a single image.

907 citations

Proceedings ArticleDOI
23 Jun 2008
TL;DR: The discussion on whether image based 3D modelling techniques can possibly be used to replace LIDAR systems for outdoor 3D data acquisition and two main issues have to be addressed: camera calibration and dense multi-view stereo.
Abstract: In this paper we want to start the discussion on whether image based 3D modelling techniques can possibly be used to replace LIDAR systems for outdoor 3D data acquisition. Two main issues have to be addressed in this context: (i) camera calibration (internal and external) and (ii) dense multi-view stereo. To investigate both, we have acquired test data from outdoor scenes both with LIDAR and cameras. Using the LIDAR data as reference we estimated the ground-truth for several scenes. Evaluation sets are prepared to evaluate different aspects of 3D model building. These are: (i) pose estimation and multi-view stereo with known internal camera parameters; (ii) camera calibration and multi-view stereo with the raw images as the only input and (iii) multi-view stereo.

890 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper has designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can easily be extended to include new algorithms.
Abstract: Stereo matching is one of the most active research areas in computer vision. While a large number of algorithms for stereo correspondence have been developed, relatively little work has been done on characterizing their performance. In this paper, we present a taxonomy of dense, two-frame stereo methods designed to assess the different components and design decisions made in individual stereo algorithms. Using this taxonomy, we compare existing stereo methods and present experiments evaluating the performance of many different variants. In order to establish a common software platform and a collection of data sets for easy evaluation, we have designed a stand-alone, flexible C++ implementation that enables the evaluation of individual components and that can be easily extended to include new algorithms. We have also produced several new multiframe stereo data sets with ground truth, and are making both the code and data sets available on the Web.

7,458 citations

Proceedings ArticleDOI
26 Jun 2006
TL;DR: A spatially adaptive multiscale algorithm whose time and space complexities are proportional to the size of the reconstructed model, and which reduces to a well conditioned sparse linear system.
Abstract: We show that surface reconstruction from oriented points can be cast as a spatial Poisson problem. This Poisson formulation considers all the points at once, without resorting to heuristic spatial partitioning or blending, and is therefore highly resilient to data noise. Unlike radial basis function schemes, our Poisson approach allows a hierarchy of locally supported basis functions, and therefore the solution reduces to a well conditioned sparse linear system. We describe a spatially adaptive multiscale algorithm whose time and space complexities are proportional to the size of the reconstructed model. Experimenting with publicly available scan data, we demonstrate reconstruction of surfaces with greater detail than previously achievable.

2,712 citations


"Accurate, Dense, and Robust Multivi..." refers methods in this paper

  • ...Table 1 lists the number of input images, their approximate size, the corresponding choice of parameters, the algorithm used to initialize a mesh model (either PSR software [29] or iterative snapping after visual hull construction, denoted as VH), and whether images contain obstacles (crowded scenes) or not....

    [...]

  • ...Our first approach to mesh initialization is to simply use Poisson Surface Reconstruction (PSR) software [29] that directly converts a set of oriented points into a triangulated mesh model....

    [...]

Proceedings ArticleDOI
17 Jun 2006
TL;DR: This paper first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties, then describes the process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduces the evaluation methodology.
Abstract: This paper presents a quantitative comparison of several multi-view stereo reconstruction algorithms. Until now, the lack of suitable calibrated multi-view image datasets with known ground truth (3D shape models) has prevented such direct comparisons. In this paper, we first survey multi-view stereo algorithms and compare them qualitatively using a taxonomy that differentiates their key properties. We then describe our process for acquiring and calibrating multiview image datasets with high-accuracy ground truth and introduce our evaluation methodology. Finally, we present the results of our quantitative comparison of state-of-the-art multi-view stereo reconstruction algorithms on six benchmark datasets. The datasets, evaluation details, and instructions for submitting new models are available online at http://vision.middlebury.edu/mview.

2,556 citations


"Accurate, Dense, and Robust Multivi..." refers background or methods in this paper

  • ...QUANTITATIVE EVALUATIONS PROVIDED AT [2]....

    [...]

  • ...Quantitative evaluations of state-of-the-art MVS algorithms are presented at [2] in terms of accuracy (distance d such that a given percentage of the reconstruction is within d from the ground truth model) and completeness (percentage of the ground truth model that is within a given distance from the reconstruction)....

    [...]

  • ...10Rendered views of the reconstructions and all the quantitative evaluations can be found at [2]....

    [...]

  • ...The patch generation algorithm is very efficient, in particular, takes only a few minutes for temple and dino, in comparison to most other state-of-the-art techniques evaluated at [2] that take more than half an hour....

    [...]

  • ...[2], state-of-the-art MVS algorithms achieve relative accuracy better than 1/200 (1mm for a 20cm wide object) from a set of low-resolution (640×480) images....

    [...]

Journal ArticleDOI
TL;DR: A provably-correct algorithm is given, called Space Carving, for computing the 3D shape of an unknown, arbitrarily-shaped scene from multiple photographs taken at known but arbitrarily-distributed viewpoints to capture photorealistic shapes that accurately model scene appearance from a wide range of viewpoints.
Abstract: In this paper we consider the problem of computing the 3D shape of an unknown, arbitrarily-shaped scene from multiple photographs taken at known but arbitrarily-distributed viewpoints. By studying the equivalence class of all 3D shapes that reproduce the input photographs, we prove the existence of a special member of this class, the photo hull, that (1) can be computed directly from photographs of the scene, and (2) subsumes all other members of this class. We then give a provably-correct algorithm, called Space Carving, for computing this shape and present experimental results on complex real-world scenes. The approach is designed to (1) capture photorealistic shapes that accurately model scene appearance from a wide range of viewpoints, and (2) account for the complex interactions between occlusion, parallax, shading, and their view-dependent effects on scene-appearance.

1,487 citations

Proceedings ArticleDOI
01 Sep 1993
TL;DR: In this article, the authors present a method for solving the following problem: given a set of data points scattered in three dimensions and an initial triangular mesh M0, produce a mesh M, of the same topological type as M0 that fits the data well and has a small number of vertices.
Abstract: We present a method for solving the following problem: Given a set of data points scattered in three dimensions and an initial triangular mesh M0, produce a mesh M, of the same topological type as M0, that fits the data well and has a small number of vertices. Our approach is to minimize an energy function that explicitly models the competing desires of conciseness of representation and fidelity to the data. We show that mesh optimization can be effectively used in at least two applications: surface reconstruction from unorganized points, and mesh simplification (the reduction of the number of vertices in an initially dense mesh of triangles).

1,424 citations