scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Joint image and depth completion in shape-from-focus: Taking a cue from parallax

TL;DR: The interesting possibility of exploiting motion parallax cue in the images captured in Shape-from-focus with a practical camera to jointly inpaint the focused image and depth map is demonstrated.
Abstract: Shape-from-focus (SFF) uses a sequence of space-variantly defocused observations captured with relative motion between camera and scene. It assumes that there is no motion parallax in the frames. This is a restriction and constrains the working environment. Moreover, SFF cannot recover the structure information when there are missing data in the frames due to CCD sensor damage or unavoidable occlusions. The capability of filling-in plausible information in regions devoid of data is of critical importance in many applications. Images of 3D scenes captured by off-the-shelf cameras with relative motion commonly exhibit parallax-induced pixel motion. We demonstrate the interesting possibility of exploiting motion parallax cue in the images captured in SFF with a practical camera to jointly inpaint the focused image and depth map.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
Chi Chen1, Bisheng Yang1
TL;DR: The proposed method achieves high precision and recall rate for dynamic occlusion detection and inpainting method for sequential TLS point clouds captured from one scan position and produces clean inpainted point clouds for further processing.
Abstract: Laser point clouds captured using terrestrial laser scanning (TLS) in an uncontrollable urban outdoor or indoor scene suffer from irregular shaped data blanks caused by dynamic occlusion that temporarily exists, i.e., moving objects, such as pedestrians or cars, resulting in integrality and quality losses of the scene data. This paper proposes a novel automatic dynamic occlusion detection and inpainting method for sequential TLS point clouds captured from one scan position. In situ collected laser point clouds sequences are indexed by establishing a novel panoramic space partition that assigns a three dimensional voxel to each laser point according to the scanning setups. Then two stationary background models are constructed at the ray voxel level using the laser reflectance intensity and geometrical attributes of the point set inside each voxel across the TLS sequence. Finally, the background models are combined to detect the points on the dynamic object, and the ray voxels of the detected dynamic points are tracked for further inpainting by replacing the ray voxels with the corresponding background voxels from another scan. The resulting scene is free of dynamic occlusions. Experiments validated the effectiveness of the proposed method for indoor and outdoor TLS point clouds captured by a commercial terrestrial scanner. The proposed method achieves high precision and recall rate for dynamic occlusion detection and produces clean inpainted point clouds for further processing.

33 citations

Proceedings ArticleDOI
01 Jan 2015
TL;DR: This work proposes a multi-modal approach to address the problem of removal of fences/occlusions from images captured using a Kinect camera, and model the unoccluded image and the completed depth map as two distinct Markov random fields, respectively, and obtain their maximum a-posteriori estimates using loopy belief propagation.
Abstract: Low cost RGB-D sensors such as the Microsoft Kinect have enabled the use of depth data along with color images. In this work, we propose a multi-modal approach to address the problem of removal of fences/occlusions from images captured using a Kinect camera. We also perform depth completion by fusing data from multiple recorded depth maps affected by occlusions. The availability of aligned image and depth data from Kinect aids us in the detection of the fence locations. However, accurate estimation of the relative shifts between the captured color frames is necessary. Initially, for the case of static scene elements with simple relative motion between the camera and the objects, we propose the use of affine scale-invariant feature transform descriptor (ASIFT) to compute the relative global displacements. We also address the scenario wherein the relative motion between the frames may not be global using the depth map obtained by Kinect. For such a scenario involving complex motion of scene pixels, we use a recently proposed robust optical flow technique. We show results for challenging real-world data wherein the scene is dynamic. The inverse ill-posed problems of estimation of the de-fenced image and the inpainted depth map are solved using an optimization-based framework. Specifically, we model the unoccluded image and the completed depth map as two distinct Markov random fields, respectively, and obtain their maximum a-posteriori estimates using loopy belief propagation.

27 citations


Cites methods from "Joint image and depth completion in..."

  • ...Our work is also related to recent works in the area of depth inpainting [4], [5], [6], [7] but unlike these techniques our work uses the Kinect sensor to capture both RGB video and depth data....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a fast and reliable algorithm for depth map inpainting using the tensor voting (TV) framework, and align the depth maps of the training set with the target (defective) depth map and evaluates the goodness of depth estimates among candidate values using 3D TV.
Abstract: Depth maps captured by range scanning devices or by using optical cameras often suffer from missing regions due to occlusions, reflectivity, limited scanning area, sensor imperfections, etc. In this paper, we propose a fast and reliable algorithm for depth map inpainting using the tensor voting (TV) framework. For less complex missing regions, local edge and depth information is utilized for synthesizing missing values. The depth variations are modeled by local planes using 3D TV, and missing values are estimated using plane equations. For large and complex missing regions, we collect and evaluate depth estimates from self-similar (training) datasets. We align the depth maps of the training set with the target (defective) depth map and evaluate the goodness of depth estimates among candidate values using 3D TV. We demonstrate the effectiveness of the proposed approaches on real as well as synthetic data.

22 citations

Journal ArticleDOI
TL;DR: This work proposes a depth estimation framework using calibrated images captured under general camera motion and lens parameter variations that seeks to generalize the constrained areas of stereo and shape from defocus (SFD)/focus (SFF) by handling, in tandem, various effects such as focus variation, zoom, parallax and stereo occlusions, all under one roof.
Abstract: Traditional depth estimation methods typically exploit the effect of either the variations in internal parameters such as aperture and focus (as in depth from defocus), or variations in extrinsic parameters such as position and orientation of the camera (as in stereo). When operating off-the-shelf (OTS) cameras in a general setting, these parameters influence the depth of field (DOF) and field of view (FOV). While DOF mandates one to deal with defocus blur, a larger FOV necessitates camera motion during image acquisition. As a result, for unfettered operation of an OTS camera, it becomes inevitable to account for pixel motion as well as optical defocus blur in the captured images. We propose a depth estimation framework using calibrated images captured under general camera motion and lens parameter variations. Our formulation seeks to generalize the constrained areas of stereo and shape from defocus (SFD)/focus (SFF) by handling, in tandem, various effects such as focus variation, zoom, parallax and stereo occlusions, all under one roof. One of the associated challenges in such an unrestrained scenario is the problem of removing user-defined foreground occluders in the reference depth map and image (termed inpainting of depth and image). Inpainting is achieved by exploiting the cue from motion parallax to discover (in other images) the correspondence/color information missing in the reference image. Moreover, considering the fact that the observations could be differently blurred, it is important to ensure that the degree of defocus in the missing regions (in the reference image) is coherent with the local neighbours (defocus inpainting).

13 citations


Cites background from "Joint image and depth completion in..."

  • ...The works in Sahay and Rajagopalan (2009a), Sahay and Rajagopalan (2010) considers both motion and blur in the restrictive setting of axial motion and no camera parameter variations....

    [...]

  • ...In the restrictive case of an axial translation (Sahay and Rajagopalan 2010), the motion-cue for inpainting is effectively absent near the image center due to small pixel motion....

    [...]

  • ...However, if telecentricity assumption is relaxed, the camera motion will play a role by inducing depth dependent pixel-shift in which case the SFF scenario turns out to be a special case of real-aperture axial stereo (Sahay and Rajagopalan 2010)....

    [...]

  • ...– In the restrictive case of an axial translation (Sahay and Rajagopalan 2010), the motion-cue for inpainting is effectively absent near the image center due to small pixel motion....

    [...]

Journal ArticleDOI
TL;DR: This paper exploits spatial dependencies by modeling the shape of the object with a discontinuity‐adaptive Markov random field wherein the focus measure profile is used to judiciously control the degree of smoothness.
Abstract: Summary Shape from focus is an elegant method that estimates the structure of a 3D object from a video of captured frames using the degree of focus as the principal cue. However, the quality of the estimated structure is vulnerable to scene texture. The effect is particularly pronounced for objects that are smooth relative to the magnification of the optical system. In this paper, the shape estimation process is cast as an inverse problem. We exploit spatial dependencies by modeling the shape of the object with a discontinuity-adaptive Markov random field wherein the focus measure profile is used to judiciously control the degree of smoothness. The 3D information is obtained by minimizing a suitably derived energy function that preserves fine details of the underlying structure. We show by experimentation on several real-world specimens that our method yields state-of-the-art performance.

10 citations


Cites methods from "Joint image and depth completion in..."

  • ...The assumption of no parallax in the stack of defocused images has been relaxed in Sahay & Rajagopalan (2011) and later this framework is used for inpainting and disocclusion in Sahay & Rajagopalan (2010)....

    [...]

  • ...The works in Sahay & Rajagopalan (2011, 2010) used a stack affected by parallax captured by an off-the-shelf digital camera....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: This work presents two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves that allow important cases of discontinuity preserving energies.
Abstract: Many tasks in computer vision involve assigning a label (such as disparity) to every pixel. A common constraint is that the labels should vary smoothly almost everywhere while preserving sharp discontinuities that may exist, e.g., at object boundaries. These tasks are naturally stated in terms of energy minimization. The authors consider a wide class of energies with various smoothness constraints. Global minimization of these energy functions is NP-hard even in the simplest discontinuity-preserving case. Therefore, our focus is on efficient approximation algorithms. We present two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves. These moves can simultaneously change the labels of arbitrarily large sets of pixels. In contrast, many standard algorithms (including simulated annealing) use small moves where only one pixel changes its label at a time. Our expansion algorithm finds a labeling within a known factor of the global minimum, while our swap algorithm handles more general energy functions. Both of these algorithms allow important cases of discontinuity preserving energies. We experimentally demonstrate the effectiveness of our approach for image restoration, stereo and motion. On real data with ground truth, we achieve 98 percent accuracy.

7,413 citations

Proceedings ArticleDOI
01 Jul 2000
TL;DR: A novel algorithm for digital inpainting of still images that attempts to replicate the basic techniques used by professional restorators, and does not require the user to specify where the novel information comes from.
Abstract: Inpainting, the technique of modifying an image in an undetectable form, is as ancient as art itself. The goals and applications of inpainting are numerous, from the restoration of damaged paintings and photographs to the removal/replacement of selected objects. In this paper, we introduce a novel algorithm for digital inpainting of still images that attempts to replicate the basic techniques used by professional restorators. After the user selects the regions to be restored, the algorithm automatically fills-in these regions with information surrounding them. The fill-in is done in such a way that isophote lines arriving at the regions' boundaries are completed inside. In contrast with previous approaches, the technique here introduced does not require the user to specify where the novel information comes from. This is automatically done (and in a fast way), thereby allowing to simultaneously fill-in numerous regions containing completely different structures and surrounding backgrounds. In addition, no limitations are imposed on the topology of the region to be inpainted. Applications of this technique include the restoration of old photographs and damaged film; removal of superimposed text like dates, subtitles, or publicity; and the removal of entire objects from the image like microphones or wires in special effects.

3,830 citations

Proceedings ArticleDOI
01 Jan 1999
TL;DR: This paper proposes two algorithms that use graph cuts to compute a local minimum even when very large moves are allowed, and generates a labeling such that there is no expansion move that decreases the energy.
Abstract: In this paper we address the problem of minimizing a large class of energy functions that occur in early vision. The major restriction is that the energy function's smoothness term must only involve pairs of pixels. We propose two algorithms that use graph cuts to compute a local minimum even when very large moves are allowed. The first move we consider is an /spl alpha/-/spl beta/-swap: for a pair of labels /spl alpha/,/spl beta/, this move exchanges the labels between an arbitrary set of pixels labeled a and another arbitrary set labeled /spl beta/. Our first algorithm generates a labeling such that there is no swap move that decreases the energy. The second move we consider is an /spl alpha/-expansion: for a label a, this move assigns an arbitrary set of pixels the label /spl alpha/. Our second algorithm, which requires the smoothness term to be a metric, generates a labeling such that there is no expansion move that decreases the energy. Moreover, this solution is within a known factor of the global minimum. We experimentally demonstrate the effectiveness of our approach on image restoration, stereo and motion.

3,199 citations

Journal ArticleDOI
01 Jan 2004
TL;DR: This work gives a precise characterization of what energy functions can be minimized using graph cuts, among the energy functions that can be written as a sum of terms containing three or fewer binary variables.
Abstract: In the last few years, several new algorithms based on graph cuts have been developed to solve energy minimization problems in computer vision. Each of these techniques constructs a graph such that the minimum cut on the graph also minimizes the energy. Yet, because these graph constructions are complex and highly specific to a particular energy function, graph cuts have seen limited application to date. In this paper, we give a characterization of the energy functions that can be minimized by graph cuts. Our results are restricted to functions of binary variables. However, our work generalizes many previous constructions and is easily applicable to vision problems that involve large numbers of labels, such as stereo, motion, image restoration, and scene reconstruction. We give a precise characterization of what energy functions can be minimized using graph cuts, among the energy functions that can be written as a sum of terms containing three or fewer binary variables. We also provide a general-purpose construction to minimize such an energy function. Finally, we give a necessary condition for any energy function of binary variables to be minimized by graph cuts. Researchers who are considering the use of graph cuts to optimize a particular energy function can use our results to determine if this is possible and then follow our construction to create the appropriate graph. A software implementation is freely available.

3,079 citations