A depth estimation algorithm that treats occlusions explicitly, the method also enables identification of occlusion edges, which may be useful in other applications and outperforms current state-of-the-art light-field depth estimation algorithms, especially near Occlusion boundaries.
Abstract:
Consumer-level and high-end light-field cameras are now widely available. Recent work has demonstrated practical methods for passive depth estimation from light-field images. However, most previous approaches do not explicitly model occlusions, and therefore cannot capture sharp transitions around object boundaries. A common assumption is that a pixel exhibits photo-consistency when focused to its correct depth, i.e., all viewpoints converge to a single (Lambertian) point in the scene. This assumption does not hold in the presence of occlusions, making most current approaches unreliable precisely where accurate depth information is most important - at depth discontinuities. In this paper, we develop a depth estimation algorithm that treats occlusion explicitly, the method also enables identification of occlusion edges, which may be useful in other applications. We show that, although pixels at occlusions do not preserve photo-consistency in general, they are still consistent in approximately half the viewpoints. Moreover, the line separating the two view regions (correct depth vs. occluder) has the same orientation as the occlusion edge has in the spatial domain. By treating these two regions separately, depth estimation can be improved. Occlusion predictions can also be computed and used for regularization. Experimental results show that our method outperforms current state-of-the-art light-field depth estimation algorithms, especially near occlusion boundaries.
TL;DR: In this paper, a learning-based approach is proposed to synthesize new views from a sparse set of input views using two sequential convolutional neural networks to model disparity and color estimation components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images.
TL;DR: This paper proposes a novel learning-based approach to synthesize new views from a sparse set of input views that could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.
TL;DR: In computer vision communities such as stereo, optical flow, or visual tracking, commonly accepted and widely used benchmarks have enabled objective comparison and boosted scientific progress.
TL;DR: A comprehensive overview and discussion of research in light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data are presented.
TL;DR: A novel algorithm for view synthesis that utilizes a soft 3D reconstruction to improve quality, continuity and robustness and it is shown that this representation is beneficial throughout the view synthesis pipeline.
TL;DR: This work presents two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves that allow important cases of discontinuity preserving energies.
TL;DR: This paper compares the running times of several standard algorithms, as well as a new algorithm that is recently developed that works several times faster than any of the other methods, making near real-time performance possible.
TL;DR: This paper describes a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views, and describes a compression system that is able to compress the light fields generated by more than a factor of 100:1 with very little loss of fidelity.
TL;DR: This paper presents a volumetric method for integrating range images that is able to integrate a large number of range images yielding seamless, high-detail models of up to 2.6 million triangles.
TL;DR: This paper proposes two algorithms that use graph cuts to compute a local minimum even when very large moves are allowed, and generates a labeling such that there is no expansion move that decreases the energy.
Q1. What contributions have the authors mentioned in the paper "Occlusion-aware depth estimation using light-field cameras" ?
In this paper, the authors develop a depth estimation algorithm that treats occlusion explicitly ; the method also enables identification of occlusion edges, which may be useful in other applications. The authors show that, although pixels at occlusions do not preserve photo-consistency in general, they are still consistent in approximately half the viewpoints.
Q2. How do the authors increase the depth of the occlusion cues?
The authors divide the gradient by dini to increase robustness since for the same normal, the depth change across pixels becomes larger as the depth gets larger.
Q3. What are the main problems of the two methods?
both methods are vulnerable to heavy occlusion: the tensor field becomes too random to estimate, and 3D lines are partitioned into small, incoherent segments.
Q4. What is the effect of a refocus on the angular patch?
The authors show that if the authors refocus to the occluded plane, the angular patch will still have photo-consistency in a subset of the pixels (unoccluded).
Q5. What is the effect of the bilateral metric on angular patch?
The authors show that at occlusions, some of the angular patch remains photoconsistent, while the other part comes from occluders and exhibits no photo consistency.
Q6. What is the main contribution of this paper?
In this paper, the authors explicitly model occlusions, by developing a modified version of the photo-consistency condition on angular pixels.
Q7. How did Wanner and Goldluecke propose a global consistent framework?
Wanner and Goldluecke [22] proposed a globally consistent framework by applying structure tensors to estimate the directions of feature pixels in the 2D EPI.
Q8. What is the angular coordinates of the pixel?
For each pixel, the authors refocus to various depths using a 4D shearing of the light-field data [17],Lα(x, y, u, v) = L(x+u(1− 1 α ), y+v(1− 1 α ), u, v), (6)where L is the input light-field image, α is the ratio of the refocused depth to the currently focused depth, Lα is the refocused light-field image, (x, y) are the spatial coordinates, and (u, v) are the angular coordinates.