scispace - formally typeset
Open AccessProceedings ArticleDOI

Occlusion-Aware Depth Estimation Using Light-Field Cameras

Reads0
Chats0
TLDR
A depth estimation algorithm that treats occlusions explicitly, the method also enables identification of occlusion edges, which may be useful in other applications and outperforms current state-of-the-art light-field depth estimation algorithms, especially near Occlusion boundaries.
Abstract
Consumer-level and high-end light-field cameras are now widely available. Recent work has demonstrated practical methods for passive depth estimation from light-field images. However, most previous approaches do not explicitly model occlusions, and therefore cannot capture sharp transitions around object boundaries. A common assumption is that a pixel exhibits photo-consistency when focused to its correct depth, i.e., all viewpoints converge to a single (Lambertian) point in the scene. This assumption does not hold in the presence of occlusions, making most current approaches unreliable precisely where accurate depth information is most important - at depth discontinuities. In this paper, we develop a depth estimation algorithm that treats occlusion explicitly, the method also enables identification of occlusion edges, which may be useful in other applications. We show that, although pixels at occlusions do not preserve photo-consistency in general, they are still consistent in approximately half the viewpoints. Moreover, the line separating the two view regions (correct depth vs. occluder) has the same orientation as the occlusion edge has in the spatial domain. By treating these two regions separately, depth estimation can be improved. Occlusion predictions can also be computed and used for regularization. Experimental results show that our method outperforms current state-of-the-art light-field depth estimation algorithms, especially near occlusion boundaries.

read more

Content maybe subject to copyright    Report

Occlusion-aware Depth Estimation Using Light-field Cameras
Ting-Chun Wang
UC Berkeley
tcwang0509@berkeley.edu
Alexei A. Efros
UC Berkeley
efros@eecs.berkeley.edu
Ravi Ramamoorthi
UC San Diego
ravir@cs.ucsd.edu
Abstract
Consumer-level and high-end light-field cameras are
now widely available. Recent work has demonstrated prac-
tical methods for passive depth estimation from light-field
images. However, most previous approaches do not explic-
itly model occlusions, and therefore cannot capture sharp
transitions around object boundaries. A common assump-
tion is that a pixel exhibits photo-consistency when focused
to its correct depth, i.e., all viewpoints converge to a sin-
gle (Lambertian) point in the scene. This assumption does
not hold in the presence of occlusions, making most cur-
rent approaches unreliable precisely where accurate depth
information is most important at depth discontinuities.
In this paper, we develop a depth estimation algorithm
that treats occlusion explicitly; the method also enables
identification of occlusion edges, which may be useful in
other applications. We show that, although pixels at occlu-
sions do not preserve photo-consistency in general, they are
still consistent in approximately half the viewpoints. More-
over, the line separating the two view regions (correct depth
vs. occluder) has the same orientation as the occlusion edge
has in the spatial domain. By treating these two regions
separately, depth estimation can be improved. Occlusion
predictions can also be computed and used for regulariza-
tion. Experimental results show that our method outper-
forms current state-of-the-art light-field depth estimation
algorithms, especially near occlusion boundaries.
1. Introduction
Light-field cameras from Lytro [3] and Raytrix [18]
are now available for consumer and industrial use respec-
tively, bringing to fruition early work on light-field render-
ing [10, 15]. An important benefit of light-field cameras for
computer vision is that multiple viewpoints or sub-apertures
are available in a single light-field image, enabling passive
depth estimation [4]. Indeed, Lytro Illum and Raytrix soft-
ware produces depth maps used for tasks like refocusing af-
ter capture, and recent work [20] shows how multiple cues
like defocus and correspondence can be combined.
However, very little work has explicitly considered oc-
Wanner et al. (CVPR12)
Tao et al. (ICCV13)
Yu et al. (ICCV13)
Chen et al. (CVPR14)Our Result
Input light-field image
Figure 1: Comparison of depth estimation results of differ-
ent algorithms from a light-field input image. Darker rep-
resents closer and lighter represents farther. It can be seen
that only our occlusion-aware algorithm successfully cap-
tures most of the holes in the basket, while other methods
either smooth over them, or have artifacts as a result.
angular
patch
(a) Non-occluded pixels
angular
patch
(b) Occluded pixels
Figure 2: Non-occluded vs. occluded pixels. (a) At non-
occluded pixels, all view rays converge to the same point
in the scene if refocused to the correct depth. (b) However,
photo-consistency fails to hold at occluded pixels, where
some view rays will hit the occluder.
clusion. A common assumption is that, when refocused
to the correct depth (the depth of the center view), an-
gular pixels corresponding to a single spatial pixel repre-
sent viewpoints that converge to one point in the scene. If
we collect these pixels into an angular patch (Eq. 6), they
exhibit photo-consistency for Lambertian surfaces, which
means they all share the same color (Fig. 2a). However, this
assumption is not true when occlusions occur at a pixel;
photo-consistency no longer holds (Fig. 2b). Enforcing
photo-consistency on these pixels will often lead to incor-
rect depth results, causing smooth transitions around sharp
1

occlusion boundaries.
In this paper, we explicitly model occlusions, by devel-
oping a modified version of the photo-consistency condition
on angular pixels. Our main contributions are:
1. An occlusion prediction framework on light-field im-
ages that uses a modified angular photo-consistency.
2. A robust depth estimation algorithm which explicitly
takes occlusions into account.
We show (Sec. 3) that around occlusion edges, the angu-
lar patch can be divided into two regions, where only one
of them obeys photo-consistency. A key insight (Fig. 3) is
that the line separating the two regions in the angular do-
main (correct depth vs. occluder) has the same orientation
as the occlusion edge does in the spatial domain. This ob-
servation is specific to light-fields, which have a dense set
of views from a planar camera array or set of sub-apertures.
Standard stereo image pairs (nor general multi-view stereo
configurations) do not directly satisfy the model.
We use the modified photo-consistency condition, and
the means/variances in the two regions, to estimate initial
occlusion-aware depth (Sec. 4). We also compute a predic-
tor for the occlusion boundaries, that can be used as an input
to determine the final regularized depth (Sec. 5). These oc-
clusion boundaries could also be used for other applications
like segmentation or recognition. As seen in Fig. 1, our
depth estimates are more accurate in scenes with complex
occlusions (previous results smooth object boundaries like
the holes in the basket). In Sec. 6, we present extensive re-
sults on both synthetic data (Figs. 9, 10), and on real scenes
captured with the consumer Lytro Illum camera (Fig. 11),
demonstrating higher-quality depth recovery than previous
work [8, 20, 22, 26].
2. Related Work
(Multi-View) Stereo with Occlusions: Multi-view stereo
matching has a long history, with some efforts to handle oc-
clusions. For example, the graph-cut framework [12] used
an occlusion term to ensure visibility constraints while as-
signing depth labels. Woodford et al. [25] imposed an ad-
ditional second order smoothness term in the optimization,
and solved it using Quadratic Pseudo-Boolean Optimiza-
tion [19]. Based on this, Bleyer et al. [5] assumed a scene
is composed of a number of smooth surfaces and proposed
a soft segmentation method to apply the asymmetric occlu-
sion model [24]. However, significant occlusions still re-
main difficult to address even with a large number of views.
Depth from Light-Field Cameras: Perwass and Wiet-
zke [18] proposed using correspondence techniques to esti-
mate depth from light-field cameras. Tao et al. [20] com-
bined correspondence and defocus cues in the 4D Epipo-
lar Image (EPI) to complement the disadvantages of each
other. Neither method explicitly models occlusions. Mc-
Closkey [16] proposed a method to remove partial occlusion
in color images, which does not estimate depth. Wanner and
Goldluecke [22] proposed a globally consistent framework
by applying structure tensors to estimate the directions of
feature pixels in the 2D EPI. Yu et al. [26] explored geo-
metric structures of 3D lines in ray space and encoded the
line constraints to further improve the reconstruction qual-
ity. However, both methods are vulnerable to heavy occlu-
sion: the tensor field becomes too random to estimate, and
3D lines are partitioned into small, incoherent segments.
Kim et al. [11] adopted a fine-to-coarse framework to en-
sure smooth reconstructions in homogeneous areas using
dense light-fields. We build on the method by Tao et al. [20],
which works with consumer light-field cameras, to improve
depth estimation by taking occlusions into account.
Chen et al. [8] proposed a new bilateral metric on angu-
lar pixel patches to measure the probability of occlusions
by their similarity to the central pixel. However, as noted in
their discussion, their method is biased towards the central
view as it uses the color of the central pixel as the mean of
the bilateral filter. Therefore, the bilateral metric becomes
unreliable once the input images get noisy. In contrast, our
method uses the mean of about half the pixels as the ref-
erence, and is thus more robust when the input images are
noisy, as shown in our result section.
3. Light-Field Occlusion Theory
We first develop our new light-field occlusion model,
based on the physical image formation. We show that
at occlusions, some of the angular patch remains photo-
consistent, while the other part comes from occluders and
exhibits no photo consistency. By treating these two regions
separately, occlusions can be better handled.
For each pixel on an occlusion edge, we assume it is oc-
cluded by only one occluder among all views. We also as-
sume that we are looking at a spatial patch small enough,
so that the occlusion edge around that pixel can be approxi-
mated by a line. We show that if we refocus to the occluded
plane, the angular patch will still have photo-consistency
in a subset of the pixels (unoccluded). Moreover, the edge
separating the unoccluded and occluded pixels in the angu-
lar patch has the same orientation as the occlusion edge in
the spatial domain (Fig. 3). In Secs. 4 and 5, we use this idea
to develop a depth estimation and regularization algorithm.
Consider a pixel at (x
0
, y
0
, f) on the imaging focal plane
(the plane in focus), as shown in Fig. 3a. An edge in the cen-
tral pinhole image with 2D slope γ corresponds to a plane
P in 3D space (the green plane in Fig. 3a). The normal n to
this plane can be obtained by taking the cross-product,
n = (x
0
, y
0
, f)× (x
0
+1, y
0
+γ, f) = (γf, f, γx
0
y
0
). (1)
Note that we do not need to normalize the vector. The plane
equation is P (x, y, z) n · (x
0
x, y
0
y, f z) = 0,
P (x, y, z) γf (x x
0
) f (y y
0
) + (y
0
γx
0
)(z f) = 0.
(2)

Camera plane
Occluder
Focal plane
s
1
= (x
0
,y
0
,f)
O
1
γ
u
v
s
0
= (u,v,0)
n
(a) Pinhole model
Camera plane
Occluder
Occluded plane
1
γ
1
γ
1
γ
(b) “Reversed” pinhole model
Figure 3: Light-field occlusion model. (a) Pinhole model for
central camera image formation. An occlusion edge on the
imaging plane corresponds to an occluding plane in the 3D
space. (b) The “reversed” pinhole model for light-field for-
mation. It can be seen that when we refocus to the occluded
plane, we get a projection of the occluder on the camera
plane, forming a reversed pinhole camera model.
In our case, one can verify that n · (x
0
, y
0
, f) = 0 so a
further simplification to n · (x, y, z) = 0 is possible,
P (x, y, z) γfx fy + (y
0
γx
0
)z = 0. (3)
Now consider the occluder (yellow triangle in Fig. 3a).
The occluder intersects P (x, y, z) with z (0, f) and lies
on one side of that plane. Without loss of generality, we
can assume it lies in the half-space P (x, y, z) 0. Now
consider a point (u, v, 0) on the camera plane (the plane
where the camera array lies on). To avoid being shadowed
by the occluder, the line segment connecting this point and
the pixel (x
0
, y
0
, f) on the image must not hit the occluder,
P (s
0
+ (s
1
s
0
)t) 0 t [0, 1], (4)
where s
0
= (u, v, 0) and s
1
= (x
0
, y
0
, f). When t = 1,
P (s
1
) = 0. When t = 0,
P (s
0
) γf u f v 0. (5)
The last inequality is satisfied if v γu, i.e., the critical
slope on the angular patch v/u = γ is the same as the edge
orientation in the spatial domain. If the inequality above is
(a) Occlusion in central view (b) Occlusion in other views
Figure 4: Occlusions in different views. The insets are the
angular patches of the red pixels when refocused to the cor-
rect depth. At the occlusion edge in the central view, the
angular patch can be divided evenly into two regions, one
with photo-consistency and one without. However, for pix-
els around the occlusion edge, although the central view
is not occluded, some other views will still get occluded.
Hence, the angular patch will not be photo-consistent, and
will be unevenly divided into occluded and visible regions.
satisfied, both endpoints of the line segment lie on the other
side of the plane, and hence the entire segment lies on that
side as well. Thus, the light ray will not be occluded.
We also give an intuitive explanation of the above proof.
Consider a plane being occluded by an occluder, as shown
in Fig. 3b. Consider a simple 3 × 3 camera array. When we
refocus to the occluded plane, we can see that some views
are occluded by the occluder. Moreover, the occluded cam-
eras on the camera plane are the projection of the occluder
on the camera plane. Thus, we obtain a “reversed” pinhole
camera model, where the original imaging plane is replaced
by the camera plane, and the original pinhole becomes the
pixel we are looking at. When we collect pixels from differ-
ent cameras to form an angular patch, the edge separating
the two regions will correspond to the same edge the oc-
cluder has in the spatial domain.
Therefore, we can predict the edge orientation in the an-
gular domain using the edge in the spatial image. Once we
divide the patch into two regions, we know photo consis-
tency holds in one of them since they all come from the
same (assumed to be Lambertian) spatial pixel.
4. Occlusion-Aware Initial Depth Estimation
In this section, we show how to modify the initial depth
estimation from Tao et al. [20], based on the theory above.
First, we apply edge detection on the central view image.
Then for each edge pixel, we compute initial depths using
a modified photo-consistency constraint. The next section
will discuss computation of refined occlusion predictors and
regularization to generate the final depth map.
Edge detection: We first apply Canny edge detection on
the central view (pinhole) image. Then an edge orientation
predictor is applied on the obtained edges to get the orien-
tation angles at each edge pixel. These pixels are candidate

occlusion pixels in the central view. However, some pix-
els are not occluded in the central view, but are occluded in
other views, as shown in Fig. 4, and we want to mark these
as candidate occlusions as well. We identify these pixels by
dilating the edges detected in the center view.
Depth Estimation: For each pixel, we refocus to various
depths using a 4D shearing of the light-field data [17],
L
α
(x, y, u, v) = L(x+u(1
1
α
), y +v(1
1
α
), u, v), (6)
where L is the input light-field image, α is the ratio of the
refocused depth to the currently focused depth, L
α
is the re-
focused light-field image, (x, y) are the spatial coordinates,
and (u, v) are the angular coordinates. The central view-
point is located at (u, v) = (0, 0). This gives us an angular
patch for each depth, which can be averaged to give a refo-
cused pixel.
When an occlusion is not present at the pixel, the ob-
tained angular patch will have photo-consistency, and hence
exhibits small variance and high similarity to the central
view. For pixels that are not occlusion candidates, we can
simply compute the variance and the mean of this patch to
obtain the correspondence and defocus cues, similar to the
method by Tao et al. [20].
However, if an occlusion occurs, photo-consistency will
no longer hold. Instead of dealing with the entire angular
patch, we divide the patch into two regions. The angular
edge orientation separating the two regions is the same as
in the spatial domain, as proven in Sec. 3. Since at least
half the angular pixels come from the occluded plane (oth-
erwise it will not be seen in the central view), we place the
edge passing through the central pixel, dividing the patch
evenly. Note that only one region, corresponding to the par-
tially occluded plane focused to the correct depth, exhibits
photo-consistency. The other region contains angular pix-
els that come from the occluder, which is not focused at
the proper depth, and might also contain some pixels from
the occluded plane. We therefore replace the original patch
with the region that has the minimum variance to compute
the correspondence and defocus cues.
To be specific, let (u
1
, v
1
) and (u
2
, v
2
) be the angular co-
ordinates in the two regions, respectively. We first compute
the means and the variances of the two regions,
¯
L
α,j
(x, y) =
1
N
j
X
u
j
,v
j
L
α
(x, y, u
j
, v
j
), j = 1, 2 (7)
V
α,j
(x, y) =
1
N
j
1
X
u
j
,v
j
L
α
(x, y, u
j
, v
j
)
¯
L
α,j
(x, y)
2
,
(8)
where N
j
is the number of pixels in region j. Let
i = arg min
j=1,2
V
α,j
(x, y)
(9)
(a) Spatial image (b) Angular patch
(correct depth)
(c) Angular patch
(incorrect depth)
p
1
p
2
R
1
R
2
spatial patch
angular patch
(d) Color consistency (e) Focusing to
correct depth
angular
patch
(f) Focusing to
incorrect depth
Figure 5: Color consistency constraint. (b)(e) We can see
that when we refocus to the correct depth, we get low vari-
ance in half the angular patch. However, in (c)(f) although
we refocused to an incorrect depth, it still gives low vari-
ance response since the occluded plane is very textureless,
so we get a “reversed” angular patch. To address this, we
add another constraint that p
1
and p
2
should be similar to
the averages of R
1
and R
2
in (d), respectively.
be the index of the region that exhibits smaller variance.
Then the correspondence response is given by
C
α
(x, y) = V
α,i
(x, y) (10)
Similarly, the defocus response is given by
D
α
(x, y) =
¯
L
α,i
(x, y) L(x, y, 0, 0)
2
(11)
Finally, the optimal depth is determined as
α
(x, y) = arg min
α
C
α
(x, y) + D
α
(x, y)
(12)
Color Consistency Constraint: When we divide the an-
gular patch into two regions, it is sometimes possible to
obtain a “reversed” patch when we refocus to an incorrect
depth, as shown in Fig. 5. If the occluded plane is very
textureless, this depth might also give a very low variance
response, even though it is obviously incorrect. To address
this, we add a color consistency constraint that the averages
of the two regions should have a similar relationship with
respect to the current pixel as they have in the spatial do-
main. Mathematically,
|
¯
L
α,1
p
1
| + |
¯
L
α,2
p
2
| < |
¯
L
α,2
p
1
| + |
¯
L
α,1
p
2
| + δ,
(13)
where p
1
and p
2
are the values of the pixels shown in
Fig. 5d, and δ is a small value (threshold) to increase robust-
ness. If refocusing to a depth violates this constraint, this
depth is considered invalid, and is automatically excluded
in the depth estimation process.

(a) Central input image (b) Depth cue (F=0.58)
(c) Corresp. cue (F=0.53) (d) Refocus cue (F=0.57)
(e) Combined cue (F=0.65) (f) Occlusion ground truth
Figure 6: Occlusion Predictor (Synthetic Scene). The inten-
sities are adjusted for better contrast. F-measure is the har-
monic mean of precision and recall compared to the ground
truth. By combining three cues from depth, correspondence
and refocus, we can obtain a better prediction of occlusions.
5. Occlusion-Aware Depth Regularization
After the initial local depth estimation phase, we refine
the results with global regularization using a smoothness
term. We improve on previous methods by reducing the ef-
fect of the smoothness/regularization term in occlusion re-
gions. Our occlusion predictor, discussed below, may also
be useful independently for other vision applications.
Occlusion Predictor Computation: We compute a pre-
dictor P
occ
for whether a particular pixel is occluded, by
combining cues from depth, correspondence and refocus.
1. Depth Cues: First, by taking the gradient of the
initial depth, we can obtain an initial occlusion boundary,
P
d
occ
= f
d
ini
/d
ini
(14)
where d
ini
is the initial depth, and f(·) is a robust clipping
function that saturates the response above some threshold.
We divide the gradient by d
ini
to increase robustness since
for the same normal, the depth change across pixels be-
comes larger as the depth gets larger.
2. Correspondence Cues: In occlusion regions, we
have already seen that photo-consistency will only be valid
in approximately half the angular patch, with a small vari-
ance in that region. On the other hand, the pixels in the other
region come from different points on the occluding object,
and thus exhibit much higher variance. By computing the
ratio between the two variances, we can obtain an estimate
of how likely the current pixel is to be at an occlusion,
P
var
occ
= f
max
V
α
,1
V
α
,2
,
V
α
,2
V
α
,1

. (15)
where α
is the initial depth we get.
3. Refocus Cues: Finally, note that the variances in
both the regions will be small if the occluder is textureless.
To address this issue, we also compute the means of both
regions. Since the two regions come from different objects,
their colors should be different, so a large difference be-
tween the two means also indicates a possible occlusion oc-
currence. In other words,
P
avg
occ
= f(|
¯
L
α
,1
¯
L
α
,2
|) (16)
Finally, we compute the combined occlusion response or
prediction by the product of these three cues,
P
occ
= N (P
d
occ
) · N (P
var
occ
) · N (P
avg
occ
) (17)
where N (·) is a normalization function that subtracts the
mean and divides by the standard deviation.
Depth Regularization: Finally, given initial depth and
occlusion cues, we regularize with a Markov Random Field
(MRF) for a final depth map. We minimize the energy:
E =
X
p
E
unary
(p, d(p)) +
X
p,q
E
binary
(p, q, d(p), d(q)).
(18)
where d is the final depth and p, q are neighboring pixels.
We adopt the unary term similar to Tao et al. [20]. The
binary energy term is defined as
E
binary
(p, q, d(p), d(q)) =
exp
(d(p) d(q))
2
/(2σ
2
)
(|∇I(p) I(q)| + k|P
occ
(p) P
occ
(q)|)
(19)
where I is the gradient of the central pinhole image, and
k is a weighting factor. The numerator encodes the smooth-
ness constraint, while the denominator reduces the strength
of the constraint if two pixels are very different or an oc-
clusion is likely to be between them. The minimization
is solved using a standard graph cut algorithm [6, 7, 13].
We can then apply the occlusion prediction procedure again
on this regularized depth map. A sample result is shown
in Fig. 6. In this example, the F-measure (harmonic mean
of precision and recall compared to ground truth) increased
from 0.58 (depth cue), 0.53 (correspondence cue), and 0.57
(refocus cue), to 0.65 (combined cue).

Figures
Citations
More filters
Journal ArticleDOI

Learning-based view synthesis for light field cameras

TL;DR: In this paper, a learning-based approach is proposed to synthesize new views from a sparse set of input views using two sequential convolutional neural networks to model disparity and color estimation components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images.
Journal ArticleDOI

Learning-Based View Synthesis for Light Field Cameras

TL;DR: This paper proposes a novel learning-based approach to synthesize new views from a sparse set of input views that could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.
Book ChapterDOI

A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields

TL;DR: In computer vision communities such as stereo, optical flow, or visual tracking, commonly accepted and widely used benchmarks have enabled objective comparison and boosted scientific progress.
Journal ArticleDOI

Light Field Image Processing: An Overview

TL;DR: A comprehensive overview and discussion of research in light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data are presented.
Journal ArticleDOI

Soft 3D reconstruction for view synthesis

TL;DR: A novel algorithm for view synthesis that utilizes a soft 3D reconstruction to improve quality, continuity and robustness and it is shown that this representation is beneficial throughout the view synthesis pipeline.
References
More filters
Journal ArticleDOI

Fast approximate energy minimization via graph cuts

TL;DR: This work presents two algorithms based on graph cuts that efficiently find a local minimum with respect to two types of large moves, namely expansion moves and swap moves that allow important cases of discontinuity preserving energies.
Journal ArticleDOI

An experimental comparison of min-cut/max- flow algorithms for energy minimization in vision

TL;DR: This paper compares the running times of several standard algorithms, as well as a new algorithm that is recently developed that works several times faster than any of the other methods, making near real-time performance possible.
Proceedings ArticleDOI

Light field rendering

TL;DR: This paper describes a sampled representation for light fields that allows for both efficient creation and display of inward and outward looking views, and describes a compression system that is able to compress the light fields generated by more than a factor of 100:1 with very little loss of fidelity.
Proceedings ArticleDOI

A volumetric method for building complex models from range images

TL;DR: This paper presents a volumetric method for integrating range images that is able to integrate a large number of range images yielding seamless, high-detail models of up to 2.6 million triangles.
Proceedings ArticleDOI

Fast approximate energy minimization via graph cuts

TL;DR: This paper proposes two algorithms that use graph cuts to compute a local minimum even when very large moves are allowed, and generates a labeling such that there is no expansion move that decreases the energy.
Related Papers (5)
Frequently Asked Questions (8)
Q1. What contributions have the authors mentioned in the paper "Occlusion-aware depth estimation using light-field cameras" ?

In this paper, the authors develop a depth estimation algorithm that treats occlusion explicitly ; the method also enables identification of occlusion edges, which may be useful in other applications. The authors show that, although pixels at occlusions do not preserve photo-consistency in general, they are still consistent in approximately half the viewpoints. 

The authors divide the gradient by dini to increase robustness since for the same normal, the depth change across pixels becomes larger as the depth gets larger. 

both methods are vulnerable to heavy occlusion: the tensor field becomes too random to estimate, and 3D lines are partitioned into small, incoherent segments. 

The authors show that if the authors refocus to the occluded plane, the angular patch will still have photo-consistency in a subset of the pixels (unoccluded). 

The authors show that at occlusions, some of the angular patch remains photoconsistent, while the other part comes from occluders and exhibits no photo consistency. 

In this paper, the authors explicitly model occlusions, by developing a modified version of the photo-consistency condition on angular pixels. 

Wanner and Goldluecke [22] proposed a globally consistent framework by applying structure tensors to estimate the directions of feature pixels in the 2D EPI. 

For each pixel, the authors refocus to various depths using a 4D shearing of the light-field data [17],Lα(x, y, u, v) = L(x+u(1− 1 α ), y+v(1− 1 α ), u, v), (6)where L is the input light-field image, α is the ratio of the refocused depth to the currently focused depth, Lα is the refocused light-field image, (x, y) are the spatial coordinates, and (u, v) are the angular coordinates.