scispace - formally typeset
Open AccessProceedings ArticleDOI

Accurate depth map estimation from a lenslet light field camera

Reads0
Chats0
TLDR
This paper introduces an algorithm that accurately estimates depth maps using a lenslet light field camera and estimates the multi-view stereo correspondences with sub-pixel accuracy using the cost volume using the phase shift theorem.
Abstract
This paper introduces an algorithm that accurately estimates depth maps using a lenslet light field camera. The proposed algorithm estimates the multi-view stereo correspondences with sub-pixel accuracy using the cost volume. The foundation for constructing accurate costs is threefold. First, the sub-aperture images are displaced using the phase shift theorem. Second, the gradient costs are adaptively aggregated using the angular coordinates of the light field. Third, the feature correspondences between the sub-aperture images are used as additional constraints. With the cost volume, the multi-label optimization propagates and corrects the depth map in the weak texture regions. Finally, the local depth map is iteratively refined through fitting the local quadratic function to estimate a non-discrete depth map. Because micro-lens images contain unexpected distortions, a method is also proposed that corrects this error. The effectiveness of the proposed algorithm is demonstrated through challenging real world examples and including comparisons with the performance of advanced depth estimation algorithms.

read more

Content maybe subject to copyright    Report

Accurate Depth Map Estimation from a Lenslet Light Field Camera
Hae-Gon Jeon Jaesik Park Gyeongmin Choe Jinsun Park
Yunsu Bok Yu-Wing Tai In So Kweon
Korea Advanced Institute of Science and Technology (KAIST), Republic of Korea
[hgjeon,jspark,gmchoe,ysbok]@rcv.kaist.ac.kr
[zzangjinsun,yuwing]@gmail.com, iskweon@kaist.ac.kr
Abstract
This paper introduces an algorithm that accurately esti-
mates depth maps using a lenslet light field camera. The
proposed algorithm estimates the multi-view stereo cor-
respondences with sub-pixel accuracy using the cost vol-
ume. The foundation for constructing accurate costs is
threefold. First, the sub-aperture images are displaced us-
ing the phase shift theorem. Second, the gradient costs
are adaptively aggregated using the angular coordinates of
the light field. Third, the feature correspondences between
the sub-aperture images are used as additional constraints.
With the cost volume, the multi-label optimization propa-
gates and corrects the depth map in the weak texture re-
gions. Finally, the local depth map is iteratively refined
through fitting the local quadratic function to estimate a
non-discrete depth map. Because micro-lens images con-
tain unexpected distortions, a method is also proposed that
corrects this error. The effectiveness of the proposed algo-
rithm is demonstrated through challenging real world ex-
amples and including comparisons with the performance of
advanced depth estimation algorithms.
1. Introduction
The problem of estimating an accurate depth map
from a lenslet light field camera, e.g. Lytro
TM
[1] and
Raytrix
TM
[19], is investigated. Different to conventional
cameras, a light field camera captures not only a 2D image,
but also the directions of the incoming light rays. The addi-
tional light directions allow the image to be re-focused and
the depth map of a scene to be estimated as demonstrated
in [12, 17, 19, 23, 26, 29].
Because the baseline between sub-aperture images from
a lenslet light field camera is very narrow, directly applying
the existing stereo matching algorithms such as [20] can-
not produce satisfying results, even if the applied algorithm
is a top ranked method in the Middlebury stereo matching
benchmark. As reported in Yu et al. [29], the disparity range
Lytro software [1] Ours
Figure 1. Synthesized views of the two depth maps acquired from
Lytro software [1] and our approach.
of adjacent sub-aperture images in Lytro is between 1 to
1 pixels. Consequently, it is very challenging to estimate an
accurate depth map because the one pixel disparity error is
already a significant error in this problem.
In this paper, an algorithm for stereo matching between
sub-aperture images with an extremely narrow baseline is
presented. Central to the proposed algorithm is the use of
the phase shift theorem in the Fourier domain to estimate
the sub-pixel shifts of sub-aperture images. This enables the
estimation of the stereo correspondences at sub-pixel accu-
racy, even with a very narrow baseline. The cost volume
is computed to evaluate the matching cost of different dis-
parity labels, which is defined using the similarity measure-
ment between the sub-aperture images and the center view
sub-aperture image shifted at different sub-pixel locations.
Here, the gradient matching costs are adaptively aggregated
based on the angular coordinates of the light field camera.
In order to reduce the effects of image noise, the
weighted median filter was adopted to remove the noise
in the cost volume, followed by using the multi-label op-
timization to propagate reliable disparity labels to the weak
texture regions. In the multi-label optimization, confident
matching correspondences between the center view and
other views are used as additional constraints, which as-
sist in preventing oversmoothing at the edges and texture
regions. Finally, the estimated depth map is iteratively re-
fined using quadratic polynomial interpolation to enhance
the estimated depth map with sub-label precision.
1

In the experiments, it was found that a micro-lens im-
age of lenslet light field cameras contains depth distor-
tions. Therefore, a method of correcting this error is also
presented. The effectiveness of the proposed algorithm is
demonstrated using challenging real world examples that
were captured by a Lytro camera, a Raytrix camera, and a
lab-made lenslet light field camera. A performance compar-
ison with advanced methods is also presented. An example
of the results of the proposed method are presented in Fig. 1.
2. Related Work
Previous work related to depth map (or disparity map
1
)
estimation from a light field image is reviewed. Compared
with conventional approaches in stereo matching, lenslet
light field images have very narrow baselines. Conse-
quently, approaches based on correspondence matching do
not typically work well because the sub-pixel shift in the
spatial domain usually involves interpolation with blurri-
ness, and the matching costs of stereo correspondence are
highly ambiguous. Therefore, instead of using correspon-
dence matching, other cues and constraints were used to es-
timate the depth maps from a lenslet light field image.
Georgiev and Lumsdaine [7] computed a normalized
cross correlation between microlens images in order to es-
timate the disparity map. Bishop and Favaro [4] introduced
an iterative method for a multi-view stereo image for a light
field. Wanner and Goldluecke [26] used a structure tensor
to compute the vertical and horizontal slopes in the epipolar
plane of a light field image, and they formulated the depth
map estimation problem as a global optimization approach
that was subject to the epipolar constraint. Yu et al. [29]
analyzed the 3D geometry of lines in a light field image
and computed the disparity maps through line matching be-
tween the sub-aperture images. Tao et al. [23] introduced
a fusion method that uses the correspondences and defocus
cues of a light field image to estimate the disparity maps.
After the initial estimation, a multi-label optimization is ap-
plied in order to refine the estimated disparity map. Heber
and Pock [8] estimated disparity maps using the low-rank
structure regularization to align the sub-aperture images.
In addition to the aforementioned approaches, there have
been recent studies that have estimated depth maps from
light field images. For example, Kim et al. [10] estimated
depth maps from a DSLR camera with movement, which
simulated the multiple viewpoints of a light field image.
Chen et al. [6] introduced a bilateral consistency metric on
the surface camera in order to estimate the stereo correspon-
dence in a light field image in the presence of occlusion.
However, it should be noted that the baseline of the light
field images presented in Kim et al. [10] and Chen et al. [6]
are significantly larger than the baseline of the light field
1
We sometimes use disparity map to represent depth map.
images captured using a lenslet light field camera.
Compared with previous studies, the proposed algorithm
computes the cost volume that is based on sub-pixel multi-
view stereo matching. Unique in the proposed algorithm
is the usage of the phase shift theorem when performing
the sub-pixel shifts of sub-aperture image. The phase shift
theorem allows the reconstruction of the sub-pixel shifted
sub-aperture images without introducing blurriness in con-
trast to spatial domain interpolation. As is demonstrated in
the experiments, the proposed algorithm is highly effective
and outperforms the advanced algorithms in depth map es-
timation using a lenslet light field image.
3. Sub-aperture Image Analysis
First, the characteristics of sub-aperture images obtained
from a lenslet-based light field camera are analyzed, and
then the proposed distortion correction method is described.
3.1. Narrow Baseline Sub-aperture Image
Narrow baseline. According to the lenslet light field cam-
era projection model proposed by Bok et al. [5], the view-
point (S, T ) of a sub-aperture image with an angular direc-
tion s = (s, t)
2
is as follows:
S
T
=
D
d
(D + d)
s/f
x
t/f
y
, (1)
where D is the distance between the lenslet array and the
center of main lens, d is the distance between the lenslet
array and imaging sensor, and f is the focal length of the
main lens. With the assumption of a uniform focal length
(i.e. f
x
= f
y
= f ), the baseline between two adjacent sub-
aperture images is defined as baseline :=
(D+d)D
df
.
Based on this, we need to shorten f, shorten d, or
lengthen D for a wider baseline. However, f cannot be too
short because it is proportional to the angular resolution of
the micro-lenses in a lenslet array. Therefore, the maximum
baseline that is multiplication of the baseline and angular
resolution of sub-aperture images remains unchanged even
if the value of f varies. If the physical size of the micro-
lenses is too large, the spatial resolution of the sub-aperture
images is reduced. Shortening d enlarges the angular dif-
ference between the corresponding rays of adjacent pixels
and might cause radial distortion of the micro-lenses. Fi-
nally, lengthening D increases the baseline, but the field
of view is reduced. Due to these challenges, the disparity
range of sub-aperture images is quite narrow. For example,
the disparity range between adjacent sub-aperture views of
the Lytro camera is smaller than ±1 pixel [29].
2
The 4D parameterization [7, 17, 26] is followed where the pixel co-
ordinates of a light field image I are defined using the 4D parameters of
(s, t, x, y). Here, s = (s, t) denotes the discrete index of the angular di-
rections and x = (x, y) denotes the Cartesian image coordinates of each
sub-aperture image.

(a) Before compensation
(b) After compensation
Pivot
Rotate
(c) EPI compensation (d) EPI difference
Figure 2. (a) and (b) EPI before and after distortion correction.
(c) shows our compensation process for a pixel. (d) shows slope
difference between two EPIs.
without correction with correction
Figure 3. Disparity map before and after distortion correction
(Sec. 3.2). Real-world planar scene is captured and the depth map
is computed using our approach (Sec. 4).
Sub-aperture image distortion. From the analyses con-
ducted in this study, it is observed that the lenslet light field
images contain optical distortions that are caused by both
the main lens (thin lens model) and micro-lenses (pinhole
model). Although the radial distortion of the main lens can
be calibrated using conventional methods, it is imperfect,
particularly for light rays that have large angular differences
from the optical axis. The distortion caused by these rays
is called astigmatism [22]. Moreover, because the conven-
tional distortion model is based on a pinhole camera model,
the rays that do not pass through the center of the main lens
cannot fit well to the model. The distortion caused by those
rays is called field curvature [22]. Because they are the pri-
mary causes of the depth distortion, the two distortions are
compensated in the following subsection.
3.2. Distortion Estimation and Correction
During the capture of a light field image of a planar ob-
ject, spatially variant epipolar plane image (EPI) slopes (i.e.
non-uniform depths) are observed that result from the dis-
tortions mentioned in Sec. 3.1 (see Fig. 3). In addition, the
degree of distortion also varies for each sub-aperture image.
To solve this problem, an energy minimization problem
is formulated under a constant depth assumption as follows:
ˆ
G = argmin
G
X
x
|θ(I(x)) θ
o
G(x)| (2)
where | · | denotes the absolute operator. θ
o
, θ(·), and G(·)
denote the slope without distortion, the slope of EPI, and
the amount of distortion at point x, respectively.
The amount of field curvature distortion is estimated for
Bilinear Bicubic Phase Original
Figure 4. An original sub-aperture image is shifted with bilinear,
bicubic and phase shift theorem.
each pixel. An image of a planar checkerboard is captured
and compared with the observed EPI slopes with θ
o
3
. Points
with strong gradients in the EPI are selected and the differ-
ence (θ(·) θ
o
) is calculated in Eq. (2). Then, the entire
field curvature G is fitted to a second order polynomial sur-
face model.
After solving Eq. (2), each point’s EPI slope is rotated
using
ˆ
G. The pixel of reference view (i.e. center view) is
set as the pivot of the rotation (see Fig. 2 (c)). However,
due to the astigmatism, the field curvature varies accord-
ing to the slice direction. In order to consider this problem,
Eq. (2) is solved twice: once each for the horizontal and
vertical directions. The correction order does not affect the
compensation result. In order to avoid chromatic aberra-
tions, the distortion parameters are estimated for each color
channel. Figure 2 and Fig. 3 present the EPI image and es-
timated depth map before and after the proposed distortion
correction, respectively
4
.
The proposed method is classified as a low order ap-
proach that targets the astigmatism and field curvature. A
more generalized technique for correcting the aberration has
been proposed by Ng and Hanrahan [16], and it is currently
used for real products [2].
4. Depth Map Estimation
Given the distortion-corrected sub-aperture images, the
goal is to estimate accurate dense depth maps. The pro-
posed depth map estimation algorithm is developed using a
cost-volume-based stereo [20]. In order to manage the nar-
row baseline between the sub-aperture images, the pipeline
is tailored with three significant differences. First, instead
of traversing the local patches to compute the cost vol-
ume, the sub-aperture images were directly shifted using a
phase shift theorem and the per-pixel cost volume was com-
puted. Second, in order to effectively aggregate the gradient
costs computed from dozens of sub-aperture image pairs, a
3
A tilt error might exist if the sensor and calibration plane are not par-
allel. In order to avoid this, an optical table is used.
4
It is observed that altering the focal length and zooming parameters
affect the correction. This is a limitation of the proposed method. However,
it is also observed that the distortion parameter is not scene dependent.

weight term that considers the horizontal/vertical deviation
in the st coordinates between the sub-aperture image pairs
is defined. Third, because small viewpoint changes of sub-
aperture images allow feature matching to be more reliable,
a guidance of confident matching correspondences is also
included in the discrete label optimization [11]. The details
are described in following sub-sections.
4.1. Phase Shift based Sub-pixel Displacement
A key contribution of the proposed depth estimation al-
gorithm is matching the narrow baseline sub-aperture im-
ages using sub-pixel displacements. According to the phase
shift theorem, if an image I is shifted by x R
2
, the
corresponding phase shift in the 2D Fourier transform is:
F{I(x + x)} = F{I(x)}exp
2πix
, (3)
where F{·} denotes the discrete 2D Fourier transform.
In Eq. (3), multiplying the exponential term in the frequency
domain is the same as convolving a Dirichlet kernel (or peri-
odic sinc) in the spatial domain. According to the Nyquist-
Shannon sampling theorem [21], a continuous band-limited
signal can be perfectly reconstructed through convolving it
with a sinc function. If the centroid of the sinc function
is deviated from the origin, precisely shifted signals can be
obtained. In the same manner, Eq. (3) generates a precisely
shifted image in the spatial domain if the sub-aperture im-
age is band-limited. Therefore, the sub-pixel shifted image
I
0
(x) is obtained using:
I
0
(x) = I(x + x) = F
1
{F{I(x)}exp
2πix
}. (4)
In practice, the light field image is not always a band-
limited signal. This results from the weak pre-filtering
that fits the light field into the sub-aperture image resolu-
tion [13, 24]. However, the artifact is not obvious for re-
gions where the texture is obtained from the source surface
in the scene. For example, a sub-aperture image of a reso-
lution chart captured by Lytro camera is presented in Fig. 4.
This image is shifted by x = [2.2345, 1.5938] pixels.
Compared with the displacement that results from the bilin-
ear and bicubic interpolations, the sub-pixel shifted image
using the phase shift theorem is sharper and does not con-
tain blurriness. Note that having an accurate reconstruction
of sub-pixel shifted images is significant for accurate depth
map estimations, particularly when the baseline is narrow.
The effect of the interpolation method and depth accuracy
is analyzed in Sec. 5.
In this implementation, the fast Fourier transform with
a circular boundary condition is used to manage the non-
infinite signals. Because the proposed algorithm shifts the
entire sub-aperture image instead of local patches, the ar-
tifacts that result from periodicity problems only appear at
the boundary of the image within a width of a few pixels
(less than two pixels), which is negligible.
4.2. Building the Cost Volume
In order to match sub-aperture images, two complemen-
tary costs were used: the sum of absolute differences (SAD)
and the sum of gradient differences (GRAD). The cost vol-
ume C is defined as a function of x and cost label l:
C(x, l) = αC
A
(x, l) + (1α)C
G
(x, l), (5)
where α [0, 1] adjusts the relative importance between
the SAD cost C
A
and GRAD cost C
G
. C
A
is defined as
C
A
(x, l)=
X
sV
X
xR
x
min(|I(s
c
, x)I(s, x+x(s, l))|, τ
1
),
(6)
where R
x
is a small rectangular region centered at x; τ
1
is
a truncation value of a robust function; and V contains the
st coordinate pixels s, except for the center view s
c
. Equa-
tion (3) is used for precise sub-pixel shifting of the images.
Equation (6) builds a matching cost through comparing
the center sub-aperture image I(s
c
, x) with the other sub-
aperture images I(s, x) to generate a disparity map from a
canonical viewpoint. The 2D shift vector x in Eq. (6) is
defined as follows:
x(s, l) = lk(s s
c
), (7)
where k is the unit of the label in pixels. x linearly in-
creases as the angular deviations from the center viewpoint
increase. Another cost volume C
G
is defined as follows:
C
G
(x, l)=
X
sV
X
xR
x
β(s) min
Diff
x
(s
c
, s, x, l), τ
2
(8)
+
1 β(s)
min
Diff
y
(s
c
, s, x, l), τ
2
where Diff
x
(s
c
, s, x, l) = |I
x
(s
c
, x) I
x
(s, x + x(s, l))|
denotes the differences between the x-directional gradient
of the sub-aperture images. Diff
y
is defined similarly on
the y-directional gradients. τ
2
is a truncation constant that
suppresses outliers. β(s) in Eq. (8) controls the relative im-
portance of the two directional gradient differences based
on the relative st coordinates. β(s) is defined as follows:
β(s) =
|s s
c
|
|s s
c
| + |t t
c
|
. (9)
According to Eq. (9), β increases if the target view s is
located at the horizontal extent of the center view s
c
. In
this case, only the gradient costs in the x direction are ag-
gregated to C
G
. Note that β is independent of the scene
because it is determined purely using the relative position
between s and s
c
.
As a sequential step, every cost slice is refined using an
edge-preserving filter [15] to alleviate the coarsely scattered
unreliable matches. Here, the central sub-aperture image is
used to determine the weights used for the filter. They are

(a) (b) (c) (d) (e)
Figure 5. Estimated disparity maps at different step of our algorithm. (a) The center view sub-aperture image. (b)-(e) Disparity maps (b)
based on the initial cost volume (winner-takes-all strategy), (c) after weighted median filter refinement (The red pixels indicates detected
outlier pixels), (d) after the multi-label optimization, and (e) after the iterative refinement. The processes in (b) and (c) are described in
Sec. 4.2, and the processes in (d) and (e) are described in Sec. 4.3 respectively.
Central view Graph cuts Refined
Synthesized view
using graph cut depth
Synthesized view
using refined depth
Figure 6. The effectiveness of the iterative refinement step de-
scribed in Sec. 4.3.
determined using the Euclidean distances between the RGB
values of two pixels in the filter, which preserves the discon-
tinuity in the cost slices. From the refined cost volume C
0
,
a disparity map l
a
is determined using the winner-takes-all
strategy. As depicted in Figs. 5 (b) and (c), the noisy back-
ground disparities are substituted with the majority value
(almost zero in this example) of the background disparity.
In each pixel, if the variance over the cost slices is smaller
than a threshold τ
reject
, this pixel is regarded as an outlier
because it does not have distinctive minimum values. The
red pixels in Fig. 5 (c) indicate these outlier pixels.
4.3. Disparity Optimization and Enhancement
The disparity map from the previous step is enhanced
through discrete optimization and iterative quadratic fitting.
Confident matching correspondences. Besides the cost
volume, the correspondences are also matched at salient
feature points as strong guides for multi-label optimization.
In particular, local feature matching is conducted between
the center sub-aperture image and other sub-aperture im-
ages. Here, the SIFT algorithm [14] is used for the feature
extraction and matching. From a pair of matched feature po-
sitions, the positional deviation f R
2
in the xy coordi-
nates is computed. If the amount of deviation kf k exceeds
the maximum disparity range of the light field camera, they
are rejected as outliers. For each pair of matched positions,
given s, s
c
, f , and k, an over-determined linear equation
f = lk(s s
c
) is solved for l. This is based on the linear
relationship depicted in Eq. (7). Because the feature point
in the center view is matched with that of multiple images,
it has several candidates for disparities. Therefore, their me-
dian value is obtained and used to compute the sparse and
confident disparities l
c
.
Multi-label optimization. Multi-label optimization is per-
formed using graph cuts [11] to propagate and correct the
disparities using neighboring estimation. The optimal dis-
parity map is obtained through minimizing
l
r
= argmin
l
X
x
C
0
x, l(x)
+ λ
1
X
x∈I
kl(x)l
a
(x)k
+λ
2
X
x∈M
kl(x)l
c
(x)k+ λ
3
X
x
0
∈N
x
kl(x)l(x
0
)k, (10)
where I contains inlier pixels that are determined in the
previous step in Sec. 4.2, and M denotes the pixels that
have confident matching correspondences. Equation (10)
has four terms: matching cost reliability (C
0
x, l(x)
), data
fidelity (kl(x) l
a
(x)k), confident matching cost (kl(x)
l
c
(x)k), and local smoothness (kl(x)l(x
0
)k). Figure 5 (d)
presents a corrected depth map after the discrete optimiza-
tion. Note that even without the confident matching cost,
the proposed approach estimates a reliable disparity map.
The confident matching cost further enhances the estimated
disparity at regions with salient matching.
Iterative refinement. The last step refines the discrete dis-
parity map after the multi-label optimization into a contin-
uous disparity with sharp gradients at depth discontinuities.
The method presented by Yang et al. [28] is adopted. A new
cost volume
ˆ
C that is filled with one is computed. Then, for

Citations
More filters
Journal ArticleDOI

Learning-based view synthesis for light field cameras

TL;DR: In this paper, a learning-based approach is proposed to synthesize new views from a sparse set of input views using two sequential convolutional neural networks to model disparity and color estimation components and train both networks simultaneously by minimizing the error between the synthesized and ground truth images.
Journal ArticleDOI

Learning-Based View Synthesis for Light Field Cameras

TL;DR: This paper proposes a novel learning-based approach to synthesize new views from a sparse set of input views that could potentially decrease the required angular resolution of consumer light field cameras, which allows their spatial resolution to increase.
Book ChapterDOI

A Dataset and Evaluation Methodology for Depth Estimation on 4D Light Fields

TL;DR: In computer vision communities such as stereo, optical flow, or visual tracking, commonly accepted and widely used benchmarks have enabled objective comparison and boosted scientific progress.
Journal ArticleDOI

Light Field Image Processing: An Overview

TL;DR: A comprehensive overview and discussion of research in light field image processing, including basic light field representation and theory, acquisition, super-resolution, depth estimation, compression, editing, processing algorithms for light field display, and computer vision applications of light field data are presented.
Journal ArticleDOI

Soft 3D reconstruction for view synthesis

TL;DR: A novel algorithm for view synthesis that utilizes a soft 3D reconstruction to improve quality, continuity and robustness and it is shown that this representation is beneficial throughout the view synthesis pipeline.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Journal ArticleDOI

Communication in the presence of noise

TL;DR: A method is developed for representing any communication system geometrically and a number of results in communication theory are deduced concerning expansion and compression of bandwidth and the threshold effect.

Light field photography with a hand-held plenoptic camera

TL;DR: The plenoptic camera as mentioned in this paper uses a microlens array between the sensor and the main lens to measure the total amount of light deposited at that location, but how much light arrives along each ray.
Proceedings ArticleDOI

Fast cost-volume filtering for visual correspondence and beyond

TL;DR: This paper proposes a generic and simple framework comprising three steps: constructing a cost volume, fast cost volume filtering and winner-take-all label selection, and achieves state-of-the-art results that achieve disparity maps in real-time, and optical flow fields with very fine structures as well as large displacements.
Book ChapterDOI

Multi-camera Scene Reconstruction via Graph Cuts

TL;DR: This paper addresses the problem of computing the 3-dimensional shape of an arbitrary scene from a set of images taken at known viewpoints by giving an energy minimization formulation of the multi-camera scene reconstruction problem.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What have the authors contributed in "Accurate depth map estimation from a lenslet light field camera" ?

This paper introduces an algorithm that accurately estimates depth maps using a lenslet light field camera. 

Central to the proposed algorithm is the use of the phase shift theorem in the Fourier domain to estimate the sub-pixel shifts of sub-aperture images. 

Because the proposed depth estimation method collects matching costs using robust clipping functions, it can tolerate significant outliers. 

Shortening d enlarges the angular difference between the corresponding rays of adjacent pixels and might cause radial distortion of the micro-lenses. 

In addition, the calculation of the exact sub-pixel shift using the phase shift theorem improves the matching quality as demonstrated in the synthetic experiments. 

The significant challenges of estimating the disparity using very narrow baselines was discussed, and the proposed method was found to be effective in terms of utilizing the sub-pixel shift in the frequency domain. 

weight term that considers the horizontal/vertical deviation in the st coordinates between the sub-aperture image pairs is defined. 

The error values of AWS and Robust PCA are from [8].every x, Ĉ(x, lr(x)) is set to 0, followed by weighted median filtering [15] of the cost slices. 

The optimal disparity map is obtained through minimizinglr = argmin l ∑ x C ′ ( x, l(x) ) + λ1 ∑ x∈I ‖l(x)−la(x)‖+λ2 ∑ x∈M ‖l(x)−lc(x)‖+ λ3 ∑ x′∈Nx ‖l(x)−l(x′)‖, (10)where The authorcontains inlier pixels that are determined in the previous step in Sec. 4.2, and M denotes the pixels that have confident matching correspondences. 

According to the NyquistShannon sampling theorem [21], a continuous band-limited signal can be perfectly reconstructed through convolving it with a sinc function. 

Because the baseline between sub-aperture images from a lenslet light field camera is very narrow, directly applying the existing stereo matching algorithms such as [20] cannot produce satisfying results, even if the applied algorithm is a top ranked method in the Middlebury stereo matching benchmark. 

Among the sub-pixel shift methods, the proposed phase-shift based approach exhibited the best results, which supports the importance of accurate sub-pixel shifting.