scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Image Guided Depth Upsampling Using Anisotropic Total Generalized Variation

01 Dec 2013-pp 993-1000
TL;DR: This work formulate a convex optimization problem using higher order regularization for depth image up sampling, and derives a numerical algorithm based on a primal-dual formulation that is efficiently parallelized and runs at multiple frames per second.
Abstract: In this work we present a novel method for the challenging problem of depth image up sampling. Modern depth cameras such as Kinect or Time-of-Flight cameras deliver dense, high quality depth measurements but are limited in their lateral resolution. To overcome this limitation we formulate a convex optimization problem using higher order regularization for depth image up sampling. In this optimization an an isotropic diffusion tensor, calculated from a high resolution intensity image, is used to guide the up sampling. We derive a numerical algorithm based on a primal-dual formulation that is efficiently parallelized and runs at multiple frames per second. We show that this novel up sampling clearly outperforms state of the art approaches in terms of speed and accuracy on the widely used Middlebury 2007 datasets. Furthermore, we introduce novel datasets with highly accurate ground truth, which, for the first time, enable to benchmark depth up sampling methods using real sensor data.

Summary (2 min read)

1. Introduction

  • Accurate, high resolution depth sensing is a fundamental challenge in computer vision.
  • Upsampling of a low resolution depth image (a) using an additional high resolution intensity image (b) through image guided anisotropic Total Generalized Variation (c). packet size and a low energy consumption make them applicable in mobile devices.

3.1. Depth Mapping

  • Since the low resolution depth map DL and the high resolution intensity image IH stem from different cameras, a mapping can only be established when intrinsic and extrinsic parameters are known (see Section 4.2).
  • In their setup the authors define the intensity camera as the world coordinate center.
  • Each depth measurement di,j at pixel position xi,j = [i, j, 1] T is projected into the high resolution intensity image space ΩH.
  • Therewith, the authors minimize the error which can occur due to this averaging in the high resolution space.
  • Through the regularization term, introduced in Section 3.2, the area between the projected depth pixels is implicitly interpolated.

3.2. Depth Image Upsampling

  • The authors upsampling method increases the resolution of measured depth data from a low resolution depth sensor by adding edge cues from a high resolution intensity image.
  • (2) This formulation is composed of the data term G(u,DS) that measures the fidelity of the argument u to the input depth measurements DS and the regularization term F (u) that reflects prior knowledge of the smoothness of their solution.
  • F and G are convex lower semi-continuous functions.
  • Because the TGV regularizer is convex it allows to compute a globally optimal solution.
  • Including this term in their TGV model the authors can penalize high depth discontinuities at homogeneous regions and allow sharp depth edges at corresponding texture differences.

3.3. Primal-Dual Optimization

  • The proposed optimization problem (6) is convex but non smooth due to TGV regularization term and the zeros in the weighting operator w.
  • To find a fast, global optimal solution for their problem the authors use the primal-dual energy minimization scheme, as proposed in [2, 6].
  • The authors reformulate the non-smooth problem in a convex-concave saddlepoint problem applying the Legendre Fenchel transform (LF) .
  • Second, the primal variables are updated using gradientdescent.

4. Evaluation

  • The authors show a quantitative and qualitative evaluation of their upsampling method.
  • (f) The authors upsampling method using image guided anisotropic TGV.
  • (e) removes noise but suffers from edge bleeding especially at small structure boundaries.

4.1. Middlebury Benchmark Evaluation

  • An exhaustive evaluation of their method in terms of quantitative and qualitative comparison is made using input images from the Middlebury datasets [10, 20].
  • The authors use the disparity image as groundtruth and the original RGB intensity image as input for their anisotropic diffusion tensor.
  • This experiment gives an objective comparison on the robustness, accuracy and speed of a variety of different algorithms.
  • While the Middlebury datasets are popular to evaluate depth upsampling methods, they neglect some important properties of real acquisition setups.
  • Typically, depth and intensity data do not originate from the same sensor and are therefore not aligned.

4.2. Benchmarking based on Real Sensor Data

  • The evaluation on real acquisitions is made using different scenes acquired with a Time of Flight (ToF) and an intensity camera simultaneously.
  • The rotation and translation between intensity and ToF camera is estimated by establishing a geometric correspondence through the feature points on the planar target.
  • Through a comparison of the very accurate 3D measurements of the calibration points and the measured ToF depth points a dependence between the acquired IR amplitude image and the measurement error can be established, as shown in Figure 4.
  • Using their depth calibration, the authors can compensate for that error (see green/dashed box).
  • In the visual and numerical results it can be seen that their method delivers high quality upsampling results at multiple frames per second for an approximate upsampling factor of ×6.25.

5. Conclusion

  • Low cost 3D sensor and an additional high resolution 2D sensor.the authors.
  • The upsampling is formulated as a global energy optimization problem using Total Generalized Variation (TGV) regularization.
  • For fast numerical optimization the authors use a first order primal-dual algorithm, which is efficiently parallelized resulting in high frame rates.
  • In a quantitative evaluation using widespread datasets the authors show that their method clearly outperforms existing state of the art methods in terms of speed and quality.
  • The authors further provide benchmarking datasets of real world scenes providing a highly accurate groundtruth that, for the first time, enable a real quality comparison of depth image upsampling methods.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Image Guided Depth Upsampling using Anisotropic Total Generalized Variation
David Ferstl, Christian Reinbacher, Rene Ranftl, Matthias R
¨
uther and Horst Bischof
Graz University of Technology
Institute for Computer Graphics and Vision
Inffeldgasse 16, 8010 Graz, AUSTRIA
{ferstl,reinbacher,ranftl,ruether,bischof}@icg.tugraz.at
Abstract
In this work we present a novel method for the chal-
lenging problem of depth image upsampling. Modern depth
cameras such as Kinect or Time of Flight cameras deliver
dense, high quality depth measurements but are limited in
their lateral resolution. To overcome this limitation we for-
mulate a convex optimization problem using higher order
regularization for depth image upsampling. In this opti-
mization an anisotropic diffusion tensor, calculated from a
high resolution intensity image, is used to guide the upsam-
pling. We derive a numerical algorithm based on a primal-
dual formulation that is efficiently parallelized and runs at
multiple frames per second. We show that this novel up-
sampling clearly outperforms state of the art approaches in
terms of speed and accuracy on the widely used Middlebury
2007 datasets. Furthermore, we introduce novel datasets
with highly accurate groundtruth, which, for the first time,
enable to benchmark depth upsampling methods using real
sensor data.
1. Introduction
Accurate, high resolution depth sensing is a fundamen-
tal challenge in computer vision. It is used in a variety
of different applications including object reconstruction,
robotic navigation and automotive driver assistance. Tradi-
tional computer vision approaches calculate the scene depth
through computational exhaustive stereo calculations or ex-
pensive laser range measurements.
Recently, Time of Flight (ToF ) range sensors became a
popular alternative for dense depth sensing. A per-pixel
depth is measured actively through the runtime of light. The
measurement is independent from scene texture and largely
independent from environmental lighting conditions. It de-
livers a dense depth map even at very close ranges [12, 21].
No additional calculations are necessary, which results in
depth measurements at high frame rates. Recently, ToF sen-
sors have become affordable in the mass market and a small
(a) Low resolution depth (b) High resolution intensity
(c) High resolution depth upsampling result
Figure 1. Upsampling of a low resolution depth image (a) using
an additional high resolution intensity image (b) through image
guided anisotropic Total Generalized Variation (c). Depth maps
are color coded for better visualization.
packet size and a low energy consumption make them appli-
cable in mobile devices. However, their main disadvantages
are a low resolution caused by chip size limitations and ac-
quisition noise due to limited active illumination energy.
In this work, we propose a method to drastically increase
the lateral measurement resolution by a novel depth map up-
sampling approach, as shown in Figure 1. To increase both,
quality and resolution, we add information from a high reso-
lution intensity camera in a variational optimization frame-
work. We build on the observation that textural edges are
more likely to appear at high depth discontinuities, whereas
homogeneous textured regions correspond to homogeneous
surface parts [23]. Fusing both, low resolution but very ro-
bust depth and high resolution intensity in a spatial sense,
results in a dense depth map with increased lateral resolu-
tion and visual quality.
2013 IEEE International Conference on Computer Vision
1550-5499/13 $31.00 © 2013 IEEE
DOI 10.1109/ICCV.2013.127
993
2013 IEEE International Conference on Computer Vision
1550-5499/13 $31.00 © 2013 IEEE
DOI 10.1109/ICCV.2013.127
993

We formulate the upsampling as a convex optimization
problem [2, 6]. The energy is composed of two terms. First,
the data term forces the solution to be similar to the input
depth measurements. Second, the higher order regulariza-
tion term enforces a piecewise affine solution, preserving
sharp edges according to the texture, while compensating
acquisition noise. This term is modeled as a second or-
der Total Generalized Variation (TGV) regularization and
is weighted according to the intensity image texture by an
anisotropic diffusion tensor.
The main contributions of this work are two-fold: (1) We
propose a novel method for fast depth image upsampling by
combining a low resolution depth image with high resolu-
tion texture information in a variational energy optimization
framework. The employed higher order regularization is
well suited to model the image acquisition process of mod-
ern depth cameras and leads to an improved quality of the
upsampled depth maps, compared to state of the art meth-
ods. (2) We propose benchmarking datasets that enable a
quantitative comparison of depth image upsampling meth-
ods providing real To F and intensity camera acquisitions to-
gether with a highly accurate groundtruth measurement. To
encourage further comparison and future work, these novel
datasets and MATLAB code of our method are available at
our website
1
.
In our experiments we demonstrate the upsampling qual-
ity by a numerical and visual comparison on synthetic and
real benchmarking datasets. Compared to state of the art
methods, our method is superior in terms of speed and ac-
curacy on all test sets.
2. Related Work
There are many ways to increase the resolution and the
accuracy of depth measurements. In general, they can be
separated in three main classes: (1) fusion of multiple depth
sensors, (2) temporal and spatial fusion and (3) upsampling
by combining depth and intensity sensors.
Multiple Depth Sensor Fusion Recent works addressed
the fusion of different depth sensing techniques to increase
resolution and quality. Gudmundsson et al .[8] presented a
method for stereo and Time of Flight (ToF ) depth map fu-
sion in a dynamic programming approach. Similar work has
been proposed by Zhu et al .[26] using an accurate depth
calibration and fusing the measurements in a Markov Ran-
dom Field (MRF) framework. Additionally to this spatial
fusion also a temporal fusion was performed by measuring
the frame-to-frame displacement acquired with high speed
intensity cameras.
1
http://rvlab.icg.tugraz.at/tofmark
Temporal and Spatial Upsampling A common way to
improve the resolution and quality of depth information is
to fuse multiple depth measurements into one depth map.
Schuon et al.[22] proposed a method to fuse ToF acquisi-
tions of slightly moved viewpoints. It uses a bilateral regu-
larization in a MRF optimization framework incorporating
also the ToF sensor characteristics. Based on this work, Cui
et al .[4] used a set of fused depth maps with larger displace-
ments. To create whole volumes of depth data Newcombe
et al .[14] proposed a method for simultaneous camera lo-
calization and depth fusion in real time.
Depth Upsampling through Intensity Information
This class of approaches uses additional intensity informa-
tion as depth cue for image upsampling. Yang et al .[24]
used bilateral filtering of a depth cost volume and a RGB
image in an iterative refinement process. Chan et al .[3]
used a noise aware joint bilateral filter to increase the res-
olution and to reduce depth map errors at multiple frames
per second. Diebel and Thrun [5] performed an upsampling
using a MRF formulation, where the smoothness term is
weighted according to texture derivatives. A more com-
plex approach was proposed by Park et al .[15]. They
used a combination of different weighting terms of a least
squares optimization including segmentation, image gradi-
ents, edge saliency and non-local means for depth upsam-
pling. The combination of intensity and depth data in a
Bayesian Framework was proposed by Li et al .[13].
Discussion While the methods for multiple sensor fusion
deliver accurate depth results, their quality relies on high
calibration effort. Further, most sensor fusion techniques
have to calculate a depth map from passive stereo in a pre-
processing step before the actual fusion is able to start. Con-
trary, temporal and spatial fusion approaches rely on mul-
tiple acquisitions from a single depth sensor. The major
drawback of these methods is that changing environments
during these acquisitions will harm the fusion result.
To overcome these limitations, we chose the combina-
tion of a low resolution depth and a high resolution in-
tensity sensor to increase the natural depth sensor resolu-
tion. The upsampling is calculated on a per image basis
without the need for complex preprocessing. Existing ap-
proaches, such as [3, 24], calculate this depth upsampling
by a bilateral filtering. While bilateral filtering techniques
can operate at high frame rates they have a drawback in
oversmoothing fine details. In contrast, our method builds
on the success of recently introduced upsampling methods
using MRF and least squares optimization [5, 15]. Unlike
them, our approach incorporates a higher order regulariza-
tion, which avoids surface flattening. Furthermore, we use
an anisotropic diffusion tensor based on the intensity image.
This tensor not only weights the depth gradient but also ori-
ents the gradient direction during the optimization process.
994994

3. Method
Our upsampling approach generates a high quality and
high resolution depth map D
H
out of a high resolution in-
tensity image I
H
and a low resolution and noisy depth map
D
L
, where I
H
,D
H
H
R
2
and D
L
L
R
2
. The
methodology of this approach can be divided into three
main areas: (1) Registering the low-resolution depth mea-
surements and the high resolution intensity information in
one common coordinate system (Section 3.1), (2) formu-
lating the depth upsampling problem into a convex energy
functional (Section 3.2), and (3) solving the optimization
problem with a first-order primal-dual optimization scheme
(Section 3.3).
3.1. Depth Mapping
Since the low resolution depth map D
L
and the high res-
olution intensity image I
H
stem from different cameras, a
mapping can only be established when intrinsic and ex-
trinsic parameters are known (see Section 4.2). In our
setup we define the intensity camera as the world coordi-
nate center. Each depth measurement d
i,j
at pixel position
x
i,j
=[i, j, 1]
T
is projected into the high resolution inten-
sity image space Ω
H
. This projection is calculated as
X
i,j
= C
L
+ d
i,j
P
L
x
i,j
P
L
x
i,j
˜x
i,j
= P
H
X
i,j
i, j Ω
L
,
(1)
where P
L
is the pseudoinverse of the depth camera projec-
tion matrix, C
L
the camera center and X
i,j
the 3D point.
Each 3D point is back projected by multiplication with the
projection matrix of the intensity camera P
H
. Hence, we
get a projected depth image D
S
consisting of a sparse set
of base depth points at position ˜x
i,j
in the intensity image
space Ω
H
where the depth value is given by the distance to
the 3D point X
i,j
(see Figure 2).
C
L
C
H
X
i,j
d
i,j
Figure 2. Projection from a low resolution depth map D
L
to a high
resolution sparse depth map D
S
in the intensity camera coordinate
system.
Although, one low resolution sensor pixel D
L
i, j mea-
sures the average depth of multiple pixels in the high reso-
lution space we only project it to one central pixel D
S
i, j
at position ˜x
i,j
. Therewith, we minimize the error which
can occur due to this averaging in the high resolution space.
Through the regularization term, introduced in Section 3.2,
the area between the projected depth pixels is implicitly in-
terpolated.
3.2. Depth Image Upsampling
Our upsampling method increases the resolution of mea-
sured depth data from a low resolution depth sensor by
adding edge cues from a high resolution intensity image.
To be able to use both information, we map the depth mea-
surements to the intensity camera coordinate system as de-
scribed in Section 3.1. With this mapping we get a depth
map D
S
of a sparse set of base depth measurements from
the low resolution depth sensor.
The high resolution depth map D
H
is given by
D
H
=argmin
u
{G(u, D
S
)+αF (u)} . (2)
This formulation is composed of the data term G(u, D
S
)
that measures the fidelity of the argument u to the input
depth measurements D
S
and the regularization term F (u)
that reflects prior knowledge of the smoothness of our solu-
tion. F and G are convex lower semi-continuous functions.
The scalar α is used to balance the relative weight between
the data and the regularization.
The data term in our energy model is designed to ensure a
data consistency to the base depth points D
S
from the depth
camera. Additionally, we allow to weight the depth mea-
surements with a weighting operator w =[0, 1] R
Ω
H
,
which is zero at unmapped image points and between zero
and one on the base points according to some application
specific confidence. Hence, the data term results in
G(u, D
S
)=
Ω
H
w|(u D
S
)|
2
dx, (3)
which penalizes deviations of the resulting depth from the
measured depth.
The regularization term has to meet the challenges of
producing a high resolution depth map out of a sparse
set of depth points. Most currently utilized regularization
terms are based on the first order smoothness assumption
[19], e.g. the Total Variation semi norm, which results in
F (u)=∇u
1
. While the simple model with L1 norm is
well suited for intensity image denoising, it has a disadvan-
tage when used for range data regularization. Through its
gradient penalization it favors constant solutions. This pre-
vents the depth map to become a piecewise smooth surface,
resulting in piecewise fronto parallel depth reconstructions.
Hence, we use a more generalized regularization model
namely the Total Generalized Variation (TGV) introduced
by Bredies et al.[1]. The TGV is composed of polynomi-
als of arbitrary order, which allows to reconstruct piecewise
995995

polynomial functions. An order of k favors solutions com-
posed of polynomials of order k 1. For depth upsampling,
it turns out that the second order TGV is sufficient, since
most objects can be well approximated by piecewise affine
surfaces. The primal definition of the second order TGV is
formulated as
TGV
2
α
=min
v
α
1
Ω
|∇u v| dx + α
0
Ω
|∇v| dx
,
(4)
where the scalars α
0
and α
1
are used to weight each order.
Because the TGV regularizer is convex it allows to compute
a globally optimal solution.
Assuming that texture edges most likely correspond to
depth discontinuities, we use the high resolution intensity
data to produce a more accurate upsampling result. Hence-
forth, we include an anisotropic diffusion tensor T
1
2
. This
tensor is calculated by
T
1
2
=exp(β |∇I
H
|
γ
) nn
T
+ n
n
T
, (5)
where n is the normalized direction of the image gradient
n =
I
H
|∇I
H
|
, n
is the normal vector to the gradient and the
scalars β, γ adjust the magnitude and the sharpness of the
tensor. The anisotropic diffusion tensor not only weights
the first order depth gradient but also orients the gradient
direction during the optimization process.
Including this term in our TGV model we can penalize
high depth discontinuities at homogeneous regions and al-
low sharp depth edges at corresponding texture differences.
A similar combination of TGV and weighting was used by
Ranftl et al .[18] for passive stereo reconstruction. With
the additional edge tensor information the optimization re-
sult leads to sharper and more defined edges in our solution.
Further, the regions where the depth data is interpolated are
filled out more reasonably.
The final energy is defined as a combination of data term
(3) and the TGV term (4) with anisotropic diffusion (5):
min
u,v
α
1
Ω
H
|T
1
2
(u v)| dx + α
0
Ω
H
|∇v| dx+
Ω
H
w|(u D
S
)|
2
dx
.
(6)
3.3. Primal-Dual Optimization
The proposed optimization problem (6) is convex but
non smooth due to TGV regularization term and the zeros
in the weighting operator w. To find a fast, global opti-
mal solution for our problem we use the primal-dual energy
minimization scheme, as proposed in [2, 6]. We reformu-
late the non-smooth problem in a convex-concave saddle-
point problem applying the Legendre Fenchel transform
(LF) . The optimization problem can be efficiently mini-
mized through gradient descent. The transformed saddle-
point problem of our energy functional (6)isgivenby
min
uR
MN
,vR
2MN
max
pP,qQ
α
1
T
1
2
(u v),p+
α
0
∇v, q +
i,jΩ
w
i,j
(u
i,j
D
Si,j
)
2
,
(7)
introducing the dual variables p and q. The feasible sets of
these variables are defined by
P =
p
H
R
2
|p
i,j
≤1, i, j Ω
H
, (8)
Q =
q
H
R
4
|q
i,j
≤1, i, j Ω
H
. (9)
This formulation is used in the primal-dual algorithm,
where the primal and dual variables are iteratively opti-
mized for the individual pixels in three steps. First, the
dual variables p and q are updated using gradient ascend.
Second, the primal variables are updated using gradient-
descent. Third, the primal variables are refined in an over-
relaxation step. The step sizes are chosen s.t. u
0
= D
S
,
v
0
,p
0
,q
0
=0, σ
p
> 0, σ
q
> 0, τ
u
> 0 and τ
v
> 0.For
any iteration n 0 the steps are calculated according to
p
n+1
= P
p
p
n
+ σ
p
α
1
T
1/2
(¯u
n
¯v
n
)

q
n+1
= P
q
{q
n
+ σ
q
α
0
¯v
n
}
u
n+1
=
u
n
+ τ
u
α
1
T
T
1/2
p
n+1
+ wD
S
1+τ
u
w
v
n+1
= v
n
+ τ
v
α
0
T
q
n+1
+ α
1
T
1/2
p
n+1
¯u
n+1
= u
n+1
+ θ(u
n+1
¯u
n
)
¯v
n+1
= v
n+1
+ θ(v
n+1
¯v
n
)
(10)
until a stopping criterion is reached. To fulfill the convex
optimality condition in the dual update step, the projection
operators P
p
and P
q
for p and q are calculated through
P
p
{˜p
i,j
} =
˜p
i,j
max (1, |˜p
i,j
|)
,
P
q
{˜q
i,j
} =
˜q
i,j
max (1, |˜q
i,j
|)
.
(11)
In practice the relaxation parameter θ is updated in every
iteration, according to [2], and the optimal step sizes are cal-
culated using preconditioning, as proposed in [17]. There-
with, we achieve a fast and guaranteed convergence to the
global optimal solution for different tensor conditions. The
gradient and divergence operators are approximated using
forward/backward differences with Neumann and Dirichlet
boundary conditions, respectively.
4. Evaluation
In this section, we show a quantitative and qualita-
tive evaluation of our upsampling method. For an exten-
sive evaluation we investigate the performance compared
996996

Art Books Moebius
Avg.Time [s]
x2 x4 x8 x16 x2 x4 x8 x16 x2 x4 x8 x16
Nearest 4.65 5.01 5.71 7.10 4.30 4.68 4.85 5.23 5.08 5.20 5.31 5.65 -
Bilinear 3.09 3.59 4.39 5.91 2.91 3.12 3.34 3.71 3.21 3.45 3.62 4.00 -
Yang et al .[24] 1.36 1.93
2.45 4.52 1.12 1.47 1.81 2.92 1.25 1.63 2.06 3.21 -
He et al .[9] 1.92 2.40 3.32 5.08 1.60 1.82 2.31 3.06 1.77 2.03 2.60 3.34 23.89
Diebel and Thrun [5] 1.62 2.24 3.85 5.70 1.34 2.08 2.85 3.54 1.47 2.29 3.09 3.81 -
Chan et al .[3] 1.83 2.90 4.75 7.70 1.04
1.36 1.94 3.07 1.17 1.55 2.28 3.55 3.02
2
Park et al.[15] 1.24 1.82 2.78 4.17 0.99 1.43 1.98 3.04 1.03 1.49 2.13 3.09 24.05
OURS 0.84 1.29 2.06 3.56 0.51 0.75 1.16 1.89 0.57 0.90 1.38 2.15 1.94
Table 1. Quantitative comparison on the Middlebury 2007 datasets with added noise. The error is measured as RMSE of the pixel disparity
for four different magnification factors (×2, ×4, ×8, ×16). The best result for each dataset and upscaling factor is highlighted and the
second best is underlined.
(a) RGB (b) input depth (c) Diebel (d) Chan (e) Park (f) OURS (g) Groundtruth
Figure 3. Visual comparison of ×8 upsampling on a snippet of the Middlebury Art dataset including fine structures. (a) RGB intensity
image, (b) low resolution input image (enlarged using nearest neighbor upsampling). (c) Upsampling using MRF proposed by Diebel and
Thrun [5]. (d) Adaptive bilateral upsampling proposed by Chan et al .[3]. (e) Nonlocal means upsampling proposed by Park et al .[15].
(f) Our upsampling method using image guided anisotropic TGV. The results in (c) and (d) still suffer from noise. (e) removes noise but
suffers from edge bleeding especially at small structure boundaries. Our method removes noise and preserves sharp object edges.
to state of the art approaches on the simulated Middlebury
2007 datasets [10, 20] in terms of speed and accuracy. Be-
yond this simulations, we evaluate our method on real data
with highly accurate groundtruth measurements. In our ex-
periments we use a 2 × 2 gradient operator to calculate the
intensity image gradients. The tensor parameters β and γ
as well as the TGV parameters α
0
and α
1
are manually set
once for each upsampling factor and are constant in syn-
thetic and the real world evaluations.
4.1. Middlebury Benchmark Evaluation
An exhaustive evaluation of our method in terms of
quantitative and qualitative comparison is made using in-
put images from the Middlebury datasets [10, 20]. We use
the disparity image as groundtruth and the original RGB
intensity image as input for our anisotropic diffusion ten-
sor. Park et al.[15] provides low resolution input depth
images with different downsampling factors (×2, ×4, ×8,
×16). To simulate the acquisition process, these input im-
ages contain additional Gaussian noise with a standard devi-
ation that increases with the disparity. Using these datasets
we are able to compare our results with the Markov Random
2
This is an extrapolation of the runtime the authors report on images of
size 800 × 600.
Field (MRF) based approach of Diebel and Thrun [5], the
bilateral filtering with cost volume refinement of Yang et al .
[24], the guided image filtering approach of He et al .[9], the
noise-aware bilateral filter approach by Chan et al .[3] and
the non-local means filtering by Park et al .[15]. Further,
we compare the results to common interpolation methods.
The confidence measure w in our functional is set to 1 for
all depth points. The parameters α
0
and α
1
have been kept
fixed for all datasets and have been empirically chosen for
×2 / ×4 / ×8 / ×16 as 0.154, 0.023 / 0.05, 0.0056 / 0.267,
0.03 / 0.267, 0.03.
This experiment gives an objective comparison on the ro-
bustness, accuracy and speed of a variety of different algo-
rithms. The numerical results for this experiment in terms of
the root mean squared error (RMSE) and computation time
are shown in Table 1. A visual comparison for the different
methods is given in Figure 3. Further quantitative compar-
isons to other depth upsampling methods on the Middlebury
2003 and 2007 datasets can be found in the supplemental
material.
Discussion What can be clearly seen is that our method
delivers an upsampling quality that is superior compared
to state of the art methods at a lower computation time.
997997

Citations
More filters
Proceedings ArticleDOI
01 Oct 2017
TL;DR: This paper proposes a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation, and demonstrates the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches.
Abstract: In this paper, we consider convolutional neural networks operating on sparse inputs with an application to depth upsampling from sparse laser scan data. First, we show that traditional convolutional networks perform poorly when applied to sparse data even when the location of missing data is provided to the network. To overcome this problem, we propose a simple yet effective sparse convolution layer which explicitly considers the location of missing data during the convolution operation. We demonstrate the benefits of the proposed network architecture in synthetic and real experiments with respect to various baseline approaches. Compared to dense baselines, the proposed sparse convolution network generalizes well to novel datasets and is invariant to the level of sparsity in the data. For our evaluation, we derive a novel dataset from the KITTI benchmark, comprising 93k depth annotated RGB images. Our dataset allows for training and evaluating depth upsampling and depth prediction techniques in challenging real-world settings and will be made available upon publication.

518 citations


Cites background or methods or result from "Image Guided Depth Upsampling Using..."

  • ...[12] perform slightly better than our method on very sparse data but require a dense high-resolution RGB image for guidance....

    [...]

  • ...Note that in contrast to other techniques [12, 52] which artificially upsample the input (e....

    [...]

  • ...We compare our unguided approach to several baselines [1, 12, 30, 58] which leverage RGB guidance for upsampling and two standard convolutional neural networks with and without valid mask concatenated to the input....

    [...]

  • ...More advanced approaches are based on global energy minimization [1, 6, 12, 49, 51], compressive sensing [22], or incorporate semantics for improved performance [58]....

    [...]

Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, a deep network is trained to predict surface normals and occlusion boundaries, which are then combined with raw depth observations provided by the RGB-D camera to solve for all pixels, including those missing in the original observation.
Abstract: The goal of our work is to complete the depth channel of an RGB-D image. Commodity-grade depth cameras often fail to sense depth for shiny, bright, transparent, and distant surfaces. To address this problem, we train a deep network that takes an RGB image as input and predicts dense surface normals and occlusion boundaries. Those predictions are then combined with raw depth observations provided by the RGB-D camera to solve for depths for all pixels, including those missing in the original observation. This method was chosen over others (e.g., inpainting depths directly) as the result of extensive experiments with a new depth completion benchmark dataset, where holes are filled in training data through the rendering of surface reconstructions created from multiview RGB-D scans. Experiments with different network inputs, depth representations, loss functions, optimization methods, inpainting methods, and deep depth estimation networks show that our proposed approach provides better depth completions than these alternatives.

353 citations

Book ChapterDOI
08 Oct 2016
TL;DR: A novel algorithm for edge-aware smoothing that combines the flexibility and speed of simple filtering approaches with the accuracy of domain-specific optimization algorithms, fast, robust, straightforward to generalize to new domains, and simple to integrate into deep learning pipelines.
Abstract: We present the bilateral solver, a novel algorithm for edge-aware smoothing that combines the flexibility and speed of simple filtering approaches with the accuracy of domain-specific optimization algorithms. Our technique is capable of matching or improving upon state-of-the-art results on several different computer vision tasks (stereo, depth superresolution, colorization, and semantic segmentation) while being 10–1000\(\times \) faster than baseline techniques with comparable accuracy, and producing lower-error output than techniques with comparable runtimes. The bilateral solver is fast, robust, straightforward to generalize to new domains, and simple to integrate into deep learning pipelines.

336 citations


Cites background or methods from "Image Guided Depth Upsampling Using..."

  • ...Optimization algorithms of this nature have been used in global stereo [30], semantic segmentation [7, 20, 25, 38], depth superresolution [8, 17, 22, 24, 26, 27], and colorization [23]....

    [...]

  • ...With the advent of consumer depth sensors, techniques have been proposed for the task of upsampling the noisy depth maps produced by these sensors using a highresolution RGB reference image [8, 17, 22, 24, 26, 27]....

    [...]

  • ...The runtimes we report in Table 3 were either produced by ourselves (on a 2012 HP Z420 workstation) or taken from past work [8, 22]....

    [...]

  • ...To evaluate our model, we use a depth superresolution benchmark [8] which is based on the Middlebury stereo dataset [30]....

    [...]

  • ...Table 3: Performance on the depth superresolution task [8]....

    [...]

Book ChapterDOI
08 Oct 2016
TL;DR: A new method to address the problem of depth map super resolution in which a high-resolution (HR) depth map is inferred from a LR depth map and an additional HR intensity image of the same scene is presented.
Abstract: Depth boundaries often lose sharpness when upsampling from low-resolution (LR) depth maps especially at large upscaling factors. We present a new method to address the problem of depth map super resolution in which a high-resolution (HR) depth map is inferred from a LR depth map and an additional HR intensity image of the same scene. We propose a Multi-Scale Guided convolutional network (MSG-Net) for depth map super resolution. MSG-Net complements LR depth features with HR intensity features using a multi-scale fusion strategy. Such a multi-scale guidance allows the network to better adapt for upsampling of both fine- and large-scale structures. Specifically, the rich hierarchical HR intensity features at different levels progressively resolve ambiguity in depth map upsampling. Moreover, we employ a high-frequency domain training method to not only reduce training time but also facilitate the fusion of depth and intensity features. With the multi-scale guidance, MSG-Net achieves state-of-art performance for depth map upsampling.

317 citations


Cites methods from "Image Guided Depth Upsampling Using..."

  • ...Figure S2 shows the convergence curves using f2 ∈ (3, 9, 11)....

    [...]

Journal ArticleDOI
Jingyu Yang1, Xinchen Ye1, Kun Li1, Chunping Hou1, Yao Wang2 
TL;DR: Being able to handle various types of depth degradations, the proposed method is versatile for mainstream depth sensors, time-of-flight camera, and Kinect, as demonstrated by experiments on real systems.
Abstract: This paper proposes an adaptive color-guided autoregressive (AR) model for high quality depth recovery from low quality measurements captured by depth cameras. We observe and verify that the AR model tightly fits depth maps of generic scenes. The depth recovery task is formulated into a minimization of AR prediction errors subject to measurement consistency. The AR predictor for each pixel is constructed according to both the local correlation in the initial depth map and the nonlocal similarity in the accompanied high quality color image. We analyze the stability of our method from a linear system point of view, and design a parameter adaptation scheme to achieve stable and accurate depth recovery. Quantitative and qualitative evaluation compared with ten state-of-the-art schemes show the effectiveness and superiority of our method. Being able to handle various types of depth degradations, the proposed method is versatile for mainstream depth sensors, time-of-flight camera, and Kinect, as demonstrated by experiments on real systems.

300 citations


Cites methods from "Image Guided Depth Upsampling Using..."

  • ...In applications using depth cameras, these complex systematic errors are usually calibrated and compensated as a preprocessing step before subsequence processing [44], [45]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, a constrained optimization type of numerical algorithm for removing noise from images is presented, where the total variation of the image is minimized subject to constraints involving the statistics of the noise.

15,225 citations


Additional excerpts

  • ...…according to⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ pn+1 = Pp { pn + σpα1 ( T 1/2(∇ūn − v̄n))} qn+1 = Pq {qn + σqα0∇v̄n} un+1 = un + τu ( α1∇TT 1/2pn+1 + wDS ) 1 + τuw vn+1 = vn + τv ( α0∇Tqn+1 + α1T 1/2pn+1 ) ūn+1 = un+1 + θ(un+1 − ūn) v̄n+1 = vn+1 + θ(vn+1 − v̄n) (10) until a stopping criterion is reached....

    [...]

Journal ArticleDOI
ZhenQiu Zhang1
TL;DR: A flexible technique to easily calibrate a camera that only requires the camera to observe a planar pattern shown at a few (at least two) different orientations is proposed and advances 3D computer vision one more step from laboratory environments to real world use.
Abstract: We propose a flexible technique to easily calibrate a camera. It only requires the camera to observe a planar pattern shown at a few (at least two) different orientations. Either the camera or the planar pattern can be freely moved. The motion need not be known. Radial lens distortion is modeled. The proposed procedure consists of a closed-form solution, followed by a nonlinear refinement based on the maximum likelihood criterion. Both computer simulation and real data have been used to test the proposed technique and very good results have been obtained. Compared with classical techniques which use expensive equipment such as two or three orthogonal planes, the proposed technique is easy to use and flexible. It advances 3D computer vision one more step from laboratory environments to real world use.

13,200 citations


"Image Guided Depth Upsampling Using..." refers background in this paper

  • ...As a future perspective, it will be extended to incorporate a temporal coherence in a consistent way, eventually leading to depth reconstructions with even higher accuracy....

    [...]

Journal ArticleDOI
TL;DR: The guided filter is a novel explicit image filter derived from a local linear model that can be used as an edge-preserving smoothing operator like the popular bilateral filter, but it has better behaviors near edges.
Abstract: In this paper, we propose a novel explicit image filter called guided filter. Derived from a local linear model, the guided filter computes the filtering output by considering the content of a guidance image, which can be the input image itself or another different image. The guided filter can be used as an edge-preserving smoothing operator like the popular bilateral filter [1], but it has better behaviors near edges. The guided filter is also a more generic concept beyond smoothing: It can transfer the structures of the guidance image to the filtering output, enabling new filtering applications like dehazing and guided feathering. Moreover, the guided filter naturally has a fast and nonapproximate linear time algorithm, regardless of the kernel size and the intensity range. Currently, it is one of the fastest edge-preserving filters. Experiments show that the guided filter is both effective and efficient in a great variety of computer vision and computer graphics applications, including edge-aware smoothing, detail enhancement, HDR compression, image matting/feathering, dehazing, joint upsampling, etc.

4,730 citations


"Image Guided Depth Upsampling Using..." refers methods in this paper

  • ...The proposed method is not limited to single image upsampling....

    [...]

Journal ArticleDOI
TL;DR: A first-order primal-dual algorithm for non-smooth convex optimization problems with known saddle-point structure can achieve O(1/N2) convergence on problems, where the primal or the dual objective is uniformly convex, and it can show linear convergence, i.e. O(ωN) for some ω∈(0,1), on smooth problems.
Abstract: In this paper we study a first-order primal-dual algorithm for non-smooth convex optimization problems with known saddle-point structure. We prove convergence to a saddle-point with rate O(1/N) in finite dimensions for the complete class of problems. We further show accelerations of the proposed algorithm to yield improved rates on problems with some degree of smoothness. In particular we show that we can achieve O(1/N 2) convergence on problems, where the primal or the dual objective is uniformly convex, and we can show linear convergence, i.e. O(? N ) for some ??(0,1), on smooth problems. The wide applicability of the proposed algorithm is demonstrated on several imaging problems such as image denoising, image deconvolution, image inpainting, motion estimation and multi-label image segmentation.

4,487 citations


"Image Guided Depth Upsampling Using..." refers background or methods in this paper

  • ...1550-5499/13 $31.00 © 2013 IEEE DOI 10.1109/ICCV.2013.127 993 We formulate the upsampling as a convex optimization problem [2, 6]....

    [...]

  • ...The gradient and divergence operators are approximated using forward/backward differences with Neumann and Dirichlet boundary conditions, respectively....

    [...]

Proceedings ArticleDOI
26 Oct 2011
TL;DR: A system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware, which fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real- time.
Abstract: We present a system for accurate real-time mapping of complex and arbitrary indoor scenes in variable lighting conditions, using only a moving low-cost depth camera and commodity graphics hardware. We fuse all of the depth data streamed from a Kinect sensor into a single global implicit surface model of the observed scene in real-time. The current sensor pose is simultaneously obtained by tracking the live depth frame relative to the global model using a coarse-to-fine iterative closest point (ICP) algorithm, which uses all of the observed depth data available. We demonstrate the advantages of tracking against the growing full surface model compared with frame-to-frame tracking, obtaining tracking and mapping results in constant time within room sized scenes with limited drift and high accuracy. We also show both qualitative and quantitative results relating to various aspects of our tracking and mapping system. Modelling of natural scenes, in real-time with only commodity sensor and GPU hardware, promises an exciting step forward in augmented reality (AR), in particular, it allows dense surfaces to be reconstructed in real-time, with a level of detail and robustness beyond any solution yet presented using passive computer vision.

4,184 citations


"Image Guided Depth Upsampling Using..." refers methods in this paper

  • ...To create whole volumes of depth data Newcombe et al ....

    [...]