scispace - formally typeset
Open AccessJournal ArticleDOI

DehazeNet: An End-to-End System for Single Image Haze Removal

Reads0
Chats0
TLDR
DehazeNet as discussed by the authors adopts convolutional neural network-based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing.
Abstract
Single image haze removal is a challenging ill-posed problem. Existing methods use various constraints/priors to get plausible dehazing solutions. The key to achieve haze removal is to estimate a medium transmission map for an input hazy image. In this paper, we propose a trainable end-to-end system called DehazeNet, for medium transmission estimation. DehazeNet takes a hazy image as input, and outputs its medium transmission map that is subsequently used to recover a haze-free image via atmospheric scattering model. DehazeNet adopts convolutional neural network-based deep architecture, whose layers are specially designed to embody the established assumptions/priors in image dehazing. Specifically, the layers of Maxout units are used for feature extraction, which can generate almost all haze-relevant features. We also propose a novel nonlinear activation function in DehazeNet, called bilateral rectified linear unit, which is able to improve the quality of recovered haze-free image. We establish connections between the components of the proposed DehazeNet and those used in existing methods. Experiments on benchmark images show that DehazeNet achieves superior performance over existing methods, yet keeps efficient and easy to use.

read more

Content maybe subject to copyright    Report

© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses,
in any current or future media, including reprinting/republishing this material for advertising or promotional
purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted
component of this work in other works.

1
DehazeNet: An End-to-End System for Single Image
Haze Removal
Bolun Cai, Xiangmin Xu, Member, IEEE, Kui Jia, Member, IEEE, Chunmei Qing, Member, IEEE,
and Dacheng Tao, Fellow, IEEE
Abstract—Single image haze removal is a challenging ill-posed
problem. Existing methods use various constraints/priors to get
plausible dehazing solutions. The key to achieve haze removal is
to estimate a medium transmission map for an input hazy image.
In this paper, we propose a trainable end-to-end system called
DehazeNet, for medium transmission estimation. DehazeNet takes
a hazy image as input, and outputs its medium transmission map
that is subsequently used to recover a haze-free image via atmo-
spheric scattering model. DehazeNet adopts Convolutional Neural
Networks (CNN) based deep architecture, whose layers are
specially designed to embody the established assumptions/priors
in image dehazing. Specifically, layers of Maxout units are used
for feature extraction, which can generate almost all haze-relevant
features. We also propose a novel nonlinear activation function
in DehazeNet, called Bilateral Rectified Linear Unit (BReLU),
which is able to improve the quality of recovered haze-free image.
We establish connections between components of the proposed
DehazeNet and those used in existing methods. Experiments
on benchmark images show that DehazeNet achieves superior
performance over existing methods, yet keeps efficient and easy
to use.
KeywordsDehaze, image restoration, deep CNN, BReLU.
I. INTRODUCTION
H
AZE is a traditional atmospheric phenomenon where
dust, smoke and other dry particles obscure the clarity
of the atmosphere. Haze causes issues in the area of terrestrial
photography, where the light penetration of dense atmosphere
may be necessary to image distant subjects. This results in the
visual effect of a loss of contrast in the subject, due to the
effect of light scattering through the haze particles. For these
reasons, haze removal is desired in both consumer photography
and computer vision applications.
Haze removal is a challenging problem because the haze
transmission depends on the unknown depth which varies at
different positions. Various techniques of image enhancement
have been applied to the problem of removing haze from a
single image, including histogram-based [1], contrast-based [2]
B. Cai, X. Xu (
) and C. Qing are with School of Electronic and
Information Engineering, South China University of Technology, Wushan
RD., Tianhe District, Guangzhou, P.R.China. E-mail: {caibolun@gmail.com,
xmxu@scut.edu.cn, qchm@scut.edu.cn}.
K. Jia is with the Department of Electrical and Computer Engineering,
Faculty of Science and Technology, University of Macau, Macau 999078,
China. E-mail: {kuijia@gmail.com}.
D. Tao is with Centre for Quantum Computation & Intelligent Sys-
tems, Faculty of Engineering & Information Technology, University of Tech-
nology Sydney, 235 Jones Street, Ultimo, NSW 2007, Australia. E-mail:
{dacheng.tao@uts.edu.au}.
and saturation-based [3]. In addition, methods using multiple
images or depth information have also been proposed. For
example, polarization based methods [4] remove the haze
effect through multiple images taken with different degrees of
polarization. In [5], multi-constraint based methods are applied
to multiple images capturing the same scene under different
weather conditions. Depth-based methods [6] require some
depth information from user inputs or known 3D models. In
practice, depth information or multiple hazy images are not
always available.
Single image haze removal has made significant progresses
recently, due to the use of better assumptions and priors.
Specifically, under the assumption that the local contrast of the
haze-free image is much higher than that of hazy image, a local
contrast maximizing method [7] based on Markov Random
Field (MRF) is proposed for haze removal. Although contrast
maximizing approach is able to achieve impressive results, it
tends to produce over-saturated images. In [8], Independent
Component Analysis (ICA) based on minimal input is pro-
posed to remove the haze from color images, but the approach
is time-consuming and cannot be used to deal with dense-haze
images. Inspired by dark-object subtraction technique, Dark
Channel Prior (DCP) [9] is discovered based on empirical
statistics of experiments on haze-free images, which shows
at least one color channel has some pixels with very low
intensities in most of non-haze patches. With dark channel
prior, the thickness of haze is estimated and removed by the
atmospheric scattering model. However, DCP loses dehazing
quality in the sky images and is computationally intensive.
Some improved algorithms are proposed to overcome these
limitations. To improve dehazing quality, Kratz and Nishino
et al. [10] model the image with a factorial MRF to estimate
the scene radiance more accurately; Meng et al. [11] propose
an effective regularization dehazing method to restore the haze-
free image by exploring the inherent boundary constraint. To
improve computational efficiency, standard median filtering
[12], median of median filter [13], guided joint bilateral
filtering [14] and guided image filter [15] are used to replace
the time-consuming soft matting [16]. In recent years, haze-
relevant priors are investigated in machine learning framework.
Tang et al. [17] combine four types of haze-relevant features
with Random Forests to estimate the transmission. Zhu et al.
[18] create a linear model for estimating the scene depth of
the hazy image under color attenuation prior and learns the
parameters of the model with a supervised method. Despite the
remarkable progress, these state-of-the-art methods are limited
by the very same haze-relevant priors or heuristic cues - they
arXiv:1601.07661v2 [cs.CV] 17 May 2016

2
are often less effective for some images.
Haze removal from a single image is a difficult vision task.
In contrast, the human brain can quickly identify the hazy area
from the natural scenery without any additional information.
One might be tempted to propose biologically inspired models
for image dehazing, by following the success of bio-inspired
CNNs for high-level vision tasks such as image classification
[19], face recognition [20] and object detection [21]. In fact,
there have been a few (convolutional) neural network based
deep learning methods that are recently proposed for low-level
vision tasks of image restoration/reconstruction [22], [23],
[24]. However, these methods cannot be directly applied to
single image haze removal.
Note that apart from estimation of a global atmospheric light
magnitude, the key to achieve haze removal is to recover an
accurate medium transmission map. To this end, we propose
DehazeNet, a trainable CNN based end-to-end system for
medium transmission estimation. DehazeNet takes a hazy
image as input, and outputs its medium transmission map that
is subsequently used to recover the haze-free image by a simple
pixel-wise operation. Design of DehazeNet borrows ideas from
established assumptions/principles in image dehazing, while
parameters of all its layers can be automatically learned from
training hazy images. Experiments on benchmark images show
that DehazeNet gives superior performance over existing meth-
ods, yet keeps efficient and easy to use. Our main contributions
are summarized as follows.
1) DehazeNet is an end-to-end system. It directly learns
and estimates the mapping relations between hazy im-
age patches and their medium transmissions. This is
achieved by special design of its deep architecture to
embody established image dehazing principles.
2) We propose a novel nonlinear activation function in
DehazeNet, called Bilateral Rectified Linear Unit
1
(BReLU). BReLU extends Rectified Linear Unit
(ReLU) and demonstrates its significance in obtaining
accurate image restoration. Technically, BReLU uses
the bilateral restraint to reduce search space and im-
prove convergence.
3) We establish connections between components of De-
hazeNet and those assumptions/priors used in existing
dehazing methods, and explain that DehazeNet im-
proves over these methods by automatically learning
all these components from end to end.
The remainder of this paper is organized as follows. In
Section II, we review the atmospheric scattering model and
haze-relevant features, which provides background knowledge
to understand the design of DehazeNet. In Section III, we
present details of the proposed DehazeNet, and discuss how
it relates to existing methods. Experiments are presented in
1
During the preparation of this manuscript (in December, 2015), we
find that a nonlinear activation function called adjustable bounded rectifier
is proposed in [25] (arXived in November, 2015), which is almost identical
to BReLU. Adjustable bounded rectifier is motivated to achieve the objective
of image recognition. In contrast, BReLU is proposed here to improve image
restoration accuracy. It is interesting that we come to the same activation
function from completely different initial objectives. This may also suggest
the general usefulness of the proposed BReLU.
Section IV, before conclusion is drawn in Section V.
II. RELATED WORKS
Many image dehazing methods have been proposed in the
literature. In this section, we briefly review some important
ones, paying attention to those proposing the atmospheric
scattering model, which is the basic underlying model of
image dehazing, and those proposing useful assumptions for
computing haze-relevant features.
A. Atmospheric Scattering Model
To describe the formation of a hazy image, the atmospheric
scattering model is first proposed by McCartney [26], which
is further developed by Narasimhan and Nayar [27], [28]. The
atmospheric scattering model can be formally written as
I (x) = J (x) t (x) + α (1 t (x)) , (1)
where I (x) is the observed hazy image, J (x) is the real scene
to be recovered, t (x) is the medium transmission, α is the
global atmospheric light, and x indexes pixels in the observed
hazy image I. Fig. 1 gives an illustration. There are three
unknowns in equation (1), and the real scene J (x) can be
recovered after α and t (x) are estimated.
The medium transmission map t (x) describes the light
portion that is not scattered and reaches the camera. t (x) is
defined as
t (x) = e
βd(x)
, (2)
where d (x) is the distance from the scene point to the camera,
and β is the scattering coefficient of the atmosphere. Equation
(2) suggests that when d (x) goes to infinity, t (x) approaches
zero. Together with equation (1) we have
α = I (x) , d (x) inf (3)
In practical imaging of a distance view, d (x) cannot be
infinity, but rather be a long distance that gives a very low
transmission t
0
. Instead of relying on equation (3) to get the
global atmospheric light α, it is more stably estimated based
on the following rule
α = max
y∈{x|t(x)t
0
}
I (y) (4)
The discussion above suggests that to recover a clean scene
(i.e., to achieve haze removal), it is the key to estimate an
accurate medium transmission map.
B. Haze-relevant features
Image dehazing is an inherently ill-posed problem. Based
on empirical observations, existing methods propose various
assumptions or prior knowledge that are utilized to compute
intermediate haze-relevant features. Final haze removal can be
achieved based on these haze-relevant features.

3
(a) The process of imaging in hazy weather. The transmission attenuation
J (x) t (x) caused by the reduction in reflected energy, leads to low brightness
intensity. The airlight α (1 t (x)) formed by the scattering of the environ-
mental illumination, enhances the brightness and reduces the saturation.
(b) Atmospheric scattering model. The observed hazy image I (x) is gener-
ated by the real scene J (x), the medium transmission t (x) and the global
atmospheric light α.
Fig. 1. Imaging in hazy weather and atmospheric scattering model
1) Dark Channel: The dark channel prior is based on the
wide observation on outdoor haze-free images. In most of the
haze-free patches, at least one color channel has some pixels
whose intensity values are very low and even close to zero.
The dark channel [9] is defined as the minimum of all pixel
colors in a local patch:
D (x) = min
y
r
(x)
min
c∈{r,g,b}
I
c
(y)
, (5)
where I
c
is a RGB color channel of I and
r
(x) is a
local patch centered at x with the size of r × r. The dark
channel feature has a high correlation to the amount of haze
in the image, and is used to estimate the medium transmission
t (x) 1 D (x) directly.
2) Maximum Contrast: According to the atmospheric scat-
tering, the contrast of the image is reduced by the haze trans-
mission as
P
x
k∇I (x)k = t
P
x
k∇J (x)k
P
x
k∇J (x)k .
Based on this observation, the local contrast [7] as the variance
of pixel intensities in a s×s local patch
s
with respect to the
center pixel, and the local maximum of local contrast values
in a r × r region
r
is defined as:
C (x) = max
y
r
(x)
v
u
u
t
1
|
s
(y)|
X
z
s
(y)
kI (z) I (y)k
2
, (6)
where |
s
(y)| is the cardinality of the local neighborhood.
The correlation between the contrast feature and the medium
transmission t is visually obvious, so the visibility of the image
is enhanced by maximizing the local contrast showed as (6).
3) Color Attenuation: The saturation I
s
(x) of the patch
decreases sharply while the color of the scene fades under
the influence of the haze, and the brightness value I
v
(x)
increases at the same time producing a high value for the
difference. According to the above color attenuation prior [18],
the difference between the brightness and the saturation is
utilized to estimate the concentration of the haze:
A (x) = I
v
(x) I
s
(x) , (7)
where I
v
(x) and I
h
(x) can be expressed in the HSV
color space as I
v
(x) = max
c∈{r,b,g}
I
c
(x) and I
s
(x) =
max
c∈{r,b,g}
I
c
(x) min
c∈{r,b,g}
I
c
(x)

max
c∈{r,b,g}
I
c
(x).
The color attenuation feature is proportional to the scene
depth d (x) A (x), and is used for transmission estimation
easily.
4) Hue Disparity: Hue disparity between the origi-
nal image I (x) and its semi-inverse image, I
si
(x) =
max [I
c
(x) , 1 I
c
(x)] with c {r, g, b}, has been used to
detect the haze. For haze-free images, pixel values in the three
channels of their semi-inverse images will not all flip, resulting
in large hue changes between I
si
(x) and I (x). In [29], the
hue disparity feature is defined:
H (x) =
I
h
si
(x) I
h
(x)
, (8)
where the superscript h denotes the hue channel of the
image in HSV color space. According to (8), the medium
transmission t (x) is in inverse propagation to H (x).
III. THE PROPOSED DEHAZENET
The atmospheric scattering model in Section II-A suggests
that estimation of the medium transmission map is the most
important step to recover a haze-free image. To this end,
we propose DehazeNet, a trainable end-to-end system that
explicitly learns the mapping relations between raw hazy
images and their associated medium transmission maps. In
this section, we present layer designs of DehazeNet, and
discuss how these designs are related to ideas in existing
image dehazing methods. The final pixel-wise operation to
get a recovered haze-free image from the estimated medium
transmission map will be presented in Section IV.
A. Layer Designs of DehazeNet
The proposed DehazeNet consists of cascaded convolutional
and pooling layers, with appropriate nonlinear activation func-
tions employed after some of these layers. Fig. 2 shows the
architecture of DehazeNet. Layers and nonlinear activations

4
Fig. 2. The architecture of DehazeNet. DehazeNet conceptually consists of four sequential operations (feature extraction, multi-scale mapping, local extremum
and non-linear regression), which is constructed by 3 convolution layers, a max-pooling, a Maxout unit and a BReLU activation function.
of DehazeNet are designed to implement four sequential op-
erations for medium transmission estimation, namely, feature
extraction, multi-scale mapping, local extremum, and nonlinear
regression. We detail these designs as follows.
1) Feature Extraction: To address the ill-posed nature of
image dehazing problem, existing methods propose various
assumptions and based on these assumptions, they are able
to extract haze-relevant features (e.g., dark channel, hue dis-
parity, and color attenuation) densely over the image domain.
Note that densely extracting these haze-relevant features is
equivalent to convolving an input hazy image with appropriate
filters, followed by nonlinear mappings. Inspired by extremum
processing in color channels of those haze-relevant features,
an unusual activation function called Maxout unit [30] is
selected as the non-linear mapping for dimension reduction.
Maxout unit is a simple feed-forward nonlinear activation
function used in multi-layer perceptron or CNNs. When used
in CNNs, it generates a new feature map by taking a pixel-wise
maximization operation over k affine feature maps. Based on
Maxout unit, we design the first layer of DehazeNet as follows
F
i
1
(x) = max
j[1,k]
f
i,j
1
(x) , f
i,j
1
= W
i,j
1
I + B
i,j
1
, (9)
where W
1
= {W
i,j
1
}
(n
1
,k)
(i,j)=(1,1)
and B
1
= {B
i,j
1
}
(n
1
,k)
(i,j)=(1,1)
represent the filters and biases respectively, and denotes the
convolution operation. Here, there are n
1
output feature maps
in the first layer. W
i,j
1
R
3×f
1
×f
1
is one of the total k × n
1
convolution filters, where 3 is the number of channels in the
input image I (x), and f
1
is the spatial size of a filter (detailed
in Table I). Maxout unit maps each of kn
1
-dimensional vectors
into an n
1
-dimensional one, and extracts the haze-relevant
features by automatic learning rather than heuristic ways in
existing methods.
2) Multi-scale Mapping: In [17], multi-scale features have
been proven effective for haze removal, which densely com-
pute features of an input image at multiple spatial scales.
Multi-scale feature extraction is also effective to achieve
scale invariance. For example, the inception architecture in
GoogLeNet [31] uses parallel convolutions with varying filter
sizes, and better addresses the issue of aligning objects in input
images, resulting in state-of-the-art performance in ILSVRC14
[32]. Motivated by these successes of multi-scale feature ex-
traction, we choose to use parallel convolutional operations in
the second layer of DehazeNet, where size of any convolution
filter is among 3 × 3, 5 × 5 and 7 × 7, and we use the same
number of filters for these three scales. Formally, the output
of the second layer is written as
F
i
2
= W
di/3e,(i\3)
2
F
1
+ B
2
di/3e,(i\3)
, (10)
where W
2
= {W
p,q
2
}
(3,n
2
/3)
(p,q)=(1,1)
and B
2
= {B
p,q
2
}
(3,n
2
/3)
(p,q)=(1,1)
contain n
2
pairs of parameters that is break up into 3 groups.
n
2
is the output dimension of the second layer, and i [1, n
2
]
indexes the output feature maps. de takes the integer upwardly
and \ denotes the remainder operation.
3) Local Extremum: To achieve spatial invariance, the cor-
tical complex cells in the visual cortex receive responses
from the simple cells for linear feature integration. Ilan et al.
[33] proposed that spatial integration properties of complex
cells can be described by a series of pooling operations.
According to the classical architecture of CNNs [34], the
neighborhood maximum is considered under each pixel to
overcome local sensitivity. In addition, the local extremum is in
accordance with the assumption that the medium transmission
is locally constant, and it is commonly to overcome the noise
of transmission estimation. Therefore, we use a local extremum
operation in the third layer of DehazeNet.
F
i
3
(x) = max
yΩ(x)
F
i
2
(y) , (11)
where (x) is an f
3
× f
3
neighborhood centered at x, and
the output dimension of the third layer n
3
= n
2
. In contrast
to max-pooling in CNNs, which usually reduce resolutions of
feature maps, the local extremum operation here is densely

Citations
More filters
Proceedings ArticleDOI

AOD-Net: All-in-One Dehazing Network

TL;DR: An image dehazing model built with a convolutional neural network (CNN) based on a re-formulated atmospheric scattering model, called All-in-One Dehazing Network (AOD-Net), which demonstrates superior performance than the state-of-the-art in terms of PSNR, SSIM and the subjective visual quality.
Journal ArticleDOI

Benchmarking Single-Image Dehazing and Beyond

TL;DR: In this article, the authors present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called Realistic Single-Image DEhazing (RESIDE).
Proceedings ArticleDOI

Densely Connected Pyramid Dehazing Network

TL;DR: Zhang et al. as discussed by the authors proposed a Densely Connected Pyramid Dehazing Network (DCPDN), which can jointly learn the transmission map, atmospheric light and dehazing all together.
Proceedings ArticleDOI

Deep Joint Rain Detection and Removal from a Single Image

TL;DR: A recurrent rain detection and removal network that removes rain streaks and clears up the rain accumulation iteratively and progressively is proposed and a new contextualized dilated network is developed to exploit regional contextual information and to produce better representations for rain detection.
Posted Content

Pre-Trained Image Processing Transformer

TL;DR: To maximally excavate the capability of transformer, the IPT model is presented to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs and the contrastive learning is introduced for well adapting to different image processing tasks.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI

Gradient-based learning applied to document recognition

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Journal ArticleDOI

Image quality assessment: from error visibility to structural similarity

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Proceedings ArticleDOI

Going deeper with convolutions

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).
Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What have the authors contributed in "Dehazenet: an end-to-end system for single image haze removal" ?

In this paper, the authors propose a trainable end-to-end system called DehazeNet, for medium transmission estimation. The authors also propose a novel nonlinear activation function in DehazeNet, called Bilateral Rectified Linear Unit ( BReLU ), which is able to improve the quality of recovered haze-free image. The authors establish connections between components of the proposed DehazeNet and those used in existing methods. 

The authors leave this problem for future research. 

As a basic noise model, additive white Gaussian (AWG) noise with standard deviation σ ∈ {10, 15, 20, 25} is used for noise robustness evaluation (NRE). 

the authors think11atmospheric scattering model can also be learned in a deeper neural network, in which an end-to-end mapping between haze and haze-free images can be optimized directly without the medium transmission estimation. 

According to the testing measure of RF, 2000 image patches are randomly sampled from haze-free images with 10 random transmission t ∈ (0, 1) to generate 20,000 hazy patches for testing. 

In addition, the local extremum is in accordance with the assumption that the medium transmission is locally constant, and it is commonly to overcome the noise of transmission estimation. 

The atmospheric scattering model in Section II-A suggests that estimation of the medium transmission map is the most important step to recover a haze-free image. 

An airlight robustness evaluation (ARE) is proposed to analyze the dehazing methods for different atmosphere airlight α. Although DehazeNet is trained from the samples generated by setting α = 1, it also achieves the greater robustness on the other values of atmosphere airlight. 

DehazeNetachieves the best state-of-the-art score, which is 1.19e-2; the MSE between their method and the next state-of-art result (RF [17]) in the literature is 0.07e-2. 

To address the ill-posed nature of image dehazing problem, existing methods propose various assumptions and based on these assumptions, they are able to extract haze-relevant features (e.g., dark channel, hue disparity, and color attenuation) densely over the image domain. 

Based on the learning framework, CAP and RF avoid color distortion in the sky, but non-sky regions are enhanced poorly because of the non-content regression model (for example, the rock-soil of the first image and the green flatlands in the third image). 

With this lightweight architecture, DehazeNet achieves dramatically high efficiency and outstanding dehazing effects than the state-of-the-art methods.