AOD-Net: All-in-One Dehazing Network

doi:10.1109/ICCV.2017.511

Home
/
Papers
/
AOD-Net: All-in-One Dehazing Network

Proceedings Article•DOI•

AOD-Net: All-in-One Dehazing Network

Boyi Li¹, Xiulian Peng², Zhangyang Wang³, Xu Jizheng², Dan Feng¹ - Show less +1 more•Institutions (3)

Huazhong University of Science and Technology¹, Microsoft², Texas A&M University³

01 Oct 2017-pp 4780-4788

TL;DR: An image dehazing model built with a convolutional neural network (CNN) based on a re-formulated atmospheric scattering model, called All-in-One Dehazing Network (AOD-Net), which demonstrates superior performance than the state-of-the-art in terms of PSNR, SSIM and the subjective visual quality.

read less

Abstract: This paper proposes an image dehazing model built with a convolutional neural network (CNN), called All-in-One Dehazing Network (AOD-Net). It is designed based on a re-formulated atmospheric scattering model. Instead of estimating the transmission matrix and the atmospheric light separately as most previous models did, AOD-Net directly generates the clean image through a light-weight CNN. Such a novel end-to-end design makes it easy to embed AOD-Net into other deep models, e.g., Faster R-CNN, for improving high-level tasks on hazy images. Experimental results on both synthesized and natural hazy image datasets demonstrate our superior performance than the state-of-the-art in terms of PSNR, SSIM and the subjective visual quality. Furthermore, when concatenating AOD-Net with Faster R-CNN, we witness a large improvement of the object detection performance on hazy images.

...read moreread less

Citations

PDF

Open Access

More filters

The PASCAL Visual Object Classes Challenge

[...]

Jianguo Zhang

01 Jan 2006

3,012 citations

Journal Article•DOI•

Benchmarking Single-Image Dehazing and Beyond

[...]

Boyi Li¹, Wenqi Ren², Dengpan Fu³, Dacheng Tao⁴, Dan Feng⁵, Wenjun Zeng⁶, Zhangyang Wang⁷ - Show less +3 more•Institutions (7)

Cornell University¹, Chinese Academy of Sciences², University of Science and Technology of China³, University of Sydney⁴, Huazhong University of Science and Technology⁵, Microsoft⁶, Texas A&M University⁷

01 Jan 2019-IEEE Transactions on Image Processing

TL;DR: In this article, the authors present a comprehensive study and evaluation of existing single image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called Realistic Single-Image DEhazing (RESIDE).

...read moreread less

Abstract: We present a comprehensive study and evaluation of existing single-image dehazing algorithms, using a new large-scale benchmark consisting of both synthetic and real-world hazy images, called REalistic Single-Image DEhazing (RESIDE). RESIDE highlights diverse data sources and image contents, and is divided into five subsets, each serving different training or evaluation purposes. We further provide a rich variety of criteria for dehazing algorithm evaluation, ranging from full-reference metrics to no-reference metrics and to subjective evaluation, and the novel task-driven evaluation. Experiments on RESIDE shed light on the comparisons and limitations of the state-of-the-art dehazing algorithms, and suggest promising future directions.

...read moreread less

922 citations

Journal Article•DOI•

EnlightenGAN: Deep Light Enhancement Without Paired Supervision

[...]

Yifan Jiang¹, Xinyu Gong¹, Ding Liu, Yu Cheng², Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou³, Zhangyang Wang¹ - Show less +5 more•Institutions (3)

University of Texas at Austin¹, Microsoft², Huazhong University of Science and Technology³

22 Jan 2021-IEEE Transactions on Image Processing

TL;DR: EnlightenGAN as mentioned in this paper proposes a highly effective unsupervised generative adversarial network that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images.

...read moreread less

Abstract: Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose a highly effective unsupervised generative adversarial network, dubbed EnlightenGAN , that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images. Instead of supervising the learning using ground truth data, we propose to regularize the unpaired training using the information extracted from the input itself, and benchmark a series of innovations for the low-light image enhancement problem, including a global-local discriminator structure, a self-regularized perceptual loss fusion, and the attention mechanism. Through extensive experiments, our proposed approach outperforms recent methods under a variety of metrics in terms of visual quality and subjective user study. Thanks to the great flexibility brought by unpaired training, EnlightenGAN is demonstrated to be easily adaptable to enhancing real-world images from various domains. Our codes and pre-trained models are available at: https://github.com/VITA-Group/EnlightenGAN .

...read moreread less

537 citations

Posted Content•

EnlightenGAN: Deep Light Enhancement without Paired Supervision

[...]

Yifan Jiang¹, Xinyu Gong¹, Ding Liu, Yu Cheng², Chen Fang, Xiaohui Shen, Jianchao Yang, Pan Zhou³, Zhangyang Wang¹ - Show less +5 more•Institutions (3)

University of Texas at Austin¹, Microsoft², Huazhong University of Science and Technology³

17 Jun 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes a highly effective unsupervised generative adversarial network, dubbed EnlightenGAN, that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images.

...read moreread less

Abstract: Deep learning-based methods have achieved remarkable success in image restoration and enhancement, but are they still competitive when there is a lack of paired training data? As one such example, this paper explores the low-light image enhancement problem, where in practice it is extremely challenging to simultaneously take a low-light and a normal-light photo of the same visual scene. We propose a highly effective unsupervised generative adversarial network, dubbed EnlightenGAN, that can be trained without low/normal-light image pairs, yet proves to generalize very well on various real-world test images. Instead of supervising the learning using ground truth data, we propose to regularize the unpaired training using the information extracted from the input itself, and benchmark a series of innovations for the low-light image enhancement problem, including a global-local discriminator structure, a self-regularized perceptual loss fusion, and attention mechanism. Through extensive experiments, our proposed approach outperforms recent methods under a variety of metrics in terms of visual quality and subjective user study. Thanks to the great flexibility brought by unpaired training, EnlightenGAN is demonstrated to be easily adaptable to enhancing real-world images from various domains. The code is available at \url{this https URL}

...read moreread less

520 citations

Cites background from "AOD-Net: All-in-One Dehazing Networ..."

...Image enhancement as pre-processing for improving subsequent high-level vision tasks has recently received increasing attention [28, 49, 50], with a number of benchmarking efforts [47, 51, 52]....
[...]

Proceedings Article•DOI•

GridDehazeNet: Attention-Based Multi-Scale Network for Image Dehazing

[...]

Xiaohong Liu¹, Yongrui Ma¹, Zhihao Shi¹, Jun Chen¹•Institutions (1)

McMaster University¹

08 Aug 2019

TL;DR: An end-to-end trainable Convolutional Neural Network for single image dehazing, named GridDehazeNet, which implements a novel attention-based multi-scale estimation on a grid network, and an explanation as to why it is not necessarily beneficial to take advantage of the dimension reduction offered by the atmosphere scattering model.

...read moreread less

Abstract: We propose an end-to-end trainable Convolutional Neural Network (CNN), named GridDehazeNet, for single image dehazing. The GridDehazeNet consists of three modules: pre-processing, backbone, and post-processing. The trainable pre-processing module can generate learned inputs with better diversity and more pertinent features as compared to those derived inputs produced by hand-selected pre-processing methods. The backbone module implements a novel attention-based multi-scale estimation on a grid network, which can effectively alleviate the bottleneck issue often encountered in the conventional multi-scale approach. The post-processing module helps to reduce the artifacts in the final output. Experimental results indicate that the GridDehazeNet outperforms the state-of-the-arts on both synthetic and real-world images. The proposed hazing method does not rely on the atmosphere scattering model, and we provide an explanation as to why it is not necessarily beneficial to take advantage of the dimension reduction offered by the atmosphere scattering model for image dehazing, even if only the dehazing results on synthetic images are concerned.

...read moreread less

464 citations

Cites background or methods from "AOD-Net: All-in-One Dehazing Networ..."

...For DehazeNet, MSCNN and AOD-Net, haze removal is clearly incomplete....
[...]
...A close inspection reveals that this reformulation in fact renders the atmosphere scattering model completely superfluous (though this point is not recognized in [13])....
[...]
...Moreover, except for AOD-Net and GFN, these methods all follow the same strategy of first estimating the transmission map and the atmosphere light then leveraging the atmo- sphere scattering model to compute the dehazed image....
[...]
...The AOD-Net [13] represents a departure from the conventional strategy....
[...]
...The proposed network is tested on the synthetic dataset for qualitative and quantitative comparisons with the state-of-the-arts that include DCP [9], DehazeNet [1], MSCNN [26], AOD-Net [13] and GFN [27]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Image quality assessment: from error visibility to structural similarity

[...]

Zhou Wang¹, Alan C. Bovik², Hamid R. Sheikh², Eero P. Simoncelli³•Institutions (3)

Center for Neural Science¹, University of Texas at Austin², Howard Hughes Medical Institute³

01 Apr 2004-IEEE Transactions on Image Processing

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.

...read moreread less

Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

...read moreread less

40,609 citations

Posted Content•

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

[...]

Shaoqing Ren¹, Kaiming He², Ross Girshick³, Jian Sun²•Institutions (3)

University of Science and Technology of China¹, Microsoft², Facebook³

04 Jun 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

...read moreread less

23,183 citations

Proceedings Article•

Faster R-CNN: towards real-time object detection with region proposal networks

[...]

Shaoqing Ren¹, Kaiming He¹, Ross Girshick¹, Jian Sun¹•Institutions (1)

Microsoft¹

07 Dec 2015

TL;DR: Ren et al. as discussed by the authors proposed a region proposal network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals.

...read moreread less

Abstract: State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet [7] and Fast R-CNN [5] have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully-convolutional network that simultaneously predicts object bounds and objectness scores at each position. RPNs are trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. With a simple alternating optimization, RPN and Fast R-CNN can be trained to share convolutional features. For the very deep VGG-16 model [19], our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007 (73.2% mAP) and 2012 (70.4% mAP) using 300 proposals per image. Code is available at https://github.com/ShaoqingRen/faster_rcnn.

...read moreread less

13,674 citations

Journal Article•DOI•

The Pascal Visual Object Classes Challenge: A Retrospective

[...]

Mark Everingham¹, S. M. Eslami², Luc Van Gool³, Christopher Williams⁴, John Winn², Andrew Zisserman⁵ - Show less +2 more•Institutions (5)

University of Leeds¹, Microsoft², ETH Zurich³, University of Edinburgh⁴, University of Oxford⁵

01 Jan 2015-International Journal of Computer Vision

TL;DR: A review of the Pascal Visual Object Classes challenge from 2008-2012 and an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

...read moreread less

Abstract: The Pascal Visual Object Classes (VOC) challenge consists of two components: (i) a publicly available dataset of images together with ground truth annotation and standardised evaluation software; and (ii) an annual competition and workshop. There are five challenges: classification, detection, segmentation, action classification, and person layout. In this paper we provide a review of the challenge from 2008---2012. The paper is intended for two audiences: algorithm designers, researchers who want to see what the state of the art is, as measured by performance on the VOC datasets, along with the limitations and weak points of the current generation of algorithms; and, challenge designers, who want to see what we as organisers have learnt from the process and our recommendations for the organisation of future challenges. To analyse the performance of submitted algorithms on the VOC datasets we introduce a number of novel evaluation methods: a bootstrapping method for determining whether differences in the performance of two algorithms are significant or not; a normalised average precision so that performance can be compared across classes with different proportions of positive instances; a clustering method for visualising the performance across multiple algorithms so that the hard and easy images can be identified; and the use of a joint classifier over the submitted algorithms in order to measure their complementarity and combined performance. We also analyse the community's progress through time using the methods of Hoiem et al. (Proceedings of European Conference on Computer Vision, 2012) to identify the types of occurring errors. We conclude the paper with an appraisal of the aspects of the challenge that worked well, and those that could be improved in future challenges.

...read moreread less

6,061 citations

Book Chapter•DOI•

Indoor segmentation and support inference from RGBD images

[...]

Nathan Silberman¹, Derek Hoiem², Pushmeet Kohli³, Rob Fergus¹•Institutions (3)

New York University¹, University of Illinois at Urbana–Champaign², Microsoft³

07 Oct 2012

TL;DR: The goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships, to better understand how 3D cues can best inform a structured 3D interpretation.

...read moreread less

Abstract: We present an approach to interpret the major surfaces, objects, and support relations of an indoor scene from an RGBD image. Most existing work ignores physical interactions or is applied only to tidy rooms and hallways. Our goal is to parse typical, often messy, indoor scenes into floor, walls, supporting surfaces, and object regions, and to recover support relationships. One of our main interests is to better understand how 3D cues can best inform a structured 3D interpretation. We also contribute a novel integer programming formulation to infer physical support relations. We offer a new dataset of 1449 RGBD images, capturing 464 diverse indoor scenes, with detailed annotations. Our experiments demonstrate our ability to infer support relations in complex scenes and verify that our 3D scene cues and inferred support lead to better object segmentation.

...read moreread less

4,827 citations