ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Home
/
Papers
/
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Posted Content•

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang - Show less +5 more

01 Sep 2018-arXiv: Computer Vision and Pattern Recognition-

TL;DR: This work thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improves each of them to derive an Enhanced SRGAN (ESRGAN), which achieves consistently better visual quality with more realistic and natural textures than SRGAN.

read less

Abstract: The Super-Resolution Generative Adversarial Network (SRGAN) is a seminal work that is capable of generating realistic textures during single image super-resolution. However, the hallucinated details are often accompanied with unpleasant artifacts. To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN). In particular, we introduce the Residual-in-Residual Dense Block (RRDB) without batch normalization as the basic network building unit. Moreover, we borrow the idea from relativistic GAN to let the discriminator predict relative realness instead of the absolute value. Finally, we improve the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery. Benefiting from these improvements, the proposed ESRGAN achieves consistently better visual quality with more realistic and natural textures than SRGAN and won the first place in the PIRM2018-SR Challenge. The code is available at this https URL .

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Deep Learning for Image Super-Resolution: A Survey

[...]

Zhihao Wang¹, Jian Chen¹, Steven C. H. Hoi²•Institutions (2)

South China University of Technology¹, Salesforce.com²

01 Oct 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way, which can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR.

...read moreread less

Abstract: Image Super-Resolution (SR) is an important class of image processing techniqueso enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.

...read moreread less

837 citations

Cites background or methods from "ESRGAN: Enhanced Super-Resolution G..."

...Some other models [32], [103], [147] also adopt this experience and achieve performance improvements....
[...]
...In practice, researchers often combine multiple loss functions by weighted average [8], [25], [27], [46], [141] for constraining different aspects of the generation process, especially for distortion-perception tradeoff [25], [103], [142], [143], [144]....
[...]
...[103], [155] propose a network interpolation strategy....
[...]
...And the ESRGAN [103] employs relativistic GAN [134] to predict the probability that real images are relatively more realistic than fake ones, instead of the probability that input images are real or fake, and thus guide recovering more detailed textures....
[...]
...Thus it produces visually more perceptible results and is also widely used in this field [8], [25], [29], [30], [46], [103], where the VGG [128] and ResNet [96] are the most commonly used pre-trained CNNs....
[...]

Posted Content•

StarGAN v2: Diverse Image Synthesis for Multiple Domains

[...]

Yunjey Choi¹, Youngjung Uh¹, Jaejun Yoo¹, Jung-Woo Ha¹•Institutions (1)

Naver Corporation¹

04 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: StarGAN v2, a single framework that tackles image-to-image translation models with limited diversity and multiple models for all domains, is proposed and shows significantly improved results over the baselines.

...read moreread less

Abstract: A good image-to-image translation model should learn a mapping between different visual domains while satisfying the following properties: 1) diversity of generated images and 2) scalability over multiple domains. Existing methods address either of the issues, having limited diversity or multiple models for all domains. We propose StarGAN v2, a single framework that tackles both and shows significantly improved results over the baselines. Experiments on CelebA-HQ and a new animal faces dataset (AFHQ) validate our superiority in terms of visual quality, diversity, and scalability. To better assess image-to-image translation models, we release AFHQ, high-quality animal faces with large inter- and intra-domain differences. The code, pretrained models, and dataset can be found at this https URL.

...read moreread less

697 citations

Cites background from "ESRGAN: Enhanced Super-Resolution G..."

...Generative adversarial networks (GANs) [9] have shown impressive results in many computer vision tasks such as image synthesis [3, 25, 7], colorization [15, 36] and superresolution [21, 34]....
[...]

Proceedings Article•DOI•

Learning Texture Transformer Network for Image Super-Resolution

[...]

Fuzhi Yang¹, Huan Yang², Jianlong Fu², Hongtao Lu¹, Baining Guo² - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, Microsoft²

14 Jun 2020

TL;DR: A novel Texture Transformer Network for Image Super-Resolution (TTSR), in which the LR and Ref images are formulated as queries and keys in a transformer, respectively, which achieves significant improvements over state-of-the-art approaches on both quantitative and qualitative evaluations.

...read moreread less

Abstract: We study on image super-resolution (SR), which aims to recover realistic textures from a low-resolution (LR) image. Recent progress has been made by taking high-resolution images as references (Ref), so that relevant textures can be transferred to LR images. However, existing SR approaches neglect to use attention mechanisms to transfer high-resolution (HR) textures from Ref images, which limits these approaches in challenging cases. In this paper, we propose a novel Texture Transformer Network for Image Super-Resolution (TTSR), in which the LR and Ref images are formulated as queries and keys in a transformer, respectively. TTSR consists of four closely-related modules optimized for image generation tasks, including a learnable texture extractor by DNN, a relevance embedding module, a hard-attention module for texture transfer, and a soft-attention module for texture synthesis. Such a design encourages joint feature learning across LR and Ref images, in which deep feature correspondences can be discovered by attention, and thus accurate texture features can be transferred. The proposed texture transformer can be further stacked in a cross-scale way, which enables texture recovery from different levels (e.g., from 1x to 4x magnification). Extensive experiments show that TTSR achieves significant improvements over state-of-the-art approaches on both quantitative and qualitative evaluations.

...read moreread less

581 citations

Cites methods from "ESRGAN: Enhanced Super-Resolution G..."

...[22] used Gram matrix based texture matching loss to enforce local similar textures, while ESRGAN [32] enhanced SRGAN by introducing RRDB with relativistic adversarial loss....
[...]
...The SISR methods include SRCNN [3], MDSR [17], RDN [40], RCAN [39], SRGAN [16], ENet [22], ESRGAN [32], RSRGAN [38], among which RCAN has achieved state-of-the-art performance on both PSNR and SSIM in recent years....
[...]
...Sajjadi et al. [22] used Gram matrix based texture matching loss to enforce local similar textures, while ESRGAN [32] enhanced SRGAN by introducing RRDB with relativistic adversarial loss....
[...]

Proceedings Article•DOI•

The Perception-Distortion Tradeoff

[...]

Yochai Blau¹, Tomer Michaeli¹•Institutions (1)

Technion – Israel Institute of Technology¹

18 Jun 2018

TL;DR: In this paper, the optimal probability for correctly discriminating the outputs of an image restoration algorithm from real images was studied and it was shown that as the mean distortion decreases, this probability must increase (indicating worse perceptual quality).

...read moreread less

Abstract: Image restoration algorithms are typically evaluated by some distortion measure (e.g. PSNR, SSIM, IFC, VIF) or by human opinion scores that quantify perceived perceptual quality. In this paper, we prove mathematically that distortion and perceptual quality are at odds with each other. Specifically, we study the optimal probability for correctly discriminating the outputs of an image restoration algorithm from real images. We show that as the mean distortion decreases, this probability must increase (indicating worse perceptual quality). As opposed to the common belief, this result holds true for any distortion measure, and is not only a problem of the PSNR or SSIM criteria. However, as we show experimentally, for some measures it is less severe (e.g. distance between VGG features). We also show that generative-adversarial-nets (GANs) provide a principled way to approach the perception-distortion bound. This constitutes theoretical support to their observed success in low-level vision tasks. Based on our analysis, we propose a new methodology for evaluating image restoration methods, and use it to perform an extensive comparison between recent super-resolution algorithms.

...read moreread less

550 citations

Book Chapter•DOI•

The 2018 PIRM Challenge on Perceptual Image Super-Resolution

[...]

Yochai Blau¹, Roey Mechrez¹, Radu Timofte², Tomer Michaeli¹, Lihi Zelnik-Manor¹ - Show less +1 more•Institutions (2)

Technion – Israel Institute of Technology¹, ETH Zurich²

08 Sep 2018

TL;DR: This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018, and concludes with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.

...read moreread less

Abstract: This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018. In contrast to previous SR challenges, our evaluation methodology jointly quantifies accuracy and perceptual quality, therefore enabling perceptual-driven methods to compete alongside algorithms that target PSNR maximization. Twenty-one participating teams introduced algorithms which well-improved upon the existing state-of-the-art methods in perceptual SR, as confirmed by a human opinion study. We also analyze popular image quality measures and draw conclusions regarding which of them correlates best with human opinion scores. We conclude with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.

...read moreread less

428 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

"ESRGAN: Enhanced Super-Resolution G..." refers methods in this paper

...For optimization, we use Adam [39] with β1 = 0....
[...]
...For optimization, we use Adam [39] with β1 = 0.9, β2 = 0.999....
[...]

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

"ESRGAN: Enhanced Super-Resolution G..." refers methods in this paper

...3 We use pre-trained 19-layer VGG network[37], where 54 indicates features obtained by the 4 convolution before the 5 maxpooling layer, representing high-level features and similarly, 22 represents low-level features....
[...]

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

49,914 citations

Journal Article•DOI•

Generative Adversarial Nets

[...]

Ian Goodfellow¹, Jean Pouget-Abadie¹, Mehdi Mirza¹, Bing Xu¹, David Warde-Farley¹, Sherjil Ozair², Aaron Courville¹, Yoshua Bengio¹ - Show less +4 more•Institutions (2)

Université de Montréal¹, Indian Institute of Technology Delhi²

08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

...read moreread less

38,211 citations

"ESRGAN: Enhanced Super-Resolution G..." refers background in this paper

...eature transform to eectively incorporate semantic prior in an image and improve the recovered textures. Throughout the literature, photo-realism is usually attained by adversarial training with GAN [15]. Recently there are a bunch of works that focus on developing more eective GAN frameworks. WGAN [31] proposes to minimize a reasonable and ecient approximation of Wasserstein distance and regulariz...
[...]
...mprove the visual quality of SR results. For instance, perceptual loss [13,14] is proposed to optimize super-resolution model in a feature space instead of pixel space. Generative adversarial network [15] is introduced to SR by [1,16] to encourage the network to favor solutions that look more like natural images. The semantic image prior is further incorporated to improve recovered texture details [17...
[...]