scispace - formally typeset
Open AccessProceedings ArticleDOI

NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results

TLDR
This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus on proposed solutions and results and gauges the state-of-the-art in single imagesuper-resolution.
Abstract
This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus on proposed solutions and results. The challenge had 4 tracks. Track 1 employed the standard bicubic downscaling setup, while Tracks 2, 3 and 4 had realistic unknown downgrading operators simulating camera image acquisition pipeline. The operators were learnable through provided pairs of low and high resolution train images. The tracks had 145, 114, 101, and 113 registered participants, resp., and 31 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.

read more

Content maybe subject to copyright    Report

NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results
Radu Timofte Shuhang Gu Jiqing Wu Luc Van Gool Lei Zhang
Ming-Hsuan Yang Muhammad Haris Greg Shakhnarovich Norimichi Ukita
Shijia Hu Yijie Bei Zheng Hui Xiao Jiang Yanan Gu Jie Liu Yifan Wang
Federico Perazzi Brian McWilliams Alexander Sorkine-Hornung
Olga Sorkine-Hornung Christopher Schroers Jiahui Yu Yuchen Fan Jianchao Yang
Ning Xu Zhaowen Wang Xinchao Wang Thomas S. Huang Xintao Wang
Ke Yu Tak-Wai Hui Chao Dong Liang Lin Chen Change Loy Dongwon Park
Kwanyoung Kim Se Young Chun Kai Zhang Pengjv Liu Wangmeng Zuo
Shi Guo Jiye Liu Jinchang Xu Yijiao Liu Fengye Xiong Yuan Dong
Hongliang Bai Alexandru Damian Nikhil Ravi Sachit Menon Cynthia Rudin
Junghoon Seo Taegyun Jeon Jamyoung Koo Seunghyun Jeon Soo Ye Kim
Jae-Seok Choi Sehwan Ki Soomin Seo Hyeonjun Sim Saehun Kim
Munchurl Kim Rong Chen Kun Zeng Jinkang Guo Yanyun Qu Cuihua Li
Namhyuk Ahn Byungkon Kang Kyung-Ah Sohn Yuan Yuan Jiawei Zhang
Jiahao Pang Xiangyu Xu Yan Zhao Wei Deng Sibt Ul Hussain Muneeb Aadil
Rafia Rahim Xiaowang Cai Fang Huang Yueshu Xu Pablo Navarrete Michelini
Dan Zhu Hanwen Liu Jun-Hyuk Kim Jong-Seok Lee Yiwen Huang Ming Qiu
Liting Jing Jiehang Zeng Ying Wang Manoj Sharma Rudrabha Mukhopadhyay
Avinash Upadhyay Sriharsha Koundinya Ankit Shukla Santanu Chaudhury
Zhe Zhang Yu Hen Hu Lingzhi Fu
Abstract
This paper reviews the 2nd NTIRE challenge on single
image super-resolution (restoration of rich details in a low
resolution image) with focus on proposed solutions and re-
sults. The challenge had 4 tracks. Track 1 employed the
standard bicubic downscaling setup, while Tracks 2, 3 and
4 had realistic unknown downgrading operators simulat-
ing camera image acquisition pipeline. The operators were
learnable through provided pairs of low and high resolu-
tion train images. The tracks had 145, 114, 101, and 113
registered participants, resp., and 31 teams competed in the
final testing phase. They gauge the state-of-the-art in single
image super-resolution.
1. Introduction
Example-based single image super-resolution (SR) tar-
gets the reconstruction of the lost high frequencies (rich
R. Timofte (timofter@vision.ee.ethz.ch, ETH Zurich), S. Gu, L. Van
Gool, L. Zhang and M.-H. Yang are the NTIRE 2018 organizers, while the
other authors participated in the challenge.
Appendix
A contains the authors’ teams and affiliations.
NTIRE webpage:
http://www.vision.ee.ethz.ch/ntire18/
details) in an image with the help of a set of prior exam-
ples of paired low resolution (LR) and high resolution (HR)
images. This problem is ill-posed, for each LR image the
space of plausible corresponding HR images is huge and
scales up quadratically with the magnification factor.
In the recent years the research literature largely focused
on example-based single image super-resolution. The per-
formance achieved by the top methods [
38, 32, 7, 16, 20,
31, 21] continuously improved.
The NTIRE 2017 challenge [
31, 1] was a step forward
in benchmarking SR. It was the first challenge of its kind
with tracks employing standard bicubic degradation and
‘unknown’ operators (blur and decimation) on the 1000 DI-
Verse 2K resolution images from DIV2K [
1] dataset.
The NTIRE 2018 challenge builds upon NTIRE 2017
and goes further. In comparison with the previous edition,
NTIRE 2018: (1) uses the same DIV2K [1] dataset; (2)
has only one bicubic downscaling track with magnification
factor ×8; (3) promotes realistic settings emulating camera
acquisition pipeline through three tracks with gradually in-
creased difficulty.
1
965

2. NTIRE 2018 Challenge
The objectives of the NTIRE 2018 challenge on
example-based single-image super-resolution are: (i) to
gauge and push the state-of-the-art in SR; (ii) to compare
different solutions; and (iii) to promote realistic SR settings.
DIV2K Dataset [
1] employed by NTIRE 2017 SR chal-
lenge [
31] is used also in our challenge. DIV2K has 1000
DIVerse 2K resolution RGB images with 800 for training,
100 for validation and 100 for testing purposes. The man-
ually collected high quality images are diverse in contents.
2.1. Tracks
Access to data and submission of HR image results re-
quired registration on Codalab competition track.
Track 1: Classic Bicubic ×8 uses the bicubic downscal-
ing (Matlab imresize, default settings), the most common
setting from the recent SR literature, with factor ×8. It is
meant for easy deployment of recent proposed SR solutions.
Track 2: Realistic Mild × 4 adverse conditions assumes
that the degradation operators emulating the image acquisi-
tion process from a digital camera can be estimated through
training pairs of LR and HR images. The degradation op-
erators are the same (use the same controlling parameters)
within each image space and for all the images in train, val-
idation, and test sets. As in reality, the motion blur and
the Poisson noise are image dependent and can introduce
pixel shifts and scaling. Each ground truth (GT) image from
DIV2K is downgraded (×4) to LR images.
Track 3: Realistic Difficult ×4 adverse conditions is sim-
ilar to Track 2, only the degradation is stronger.
Track 4: Realistic Wild ×4 adverse conditions is similar
to Tracks 2 and 3, the degradation operators are the same
within an image space but different from one image to an-
other. Some images are less degraded than other images.
This setting is the closest to real ‘wild’ conditions. Due
to increased complexity of the task 4 degraded LR images
were generated for each HR train image.
Challenge phases (1) Development phase: the partici-
pants got pairs of LR and HR train images and the LR val-
idation images of the DIV2K dataset; an online validation
server with a leaderboard provided immediate feedback for
the uploaded HR results to the LR validation images; (2)
Testing phase: the participants got test LR images and
were required to submit super-resolved HR image results,
code, and a factsheet for their method. After the end of the
challenge the final results were released to the participants.
Evaluation protocol The quantitative measures are Peak
Signal-to-Noise Ratio (PSNR) measured in deciBels [dB]
and the Structural Similarity index (SSIM) [
37], both full-
reference measures computed between the HR result and
the GT image. We report averages over sets of images.
https://competitions.codalab.org
As in [31] we ignore a boundary of 6 + s image pixels (s
is the zoom factor). Because of the pixel shifts and scal-
ings, for Tracks 2, 3, and 4 we consider all the translations
[40, 40] on both axes, compute PSNR and SSIM and
report the most favorable scores. Due to time complexity,
for Tracks 2, 3, and 4 we computed PSNR and SSIM using
a 60 × 60px centered image crop during validation phase
and a 800 × 800px centered image crop for the final results.
Figure 1. Sample LR input images for Track 1,2,3, and 4, resp.
3. Challenge Results
From 110 registered participants on average per each
track, 31 teams entered in the final phase and submitted re-
sults, codes/executables, and factsheets. Table
1 reports the
final test results and rankings of the challenge, while in Ta-
ble
2 the self-reported runtimes and major details are pro-
vided. The methods are briefly described in section
4 and
the team members are listed in Appendix
A.
Architectures and main ideas All the proposed methods,
excepting TSSR of UW18, are deep learning based. The
deep residual net (ResNet) architecture [
10] and the dense
net (DenseNet) architecture [
11] are the basis for most of
the proposed methods. For fast inference, thus train and test
time benefits, most of the teams conduct the major SR op-
erations in the LR space. Several teams, such as UIUC-IFP,
BMIPL-UNIST, Pixel
Overflow, build their methods based
on EDSR [
21], the state-of-the-art approach and the winner
of the previous NTIRE 2017 SR challenge [
31, 1]; while,
other teams, such as Toyota-TI, HIT-VPC, DRZ, PDN, pro-
posed new architectures for SR.
Restoration fidelity The top 4 methods from ‘Classic
Bicubic’ achieved similar PSNR scores (within 0.04dB).
DeepSR entry, ranked 12th, is only 0.17dB behind the
best PSNR score of Toyota-TI. On the realistic settings,
Tracks 2,3, and 4, due to the existence of noise and mo-
tion blur, the training strategy and the network architecture
plays are equally important. Although UIUC-IFP ranked
7th on ‘Classic Bicubic’, below DRZ and Duke Data Sci-
ence, it adopted a pre-alignment step for the training phase
and achieved the best performance on the realistic tracks 2
and 3, significantly better than DRZ and Duke Data Sci-
ence. PDN ranked 1st on Track 4, however, without sub-
mitted results for the other tracks we cannot tell if their so-
lution/architecture is better than that of UIUC-IFP.
Ensembles and fusion Most teams employ pseudo-
ensembles [
33]. The inputs are flipped/rotated and the HR
results are aligned and averaged for enhanced prediction.
966

Table 1. NTIRE 2018 SR Challenge results and final rankings. Note that the ‘lpj008’ results are not ranked.
(a) Track 1 Classic Bicubic ×8
Team
Author PSNR SSIM
Toyota-TI iim lab 25.455 0.7088
Pixel
Overflow McCourt Hu 25.433 0.7067
rainbow
zheng222 25.428 0.7055
DRZ
yifita 25.415 0.7068
Faceall
Xlabs xjc faceall 25.360 0.7031
Duke Data Science
admian98 25.356 0.7037
UIUC-IFP
jhyume 25.347 0.7023
Haiyun
XMU cr2018 25.338 0.7037
BMIPL
UNIST BMIPL UNIST 25.331 0.7026
Ajou-LAMDA-Lab
nmhkahn 25.318 0.7023
SIA
mikigom 25.290 0.7014
DeepSR
enoch 25.288 0.7015
Mrobot0 25.175 0.6960
reveal.ai
muneebaadil 25.137 0.6942
HIT-VPC
cskzh 25.088 0.6943
MCML
ghgh3269 24.875 0.7025
BOE-SBG
boe sbg 24.822 0.6817
SRFun
ccook 24.819 0.6829
KAIST-VICLAB
JSChoi 24.817 0.6810
zeweihe 24.773 0.6813
jingliting 24.714 0.6913
CEERI
harshakoundinya 24.687 0.6719
APSARA
MingQiu 24.618 0.6817
UW18
zzsmg 24.192 0.6531
Baseline Bicubic 23.703 0.6387
(b) Realistic Tracks 2, 3, & 4 ×8
Track 2 Mild Track 3 Difficult Track 4 Wild
Team
Author PSNR SSIM PSNR SSIM PSNR SSIM
UIUC-IFP jhyume 23.631
(1)
0.6316 22.329
(1)
0.5721 23.080
(2)
0.6038
PDN
xixihaha 23.374
(1)
0.6122
BMIPL
UNIST BMIPL UNIST 23.579
(2)
0.6269 22.074
(2)
0.5590
HIT-VPC
lpj008 22.249 0.5637 22.879 0.5936
HIT-VPC
cskzh 23.493
(3)
0.6174 21.450
(9)
0.5339 22.795
(3)
0.5829
SIA
mikigom 23.406
(5)
0.6275 21.899
(3)
0.5623 22.766
(4)
0.6023
KAIST-VICLAB
jschoi 23.455
(4)
0.6175 21.689
(6)
0.5434 22.732
(6)
0.5844
DRZ
yifita 23.397
(6)
0.6160 21.592
(8)
0.5438 22.745
(5)
0.5881
srFans
yyuan13 23.218
(9)
0.6222 21.825
(4)
0.5573 22.707
(7)
0.5932
Duke Data Science
adamian98 23.374
(7)
0.6252 21.658
(7)
0.5400
bighead 23.247
(8)
0.6165
ISP Team hot milk 23.098
(11)
0.6167 21.779
(5)
0.5550 22.496
(8)
0.5867
BOE-SBG
boe sbg 23.123
(10)
0.6008 21.443
(10)
0.5275 22.352
(10)
0.5612
MCML
ghgh3269 22.953
(12)
0.6115 21.337
(11)
0.5354 22.472
(9)
0.5842
DeepSR
enoch 21.742
(15)
0.5572 20.674
(16)
0.5168 21.589
(12)
0.5444
jingliting 21.710
(16)
0.5384 20.973
(12)
0.5187 20.956
(14)
0.5214
Haiyun
XMU cr2018 21.519
(17)
0.5313 20.866
(13)
0.5072 21.367
(13)
0.5321
Ajou-LAMDA-Lab
nmhkahn 21.240
(18)
0.5376
Juanluisgonzales juanluisgonzales 22.625
(13)
0.5868
APSARA mingqiu 20.718
(15)
0.4977
NMH nmh 20.645
(17)
0.4890
join16 20.453
(19)
0.4928
Baseline Bicubic 22.391
(14)
0.5336 20.830
(14)
0.4631 21.761
(11)
0.4989
Table 2. Reported runtimes [s] per test image and details from the factsheets.
runtime [s]
Team Track 1 Track 2,3,4 Platform CPU/GPU (at runtime) Ensemble
Ajou-LAMDA-Lab 13.84 13.84 Pytorch GTX 1080Ti flip/rotation (×8)
APSARA 30 30 Tensorflow GTX 1080Ti flip/rotation (×8)
BOE-SBG 0.15 1.11 Pytorch Nivida P100 -
bighead
- 1.5
BMIPL UNIST 2.52 4.68 Pytorch ? flip/rotation (×8)
CEERI
12.23 - Tensorflow,Keras GTX 1080 -
DeepSR 9.89 1.83 Tensorflow Titan X flip/rotation (×8)
DRZ
11.65 2.91 Pytorch Titan Xp Track1: flip/rotation (×8)
Duke Data Science 6.99 18 ??? Nivdia P100 flip/rotation (×8)
Faceall Xlabs 7.31 - Pytorch GTX 1080 flip/rotation (×4)
Haiyun
XMU 14.52 2.14 Pytorch Track 1: Titan X Track 2,3,4: GTX 1080 Track1: flip/rotation (x8)
HIT-VPC 0.26 0.2 Matconvnet GTX 1080Ti -
ISP Team - 2.1 Tensorflow Titan X -
jingliting
1.27 0.72 ??? ??? -
join16
- 4.12 ??? GTX 1080 -
juanluisgonzales - 0.02 ??? ??? -
KAIST-VICLAB
0.44 1.60 Track1: Matconvnet, Track2,3,4: Tensorflow Titan Xp Track1: - Track2,3,4: flip/rotation (×8)
MCML 5.95 1.08 Tensorflow GTX 1080 Track1: flip/rotation (×8)
Mrobot0 10 - ??? ??? -
NMH
- 3.31 ??? ??? -
PDN - 13.07 Pytorch 4 Titan Xp Ensemble two variations of the proposed methods
Pixel Overflow 20 - Tensorflow Nvidia P100 -
rainbow
6.75 - Pytorch GTX 1080Ti flip/rotation (×8)
reveal.ai
92.95 - Pytorch Tesla K80 flip/rotation (×8)
SIA 396.0 396.0 Tensorflow CPU flip/rotation (×8)
srFans
- 0.10 Pytorch Tesla K80 -
SRFun 1 - Tensorflow GTX 1080Ti -
Toyota-TI 35 - Pytorch Titan X flip/rotation (×8)
UIUC-IFP
5.03 7.28 Pytorch P100 flip/rotation (×8)
UW18 300 - Matlab Intel Core i7-6700K CPU @ 4.00GHz -
zeweihe 1.02 - ??? ??? -
Runtime / efficiency BOE-SBG reported the lowest run-
time, 0.15s to super-resolve ×8 one LR image on GPU, but
ranked 17th on ‘Classic Bicubic’ 0.63dB lower than the best
ranked method of Toyota-TI. Among the top 4 methods on
‘Classic Bicubic’ track, rainbow achieved the best trade-off
between efficiency and performance. On a GTX 1080Ti
GPU, it takes 6.75s for rainbow, while 35s are necessary for
Toyota-TI per LR image to generate the HR image, includ-
ing self-ensemble for both methods.
Train data Data augmentation by scaling (only Track 1),
flipping, and rotation [
33] is another commonly used tech-
nique. Only a couple of teams, including Pixel
Overflow,
used extra data for training. Pixel
Overflow used images
from www.pexels.com, which is also the source of
many DIV2K images. HIT-VPC used Track 1 images to es-
timate downgrading operators on Tracks 3 and 4, thus their
‘lpj008’ entry in Table
1 is just for reference and not ranked
in the challenge.
Conclusions By analyzing the settings, the proposed meth-
ods and their results we can conclude: (i) The proposed
methods improve the state-of-the-art in SR. (ii) The top so-
lutions are consistent across the realistic tracks, yet the top
methods in ‘Classic Bicubic’ are not the top methods of the
realistic tracks domain specific knowledge (pre-alignment
of train images) was critical. (iii) As expected, the realistic
tracks are more challenging than the bicubic, reflected by
the relatively lower PSNR (up to 2dB for the winners) of the
results even if we compare ×8 with ×4. (iv) SSIM is more
967

(a) DBPN architecture
(b) the up- and down-projection units in DBPN
Figure 2. Toyota-TI’s DBPN network structure.
correlated with PSNR on ‘Classic Bicubic’ than on realistic
tracks. (v) High magnification factors and realistic settings
pose the extra problem of (subpixel) alignment between HR
results and ground truth. (vi) Other ranking measures are
necessary (such as perceptual ones). (vii) Further realistic
challenges could introduce non-uniform degradations.
4. Challenge Methods and Teams
4.1. Toyota-TI team proposed a deep back-projection net-
works (DBPN) [
9] (see Fig. 2) which uses error feedbacks
from the up- and down-scaling steps to guide the network to
achieve optimal result. Unlike the previous methods which
predict the SR image in feed-forward manner, DBPN adopts
mutually connected up- and down-sampling stages to gen-
erate LR as well as HR features, and accumulate both up-
and down-projection errors to predicting the final SR re-
sults. A group of LR features are firstly extracted from the
input LR image. Then, back-projection stages are utilized
to alternatively generate LR and HR feature maps L
t
and
H
t
, which further improved by dense connection where the
input for each projection unit is the concatenation of the
outputs from all previous units. At last, all the HR fea-
ture maps are utilized to reconstruct the final SR estimation
I
sr
= f
Rec
([H
1
, H
2
, . . . , H
t
]).
The structure of he newly introduced up-projection and
down-projection units are shown in Fig.
2(b). To deal with
classic bicubic ×8 downsampling SR problem, DBPN uses
12 × 12 convolutional layer with eight striding and two
padding in the projection units, and 19 projection units (10
up- and 9 down-projection units) have been adopted for gen-
erating the SR result.
The network is trained on images from DIV2K with aug-
mentation [
33]. At training phase, the input patch size is
set to 40 × 40 and the mini-batch size to 18. The model
is trained with L1 loss using ADAM optimizer [
18] with
Figure 3. rainbow’s network architecture.
Figure 4. DRZ’s asymmetric pyramidal architecture with DCU.
learning rate 1 × 10
4
and decrease by a factor of 10 for ev-
ery 5 × 10
5
iterations for total 10
6
iterations. In the testing
phase, the authors adopt the self-ensemble strategy [
33] to
further improve the SR results.
4.2. Pixel
Overflow team [4] utilized the same network
structure as EDSR [
21]. To get better SR performance,
external training data is adopted in the training phase.
Pixel
Overflow uses Sobel filter to extract output and tar-
get image edges to emphasize loss on the edges and details.
4.3. rainbow team proposed a method based on EDSR [
21]
and SRDenseNet [
11, 34] (Fig. 3). They employed a pyra-
mid architecture to gradually generate the HR image. In
order to trade-off the performance and the inference time,
they adopted a two-step enlargement strategy. They trained
the network with L1 loss and fine-tuned with L2 loss.
4.4. DRZ team proposed an asymmetric pyramidal struc-
ture for image SR [
36] (see Fig. 4). Each level of the
pyramid consists of a cascade of dense compression units
(DCUs), and a sub-pixel convolution layer is utilized to gen-
erate the residual map to reconstruct the HR image. DCU
consists of a smaller, modified densely connected block [
11]
followed by 1 × 1 convolution. Compared with the origi-
nal densely connected block proposed for classification, the
batch normalization (BN) layer has been removed in DCU.
In the training phase, curriculum learning [
5] strategy
has been adopted to achieve better SR performance and
shorter training time. Specifically, DRZ firstly trains the 2×
portion of the network and then gradually blend a new level
of pyramid to reduce the impact on the previously trained
layers. Curriculum learning adds an average of 0.07dB
PSNR on the validation set of DIV2K for 2×/4×/8× scales
compared to 0.03dB using normal multiscale training.
4.5. UIUC-IFP team proposed a wide activation SR net-
work (WDSR, see Fig.
5), which is a deep residual SR
968

Conv
ReLU
Conv
Add
ReLU
Add
Conv
Conv
(a) Residual blocks in EDSR [21] and WDSR.
(b)The overall network structure of EDSR [21] and WDSR.
Figure 5. UIUC-IFP’s WDSR unit architecture.
network (two-layer residual blocks) similar to the baseline
EDSR [
21]. To improve the SR performance, WDSR mod-
ify the original EDSR in three aspects. Firstly, in com-
parison with EDSR, WDSR reduces the width of iden-
tity mapping pathway and increases the width of feature
maps before the ReLU function in each residual block (see
Fig.
5(a)). Their experiments showed that WDSR is ex-
tremely effective for improving accuracy. Secondly, UIUC-
IFP follows recent works [8, 21, 31] which remove the BN
layer in the residual blocks and adopts weight normaliza-
tion in their WDSR approach, although the introducing of
weight normalization in training SR networks may not help
that much, it enables the authors to use higher learning rate
to train the network. Thirdly, WDSR removes some convo-
lution layers used in EDSR and directly generate the shuf-
fled SR estimation (see Fig.
5(b)), such a strategy is able to
improve the processing speed while not affect accuracy of
SR network.
For Track 1, UIUC-IFP utilized similar training parame-
ters as EDSR, the only difference is that weight normaliza-
tion enables UIUC-IFP to increase the learning rate 10× to
0.001. After training with L1 loss, the model is finetuned
with PSNR loss, directly. The finetune step leads to around
0.03dB PSNR improvement on the DIV2K validation set.
For Tracks 2, 3 and 4, UIUC-IFP utilized a pre-align step to
alleviate the random shift effects between the LR and HR
images. Specifically, the HR images are shifted up to 40
pixels, and then bicubic downscaled HR images are com-
pared with given realistic LR images to find coarse aligned
HR images for each LR image.
In the testing phase, a self-ensemble inference strategy
has been adopted to improve SR performance [
33].
4.6. PDN team proposed the PolyDenseNet (PDN)
(see Fig.
6) for image SR. The basic building block of
PDN is PolyDense Block (PDB), which is motivated by
PolyNet [42] and DenseNet [11]. Each PDB contains three
5-layer dense block and use three parameters α
1
, α
2
and
α
3
to combine the dense block outputs D
1
, D
2
and D
3
to
conv
PDB
PDB
PDB
SubPixel x2
SubPixel x2
conv
conv
conv
Dense
Block
Dense
Block
Dense
Block
Input
Output
PolyDense Block
(a) PolyDenseNet schema
Dense
Blok
Dense
Blok
Dense
Blok
Input
Output
Dense
Blok
Dense
Blok
Dense
Blok
Input
Output









th
PDB 󰇛 1󰇜
th
PDB
Outputfeaturesofthethirdlayerinadenselok
(b) variant with skip connections between two PDBs
Figure 6. PDN’s PolyDenseNet, a variant of PolyDenseBlock.
Figure 7. BMIPL UNIST’s network structures.
get the output. PDN team investigated also a PDN variant
by building skip connections between adjacent PDBs (see
Fig.
6(b)). The results by the two variants are ensembled at
test time. In the training phase, the authors upsample the
LR images and calculate the best shifting parameters w.r.t.
ground truth based on PSNR. For brightness scaling, the au-
thors adjust the pixel mean of LR images by the mean of its
corresponding ground-truth image.
4.7. BMIPL
UNIST team decomposed the original prob-
lems of NTIRE 2018 challenge into subproblems (SR at
various scales and denoising / deblurring) and proposed an
efficient module-based single image SR network [
27] (EM-
BSR, see Fig.
7). For an individual module network on SR,
they proposed EDSR-PP which integrated pyramid pooling
into the upsampling layer of EDSR [
21] for better utilizing
both the global and local context information. For a module
network on denoising / deblurring, they proposed a residual
convolution network (DnResNet) which replaced convolu-
tion blocks of DnCNN [
40] by residual blocks with BN and
969

Citations
More filters
Book ChapterDOI

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

TL;DR: ESRGAN as mentioned in this paper improves the perceptual loss by using the features before activation, which could provide stronger supervision for brightness consistency and texture recovery, and won the first place in the PIRM2018-SR Challenge (region 3).
Posted Content

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

TL;DR: This work proposes a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections, and proposes a channel attention mechanism to adaptively rescale channel-wise features by considering interdependencies among channels.
Book ChapterDOI

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

TL;DR: Very deep residual channel attention networks (RCAN) as mentioned in this paper proposes a residual in residual (RIR) structure to form very deep network, which consists of several residual groups with long skip connections Each residual group contains some residual blocks with short skip connections.
Posted Content

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

TL;DR: This work thoroughly study three key components of SRGAN – network architecture, adversarial loss and perceptual loss, and improves each of them to derive an Enhanced SRGAN (ESRGAN), which achieves consistently better visual quality with more realistic and natural textures than SRGAN.
Book ChapterDOI

The 2018 PIRM Challenge on Perceptual Image Super-Resolution

TL;DR: This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018, and concludes with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

Adam: A Method for Stochastic Optimization

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Journal ArticleDOI

Image quality assessment: from error visibility to structural similarity

TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Proceedings ArticleDOI

Densely Connected Convolutional Networks

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Journal ArticleDOI

A theory for multiresolution signal decomposition: the wavelet representation

TL;DR: In this paper, it is shown that the difference of information between the approximation of a signal at the resolutions 2/sup j+1/ and 2 /sup j/ (where j is an integer) can be extracted by decomposing this signal on a wavelet orthonormal basis of L/sup 2/(R/sup n/), the vector space of measurable, square-integrable n-dimensional functions.
Related Papers (5)
Frequently Asked Questions (13)
Q1. What are the contributions mentioned in the paper "Ntire 2018 challenge on single image super-resolution: methods and results" ?

This paper reviews the 2nd NTIRE challenge on single image super-resolution ( restoration of rich details in a low resolution image ) with focus on proposed solutions and results. The operators were learnable through provided pairs of low and high resolution train images. 

The network uses the first 32 residual blocks and a 1 × 1 convolution layer to compute the detail coefficients of HR image and utilize another 32 residual blocks to compute the approximation coefficients of HR image. 

Curriculum learning adds an average of 0.07dB PSNR on the validation set of DIV2K for 2×/4×/8× scales compared to 0.03dB using normal multiscale training. 

On a GTX 1080Ti GPU, it takes 6.75s for rainbow, while 35s are necessary for Toyota-TI per LR image to generate the HR image, including self-ensemble for both methods. 

Compared with the original upscaling layer in MDSR, which uses only one convolution layer without an activation function to increase the number of features, they introduce four residual modules and concatenate the outputs of the modules to increase the number of feature maps. 

Track 1: Classic Bicubic ×8 uses the bicubic downscaling (Matlab imresize, default settings), the most common setting from the recent SR literature, with factor ×8. 

It was the first challenge of its kind with tracks employing standard bicubic degradation and ‘unknown’ operators (blur and decimation) on the 1000 DIVerse 2K resolution images from DIV2K [1] dataset. 

In order to apply SRMD to tracks 2, 3 and 4, the blur kernel of which is unknown, HIT-VPC centers the blur kernels based on the largest values to align the LR image and HR image, and calculates the mean (aligned) degradation maps for each track. 

Since each sub-band map of HR wavelet coefficients are with the same size of LR image, the proposed network (see Fig. 15) do not need deconvolution or subpixel layers. 

For track 1, they used 48 residual blocks and 2 residual blocks in each residual module for feature extraction and upscaling, respectively. 

The deep residual net (ResNet) architecture [10] and the dense net (DenseNet) architecture [11] are the basis for most of the proposed methods. 

Because of the pixel shifts and scalings, for Tracks 2, 3, and 4 the authors consider all the translations ∈ [−40, 40] on both axes, compute PSNR and SSIM and report the most favorable scores. 

The objectives of the NTIRE 2018 challenge on example-based single-image super-resolution are: (i) to gauge and push the state-of-the-art in SR; (ii) to compare different solutions; and (iii) to promote realistic SR settings.