How many residual blocks are used to compute the approximation coefficients of HR image?

The network uses the first 32 residual blocks and a 1 × 1 convolution layer to compute the detail coefficients of HR image and utilize another 32 residual blocks to compute the approximation coefficients of HR image.

How many residual modules are used in the proposed EUM?

Compared with the original upscaling layer in MDSR, which uses only one convolution layer without an activation function to increase the number of features, they introduce four residual modules and concatenate the outputs of the modules to increase the number of feature maps.

What is the blur kernel of the SRMD?

In order to apply SRMD to tracks 2, 3 and 4, the blur kernel of which is unknown, HIT-VPC centers the blur kernels based on the largest values to align the LR image and HR image, and calculates the mean (aligned) degradation maps for each track.

What is the way to estimate the residual of a LR image?

Since each sub-band map of HR wavelet coefficients are with the same size of LR image, the proposed network (see Fig. 15) do not need deconvolution or subpixel layers.

How many residual blocks were used in each residual module?

For track 1, they used 48 residual blocks and 2 residual blocks in each residual module for feature extraction and upscaling, respectively.

Why do the authors use the SSIM and PSNR?

Because of the pixel shifts and scalings, for Tracks 2, 3, and 4 the authors consider all the translations ∈ [−40, 40] on both axes, compute PSNR and SSIM and report the most favorable scores.

(Open Access) NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results (2018) | Radu Timofte

Q: What is the common setting in the SR literature?

Track 1: Classic Bicubic ×8 uses the bicubic downscaling (Matlab imresize, default settings), the most common setting from the recent SR literature, with factor ×8.

Q: What was the first challenge of its kind?

It was the first challenge of its kind with tracks employing standard bicubic degradation and ‘unknown’ operators (blur and decimation) on the 1000 DIVerse 2K resolution images from DIV2K [1] dataset.

NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results

Radu Timofte Shuhang Gu Jiqing Wu Luc Van Gool Lei Zhang

Ming-Hsuan Yang Muhammad Haris Greg Shakhnarovich Norimichi Ukita

Shijia Hu Yijie Bei Zheng Hui Xiao Jiang Yanan Gu Jie Liu Yifan Wang

Federico Perazzi Brian McWilliams Alexander Sorkine-Hornung

Olga Sorkine-Hornung Christopher Schroers Jiahui Yu Yuchen Fan Jianchao Yang

Ning Xu Zhaowen Wang Xinchao Wang Thomas S. Huang Xintao Wang

Ke Yu Tak-Wai Hui Chao Dong Liang Lin Chen Change Loy Dongwon Park

Kwanyoung Kim Se Young Chun Kai Zhang Pengjv Liu Wangmeng Zuo

Shi Guo Jiye Liu Jinchang Xu Yijiao Liu Fengye Xiong Yuan Dong

Hongliang Bai Alexandru Damian Nikhil Ravi Sachit Menon Cynthia Rudin

Junghoon Seo Taegyun Jeon Jamyoung Koo Seunghyun Jeon Soo Ye Kim

Jae-Seok Choi Sehwan Ki Soomin Seo Hyeonjun Sim Saehun Kim

Munchurl Kim Rong Chen Kun Zeng Jinkang Guo Yanyun Qu Cuihua Li

Namhyuk Ahn Byungkon Kang Kyung-Ah Sohn Yuan Yuan Jiawei Zhang

Jiahao Pang Xiangyu Xu Yan Zhao Wei Deng Sibt Ul Hussain Muneeb Aadil

Raﬁa Rahim Xiaowang Cai Fang Huang Yueshu Xu Pablo Navarrete Michelini

Dan Zhu Hanwen Liu Jun-Hyuk Kim Jong-Seok Lee Yiwen Huang Ming Qiu

Liting Jing Jiehang Zeng Ying Wang Manoj Sharma Rudrabha Mukhopadhyay

Avinash Upadhyay Sriharsha Koundinya Ankit Shukla Santanu Chaudhury

Zhe Zhang Yu Hen Hu Lingzhi Fu

Abstract

This paper reviews the 2nd NTIRE challenge on single

image super-resolution (restoration of rich details in a low

resolution image) with focus on proposed solutions and re-

sults. The challenge had 4 tracks. Track 1 employed the

standard bicubic downscaling setup, while Tracks 2, 3 and

4 had realistic unknown downgrading operators simulat-

ing camera image acquisition pipeline. The operators were

learnable through provided pairs of low and high resolu-

tion train images. The tracks had 145, 114, 101, and 113

registered participants, resp., and 31 teams competed in the

ﬁnal testing phase. They gauge the state-of-the-art in single

image super-resolution.

1. Introduction

Example-based single image super-resolution (SR) tar-

gets the reconstruction of the lost high frequencies (rich

R. Timofte (timofter@vision.ee.ethz.ch, ETH Zurich), S. Gu, L. Van

Gool, L. Zhang and M.-H. Yang are the NTIRE 2018 organizers, while the

other authors participated in the challenge.

Appendix

A contains the authors’ teams and afﬁliations.

NTIRE webpage:

http://www.vision.ee.ethz.ch/ntire18/

details) in an image with the help of a set of prior exam-

ples of paired low resolution (LR) and high resolution (HR)

images. This problem is ill-posed, for each LR image the

space of plausible corresponding HR images is huge and

scales up quadratically with the magniﬁcation factor.

In the recent years the research literature largely focused

on example-based single image super-resolution. The per-

formance achieved by the top methods [

38, 32, 7, 16, 20,

31, 21] continuously improved.

The NTIRE 2017 challenge [

31, 1] was a step forward

in benchmarking SR. It was the ﬁrst challenge of its kind

with tracks employing standard bicubic degradation and

‘unknown’ operators (blur and decimation) on the 1000 DI-

Verse 2K resolution images from DIV2K [

1] dataset.

The NTIRE 2018 challenge builds upon NTIRE 2017

and goes further. In comparison with the previous edition,

NTIRE 2018: (1) uses the same DIV2K [1] dataset; (2)

has only one bicubic downscaling track with magniﬁcation

factor ×8; (3) promotes realistic settings emulating camera

acquisition pipeline through three tracks with gradually in-

creased difﬁculty.

965

2. NTIRE 2018 Challenge

The objectives of the NTIRE 2018 challenge on

example-based single-image super-resolution are: (i) to

gauge and push the state-of-the-art in SR; (ii) to compare

different solutions; and (iii) to promote realistic SR settings.

DIV2K Dataset [

1] employed by NTIRE 2017 SR chal-

lenge [

31] is used also in our challenge. DIV2K has 1000

DIVerse 2K resolution RGB images with 800 for training,

100 for validation and 100 for testing purposes. The man-

ually collected high quality images are diverse in contents.

2.1. Tracks

Access to data and submission of HR image results re-

quired registration on Codalab competition track.

Track 1: Classic Bicubic ×8 uses the bicubic downscal-

ing (Matlab imresize, default settings), the most common

setting from the recent SR literature, with factor ×8. It is

meant for easy deployment of recent proposed SR solutions.

Track 2: Realistic Mild × 4 adverse conditions assumes

that the degradation operators emulating the image acquisi-

tion process from a digital camera can be estimated through

training pairs of LR and HR images. The degradation op-

erators are the same (use the same controlling parameters)

within each image space and for all the images in train, val-

idation, and test sets. As in reality, the motion blur and

the Poisson noise are image dependent and can introduce

pixel shifts and scaling. Each ground truth (GT) image from

DIV2K is downgraded (×4) to LR images.

Track 3: Realistic Difﬁcult ×4 adverse conditions is sim-

ilar to Track 2, only the degradation is stronger.

Track 4: Realistic Wild ×4 adverse conditions is similar

to Tracks 2 and 3, the degradation operators are the same

within an image space but different from one image to an-

other. Some images are less degraded than other images.

This setting is the closest to real ‘wild’ conditions. Due

to increased complexity of the task 4 degraded LR images

were generated for each HR train image.

Challenge phases (1) Development phase: the partici-

pants got pairs of LR and HR train images and the LR val-

idation images of the DIV2K dataset; an online validation

server with a leaderboard provided immediate feedback for

the uploaded HR results to the LR validation images; (2)

Testing phase: the participants got test LR images and

were required to submit super-resolved HR image results,

code, and a factsheet for their method. After the end of the

challenge the ﬁnal results were released to the participants.

Evaluation protocol The quantitative measures are Peak

Signal-to-Noise Ratio (PSNR) measured in deciBels [dB]

and the Structural Similarity index (SSIM) [

37], both full-

reference measures computed between the HR result and

the GT image. We report averages over sets of images.

https://competitions.codalab.org

As in [31] we ignore a boundary of 6 + s image pixels (s

is the zoom factor). Because of the pixel shifts and scal-

ings, for Tracks 2, 3, and 4 we consider all the translations

∈ [−40, 40] on both axes, compute PSNR and SSIM and

report the most favorable scores. Due to time complexity,

for Tracks 2, 3, and 4 we computed PSNR and SSIM using

a 60 × 60px centered image crop during validation phase

and a 800 × 800px centered image crop for the ﬁnal results.

Figure 1. Sample LR input images for Track 1,2,3, and 4, resp.

3. Challenge Results

From ∼110 registered participants on average per each

track, 31 teams entered in the ﬁnal phase and submitted re-

sults, codes/executables, and factsheets. Table

1 reports the

ﬁnal test results and rankings of the challenge, while in Ta-

ble

2 the self-reported runtimes and major details are pro-

vided. The methods are brieﬂy described in section

4 and

the team members are listed in Appendix

Architectures and main ideas All the proposed methods,

excepting TSSR of UW18, are deep learning based. The

deep residual net (ResNet) architecture [

10] and the dense

net (DenseNet) architecture [

11] are the basis for most of

the proposed methods. For fast inference, thus train and test

time beneﬁts, most of the teams conduct the major SR op-

erations in the LR space. Several teams, such as UIUC-IFP,

BMIPL-UNIST, Pixel

Overﬂow, build their methods based

on EDSR [

21], the state-of-the-art approach and the winner

of the previous NTIRE 2017 SR challenge [

31, 1]; while,

other teams, such as Toyota-TI, HIT-VPC, DRZ, PDN, pro-

posed new architectures for SR.

Restoration ﬁdelity The top 4 methods from ‘Classic

Bicubic’ achieved similar PSNR scores (within 0.04dB).

DeepSR entry, ranked 12th, is only 0.17dB behind the

best PSNR score of Toyota-TI. On the realistic settings,

Tracks 2,3, and 4, due to the existence of noise and mo-

tion blur, the training strategy and the network architecture

plays are equally important. Although UIUC-IFP ranked

7th on ‘Classic Bicubic’, below DRZ and Duke Data Sci-

ence, it adopted a pre-alignment step for the training phase

and achieved the best performance on the realistic tracks 2

and 3, signiﬁcantly better than DRZ and Duke Data Sci-

ence. PDN ranked 1st on Track 4, however, without sub-

mitted results for the other tracks we cannot tell if their so-

lution/architecture is better than that of UIUC-IFP.

Ensembles and fusion Most teams employ pseudo-

ensembles [

33]. The inputs are ﬂipped/rotated and the HR

results are aligned and averaged for enhanced prediction.

966

Table 1. NTIRE 2018 SR Challenge results and ﬁnal rankings. Note that the ‘lpj008’ results are not ranked.

(a) Track 1 Classic Bicubic ×8

Team

Author PSNR SSIM

Toyota-TI iim lab 25.455 0.7088

Pixel

Overﬂow McCourt Hu 25.433 0.7067

rainbow

zheng222 25.428 0.7055

DRZ

yiﬁta 25.415 0.7068

Faceall

Xlabs xjc faceall 25.360 0.7031

Duke Data Science

admian98 25.356 0.7037

UIUC-IFP

jhyume 25.347 0.7023

Haiyun

XMU cr2018 25.338 0.7037

BMIPL

UNIST BMIPL UNIST 25.331 0.7026

Ajou-LAMDA-Lab

nmhkahn 25.318 0.7023

SIA

mikigom 25.290 0.7014

DeepSR

enoch 25.288 0.7015

Mrobot0 25.175 0.6960

reveal.ai

muneebaadil 25.137 0.6942

HIT-VPC

cskzh 25.088 0.6943

MCML

ghgh3269 24.875 0.7025

BOE-SBG

boe sbg 24.822 0.6817

SRFun

ccook 24.819 0.6829

KAIST-VICLAB

JSChoi 24.817 0.6810

zeweihe 24.773 0.6813

jingliting 24.714 0.6913

CEERI

harshakoundinya 24.687 0.6719

APSARA

MingQiu 24.618 0.6817

UW18

zzsmg 24.192 0.6531

Baseline Bicubic 23.703 0.6387

(b) Realistic Tracks 2, 3, & 4 ×8

Track 2 Mild Track 3 Difﬁcult Track 4 Wild

Team

Author PSNR SSIM PSNR SSIM PSNR SSIM

UIUC-IFP jhyume 23.631

(1)

0.6316 22.329

(1)

0.5721 23.080

(2)

0.6038

PDN

xixihaha 23.374

(1)

0.6122

BMIPL

UNIST BMIPL UNIST 23.579

(2)

0.6269 22.074

(2)

0.5590

HIT-VPC

∗

lpj008 22.249 0.5637 22.879 0.5936

HIT-VPC

cskzh 23.493

(3)

0.6174 21.450

(9)

0.5339 22.795

(3)

0.5829

SIA

mikigom 23.406

(5)

0.6275 21.899

(3)

0.5623 22.766

(4)

0.6023

KAIST-VICLAB

jschoi 23.455

(4)

0.6175 21.689

(6)

0.5434 22.732

(6)

0.5844

DRZ

yiﬁta 23.397

(6)

0.6160 21.592

(8)

0.5438 22.745

(5)

0.5881

srFans

yyuan13 23.218

(9)

0.6222 21.825

(4)

0.5573 22.707

(7)

0.5932

Duke Data Science

adamian98 23.374

(7)

0.6252 21.658

(7)

0.5400

bighead 23.247

(8)

0.6165

ISP Team hot milk 23.098

(11)

0.6167 21.779

(5)

0.5550 22.496

(8)

0.5867

BOE-SBG

boe sbg 23.123

(10)

0.6008 21.443

(10)

0.5275 22.352

(10)

0.5612

MCML

ghgh3269 22.953

(12)

0.6115 21.337

(11)

0.5354 22.472

(9)

0.5842

DeepSR

enoch 21.742

(15)

0.5572 20.674

(16)

0.5168 21.589

(12)

0.5444

jingliting 21.710

(16)

0.5384 20.973

(12)

0.5187 20.956

(14)

0.5214

Haiyun

XMU cr2018 21.519

(17)

0.5313 20.866

(13)

0.5072 21.367

(13)

0.5321

Ajou-LAMDA-Lab

nmhkahn 21.240

(18)

0.5376

Juanluisgonzales juanluisgonzales 22.625

(13)

0.5868

APSARA mingqiu 20.718

(15)

0.4977

NMH nmh 20.645

(17)

0.4890

join16 20.453

(19)

0.4928

Baseline Bicubic 22.391

(14)

0.5336 20.830

(14)

0.4631 21.761

(11)

0.4989

Table 2. Reported runtimes [s] per test image and details from the factsheets.

runtime [s]

Team Track 1 Track 2,3,4 Platform CPU/GPU (at runtime) Ensemble

Ajou-LAMDA-Lab 13.84 13.84 Pytorch GTX 1080Ti ﬂip/rotation (×8)

APSARA 30 30 Tensorﬂow GTX 1080Ti ﬂip/rotation (×8)

BOE-SBG 0.15 1.11 Pytorch Nivida P100 -

bighead

- 1.5

BMIPL UNIST 2.52 4.68 Pytorch ? ﬂip/rotation (×8)

CEERI

12.23 - Tensorﬂow,Keras GTX 1080 -

DeepSR 9.89 1.83 Tensorﬂow Titan X ﬂip/rotation (×8)

DRZ

11.65 2.91 Pytorch Titan Xp Track1: ﬂip/rotation (×8)

Duke Data Science 6.99 18 ??? Nivdia P100 ﬂip/rotation (×8)

Faceall Xlabs 7.31 - Pytorch GTX 1080 ﬂip/rotation (×4)

Haiyun

XMU 14.52 2.14 Pytorch Track 1: Titan X Track 2,3,4: GTX 1080 Track1: ﬂip/rotation (x8)

HIT-VPC 0.26 0.2 Matconvnet GTX 1080Ti -

ISP Team - 2.1 Tensorﬂow Titan X -

jingliting

1.27 0.72 ??? ??? -

join16

- 4.12 ??? GTX 1080 -

juanluisgonzales - 0.02 ??? ??? -

KAIST-VICLAB

0.44 1.60 Track1: Matconvnet, Track2,3,4: Tensorﬂow Titan Xp Track1: - Track2,3,4: ﬂip/rotation (×8)

MCML 5.95 1.08 Tensorﬂow GTX 1080 Track1: ﬂip/rotation (×8)

Mrobot0 10 - ??? ??? -

NMH

- 3.31 ??? ??? -

PDN - 13.07 Pytorch 4 Titan Xp Ensemble two variations of the proposed methods

Pixel Overﬂow 20 - Tensorﬂow Nvidia P100 -

rainbow

6.75 - Pytorch GTX 1080Ti ﬂip/rotation (×8)

reveal.ai

92.95 - Pytorch Tesla K80 ﬂip/rotation (×8)

SIA 396.0 396.0 Tensorﬂow CPU ﬂip/rotation (×8)

srFans

- 0.10 Pytorch Tesla K80 -

SRFun 1 - Tensorﬂow GTX 1080Ti -

Toyota-TI 35 - Pytorch Titan X ﬂip/rotation (×8)

UIUC-IFP

5.03 7.28 Pytorch P100 ﬂip/rotation (×8)

UW18 300 - Matlab Intel Core i7-6700K CPU @ 4.00GHz -

zeweihe 1.02 - ??? ??? -

Runtime / efﬁciency BOE-SBG reported the lowest run-

time, 0.15s to super-resolve ×8 one LR image on GPU, but

ranked 17th on ‘Classic Bicubic’ 0.63dB lower than the best

ranked method of Toyota-TI. Among the top 4 methods on

‘Classic Bicubic’ track, rainbow achieved the best trade-off

between efﬁciency and performance. On a GTX 1080Ti

GPU, it takes 6.75s for rainbow, while 35s are necessary for

Toyota-TI per LR image to generate the HR image, includ-

ing self-ensemble for both methods.

Train data Data augmentation by scaling (only Track 1),

ﬂipping, and rotation [

33] is another commonly used tech-

nique. Only a couple of teams, including Pixel

Overﬂow,

used extra data for training. Pixel

Overﬂow used images

from www.pexels.com, which is also the source of

many DIV2K images. HIT-VPC used Track 1 images to es-

timate downgrading operators on Tracks 3 and 4, thus their

‘lpj008’ entry in Table

1 is just for reference and not ranked

in the challenge.

Conclusions By analyzing the settings, the proposed meth-

ods and their results we can conclude: (i) The proposed

methods improve the state-of-the-art in SR. (ii) The top so-

lutions are consistent across the realistic tracks, yet the top

methods in ‘Classic Bicubic’ are not the top methods of the

realistic tracks – domain speciﬁc knowledge (pre-alignment

of train images) was critical. (iii) As expected, the realistic

tracks are more challenging than the bicubic, reﬂected by

the relatively lower PSNR (up to 2dB for the winners) of the

results even if we compare ×8 with ×4. (iv) SSIM is more

967

(a) DBPN architecture

(b) the up- and down-projection units in DBPN

Figure 2. Toyota-TI’s DBPN network structure.

correlated with PSNR on ‘Classic Bicubic’ than on realistic

tracks. (v) High magniﬁcation factors and realistic settings

pose the extra problem of (subpixel) alignment between HR

results and ground truth. (vi) Other ranking measures are

necessary (such as perceptual ones). (vii) Further realistic

challenges could introduce non-uniform degradations.

4. Challenge Methods and Teams

4.1. Toyota-TI team proposed a deep back-projection net-

works (DBPN) [

9] (see Fig. 2) which uses error feedbacks

from the up- and down-scaling steps to guide the network to

achieve optimal result. Unlike the previous methods which

predict the SR image in feed-forward manner, DBPN adopts

mutually connected up- and down-sampling stages to gen-

erate LR as well as HR features, and accumulate both up-

and down-projection errors to predicting the ﬁnal SR re-

sults. A group of LR features are ﬁrstly extracted from the

input LR image. Then, back-projection stages are utilized

to alternatively generate LR and HR feature maps L

and

, which further improved by dense connection where the

input for each projection unit is the concatenation of the

outputs from all previous units. At last, all the HR fea-

ture maps are utilized to reconstruct the ﬁnal SR estimation

= f

Rec

([H

, H

, . . . , H

]).

The structure of he newly introduced up-projection and

down-projection units are shown in Fig.

2(b). To deal with

classic bicubic ×8 downsampling SR problem, DBPN uses

12 × 12 convolutional layer with eight striding and two

padding in the projection units, and 19 projection units (10

up- and 9 down-projection units) have been adopted for gen-

erating the SR result.

The network is trained on images from DIV2K with aug-

mentation [

33]. At training phase, the input patch size is

set to 40 × 40 and the mini-batch size to 18. The model

is trained with L1 loss using ADAM optimizer [

18] with

Figure 3. rainbow’s network architecture.

Figure 4. DRZ’s asymmetric pyramidal architecture with DCU.

learning rate 1 × 10

and decrease by a factor of 10 for ev-

ery 5 × 10

iterations for total 10

iterations. In the testing

phase, the authors adopt the self-ensemble strategy [

33] to

further improve the SR results.

4.2. Pixel

Overﬂow team [4] utilized the same network

structure as EDSR [

21]. To get better SR performance,

external training data is adopted in the training phase.

Pixel

Overﬂow uses Sobel ﬁlter to extract output and tar-

get image edges to emphasize loss on the edges and details.

4.3. rainbow team proposed a method based on EDSR [

21]

and SRDenseNet [

11, 34] (Fig. 3). They employed a pyra-

mid architecture to gradually generate the HR image. In

order to trade-off the performance and the inference time,

they adopted a two-step enlargement strategy. They trained

the network with L1 loss and ﬁne-tuned with L2 loss.

4.4. DRZ team proposed an asymmetric pyramidal struc-

ture for image SR [

36] (see Fig. 4). Each level of the

pyramid consists of a cascade of dense compression units

(DCUs), and a sub-pixel convolution layer is utilized to gen-

erate the residual map to reconstruct the HR image. DCU

consists of a smaller, modiﬁed densely connected block [

11]

followed by 1 × 1 convolution. Compared with the origi-

nal densely connected block proposed for classiﬁcation, the

batch normalization (BN) layer has been removed in DCU.

In the training phase, curriculum learning [

5] strategy

has been adopted to achieve better SR performance and

shorter training time. Speciﬁcally, DRZ ﬁrstly trains the 2×

portion of the network and then gradually blend a new level

of pyramid to reduce the impact on the previously trained

layers. Curriculum learning adds an average of 0.07dB

PSNR on the validation set of DIV2K for 2×/4×/8× scales

compared to 0.03dB using normal multiscale training.

4.5. UIUC-IFP team proposed a wide activation SR net-

work (WDSR, see Fig.

5), which is a deep residual SR

968

Conv

ReLU

Conv

Add

ReLU

Add

Conv

(a) Residual blocks in EDSR [21] and WDSR.

(b)The overall network structure of EDSR [21] and WDSR.

Figure 5. UIUC-IFP’s WDSR unit architecture.

network (two-layer residual blocks) similar to the baseline

EDSR [

21]. To improve the SR performance, WDSR mod-

ify the original EDSR in three aspects. Firstly, in com-

parison with EDSR, WDSR reduces the width of iden-

tity mapping pathway and increases the width of feature

maps before the ReLU function in each residual block (see

Fig.

5(a)). Their experiments showed that WDSR is ex-

tremely effective for improving accuracy. Secondly, UIUC-

IFP follows recent works [8, 21, 31] which remove the BN

layer in the residual blocks and adopts weight normaliza-

tion in their WDSR approach, although the introducing of

weight normalization in training SR networks may not help

that much, it enables the authors to use higher learning rate

to train the network. Thirdly, WDSR removes some convo-

lution layers used in EDSR and directly generate the shuf-

ﬂed SR estimation (see Fig.

5(b)), such a strategy is able to

improve the processing speed while not affect accuracy of

SR network.

For Track 1, UIUC-IFP utilized similar training parame-

ters as EDSR, the only difference is that weight normaliza-

tion enables UIUC-IFP to increase the learning rate 10× to

0.001. After training with L1 loss, the model is ﬁnetuned

with PSNR loss, directly. The ﬁnetune step leads to around

0.03dB PSNR improvement on the DIV2K validation set.

For Tracks 2, 3 and 4, UIUC-IFP utilized a pre-align step to

alleviate the random shift effects between the LR and HR

images. Speciﬁcally, the HR images are shifted up to 40

pixels, and then bicubic downscaled HR images are com-

pared with given realistic LR images to ﬁnd coarse aligned

HR images for each LR image.

In the testing phase, a self-ensemble inference strategy

has been adopted to improve SR performance [

33].

4.6. PDN team proposed the PolyDenseNet (PDN)

(see Fig.

6) for image SR. The basic building block of

PDN is PolyDense Block (PDB), which is motivated by

PolyNet [42] and DenseNet [11]. Each PDB contains three

5-layer dense block and use three parameters α

, α

and

to combine the dense block outputs D

, D

and D

conv

PDB

…

SubPixel x2

conv

Dense

Block

Dense

Block

Dense

Block

Input

Output

































PolyDense Block

(a) PolyDenseNet schema

Dense

Blok

Dense

Blok

Dense

Blok

Input

Output

Dense

Blok

Dense

Blok

Dense

Blok

Input

Output













































th

PDB 󰇛  1󰇜

th

PDB

Outputfeaturesofthethirdlayerinadenselok

(b) variant with skip connections between two PDBs

Figure 6. PDN’s PolyDenseNet, a variant of PolyDenseBlock.

Figure 7. BMIPL UNIST’s network structures.

get the output. PDN team investigated also a PDN variant

by building skip connections between adjacent PDBs (see

Fig.

6(b)). The results by the two variants are ensembled at

test time. In the training phase, the authors upsample the

LR images and calculate the best shifting parameters w.r.t.

ground truth based on PSNR. For brightness scaling, the au-

thors adjust the pixel mean of LR images by the mean of its

corresponding ground-truth image.

4.7. BMIPL

UNIST team decomposed the original prob-

lems of NTIRE 2018 challenge into subproblems (SR at

various scales and denoising / deblurring) and proposed an

efﬁcient module-based single image SR network [

27] (EM-

BSR, see Fig.

7). For an individual module network on SR,

they proposed EDSR-PP which integrated pyramid pooling

into the upsampling layer of EDSR [

21] for better utilizing

both the global and local context information. For a module

network on denoising / deblurring, they proposed a residual

convolution network (DnResNet) which replaced convolu-

tion blocks of DnCNN [

40] by residual blocks with BN and

969

NTIRE 2018 Challenge on Single Image Super-Resolution: Methods and Results

Figures

Citations

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

The 2018 PIRM Challenge on Perceptual Image Super-Resolution

References

Deep Residual Learning for Image Recognition

Adam: A Method for Stochastic Optimization

Image quality assessment: from error visibility to structural similarity

Densely Connected Convolutional Networks

A theory for multiresolution signal decomposition: the wavelet representation

Related Papers (5)

Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network

Deep Residual Learning for Image Recognition

Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network

Perceptual Losses for Real-Time Style Transfer and Super-Resolution

Image Super-Resolution Using Deep Convolutional Networks

Frequently Asked Questions (13)

Q1. What are the contributions mentioned in the paper "Ntire 2018 challenge on single image super-resolution: methods and results" ?

Q2. How many residual blocks are used to compute the approximation coefficients of HR image?

Q3. How much PSNR does curriculum learning add?

Q4. How long does it take to generate the HR image?

Q5. How many residual modules are used in the proposed EUM?

Q6. What is the common setting in the SR literature?

Q7. What was the first challenge of its kind?

Q8. What is the blur kernel of the SRMD?

Q9. What is the way to estimate the residual of a LR image?

Q10. How many residual blocks were used in each residual module?

Q11. What are the two main methods used in the NTIRE 2017 SR challenge?

Q12. Why do the authors use the SSIM and PSNR?

Q13. What are the objectives of the NTIRE 2018 challenge?