Learning to Super-Resolve Blurry Face and Text Images

doi:10.1109/ICCV.2017.36

Home
/
Papers
/
Learning to Super-Resolve Blurry Face and Text Images

Proceedings Article•DOI•

Learning to Super-Resolve Blurry Face and Text Images

Xiangyu Xu¹, Xiangyu Xu², Xiangyu Xu³, Deqing Sun³, Deqing Sun⁴, Jinshan Pan⁵, Yu-Jin Zhang¹, Hanspeter Pfister³, Ming-Hsuan Yang² - Show less +5 more•Institutions (5)

Tsinghua University¹, University of California, Merced², Harvard University³, Nvidia⁴, Nanjing University of Science and Technology⁵

01 Oct 2017-pp 251-260

TL;DR: This work presents an algorithm to directly restore a clear highresolution image from a blurry low-resolution input and introduces novel training losses that help recover fine details.

read less

Abstract: We present an algorithm to directly restore a clear highresolution image from a blurry low-resolution input. This problem is highly ill-posed and the basic assumptions for existing super-resolution methods (requiring clear input) and deblurring methods (requiring high-resolution input) no longer hold. We focus on face and text images and adopt a generative adversarial network (GAN) to learn a category-specific prior to solve this problem. However, the basic GAN formulation does not generate realistic highresolution images. In this work, we introduce novel training losses that help recover fine details. We also present a multi-class GAN that can process multi-class image restoration tasks, i.e., face and text images, using a single generator network. Extensive experiments demonstrate that our method performs favorably against the state-of-the-art methods on both synthetic and real-world images at a lower computational cost.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Deep Learning for Image Super-Resolution: A Survey

[...]

Zhihao Wang¹, Jian Chen¹, Steven C. H. Hoi²•Institutions (2)

South China University of Technology¹, Salesforce.com²

01 Oct 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way, which can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR.

...read moreread less

Abstract: Image Super-Resolution (SR) is an important class of image processing techniqueso enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.

...read moreread less

837 citations

Cites background or methods from "Learning to Super-Resolve Blurry Fa..."

...[63] propose a multi-class GAN-based FH model composed of a generic generator and class-specific discriminators....
[...]
..., PSNR) but far exceeding others in terms of perceptual quality, in which case the MOS testing is the most reliable IQA method for accurately measuring the perceptual quality [8], [25], [46], [62], [63], [64], [65]....
[...]
...[63] incorporate a multi-class GAN consisting of a generator and multiple class-specific discriminators....
[...]

Proceedings Article•DOI•

Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform

[...]

Xintao Wang¹, Ke Yu¹, Chao Dong², Chen Change Loy¹•Institutions (2)

The Chinese University of Hong Kong¹, SenseTime²

18 Jun 2018

TL;DR: In this paper, a spatial feature transform (SFT) layer was proposed to generate affine transformation parameters for spatial-wise feature modulation in a single-image super-resolution network.

...read moreread less

Abstract: Despite that convolutional neural networks (CNN) have recently demonstrated high-quality reconstruction for single-image super-resolution (SR), recovering natural and realistic texture remains a challenging problem. In this paper, we show that it is possible to recover textures faithful to semantic classes. In particular, we only need to modulate features of a few intermediate layers in a single network conditioned on semantic segmentation probability maps. This is made possible through a novel Spatial Feature Transform (SFT) layer that generates affine transformation parameters for spatial-wise feature modulation. SFT layers can be trained end-to-end together with the SR network using the same loss function. During testing, it accepts an input image of arbitrary size and generates a high-resolution image with just a single forward pass conditioned on the categorical priors. Our final results show that an SR network equipped with SFT can generate more realistic and visually pleasing textures in comparison to state-of-the-art SRGAN [27] and EnhanceNet [38].

...read moreread less

597 citations

Proceedings Article•DOI•

Gated Fusion Network for Single Image Dehazing

[...]

Wenqi Ren, Lin Ma¹, Jiawei Zhang², Jinshan Pan³, Xiaochun Cao⁴, Wei Liu¹, Ming-Hsuan Yang⁵ - Show less +3 more•Institutions (5)

Tencent¹, City University of Hong Kong², Nanjing University of Science and Technology³, Chinese Academy of Sciences⁴, University of California, Merced⁵

18 Jun 2018

TL;DR: An efficient algorithm to directly restore a clear image from a hazy input using an end-to-end trainable neural network that consists of an encoder and a decoder is proposed.

...read moreread less

Abstract: In this paper, we propose an efficient algorithm to directly restore a clear image from a hazy input. The proposed algorithm hinges on an end-to-end trainable neural network that consists of an encoder and a decoder. The encoder is exploited to capture the context of the derived input images, while the decoder is employed to estimate the contribution of each input to the final dehazed result using the learned representations attributed to the encoder. The constructed network adopts a novel fusion-based strategy which derives three inputs from an original hazy image by applying White Balance (WB), Contrast Enhancing (CE), and Gamma Correction (GC). We compute pixel-wise confidence maps based on the appearance differences between these different inputs to blend the information of the derived inputs and preserve the regions with pleasant visibility. The final dehazed image is yielded by gating the important features of the derived inputs. To train the network, we introduce a multi-scale approach such that the halo artifacts can be avoided. Extensive experimental results on both synthetic and real-world images demonstrate that the proposed algorithm performs favorably against the state-of-the-art algorithms.

...read moreread less

567 citations

Cites methods from "Learning to Super-Resolve Blurry Fa..."

...Recently, CNNs have also been used for image recovering problems [4, 14, 39, 42, 43, 44]....
[...]

Proceedings Article•DOI•

Single Image Dehazing via Conditional Generative Adversarial Network

[...]

Runde Li¹, Jinshan Pan¹, Zechao Li¹, Jinhui Tang¹•Institutions (1)

Nanjing University of Science and Technology¹

01 Jun 2018

TL;DR: This paper proposes an algorithm to directly restore a clear image from a hazy image based on a conditional generative adversarial network (cGAN), where the clear image is estimated by an end-to-end trainable neural network.

...read moreread less

Abstract: In this paper, we present an algorithm to directly restore a clear image from a hazy image. This problem is highly ill-posed and most existing algorithms often use hand-crafted features, e.g., dark channel, color disparity, maximum contrast, to estimate transmission maps and then atmospheric lights. In contrast, we solve this problem based on a conditional generative adversarial network (cGAN), where the clear image is estimated by an end-to-end trainable neural network. Different from the generative network in basic cGAN, we propose an encoder and decoder architecture so that it can generate better results. To generate realistic clear images, we further modify the basic cGAN formulation by introducing the VGG features and an L1-regularized gradient prior. We also synthesize a hazy dataset including indoor and outdoor scenes to train and evaluate the proposed algorithm. Extensive experimental results demonstrate that the proposed method performs favorably against the state-of-the-art methods on both synthetic dataset and real world hazy images.

...read moreread less

350 citations

Cites methods from "Learning to Super-Resolve Blurry Fa..."

...We also note that several methods develop GANs [39] to solve image deraining and image super-resolution [36]....
[...]

Proceedings Article•DOI•

Dynamic Scene Deblurring Using Spatially Variant Recurrent Neural Networks

[...]

Jiawei Zhang¹, Jinshan Pan², Jimmy Ren², Yibing Song³, Linchao Bao³, Rynson W. H. Lau¹, Ming-Hsuan Yang⁴ - Show less +3 more•Institutions (4)

City University of Hong Kong¹, SenseTime², Tencent³, University of California, Merced⁴

18 Jun 2018

TL;DR: Quantitative and qualitative evaluations on public datasets demonstrate that the proposed method performs favorably against state-of-the-art algorithms in terms of accuracy, speed, and model size.

...read moreread less

Abstract: Due to the spatially variant blur caused by camera shake and object motions under different scene depths, deblurring images captured from dynamic scenes is challenging. Although recent works based on deep neural networks have shown great progress on this problem, their models are usually large and computationally expensive. In this paper, we propose a novel spatially variant neural network to address the problem. The proposed network is composed of three deep convolutional neural networks (CNNs) and a recurrent neural network (RNN). RNN is used as a deconvolution operator performed on feature maps extracted from the input image by one of the CNNs. Another CNN is used to learn the weights for the RNN at every location. As a result, the RNN is spatially variant and could implicitly model the deblurring process with spatially variant kernels. The third CNN is used to reconstruct the final deblurred feature maps into restored image. The whole network is end-to-end trainable. Our analysis shows that the proposed network has a large receptive field even with a small model size. Quantitative and qualitative evaluations on public datasets demonstrate that the proposed method performs favorably against state-of-the-art algorithms in terms of accuracy, speed, and model size.

...read moreread less

335 citations

Cites background from "Learning to Super-Resolve Blurry Fa..."

...Recently, deep learning has been widely used in many low-level vision problems, such as denoising [1, 20, 45], super-resolution[4, 35, 13, 14, 15, 43, 31], dehazing [27], derain/dedirt [6, 5], edge-preserving filtering [39, 18], and image deblurring (non-blind [28, 38, 44] and blind [29, 2, 42])....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

Adam: A Method for Stochastic Optimization

[...]

Diederik P. Kingma¹, Jimmy Ba²•Institutions (2)

University of Amsterdam¹, University of Toronto²

01 Jan 2015

TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.

...read moreread less

Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

...read moreread less

111,197 citations

"Learning to Super-Resolve Blurry Fa..." refers methods in this paper

...[34] we train the models using the Adam optimizer [19] with momentum terms β1 = 0....
[...]

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Journal Article•DOI•

Generative Adversarial Nets

[...]

Ian Goodfellow¹, Jean Pouget-Abadie¹, Mehdi Mirza¹, Bing Xu¹, David Warde-Farley¹, Sherjil Ozair², Aaron Courville¹, Yoshua Bengio¹ - Show less +4 more•Institutions (2)

Université de Montréal¹, Indian Institute of Technology Delhi²

08 Dec 2014

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to ½ everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

...read moreread less

38,211 citations

"Learning to Super-Resolve Blurry Fa..." refers background or methods in this paper

...[13] introduce the GAN framework for training generative models that can generate realistic-looking images from random noise....
[...]
...Moreover, different from most methods based on GAN [13, 34, 6] that generate images from random noise, the input of our network is degraded images which contain substantial information for reconstruction....
[...]
...Specifically, we adopt a generative adversarial network (GAN) [13], which consists of generator and discriminator sub-networks that compete with each other....
[...]
...[13], in practice we train G to maximize log(Dθ(Gω(y))) which provides more sufficient gradients and leads to more stable solution than minimizing log(1 − Dθ(Gω(y))) in (2)....
[...]

Proceedings Article•

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

[...]

Sergey Ioffe¹, Christian Szegedy¹•Institutions (1)

Google¹

06 Jul 2015

TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.

...read moreread less

Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

...read moreread less

30,843 citations