scispace - formally typeset
Search or ask a question
Book ChapterDOI

Analyzing Perception-Distortion Tradeoff Using Enhanced Perceptual Super-Resolution Network

TL;DR: The proposed network, called enhanced perceptual super-resolution network (EPSR), is trained with a combination of mean squared error loss, perceptual loss, and adversarial loss and achieves the state-of-the-art trade-off between distortion and perceptual quality while the existing methods perform well in either of these measures alone.
Abstract: Convolutional neural network (CNN) based methods have recently achieved great success for image super-resolution (SR). However, most deep CNN based SR models attempt to improve distortion measures (e.g. PSNR, SSIM, IFC, VIF) while resulting in poor quantified perceptual quality (e.g. human opinion score, no-reference quality measures such as NIQE). Few works have attempted to improve the perceptual quality at the cost of performance reduction in distortion measures. A very recent study has revealed that distortion and perceptual quality are at odds with each other and there is always a trade-off between the two. Often the restoration algorithms that are superior in terms of perceptual quality, are inferior in terms of distortion measures. Our work attempts to analyze the trade-off between distortion and perceptual quality for the problem of single image SR. To this end, we use the well-known SR architecture- enhanced deep super-resolution (EDSR) network and show that it can be adapted to achieve better perceptual quality for a specific range of the distortion measure. While the original network of EDSR was trained to minimize the error defined based on per-pixel accuracy alone, we train our network using a generative adversarial network framework with EDSR as the generator module. Our proposed network, called enhanced perceptual super-resolution network (EPSR), is trained with a combination of mean squared error loss, perceptual loss, and adversarial loss. Our experiments reveal that EPSR achieves the state-of-the-art trade-off between distortion and perceptual quality while the existing methods perform well in either of these measures alone.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way, which can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR.
Abstract: Image Super-Resolution (SR) is an important class of image processing techniqueso enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.

837 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...In practice, researchers often combine multiple loss functions by weighted average [8], [25], [27], [46], [141] for constraining different aspects of the generation process, especially for distortion-perception tradeoff [25], [103], [142], [143], [144]....

    [...]

Book ChapterDOI
08 Sep 2018
TL;DR: This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018, and concludes with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.
Abstract: This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018. In contrast to previous SR challenges, our evaluation methodology jointly quantifies accuracy and perceptual quality, therefore enabling perceptual-driven methods to compete alongside algorithms that target PSNR maximization. Twenty-one participating teams introduced algorithms which well-improved upon the existing state-of-the-art methods in perceptual SR, as confirmed by a human opinion study. We also analyze popular image quality measures and draw conclusions regarding which of them correlates best with human opinion scores. We conclude with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.

428 citations

Journal ArticleDOI
TL;DR: Both explainable and black-box models are suitable for solving practical problems, but experts in machine learning need to understand the input data, the problem to solve, and the best way for showing the output data before applying a machine learning model.
Abstract: Nowadays, in the international scientific community of machine learning, there exists an enormous discussion about the use of black-box models or explainable models; especially in practical problems. On the one hand, a part of the community defends that black-box models are more accurate than explainable models in some contexts, like image preprocessing. On the other hand, there exist another part of the community alleging that explainable models are better than black-box models because they can obtain comparable results and also they can explain these results in a language close to a human expert by using patterns. In this paper, advantages and weaknesses for each approach are shown; taking into account a state-of-the-art review for both approaches, their practical applications, trends, and future challenges. This paper shows that both approaches are suitable for solving practical problems, but experts in machine learning need to understand the input data, the problem to solve, and the best way for showing the output data before applying a machine learning model. Also, we propose some ideas for fusing both, explainable and black-box, approaches to provide better solutions to experts in real-world domains. Additionally, we show one way to measure the effectiveness of the applied machine learning model by using expert opinions jointly with statistical methods. Throughout this paper, we show the impact of using explainable and black-box models on the security and medical applications.

205 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...face frontal view generation [50], generating new human poses [51], photos to emojis [52], photograph editing [53]–[55], face aging [56], [57], photo blending [58], super resolution [59]–[61], photo inpainting [62]–[64], video prediction [65], and 3D object generation [66], [67]; among...

    [...]

Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this paper, the authors optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms, which results in more realistic textures and sharper edges.
Abstract: By benefiting from perceptual losses, recent studies have improved significantly the performance of the super-resolution task, where a high-resolution image is resolved from its low-resolution counterpart. Although such objective functions generate near-photorealistic results, their capability is limited, since they estimate the reconstruction error for an entire image in the same way, without considering any semantic information. In this paper, we propose a novel method to benefit from perceptual loss in a more objective way. We optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms. In particular, the proposed method leverages our proposed OBB (Object, Background and Boundary) labels, generated from segmentation labels, to estimate a suitable perceptual loss for boundaries, while considering texture similarity for backgrounds. We show that our proposed approach results in more realistic textures and sharper edges, and outperforms other state-of-the-art algorithms in terms of both qualitative results on standard benchmarks and results of extensive user studies.

113 citations

Posted Content
TL;DR: This study performs a comprehensive survey of the advancements in GANs design and optimization solutions and proposes a new taxonomy to structure solutions by key research issues and presents the promising research directions in this rapidly growing field.
Abstract: Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images, audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to inappropriate design of network architecture, use of objective function and selection of optimization algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs have been investigated based on techniques of re-engineered network architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge, there is no existing survey that has particularly focused on broad and systematic developments of these solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. We first identify key research issues within each design and optimization technique and then propose a new taxonomy to structure solutions by key research issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants proposed within each solution and their relationships. Finally, based on the insights gained, we present the promising research directions in this rapidly growing field.

88 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...Generation of realistic images has wide range of practical applications, such as anime character generation [159]–[164], image synthesis [165]–[168], super resolution [10], [124], [169]–[177], image editing and blending [178], [179], inpainting [125], [180]–[182], interactive image generation [183], [184], human pose estimation [185], [186], face aging [187], [188], 3D object detection [189]–[192], etc....

    [...]

References
More filters
Proceedings ArticleDOI
01 Sep 2009
TL;DR: This paper proposes a unified framework for combining the classical multi-image super-resolution and the example-based super- resolution, and shows how this combined approach can be applied to obtain super resolution from as little as a single image (with no database or prior examples).
Abstract: Methods for super-resolution can be broadly classified into two families of methods: (i) The classical multi-image super-resolution (combining images obtained at subpixel misalignments), and (ii) Example-Based super-resolution (learning correspondence between low and high resolution image patches from a database). In this paper we propose a unified framework for combining these two families of methods. We further show how this combined approach can be applied to obtain super resolution from as little as a single image (with no database or prior examples). Our approach is based on the observation that patches in a natural image tend to redundantly recur many times inside the image, both within the same scale, as well as across different scales. Recurrence of patches within the same image scale (at subpixel misalignments) gives rise to the classical super-resolution, whereas recurrence of patches across different scales of the same image gives rise to example-based super-resolution. Our approach attempts to recover at each pixel its best possible resolution increase based on its patch redundancy within and across scales.

1,923 citations


"Analyzing Perception-Distortion Tra..." refers background in this paper

  • ...establish a complex mapping between LR and HR image pairs. The works in [16,15] were some of the early approaches to learn such a complex mapping using example-pairs of LR and HR training patches. In [18], the presence of patch redundancies across scales within an image was exploited to generate more realistic textures. This idea was further extended by [21] wherein self-dictionaries were constructed ...

    [...]

Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this paper, a deeply-recursive convolutional network (DRCN) was proposed for image super-resolution using a very deep recursive layer (up to 16 recursions).
Abstract: We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). Our network has a very deep recursive layer (up to 16 recursions). Increasing recursion depth can improve performance without introducing new parameters for additional convolutions. Albeit advantages, learning a DRCN is very hard with a standard gradient descent method due to exploding/ vanishing gradients. To ease the difficulty of training, we propose two extensions: recursive-supervision and skip-connection. Our method outperforms previous methods by a large margin.

1,882 citations

Journal ArticleDOI
TL;DR: A learning-based method for low-level vision problems—estimating scenes from images with Bayesian belief propagation, applied to the “super-resolution” problem (estimating high frequency details from a low-resolution image), showing good results.
Abstract: We describe a learning-based method for low-level vision problems—estimating scenes from images. We generate a synthetic world of scenes and their corresponding rendered images, modeling their relationships with a Markov network. Bayesian belief propagation allows us to efficiently find a local maximum of the posterior probability for the scene, given an image. We call this approach VISTA—Vision by Image/Scene TrAining. We apply VISTA to the “super-resolution” problem (estimating high frequency details from a low-resolution image), showing good results. To illustrate the potential breadth of the technique, we also apply it in two other problem domains, both simplified. We learn to distinguish shading from reflectance variations in a single image under particular lighting conditions. For the motion estimation problem in a “blobs world”, we show figure/ground discrimination, solution of the aperture problem, and filling-in arising from application of the same probabilistic machinery.

1,647 citations


"Analyzing Perception-Distortion Tra..." refers background in this paper

  • ...The works in [16,15] were some of the early approaches to learn such a complex mapping using example-pairs of LR and HR training patches....

    [...]

Posted Content
TL;DR: This work proposes an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN) with two extensions: recursive-supervision and skip-connection, which outperforms previous methods by a large margin.
Abstract: We propose an image super-resolution method (SR) using a deeply-recursive convolutional network (DRCN). Our network has a very deep recursive layer (up to 16 recursions). Increasing recursion depth can improve performance without introducing new parameters for additional convolutions. Albeit advantages, learning a DRCN is very hard with a standard gradient descent method due to exploding/vanishing gradients. To ease the difficulty of training, we propose two extensions: recursive-supervision and skip-connection. Our method outperforms previous methods by a large margin.

1,565 citations


"Analyzing Perception-Distortion Tra..." refers methods in this paper

  • ... SISR with deep networks gained momentum with the primal work of Chao et al. [12]. While [12] used a 3 layer convolutional neural network (CNN), the subsequent works used deeper network architectures [23,24] and new techniques to improve the restoration accuracy [31,20] and computational complexity [40,13]. Despite signicant progress in both reconstruction accuracy and speed, a majority of the existing ...

    [...]

  • ...ared in [11,12] (SRCNN) wherein a 3 layer network was employed to learn the mapping between the desired HR image and its bicubic up-sampled LR image. This was followed by deeper network architectures [23,24] promising 4 Subeesh Vasu, Nimisha T. M., Rajagopalan A.N. performance improvement over SRCNN. [23] proposed to use residual-learning and gradient clipping with a high-learning rate, whereas [24] reli...

    [...]

Book ChapterDOI
01 Nov 2014
TL;DR: This work proposes A+, an improved variant of Anchored Neighborhood Regression, which combines the best qualities of ANR and SF and builds on the features and anchored regressors from ANR but instead of learning the regressors on the dictionary it uses the full training material, similar to SF.
Abstract: We address the problem of image upscaling in the form of single image super-resolution based on a dictionary of low- and high-resolution exemplars. Two recently proposed methods, Anchored Neighborhood Regression (ANR) and Simple Functions (SF), provide state-of-the-art quality performance. Moreover, ANR is among the fastest known super-resolution methods. ANR learns sparse dictionaries and regressors anchored to the dictionary atoms. SF relies on clusters and corresponding learned functions. We propose A+, an improved variant of ANR, which combines the best qualities of ANR and SF. A+ builds on the features and anchored regressors from ANR but instead of learning the regressors on the dictionary it uses the full training material, similar to SF. We validate our method on standard images and compare with state-of-the-art methods. We obtain improved quality (i.e. 0.2–0.7 dB PSNR better than ANR) and excellent time complexity, rendering A+ the most efficient dictionary-based super-resolution method to date.

1,418 citations


"Analyzing Perception-Distortion Tra..." refers methods in this paper

  • ...le dictionary. HR images from the web with similar contents were used with-in a structure-aware matching criterion to super-resolve landmark images in [48]. The class of neighbor embedding approaches [8,3,17,44,45] aim to nd similar looking LR training patches from a low dimensional manifold and then combine their corresponding HR patches for resolution enhancement. The overtting tendency of neighborhood appro...

    [...]