scispace - formally typeset
Search or ask a question
Book ChapterDOI

Analyzing Perception-Distortion Tradeoff Using Enhanced Perceptual Super-Resolution Network

TL;DR: The proposed network, called enhanced perceptual super-resolution network (EPSR), is trained with a combination of mean squared error loss, perceptual loss, and adversarial loss and achieves the state-of-the-art trade-off between distortion and perceptual quality while the existing methods perform well in either of these measures alone.
Abstract: Convolutional neural network (CNN) based methods have recently achieved great success for image super-resolution (SR). However, most deep CNN based SR models attempt to improve distortion measures (e.g. PSNR, SSIM, IFC, VIF) while resulting in poor quantified perceptual quality (e.g. human opinion score, no-reference quality measures such as NIQE). Few works have attempted to improve the perceptual quality at the cost of performance reduction in distortion measures. A very recent study has revealed that distortion and perceptual quality are at odds with each other and there is always a trade-off between the two. Often the restoration algorithms that are superior in terms of perceptual quality, are inferior in terms of distortion measures. Our work attempts to analyze the trade-off between distortion and perceptual quality for the problem of single image SR. To this end, we use the well-known SR architecture- enhanced deep super-resolution (EDSR) network and show that it can be adapted to achieve better perceptual quality for a specific range of the distortion measure. While the original network of EDSR was trained to minimize the error defined based on per-pixel accuracy alone, we train our network using a generative adversarial network framework with EDSR as the generator module. Our proposed network, called enhanced perceptual super-resolution network (EPSR), is trained with a combination of mean squared error loss, perceptual loss, and adversarial loss. Our experiments reveal that EPSR achieves the state-of-the-art trade-off between distortion and perceptual quality while the existing methods perform well in either of these measures alone.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way, which can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR.
Abstract: Image Super-Resolution (SR) is an important class of image processing techniqueso enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.

837 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...In practice, researchers often combine multiple loss functions by weighted average [8], [25], [27], [46], [141] for constraining different aspects of the generation process, especially for distortion-perception tradeoff [25], [103], [142], [143], [144]....

    [...]

Book ChapterDOI
08 Sep 2018
TL;DR: This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018, and concludes with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.
Abstract: This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018. In contrast to previous SR challenges, our evaluation methodology jointly quantifies accuracy and perceptual quality, therefore enabling perceptual-driven methods to compete alongside algorithms that target PSNR maximization. Twenty-one participating teams introduced algorithms which well-improved upon the existing state-of-the-art methods in perceptual SR, as confirmed by a human opinion study. We also analyze popular image quality measures and draw conclusions regarding which of them correlates best with human opinion scores. We conclude with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.

428 citations

Journal ArticleDOI
TL;DR: Both explainable and black-box models are suitable for solving practical problems, but experts in machine learning need to understand the input data, the problem to solve, and the best way for showing the output data before applying a machine learning model.
Abstract: Nowadays, in the international scientific community of machine learning, there exists an enormous discussion about the use of black-box models or explainable models; especially in practical problems. On the one hand, a part of the community defends that black-box models are more accurate than explainable models in some contexts, like image preprocessing. On the other hand, there exist another part of the community alleging that explainable models are better than black-box models because they can obtain comparable results and also they can explain these results in a language close to a human expert by using patterns. In this paper, advantages and weaknesses for each approach are shown; taking into account a state-of-the-art review for both approaches, their practical applications, trends, and future challenges. This paper shows that both approaches are suitable for solving practical problems, but experts in machine learning need to understand the input data, the problem to solve, and the best way for showing the output data before applying a machine learning model. Also, we propose some ideas for fusing both, explainable and black-box, approaches to provide better solutions to experts in real-world domains. Additionally, we show one way to measure the effectiveness of the applied machine learning model by using expert opinions jointly with statistical methods. Throughout this paper, we show the impact of using explainable and black-box models on the security and medical applications.

205 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...face frontal view generation [50], generating new human poses [51], photos to emojis [52], photograph editing [53]–[55], face aging [56], [57], photo blending [58], super resolution [59]–[61], photo inpainting [62]–[64], video prediction [65], and 3D object generation [66], [67]; among...

    [...]

Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this paper, the authors optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms, which results in more realistic textures and sharper edges.
Abstract: By benefiting from perceptual losses, recent studies have improved significantly the performance of the super-resolution task, where a high-resolution image is resolved from its low-resolution counterpart. Although such objective functions generate near-photorealistic results, their capability is limited, since they estimate the reconstruction error for an entire image in the same way, without considering any semantic information. In this paper, we propose a novel method to benefit from perceptual loss in a more objective way. We optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms. In particular, the proposed method leverages our proposed OBB (Object, Background and Boundary) labels, generated from segmentation labels, to estimate a suitable perceptual loss for boundaries, while considering texture similarity for backgrounds. We show that our proposed approach results in more realistic textures and sharper edges, and outperforms other state-of-the-art algorithms in terms of both qualitative results on standard benchmarks and results of extensive user studies.

113 citations

Posted Content
TL;DR: This study performs a comprehensive survey of the advancements in GANs design and optimization solutions and proposes a new taxonomy to structure solutions by key research issues and presents the promising research directions in this rapidly growing field.
Abstract: Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images, audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to inappropriate design of network architecture, use of objective function and selection of optimization algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs have been investigated based on techniques of re-engineered network architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge, there is no existing survey that has particularly focused on broad and systematic developments of these solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. We first identify key research issues within each design and optimization technique and then propose a new taxonomy to structure solutions by key research issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants proposed within each solution and their relationships. Finally, based on the insights gained, we present the promising research directions in this rapidly growing field.

88 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...Generation of realistic images has wide range of practical applications, such as anime character generation [159]–[164], image synthesis [165]–[168], super resolution [10], [124], [169]–[177], image editing and blending [178], [179], inpainting [125], [180]–[182], interactive image generation [183], [184], human pose estimation [185], [186], face aging [187], [188], 3D object detection [189]–[192], etc....

    [...]

References
More filters
Posted Content
TL;DR: SRGAN, a generative adversarial network (GAN) for image super-resolution (SR), is presented, to its knowledge, the first framework capable of inferring photo-realistic natural images for 4x upscaling factors and a perceptual loss function which consists of an adversarial loss and a content loss.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

4,404 citations


"Analyzing Perception-Distortion Tra..." refers background or methods or result in this paper

  • ...Similar to the work in [28], we used VGG54 as the feature extraction layer (i....

    [...]

  • ...The work in [28] came up with a deeper architecture made of residual blocks for LR feature learning, called SRResNet....

    [...]

  • ...Previous studies on perceptual SISR [28,39] have shown that the use of perceptual loss LVGG can provide further boost in the detail enhancement if used along with adverserial loss....

    [...]

  • ...This is the main motivation behind the recent works on SISR [22,28,39,34] that came up with new ways to improve the perceptual quality of reconstructed images....

    [...]

  • ...In the context of SISR, the optimal MSE estimator returns the mean of many possible solutions [28,39] which often leads to blurry, overly smooth, and unnatural appearance in the output, especially at the information-rich regions....

    [...]

Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, a very deep convolutional network inspired by VGG-net was used for image superresolution, which achieved state-of-the-art performance in accuracy.
Abstract: We present a highly accurate single-image superresolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification [19]. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual information over large image regions is exploited in an efficient way. With very deep networks, however, convergence speed becomes a critical issue during training. We propose a simple yet effective training procedure. We learn residuals only and use extremely high learning rates (104 times higher than SRCNN [6]) enabled by adjustable gradient clipping. Our proposed method performs better than existing methods in accuracy and visual improvements in our results are easily noticeable.

4,136 citations

Journal ArticleDOI
TL;DR: This work has recently derived a blind IQA model that only makes use of measurable deviations from statistical regularities observed in natural images, without training on human-rated distorted images, and, indeed, without any exposure to distorted images.
Abstract: An important aim of research on the blind image quality assessment (IQA) problem is to devise perceptual models that can predict the quality of distorted images with as little prior knowledge of the images or their distortions as possible. Current state-of-the-art “general purpose” no reference (NR) IQA algorithms require knowledge about anticipated distortions in the form of training examples and corresponding human opinion scores. However we have recently derived a blind IQA model that only makes use of measurable deviations from statistical regularities observed in natural images, without training on human-rated distorted images, and, indeed without any exposure to distorted images. Thus, it is “completely blind.” The new IQA model, which we call the Natural Image Quality Evaluator (NIQE) is based on the construction of a “quality aware” collection of statistical features based on a simple and successful space domain natural scene statistic (NSS) model. These features are derived from a corpus of natural, undistorted images. Experimental results show that the new index delivers performance comparable to top performing NR IQA models that require training on large databases of human opinions of distorted images. A software release is available at http://live.ece.utexas.edu/research/quality/niqe_release.zip.

3,722 citations


"Analyzing Perception-Distortion Tra..." refers methods in this paper

  • ...PI is computed by combining the quality measures of Ma-score [32] and NIQE [36] as follows PI = 1 2 ((10−Ma-score) + NIQE) (7) Note that, a lower PI indicates better perceptual quality....

    [...]

  • ...PI is computed by combining the quality measures of Ma-score [32] and NIQE [36] as follows...

    [...]

Posted Content
TL;DR: This work presents a highly accurate single-image superresolution (SR) method using a very deep convolutional network inspired by VGG-net used for ImageNet classification and uses extremely high learning rates enabled by adjustable gradient clipping.
Abstract: We present a highly accurate single-image super-resolution (SR) method. Our method uses a very deep convolutional network inspired by VGG-net used for ImageNet classification \cite{simonyan2015very}. We find increasing our network depth shows a significant improvement in accuracy. Our final model uses 20 weight layers. By cascading small filters many times in a deep network structure, contextual information over large image regions is exploited in an efficient way. With very deep networks, however, convergence speed becomes a critical issue during training. We propose a simple yet effective training procedure. We learn residuals only and use extremely high learning rates ($10^4$ times higher than SRCNN \cite{dong2015image}) enabled by adjustable gradient clipping. Our proposed method performs better than existing methods in accuracy and visual improvements in our results are easily noticeable.

3,628 citations


"Analyzing Perception-Distortion Tra..." refers background or methods in this paper

  • ...This was followed by deeper network architectures [23,24] promising...

    [...]

  • ...[23] proposed to use residual-learning and gradient clipping with a high-learning rate, whereas [24] relied on a deep recursive layer architecture....

    [...]

  • ...While [12] used a 3 layer convolutional neural network (CNN), the subsequent works used deeper network architectures [23,24] and new techniques to improve the restoration accuracy [31,20] and computational complexity [40,13]....

    [...]

Journal ArticleDOI
TL;DR: The goal of this article is to introduce the concept of SR algorithms to readers who are unfamiliar with this area and to provide a review for experts to present the technical review of various existing SR methodologies which are often employed.
Abstract: A new approach toward increasing spatial resolution is required to overcome the limitations of the sensors and optics manufacturing technology. One promising approach is to use signal processing techniques to obtain an high-resolution (HR) image (or sequence) from observed multiple low-resolution (LR) images. Such a resolution enhancement approach has been one of the most active research areas, and it is called super resolution (SR) (or HR) image reconstruction or simply resolution enhancement. In this article, we use the term "SR image reconstruction" to refer to a signal processing approach toward resolution enhancement because the term "super" in "super resolution" represents very well the characteristics of the technique overcoming the inherent resolution limitation of LR imaging systems. The major advantage of the signal processing approach is that it may cost less and the existing LR imaging systems can be still utilized. The SR image reconstruction is proved to be useful in many practical cases where multiple frames of the same scene can be obtained, including medical imaging, satellite imaging, and video applications. The goal of this article is to introduce the concept of SR algorithms to readers who are unfamiliar with this area and to provide a review for experts. To this purpose, we present the technical review of various existing SR methodologies which are often employed. Before presenting the review of existing SR algorithms, we first model the LR image acquisition process.

3,491 citations


"Analyzing Perception-Distortion Tra..." refers background in this paper

  • ...Though there exist extensive literature studies on multi-image SR [6,38,14], here we limit our discussions to SISR works alone....

    [...]