scispace - formally typeset
Search or ask a question
Book ChapterDOI

Analyzing Perception-Distortion Tradeoff Using Enhanced Perceptual Super-Resolution Network

TL;DR: The proposed network, called enhanced perceptual super-resolution network (EPSR), is trained with a combination of mean squared error loss, perceptual loss, and adversarial loss and achieves the state-of-the-art trade-off between distortion and perceptual quality while the existing methods perform well in either of these measures alone.
Abstract: Convolutional neural network (CNN) based methods have recently achieved great success for image super-resolution (SR). However, most deep CNN based SR models attempt to improve distortion measures (e.g. PSNR, SSIM, IFC, VIF) while resulting in poor quantified perceptual quality (e.g. human opinion score, no-reference quality measures such as NIQE). Few works have attempted to improve the perceptual quality at the cost of performance reduction in distortion measures. A very recent study has revealed that distortion and perceptual quality are at odds with each other and there is always a trade-off between the two. Often the restoration algorithms that are superior in terms of perceptual quality, are inferior in terms of distortion measures. Our work attempts to analyze the trade-off between distortion and perceptual quality for the problem of single image SR. To this end, we use the well-known SR architecture- enhanced deep super-resolution (EDSR) network and show that it can be adapted to achieve better perceptual quality for a specific range of the distortion measure. While the original network of EDSR was trained to minimize the error defined based on per-pixel accuracy alone, we train our network using a generative adversarial network framework with EDSR as the generator module. Our proposed network, called enhanced perceptual super-resolution network (EPSR), is trained with a combination of mean squared error loss, perceptual loss, and adversarial loss. Our experiments reveal that EPSR achieves the state-of-the-art trade-off between distortion and perceptual quality while the existing methods perform well in either of these measures alone.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A survey on recent advances of image super-resolution techniques using deep learning approaches in a systematic way, which can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR.
Abstract: Image Super-Resolution (SR) is an important class of image processing techniqueso enhance the resolution of images and videos in computer vision. Recent years have witnessed remarkable progress of image super-resolution using deep learning techniques. This article aims to provide a comprehensive survey on recent advances of image super-resolution using deep learning approaches. In general, we can roughly group the existing studies of SR techniques into three major categories: supervised SR, unsupervised SR, and domain-specific SR. In addition, we also cover some other important issues, such as publicly available benchmark datasets and performance evaluation metrics. Finally, we conclude this survey by highlighting several future directions and open issues which should be further addressed by the community in the future.

837 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...In practice, researchers often combine multiple loss functions by weighted average [8], [25], [27], [46], [141] for constraining different aspects of the generation process, especially for distortion-perception tradeoff [25], [103], [142], [143], [144]....

    [...]

Book ChapterDOI
08 Sep 2018
TL;DR: This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018, and concludes with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.
Abstract: This paper reports on the 2018 PIRM challenge on perceptual super-resolution (SR), held in conjunction with the Perceptual Image Restoration and Manipulation (PIRM) workshop at ECCV 2018. In contrast to previous SR challenges, our evaluation methodology jointly quantifies accuracy and perceptual quality, therefore enabling perceptual-driven methods to compete alongside algorithms that target PSNR maximization. Twenty-one participating teams introduced algorithms which well-improved upon the existing state-of-the-art methods in perceptual SR, as confirmed by a human opinion study. We also analyze popular image quality measures and draw conclusions regarding which of them correlates best with human opinion scores. We conclude with an analysis of the current trends in perceptual SR, as reflected from the leading submissions.

428 citations

Journal ArticleDOI
TL;DR: Both explainable and black-box models are suitable for solving practical problems, but experts in machine learning need to understand the input data, the problem to solve, and the best way for showing the output data before applying a machine learning model.
Abstract: Nowadays, in the international scientific community of machine learning, there exists an enormous discussion about the use of black-box models or explainable models; especially in practical problems. On the one hand, a part of the community defends that black-box models are more accurate than explainable models in some contexts, like image preprocessing. On the other hand, there exist another part of the community alleging that explainable models are better than black-box models because they can obtain comparable results and also they can explain these results in a language close to a human expert by using patterns. In this paper, advantages and weaknesses for each approach are shown; taking into account a state-of-the-art review for both approaches, their practical applications, trends, and future challenges. This paper shows that both approaches are suitable for solving practical problems, but experts in machine learning need to understand the input data, the problem to solve, and the best way for showing the output data before applying a machine learning model. Also, we propose some ideas for fusing both, explainable and black-box, approaches to provide better solutions to experts in real-world domains. Additionally, we show one way to measure the effectiveness of the applied machine learning model by using expert opinions jointly with statistical methods. Throughout this paper, we show the impact of using explainable and black-box models on the security and medical applications.

205 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...face frontal view generation [50], generating new human poses [51], photos to emojis [52], photograph editing [53]–[55], face aging [56], [57], photo blending [58], super resolution [59]–[61], photo inpainting [62]–[64], video prediction [65], and 3D object generation [66], [67]; among...

    [...]

Proceedings ArticleDOI
01 Oct 2019
TL;DR: In this paper, the authors optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms, which results in more realistic textures and sharper edges.
Abstract: By benefiting from perceptual losses, recent studies have improved significantly the performance of the super-resolution task, where a high-resolution image is resolved from its low-resolution counterpart. Although such objective functions generate near-photorealistic results, their capability is limited, since they estimate the reconstruction error for an entire image in the same way, without considering any semantic information. In this paper, we propose a novel method to benefit from perceptual loss in a more objective way. We optimize a deep network-based decoder with a targeted objective function that penalizes images at different semantic levels using the corresponding terms. In particular, the proposed method leverages our proposed OBB (Object, Background and Boundary) labels, generated from segmentation labels, to estimate a suitable perceptual loss for boundaries, while considering texture similarity for backgrounds. We show that our proposed approach results in more realistic textures and sharper edges, and outperforms other state-of-the-art algorithms in terms of both qualitative results on standard benchmarks and results of extensive user studies.

113 citations

Posted Content
TL;DR: This study performs a comprehensive survey of the advancements in GANs design and optimization solutions and proposes a new taxonomy to structure solutions by key research issues and presents the promising research directions in this rapidly growing field.
Abstract: Generative Adversarial Networks (GANs) is a novel class of deep generative models which has recently gained significant attention. GANs learns complex and high-dimensional distributions implicitly over images, audio, and data. However, there exists major challenges in training of GANs, i.e., mode collapse, non-convergence and instability, due to inappropriate design of network architecture, use of objective function and selection of optimization algorithm. Recently, to address these challenges, several solutions for better design and optimization of GANs have been investigated based on techniques of re-engineered network architectures, new objective functions and alternative optimization algorithms. To the best of our knowledge, there is no existing survey that has particularly focused on broad and systematic developments of these solutions. In this study, we perform a comprehensive survey of the advancements in GANs design and optimization solutions proposed to handle GANs challenges. We first identify key research issues within each design and optimization technique and then propose a new taxonomy to structure solutions by key research issues. In accordance with the taxonomy, we provide a detailed discussion on different GANs variants proposed within each solution and their relationships. Finally, based on the insights gained, we present the promising research directions in this rapidly growing field.

88 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...Generation of realistic images has wide range of practical applications, such as anime character generation [159]–[164], image synthesis [165]–[168], super resolution [10], [124], [169]–[177], image editing and blending [178], [179], inpainting [125], [180]–[182], interactive image generation [183], [184], human pose estimation [185], [186], face aging [187], [188], 3D object detection [189]–[192], etc....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors designed three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learned a two-stage regression model to predict the quality scores of super-resolution images without referring to ground-truth images.

338 citations

Journal ArticleDOI
TL;DR: A sparse neighbor selection scheme for SR reconstruction is proposed that can achieve competitive SR quality compared with other state-of-the-art baselines and develop an extended Robust-SL0 algorithm to simultaneously find the neighbors and to solve the reconstruction weights.
Abstract: Until now, neighbor-embedding-based (NE) algorithms for super-resolution (SR) have carried out two independent processes to synthesize high-resolution (HR) image patches. In the first process, neighbor search is performed using the Euclidean distance metric, and in the second process, the optimal weights are determined by solving a constrained least squares problem. However, the separate processes are not optimal. In this paper, we propose a sparse neighbor selection scheme for SR reconstruction. We first predetermine a larger number of neighbors as potential candidates and develop an extended Robust-SL0 algorithm to simultaneously find the neighbors and to solve the reconstruction weights. Recognizing that the k-nearest neighbor (k-NN) for reconstruction should have similar local geometric structures based on clustering, we employ a local statistical feature, namely histograms of oriented gradients (HoG) of low-resolution (LR) image patches, to perform such clustering. By conveying local structural information of HoG in the synthesis stage, the k-NN of each LR input patch is adaptively chosen from their associated subset, which significantly improves the speed of synthesizing the HR image while preserving the quality of reconstruction. Experimental results suggest that the proposed method can achieve competitive SR quality compared with other state-of-the-art baselines.

310 citations


"Analyzing Perception-Distortion Tra..." refers methods in this paper

  • ...le dictionary. HR images from the web with similar contents were used with-in a structure-aware matching criterion to super-resolve landmark images in [48]. The class of neighbor embedding approaches [8,3,17,44,45] aim to nd similar looking LR training patches from a low dimensional manifold and then combine their corresponding HR patches for resolution enhancement. The overtting tendency of neighborhood appro...

    [...]

Posted Content
TL;DR: Three types of low-level statistical features in both spatial and frequency domains are designed to quantify super-resolved artifacts and a two-stage regression model is learned to predict the quality scores of super-resolution images without referring to ground-truth images.
Abstract: Numerous single-image super-resolution algorithms have been proposed in the literature, but few studies address the problem of performance evaluation based on visual perception. While most super-resolution images are evaluated by fullreference metrics, the effectiveness is not clear and the required ground-truth images are not always available in practice. To address these problems, we conduct human subject studies using a large set of super-resolution images and propose a no-reference metric learned from visual perceptual scores. Specifically, we design three types of low-level statistical features in both spatial and frequency domains to quantify super-resolved artifacts, and learn a two-stage regression model to predict the quality scores of super-resolution images without referring to ground-truth images. Extensive experimental results show that the proposed metric is effective and efficient to assess the quality of super-resolution images based on human perception.

267 citations


"Analyzing Perception-Distortion Tra..." refers methods in this paper

  • ...PI is computed by combining the quality measures of Ma-score [32] and NIQE [36] as follows PI = 1 2 ((10−Ma-score) + NIQE) (7) Note that, a lower PI indicates better perceptual quality....

    [...]

  • ...PI is computed by combining the quality measures of Ma-score [32] and NIQE [36] as follows...

    [...]

Book ChapterDOI
08 Sep 2018
TL;DR: ContextualLoss as mentioned in this paper is based on both context and semantics to compare regions with similar semantic meaning, while considering the context of the entire image, which can translate eyes-to-eyes and mouth-tomouth.
Abstract: Feed-forward CNNs trained for image transformation problems rely on loss functions that measure the similarity between the generated image and a target image. Most of the common loss functions assume that these images are spatially aligned and compare pixels at corresponding locations. However, for many tasks, aligned training pairs of images will not be available. We present an alternative loss function that does not require alignment, thus providing an effective and simple solution for a new space of problems. Our loss is based on both context and semantics – it compares regions with similar semantic meaning, while considering the context of the entire image. Hence, for example, when transferring the style of one face to another, it will translate eyes-to-eyes and mouth-to-mouth. Our code can be found at https://www.github.com/roimehrez/contextualLoss.

244 citations

Posted Content
TL;DR: This work presents an alternative loss function that does not require alignment, thus providing an effective and simple solution for a new space of problems.
Abstract: Feed-forward CNNs trained for image transformation problems rely on loss functions that measure the similarity between the generated image and a target image. Most of the common loss functions assume that these images are spatially aligned and compare pixels at corresponding locations. However, for many tasks, aligned training pairs of images will not be available. We present an alternative loss function that does not require alignment, thus providing an effective and simple solution for a new space of problems. Our loss is based on both context and semantics -- it compares regions with similar semantic meaning, while considering the context of the entire image. Hence, for example, when transferring the style of one face to another, it will translate eyes-to-eyes and mouth-to-mouth. Our code can be found at this https URL

237 citations


"Analyzing Perception-Distortion Tra..." refers background in this paper

  • ...In comparison to ENet[39], the presence of noise and unrealistic texture is less for the case of CX[35] while maintaining a comparable level of detail enhancement....

    [...]

  • ...2 corresponds to failure case of ENet[39], CX[35], and BNet3 wherein all of them resulted in texture patterns that are very different from the GT, whereas EPSR3 has succeeded in generating outputs that are more faithful to the GT image....

    [...]