scispace - formally typeset
Search or ask a question
Book ChapterDOI

Analyzing Perception-Distortion Tradeoff Using Enhanced Perceptual Super-Resolution Network

TL;DR: The proposed network, called enhanced perceptual super-resolution network (EPSR), is trained with a combination of mean squared error loss, perceptual loss, and adversarial loss and achieves the state-of-the-art trade-off between distortion and perceptual quality while the existing methods perform well in either of these measures alone.
Abstract: Convolutional neural network (CNN) based methods have recently achieved great success for image super-resolution (SR). However, most deep CNN based SR models attempt to improve distortion measures (e.g. PSNR, SSIM, IFC, VIF) while resulting in poor quantified perceptual quality (e.g. human opinion score, no-reference quality measures such as NIQE). Few works have attempted to improve the perceptual quality at the cost of performance reduction in distortion measures. A very recent study has revealed that distortion and perceptual quality are at odds with each other and there is always a trade-off between the two. Often the restoration algorithms that are superior in terms of perceptual quality, are inferior in terms of distortion measures. Our work attempts to analyze the trade-off between distortion and perceptual quality for the problem of single image SR. To this end, we use the well-known SR architecture- enhanced deep super-resolution (EDSR) network and show that it can be adapted to achieve better perceptual quality for a specific range of the distortion measure. While the original network of EDSR was trained to minimize the error defined based on per-pixel accuracy alone, we train our network using a generative adversarial network framework with EDSR as the generator module. Our proposed network, called enhanced perceptual super-resolution network (EPSR), is trained with a combination of mean squared error loss, perceptual loss, and adversarial loss. Our experiments reveal that EPSR achieves the state-of-the-art trade-off between distortion and perceptual quality while the existing methods perform well in either of these measures alone.

Content maybe subject to copyright    Report

Citations
More filters
Posted Content
TL;DR: The model is an end-to-end deep residual neural network that is trained on a simulated data set to be free of common SIM artefacts, and is robust to noise and irregularities in the illumination patterns of the raw SIM input frames, making the technique compatible with real-time imaging.
Abstract: Structured illumination microscopy (SIM) has become an important technique for optical super-resolution imaging because it allows a doubling of image resolution at speeds compatible for live-cell imaging. However, the reconstruction of SIM images is often slow and prone to artefacts. Here we propose a versatile reconstruction method, ML-SIM, which makes use of machine learning. The model is an end-to-end deep residual neural network that is trained on a simulated data set to be free of common SIM artefacts. ML-SIM is thus robust to noise and irregularities in the illumination patterns of the raw SIM input frames. The reconstruction method is widely applicable and does not require the acquisition of experimental training data. Since the training data are generated from simulations of the SIM process on images from generic libraries the method can be efficiently adapted to specific experimental SIM implementations. The reconstruction quality enabled by our method is compared with traditional SIM reconstruction methods, and we demonstrate advantages in terms of noise, reconstruction fidelity and contrast for both simulated and experimental inputs. In addition, reconstruction of one SIM frame typically only takes ~100ms to perform on PCs with modern Nvidia graphics cards, making the technique compatible with real-time imaging. The full implementation and the trained networks are available at this http URL.

11 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...root-mean-square deviation) rather than optimising the perceptual quality [26, 27]....

    [...]

Posted Content
TL;DR: It is demonstrated that a light-weight SR network with a novel texture loss, trained specifically for JDSR, outperforms any combination of state-of-the-art deep denoising and SR networks.
Abstract: Denoising and super-resolution (SR) are fundamental tasks in imaging. These two restoration tasks are well covered in the literature, however, only separately. Given a noisy low-resolution (LR) input image, it is yet unclear what the best approach would be in order to obtain a noise-free high-resolution (HR) image. In order to study joint denoising and super-resolution (JDSR), a dataset containing pairs of noisy LR images and the corresponding HR images is fundamental. We propose such a novel JDSR dataset, Wieldfield2SIM (W2S), acquired using microscopy equipment and techniques. W2S is comprised of 144,000 real fluorescence microscopy images, used to form a total of 360 sets of images. A set is comprised of noisy LR images with different noise levels, a noise-free LR image, and a corresponding high-quality HR image. W2S allows us to benchmark the combinations of 6 denoising methods and 6 SR methods. We show that state-of-the-art SR networks perform very poorly on noisy inputs, with a loss reaching 14dB relative to noise-free inputs. Our evaluation also shows that applying the best denoiser in terms of reconstruction error followed by the best SR method does not yield the best result. The best denoising PSNR can, for instance, come at the expense of a loss in high frequencies, which is detrimental for SR methods. We lastly demonstrate that a light-weight SR network with a novel texture loss, trained specifically for JDSR, outperforms any combination of state-of-the-art deep denoising and SR networks.

11 citations


Cites methods from "Analyzing Perception-Distortion Tra..."

  • ...We use six state-of-the-art SR networks for the benchmark: four pixel-wise distortion-based SR networks, RCAN [52], RDN [53], SAN [11], SRFBN [24], and two perceptually-optimized SR networks, EPSR [42] and ESRGAN [44]....

    [...]

  • ...Deep Learning for Super-resolution Since the first convolutional neural network for SR [12] outperformed conventional methods on synthetic datasets, many new architectures [21,25,38,42,44,52,53] and loss functions [20,23,36,49,54] have been proposed to improve the effectiveness and the efficiency of the networks....

    [...]

  • ...Although the perception-based methods (EPSR and ESRGAN) are able to produce sharp results, they fail to reproduce faithful texture....

    [...]

  • ...(a) RIDNet [2]+RCAN [52], (b) RIDNet [2]+RDN [53], (c) RIDNet [2]+SAN [11], (d) RIDNet [2]+SRFBN [24], (e) RIDNet [2]+EPSR [42], (f) DnCNN [47]+ESRGAN [44]....

    [...]

Book ChapterDOI
23 Aug 2020
TL;DR: In this article, a trade-off between the signal-to-noise ratio and spatial resolution on one side, and the integrity of the biological sample on the other side is discussed.
Abstract: In fluorescence microscopy live-cell imaging, there is a critical trade-off between the signal-to-noise ratio and spatial resolution on one side, and the integrity of the biological sample on the other side To obtain clean high-resolution (HR) images, one can either use microscopy techniques, such as structured-illumination microscopy (SIM), or apply denoising and super-resolution (SR) algorithms However, the former option requires multiple shots that can damage the samples, and although efficient deep learning based algorithms exist for the latter option, no benchmark exists to evaluate these algorithms on the joint denoising and SR (JDSR) tasks

10 citations

Journal ArticleDOI
Yifan Yang1, Qi Li1, Chenwei Yang1, Yannian Fu1, Huajun Feng1, Zhihai Xu1, Yueting Chen1 
TL;DR: A new convolution network (CNN) is presented to improve the spatial resolution of infrared (IR) images and is able to restore fine details by decomposing the input image into low-frequency and high-frequency domains.
Abstract: Due to the limitation of hardware, infrared (IR) images have low-resolution (LR) and poor visual quality. Image super-resolution (SR) is a good solution to this problem. In this paper, we present a new convolution network (CNN) to improve the spatial resolution of infrared (IR) images. Our network is able to restore fine details by decomposing the input image into low-frequency and high-frequency domains. In low-frequency domains, we reconstruct image structure by deep networks. In high frequency domains, we reconstruct IR image details. Furthermore, we proposed another network to remove artifacts. Additionally, we propose a new loss function using visible (VIS) images to enhance the details of IR images. In training phase, we use VIS images to guide IR image restoration and in testing phase we get SR IR images with LR IR images input only. We optimize our deep network with a targeted function which penalizes images at different semantic levels using the corresponding terms. Besides, we build a dataset where paired LR-VIS images on the same scene are captured by a camera with both infrared and visible light sensors which both sensors have the same optical axis. Extensive experiments demonstrate that the proposed algorithm achieves superior performance and visual improvements against the state-of-the-arts.

9 citations

Proceedings ArticleDOI
01 Mar 2020
TL;DR: This work proposes a divide and conquer approach based wide and deep network (WDN) that divides the 4× up-sampling problem into 32 disjoint subproblems that can be solved simultaneously and independently of each other.
Abstract: Divide and Conquer is a well-established approach in the literature that has efficiently solved a variety of problems. However, it is yet to be explored in full in solving image super-resolution. To predict a sharp up-sampled image, this work proposes a divide and conquer approach based wide and deep network (WDN) that divides the 4× up-sampling problem into 32 disjoint subproblems that can be solved simultaneously and independently of each other Half of these subproblems deal with predicting the overall features of the high-resolution image, while the remaining are exclusively for predicting the finer details. Additionally, a technique that is found to be more effective in calibrating the pixel intensities has been proposed. Results obtained on multiple datasets demonstrate the improved performance of the proposed wide and deep network over state-of-the-art methods.

9 citations


Cites background from "Analyzing Perception-Distortion Tra..."

  • ...There are a few deep networks that do realise the importance of high-frequency prediction such as [54, 39, 67, 83, 48, 86, 6, 7, 88, 80, 11, 44, 75, 76, 94], these techniques use the concepts of generative adversarial networks, perceptual loss, or both....

    [...]

References
More filters
Proceedings Article
01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

111,197 citations


"Analyzing Perception-Distortion Tra..." refers methods in this paper

  • ...We used ADAM [26] optimizer with a momentum of 0.9 and a batch size of 4....

    [...]

  • ...We used ADAM [26] optimizer with a momentum of 0....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

6,884 citations

Book ChapterDOI
08 Oct 2016
TL;DR: In this paper, the authors combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image style transfer, where a feedforward network is trained to solve the optimization problem proposed by Gatys et al. in real-time.
Abstract: We consider image transformation problems, where an input image is transformed into an output image. Recent methods for such problems typically train feed-forward convolutional neural networks using a per-pixel loss between the output and ground-truth images. Parallel work has shown that high-quality images can be generated by defining and optimizing perceptual loss functions based on high-level features extracted from pretrained networks. We combine the benefits of both approaches, and propose the use of perceptual loss functions for training feed-forward networks for image transformation tasks. We show results on image style transfer, where a feed-forward network is trained to solve the optimization problem proposed by Gatys et al. in real-time. Compared to the optimization-based method, our network gives similar qualitative results but is three orders of magnitude faster. We also experiment with single-image super-resolution, where replacing a per-pixel loss with a perceptual loss gives visually pleasing results.

6,639 citations