scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

IRGUN : Improved Residue Based Gradual Up-Scaling Network for Single Image Super Resolution

TL;DR: A novel Improved Residual based Gradual Up-Scaling Network (IRGUN) to improve the quality of the super-resolved image for a large magnification factor and recovers fine details effectively at large (8X) magnification factors.
Abstract: Convolutional neural network based architectures have achieved decent perceptual quality super resolution on natural images for small scaling factors (2X and 4X). However, image super-resolution for large magnication factors (8X) is an extremely challenging problem for the computer vision community. In this paper, we propose a novel Improved Residual based Gradual Up-Scaling Network (IRGUN) to improve the quality of the super-resolved image for a large magnification factor. IRGUN has a Gradual Upsampling and Residue-based Enhancment Network (GUREN) which comprises of series of Up-scaling and Enhancement blocks (UEB) connected end-to-end and fine-tuned together to give a gradual magnification and enhancement. Due to the perceptual importance of the luminance in super-resolution, the model is trained on luminance (Y) channel of the YCbCr image. Whereas, the chrominance components (Cb and Cr) channel are up-scaled using bicubic interpolation and combined with super-resolved Y channel of the image, which is then converted to RGB. A cascaded 3D-RED architecture trained on RGB images is utilized to incorporate its inter-channel correlation. In addition to this, the training methodology is also presented in the paper. In the training procedure, the weights of the previous UEB are used in the next immediate UEB for faster and better convergence. Each UEB is trained on its respective scale by taking the output image of the previous UEB as input and corresponding HR image of the same scale as ground truth to the successive UEB. All the UEBs are then connected end-to-end and fine tuned. The IRGUN recovers fine details effectively at large (8X) magnification factors. The efficiency of IRGUN is presented on various benchmark datasets and at different magnification scales.

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI
18 Jun 2018
TL;DR: This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus on proposed solutions and results and gauges the state-of-the-art in single imagesuper-resolution.
Abstract: This paper reviews the 2nd NTIRE challenge on single image super-resolution (restoration of rich details in a low resolution image) with focus on proposed solutions and results. The challenge had 4 tracks. Track 1 employed the standard bicubic downscaling setup, while Tracks 2, 3 and 4 had realistic unknown downgrading operators simulating camera image acquisition pipeline. The operators were learnable through provided pairs of low and high resolution train images. The tracks had 145, 114, 101, and 113 registered participants, resp., and 31 teams competed in the final testing phase. They gauge the state-of-the-art in single image super-resolution.

298 citations


Cites background from "IRGUN : Improved Residue Based Grad..."

  • ...CEERI team proposed an improved residual based gradual upscaling network (IRGUN) [29]....

    [...]

  • ...The IRGUN has a series of up-scaling and enhancement blocks (UEB) connected end-to-end and fine-tuned together to give a gradual magnification and enhancement....

    [...]

  • ...Title:: Improved residual based gradual upscaling network(IRGUN) Members:Manoj Sharma, Rudrabha Mukhopadhyay, Avinash Upadhyay, Sriharsha Koundinya, Ankit Shukla, Santanu Chaudhury Affiliation: CSIR-CEERI, India...

    [...]

Journal ArticleDOI
TL;DR: A fast image upsampling method designed specifically for industrial applications at low magnification that can obtain performance comparable to that of some state-of-the-art methods for 720P-to-1080P magnification, but the computational cost is much lower.
Abstract: In recent years, many deep-network-based super-resolution techniques have been proposed and have achieved impressive results for 2 ${\times }$ and higher magnification factors. However, lower magnification factors encountered in some industrial applications have not received special attention, such as 720P-to-1080P (1.5 ${\times }$ magnification). Compared to traditional 2 ${\times }$ or higher magnification factors, these lower magnifications are much simpler, but reconstructions of high-definition images are time-consuming and computationally complex. Hence, in this paper, a fast image upsampling method is designed specifically for industrial applications at low magnification. In the proposed method, edge and nonedge areas are first distinguished and then reconstructed via different fast approaches. For the edge area, a local edge pattern encoding-based method is presented to recover sharp edges. For the nonedge area, a global iterative reconstruction with texture constraint is utilized. Moreover, some acceleration strategies are also presented to further reduce the complexity. The experimental results demonstrate that the proposed method can obtain performance comparable to that of some state-of-the-art methods for 720P-to-1080P magnification, but the computational cost is much lower.

9 citations

Proceedings ArticleDOI
01 Mar 2020
TL;DR: This work proposes a divide and conquer approach based wide and deep network (WDN) that divides the 4× up-sampling problem into 32 disjoint subproblems that can be solved simultaneously and independently of each other.
Abstract: Divide and Conquer is a well-established approach in the literature that has efficiently solved a variety of problems. However, it is yet to be explored in full in solving image super-resolution. To predict a sharp up-sampled image, this work proposes a divide and conquer approach based wide and deep network (WDN) that divides the 4× up-sampling problem into 32 disjoint subproblems that can be solved simultaneously and independently of each other Half of these subproblems deal with predicting the overall features of the high-resolution image, while the remaining are exclusively for predicting the finer details. Additionally, a technique that is found to be more effective in calibrating the pixel intensities has been proposed. Results obtained on multiple datasets demonstrate the improved performance of the proposed wide and deep network over state-of-the-art methods.

9 citations


Cites background from "IRGUN : Improved Residue Based Grad..."

  • ...For instance, [58, 32, 87, 33, 36, 58, 15, 1, 64, 43, 26, 16, 12, 70, 92, 40, 53, 79, 22, 57, 59, 4, 60, 78, 66] are some deep networks for super-resolution....

    [...]

Book ChapterDOI
17 Dec 2019
TL;DR: A novel light-weight architecture-Gradually growing Residual and self-Attention based Dense Deep Back Projection Network (GRAD-DBPN) for large scale image super-resolution (SR) and overcomes the issue of vanishing gradient.
Abstract: Due to the strong capacity of deep learning in handling unstructured data, it has been utilized for the task of single image super-resolution (SISR). These algorithms have shown promising results for small scale super-resolution but are not robust to large scale super-resolution. In addition, these algorithms are computationally complex and require high-end computational devices. Developing large-scale super-resolution framework finds its application in smart-phones as these devices have limited computational power. In this context, we present a novel light-weight architecture-Gradually growing Residual and self-Attention based Dense Deep Back Projection Network (GRAD-DBPN) for large scale image super-resolution (SR). The network is made of cascaded self-Attention based Residual Dense Deep Back Projection Network (ARD-DBPN) blocks to perform super-resolution gradually. Where each block performs 2X super-resolution and fine tuned in an end to end manner. The residual architecture facilitates the faster convergence of network and overcomes the issue of vanishing gradient. Experimental results on different benchmark data-set have been presented to compare the efficacy and effectiveness of the architecture.
TL;DR: This work proposes a new paradigm for speech enhancement: “pseudo-visual” approach, where the visual stream is synthetically generated from the noisy speech input, and demonstrates that the robustness and the accuracy boost obtained from the model lead to various real-world applications which were previously not possible.
Abstract: We interact with the world around us through multiple sensory streams of information such as audio, vision, and text (language). Each of these streams complement each other, but also contain redundant information, albeit in different forms. For example, the content of a person speaking can be captured by listening to the sounds in the speech, or partially understood by looking at the speaker’s lip movements, or by reading out the text transcribed from the vocal speech. This redundancy across modalities is utilized in human perceptual understanding that helps us to solve various practical problems. However, in the real-world, more often than not, information in individual streams is corrupted by various types of degradation like electronic transmission, background noise, and blurring which lead to deterioration in the content quality. In this work, we aim to recover the distorted signal in a given stream by exploiting the redundant information in another stream. Specifically, we deal with talking-face videos involving vision and speech signals. We propose two core ideas to explore cross-modal redundancy: (i) denoising speech using visual assistance, and (ii) upsampling very low-resolution talking-face videos using audio assistance. The first part focuses on the task of speech denoising. We show that the visual stream helps in distilling the clean speech from the corrupted signal by suppressing the background noise. We identify the key issues prevailing in the existing state-of-the-art speech enhancement works: (i) most of the current works use only the audio stream and are limited in their performance in a wide range of realworld noises, and (ii) few recent works use the lip-movements as additional cues with an aim to improve the quality of the generated speech over “audio-only” methods. However, they cannot be applied for several applications where the visual stream is unreliable or completely absent. Thus, in this work, we propose a new paradigm for speech enhancement: “pseudo-visual” approach, where the visual stream is synthetically generated from the noisy speech input. We demonstrate that the robustness and the accuracy boost obtained from our model lead to various real-world applications which were previously not possible. In the second part, we explore an interesting question of what can be obtained from an 8 × 8 pixel video sequence by utilizing the corresponding speech of the person talking. Surprisingly, it turns out to be quite a lot. We show that when processed with the right set of audio and image priors, we can obtain a full-length talking video sequence with a 32× scale-factor. When the semantic information about the identity, including basic attributes like age and gender, are almost entirely lost in the input low-resolution video, we show that utilizing the speech that accompanies the low-resolution video aids
References
More filters
Proceedings Article
05 Dec 2016
TL;DR: This paper proposes to symmetrically link convolutional and de-convolutional layers with skip-layer connections, with which the training converges much faster and attains a higher-quality local optimum, making training deep networks easier and achieving restoration performance gains consequently.
Abstract: In this paper, we propose a very deep fully convolutional encoding-decoding framework for image restoration such as denoising and super-resolution. The network is composed of multiple layers of convolution and deconvolution operators, learning end-to-end mappings from corrupted images to the original ones. The convolutional layers act as the feature extractor, which capture the abstraction of image contents while eliminating noises/corruptions. Deconvolutional layers are then used to recover the image details. We propose to symmetrically link convolutional and deconvolutional layers with skip-layer connections, with which the training converges much faster and attains a higher-quality local optimum. First, the skip connections allow the signal to be back-propagated to bottom layers directly, and thus tackles the problem of gradient vanishing, making training deep networks easier and achieving restoration performance gains consequently. Second, these skip connections pass image details from convolutional layers to deconvolutional layers, which is beneficial in recovering the original image. Significantly, with the large capacity, we can handle different levels of noises using a single model. Experimental results show that our network achieves better performance than recent state-of-the-art methods.

926 citations


Additional excerpts

  • ...Methods such as sparse convolutional network [33], recursive convolutional network [12], combined deep and shallow CNN [32], deep residual network [5, 21], bi-directional recurrent convolutional network [35], have also been proposed....

    [...]

Proceedings ArticleDOI
20 Sep 1999
TL;DR: This work shows a learning-based method for low-level vision problems-estimating scenes from images with a Markov network, and applies VISTA to the "super-resolution" problem (estimating high frequency details from a low-resolution image), showing good results.
Abstract: We show a learning-based method for low-level vision problems-estimating scenes from images. We generate a synthetic world of scenes and their corresponding rendered images. We model that world with a Markov network, learning the network parameters from the examples. Bayesian belief propagation allows us to efficiently find a local maximum of the posterior probability for the scene, given the image. We call this approach VISTA-Vision by Image/Scene TrAining. We apply VISTA to the "super-resolution" problem (estimating high frequency details from a low-resolution image), showing good results. For the motion estimation problem, we show figure/ground discrimination, solution of the aperture problem, and filling-in arising from application of the same probabilistic machinery.

630 citations


"IRGUN : Improved Residue Based Grad..." refers methods in this paper

  • ...Another approach to SISR is reconstruction-based algorithm such as gradient-based constraints [26, 40], total variation regularizer [19, 20], local texture constraint [37, 39], deblurringbased models [6,8] etc....

    [...]

Posted Content
TL;DR: In this paper, a novel application of automated texture synthesis was proposed in combination with a perceptual loss focusing on creating realistic textures rather than optimizing for a pixel-accurate reproduction of ground truth images during training.
Abstract: Single image super-resolution is the task of inferring a high-resolution image from a single low-resolution input. Traditionally, the performance of algorithms for this task is measured using pixel-wise reconstruction measures such as peak signal-to-noise ratio (PSNR) which have been shown to correlate poorly with the human perception of image quality. As a result, algorithms minimizing these metrics tend to produce over-smoothed images that lack high-frequency textures and do not look natural despite yielding high PSNR values. We propose a novel application of automated texture synthesis in combination with a perceptual loss focusing on creating realistic textures rather than optimizing for a pixel-accurate reproduction of ground truth images during training. By using feed-forward fully convolutional neural networks in an adversarial training setting, we achieve a significant boost in image quality at high magnification ratios. Extensive experiments on a number of datasets show the effectiveness of our approach, yielding state-of-the-art results in both quantitative and qualitative benchmarks.

608 citations

Journal ArticleDOI
TL;DR: Thorough experimental results suggest that the proposed SR method can reconstruct higher quality results both quantitatively and perceptually and propose a maximum a posteriori probability framework for SR recovery.
Abstract: Image super-resolution (SR) reconstruction is essentially an ill-posed problem, so it is important to design an effective prior. For this purpose, we propose a novel image SR method by learning both non-local and local regularization priors from a given low-resolution image. The non-local prior takes advantage of the redundancy of similar patches in natural images, while the local prior assumes that a target pixel can be estimated by a weighted average of its neighbors. Based on the above considerations, we utilize the non-local means filter to learn a non-local prior and the steering kernel regression to learn a local prior. By assembling the two complementary regularization terms, we propose a maximum a posteriori probability framework for SR recovery. Thorough experimental results suggest that the proposed SR method can reconstruct higher quality results both quantitatively and perceptually.

527 citations


"IRGUN : Improved Residue Based Grad..." refers methods in this paper

  • ...Another approach to SISR is reconstruction-based algorithm such as gradient-based constraints [26, 40], total variation regularizer [19, 20], local texture constraint [37, 39], deblurringbased models [6,8] etc....

    [...]

Proceedings ArticleDOI
29 Jul 2007
TL;DR: A new method for upsampling images which is capable of generating sharp edges with reduced input-resolution grid-related artifacts, based on a statistical edge dependency relating certain edge features of two different resolutions, which is generically exhibited by real-world images.
Abstract: In this paper we propose a new method for upsampling images which is capable of generating sharp edges with reduced input-resolution grid-related artifacts. The method is based on a statistical edge dependency relating certain edge features of two different resolutions, which is generically exhibited by real-world images. While other solutions assume some form of smoothness, we rely on this distinctive edge dependency as our prior knowledge in order to increase image resolution. In addition to this relation we require that intensities are conserved; the output image must be identical to the input image when downsampled to the original resolution. Altogether the method consists of solving a constrained optimization problem, attempting to impose the correct edge relation and conserve local intensities with respect to the low-resolution input image. Results demonstrate the visual importance of having such edge features properly matched, and the method's capability to produce images in which sharp edges are successfully reconstructed.

480 citations