scispace - formally typeset
Search or ask a question
Book ChapterDOI

An End-to-End Deep Learning Framework for Super-Resolution Based Inpainting

TL;DR: This proposed framework for image inpainting provides more visually plausible and better resultant image in comparison of other conventional and state-of-the-art noise-resilient super-resolution algorithms.
Abstract: Image inpainting is an extremely challenging and open problem for the computer vision community. Motivated by the recent advancement in deep learning algorithms for computer vision applications, we propose a new end-to-end deep learning based framework for image inpainting. Firstly, the images are down-sampled as it reduces the targeted area of inpainting therefore enabling better filling of the target region. A down-sampled image is inpainted using a trained deep convolutional auto-encoder (CAE). A coupled deep convolutional auto-encoder (CDCA) is also trained for natural image super resolution. The pre-trained weights from both of these networks serve as initial weights to an end-to-end framework during the fine tuning phase. Hence, the network is jointly optimized for both the aforementioned tasks while maintaining the local structure/information. We tested this proposed framework with various existing image inpainting datasets and it outperforms existing natural image blind inpainting algorithms. Our proposed framework also works well to get noise resilient super-resolution after fine-tuning on noise-free super-resolution dataset. It provides more visually plausible and better resultant image in comparison of other conventional and state-of-the-art noise-resilient super-resolution algorithms.
Citations
More filters
Journal ArticleDOI
TL;DR: The authors describe a method based on the use of recurrent algorithms for the optimal conditional linear filtering of a sequence of LR images in combination with superpixel segmentation and Expectation-Maximization-Clustering (EM-clustering) to identify areas affected by AN.
Abstract: The problem of constructing multiframe superresolution (SR) based on processing a sequence of low-resolution (LR) images in conditions of applicative noise (AN) is considered. The latter appear in the form of distributed areas of false or anomalous observations in LR images and are considered as an additional factor in reducing the quality of the original images, characterized by an irregular arrangement of LR or zero-resolution areas. The existing methods for solving this problem are analyzed using models of spin glasses and their varieties, as well as models of random Markov fields. The authors describe a method based on the use of recurrent algorithms for the optimal conditional linear filtering of a sequence of LR images in combination with superpixel segmentation and Expectation-Maximization-clustering (EM-clustering) to identify areas affected by AN. The synthesis of conditionally linear filtering algorithms is considered both in the usual and in the adaptive setting, taking into account the possible uncertainty regarding the processing parameters and registration means. An experimental study is carried out to compare algorithms on sets of test images. The analysis of the experimental results shows certain advantages of the developed approach for the synthesis of algorithms for constructing SR in an adaptive setting, which consists in increasing the accuracy and structural similarity of high-resolution (HR) image restoration in comparison with analogs.

1 citations

References
More filters
Journal ArticleDOI
TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the 5 years of the challenge, and propose future directions and improvements.

30,811 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: SRGAN as mentioned in this paper proposes a perceptual loss function which consists of an adversarial loss and a content loss, which pushes the solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images.
Abstract: Despite the breakthroughs in accuracy and speed of single image super-resolution using faster and deeper convolutional neural networks, one central problem remains largely unsolved: how do we recover the finer texture details when we super-resolve at large upscaling factors? The behavior of optimization-based super-resolution methods is principally driven by the choice of the objective function. Recent work has largely focused on minimizing the mean squared reconstruction error. The resulting estimates have high peak signal-to-noise ratios, but they are often lacking high-frequency details and are perceptually unsatisfying in the sense that they fail to match the fidelity expected at the higher resolution. In this paper, we present SRGAN, a generative adversarial network (GAN) for image super-resolution (SR). To our knowledge, it is the first framework capable of inferring photo-realistic natural images for 4x upscaling factors. To achieve this, we propose a perceptual loss function which consists of an adversarial loss and a content loss. The adversarial loss pushes our solution to the natural image manifold using a discriminator network that is trained to differentiate between the super-resolved images and original photo-realistic images. In addition, we use a content loss motivated by perceptual similarity instead of similarity in pixel space. Our deep residual network is able to recover photo-realistic textures from heavily downsampled images on public benchmarks. An extensive mean-opinion-score (MOS) test shows hugely significant gains in perceptual quality using SRGAN. The MOS scores obtained with SRGAN are closer to those of the original high-resolution images than to those obtained with any state-of-the-art method.

6,884 citations

Proceedings ArticleDOI
07 Jul 2001
TL;DR: In this paper, the authors present a database containing ground truth segmentations produced by humans for images of a wide variety of natural scenes, and define an error measure which quantifies the consistency between segmentations of differing granularities.
Abstract: This paper presents a database containing 'ground truth' segmentations produced by humans for images of a wide variety of natural scenes. We define an error measure which quantifies the consistency between segmentations of differing granularities and find that different human segmentations of the same image are highly consistent. Use of this dataset is demonstrated in two applications: (1) evaluating the performance of segmentation algorithms and (2) measuring probability distributions associated with Gestalt grouping factors as well as statistics of image region properties.

6,505 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.
Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

4,770 citations

Book ChapterDOI
06 Sep 2014
TL;DR: This work proposes a deep learning method for single image super-resolution (SR) that directly learns an end-to-end mapping between the low/high-resolution images and shows that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network.
Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) [15] that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage.

4,445 citations