scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Dual Reconstruction with Densely Connected Residual Network for Single Image Super-Resolution

TL;DR: Zhang et al. as discussed by the authors proposed to add one more shortcut between two dense-blocks, as well as add shortcut between convolution layers inside a dense-block, which enables a faster learning process as the gradient information can be backpropagated more easily.
Abstract: Deep learning-based single image super-resolution enables very fast and high-visual-quality reconstruction. Recently, an enhanced super-resolution based on generative adversarial network (ESRGAN) has achieved excellent performance in terms of both qualitative and quantitative quality of the reconstructed high-resolution image. In this paper, we propose to add one more shortcut between two dense-blocks, as well as add shortcut between two convolution layers inside a dense-block. With this simple strategy of adding more shortcuts in the proposed network, it enables a faster learning process as the gradient information can be back-propagated more easily. Based on the improved ESRGAN, the dual reconstruction is proposed to learn different aspects of the super-resolved image for judiciously enhancing the quality of the reconstructed image. In practice, the super-resolution model is pre-trained solely based on pixel distance, followed by fine-tuning the parameters in the model based on adversarial loss and perceptual loss. Finally, we fuse two different models by weighted-summing their parameters to obtain the final super-resolution model. Experimental results demonstrated that the proposed method achieves excellent performance in the real-world image super-resolution challenge. We have also verified that the proposed dual reconstruction does further improve the quality of the reconstructed image in terms of both PSNR and SSIM.
Citations
More filters
Journal ArticleDOI
TL;DR: Comprehensive experiments demonstrate the superiority of the proposed hyperspectral compressed sensing method, as well as its one-shot transfer learning (OTL)-based extension, both quantitatively and qualitatively.
Abstract: Requirements of compressed sensing techniques targeted at miniaturized hyperspectral satellite applications include lightweight onboard hardware, high-speed sensing, low sampling rate for compressing the massive volume of typical hyperspectral data, and noise robustness for reliable data transmission to the ground station. We achieve all these aims via deep learning, and neural networks resulted from which can be implemented on-chip, thereby allowing light hardware implementation. Our neural networks were trained from small-scaled data, but, even so, the resulting encoder achieves a very low sampling rate and very high speed. Unlike typical network training, the input–output pairs are not square but stripe-like images, partly because compressed acquisition does not allow performing compression after obtaining complete data cube and partly because stripe-like acquisition well matches the popular pushbroom hyperspectral sensing schemes. Even with such hard restriction caused by nontraditional training, the resulting decoder still reconstructs the image with high accuracy. To match the requirement of pushbroom sensing, a lightweight encoder is proposed to compress the stripe-like images immediately. Meanwhile, multiscale feature fusion block (MFB) and aggregation (MFA) modules are proposed to form our decoder for enhancing the feature representation of the compressed acquisitions. Furthermore, we achieve joint spatial/spectral super-resolution (SR) progressively, ensuring accurate hyperspectral reconstruction via a low-rank-driven decoder. The encoder and decoder are trained in an end-to-end manner, where noise robustness is forced during the training stage. Comprehensive experiments demonstrate the superiority of the proposed hyperspectral compressed sensing method, as well as its one-shot transfer learning (OTL)-based extension, both quantitatively and qualitatively.

18 citations

Journal ArticleDOI
TL;DR: In this paper , an adaptive moment estimation (ADMM) and ADMM-ADAM (Alternating Direction Method of Multipliers and Adaptive Moment Estimation) framework is proposed for solving inverse problems.
Abstract: Alternating direction method of multipliers (ADMM) and adaptive moment estimation (ADAM) are two optimizers of paramount importance in convex optimization (CO) and deep learning (DL), respectively. Numerous state-of-the-art algorithms for solving inverse problems are achieved by carefully designing a convex criterion, typically composed of a data-fitting term and a regularizer. Even when the regularizer is convex, its mathematical form is often sophisticated, hence inducing a math-heavy optimization procedure and making the algorithm design a daunting task for software engineers. Probably for this reason, people turn to solve the inverse problems via DL, but this requires big data collection, quite time-consuming if not impossible. Motivated by these facts, we propose a new framework, termed as ADMM-ADAM, for solving inverse problems. As the key contribution, even just with small/single data, the proposed ADMM-ADAM is able to exploit DL to obtain a convex regularizer of very simple math form, followed by solving the regularized criterion using simple CO algorithm. As a side contribution, a state-of-the-art hyperspectral inpainting algorithm is designed under ADMM-ADAM, demonstrating its superiority even without the aid of big data or sophisticated mathematical regularization.

11 citations

Journal ArticleDOI
01 Jul 2022-Optik
TL;DR: In this article , the authors present an overview of GAN based SISR techniques for further research as there are a few surveys in this area and discuss some possible solutions for the existing methods.

11 citations

Journal ArticleDOI
TL;DR: In this paper, an enhanced dense space attention network (EDSAN) model is proposed to overcome the vanishing gradient problem and reduce the running time of the model, which can be used in different applications such as biometric identification and real-time video applications.
Abstract: In some applications, such as surveillance and biometrics, image enlargement is required to inspect small details on the image. One of the image enlargement approaches is by using convolutional neural network (CNN)-based super-resolution construction from a single image. The first CNN-based image super-resolution algorithm is the super-resolution CNN (SRCNN) developed in 2014. Since then, many researchers have proposed several versions of CNN-based algorithms for image super-resolution to improve the accuracy or reduce the model’s running time. Currently, some algorithms still suffered from the vanishing-gradient problem and relied on a large number of layers. Thus, the motivation of this work is to reduce the vanishing-gradient problem that can improve the accuracy, and at the same time, reduce the running time of the model. In this paper, an enhanced dense space attention network (EDSAN) model is proposed to overcome the problems. The EDSAN model adopted a dense connection and residual network to utilize all the features to correlate the low-level feature and high-level feature as much as possible. Besides, implementing the convolution block attention module (CBAM) layer and multiscale block (MSB) helped reduce the number of layers required to achieve comparable results. The model is evaluated through peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) metrics. EDSAN achieved the most significant improvement, about 1.42% when compared to the CRN model using the Set5 dataset at a scale factor of 3. Compared to the ERN model, EDSAN performed the best, with a 1.22% improvement made when using the Set5 dataset at a scale factor of 4. In terms of overall performance, EDSAN performed very well in all datasets at a scale factor of 2 and 3. In conclusion, EDSAN successfully solves the problems above, and it can be used in different applications such as biometric identification applications and real-time video applications.

3 citations

Proceedings ArticleDOI
08 Jun 2020
TL;DR: A new compressed sensing strategy for hyperspectral imagery on spaceborne sensors systems, termed spatial/spectral compressed encoder (SPACE), is experimentally evaluated, showing superior efficacy in terms of both sampling rate and reconstruction accuracy.
Abstract: Directly transmitting the huge amount of typical hyperspectral data acquired on satellite to the ground station is inefficient. This paper proposes a new compressed sensing strategy for hyperspectral imagery on spaceborne sensors systems. As the onboard computing/storage resources are limited, e.g., on CubeSat, the measurement strategy should be computationally very light. Furthermore, considering the limited communication bandwidth, a very low sampling rate is desired. Our encoder accounts for these requirements by separately recording the spatial details and the spectral information, both of which essentially require only simple averaging operators. Our measurement strategy naturally induces a reconstruction criterion that can be elegantly interpreted as a well-known fusion problem in satellite remote sensing, allowing the adoption of a convex optimization method for simple and fast decoding. Our method, termed spatial/spectral compressed encoder (SPACE), is experimentally evaluated on real hyperspectral data, showing superior efficacy in terms of both sampling rate and reconstruction accuracy.

3 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
01 Jan 2015
TL;DR: This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework.
Abstract: We introduce Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments. The method is straightforward to implement, is computationally efficient, has little memory requirements, is invariant to diagonal rescaling of the gradients, and is well suited for problems that are large in terms of data and/or parameters. The method is also appropriate for non-stationary objectives and problems with very noisy and/or sparse gradients. The hyper-parameters have intuitive interpretations and typically require little tuning. Some connections to related algorithms, on which Adam was inspired, are discussed. We also analyze the theoretical convergence properties of the algorithm and provide a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Empirical results demonstrate that Adam works well in practice and compares favorably to other stochastic optimization methods. Finally, we discuss AdaMax, a variant of Adam based on the infinity norm.

111,197 citations

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.
Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously, and show better overall reconstruction quality.

6,122 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: This paper presents the first convolutional neural network capable of real-time SR of 1080p videos on a single K2 GPU and introduces an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output.
Abstract: Recently, several models based on deep neural networks have achieved great success in terms of both reconstruction accuracy and computational performance for single image super-resolution. In these methods, the low resolution (LR) input image is upscaled to the high resolution (HR) space using a single filter, commonly bicubic interpolation, before reconstruction. This means that the super-resolution (SR) operation is performed in HR space. We demonstrate that this is sub-optimal and adds computational complexity. In this paper, we present the first convolutional neural network (CNN) capable of real-time SR of 1080p videos on a single K2 GPU. To achieve this, we propose a novel CNN architecture where the feature maps are extracted in the LR space. In addition, we introduce an efficient sub-pixel convolution layer which learns an array of upscaling filters to upscale the final LR feature maps into the HR output. By doing so, we effectively replace the handcrafted bicubic filter in the SR pipeline with more complex upscaling filters specifically trained for each feature map, whilst also reducing the computational complexity of the overall SR operation. We evaluate the proposed approach using images and videos from publicly available datasets and show that it performs significantly better (+0.15dB on Images and +0.39dB on Videos) and is an order of magnitude faster than previous CNN-based methods.

4,770 citations