scispace - formally typeset
Search or ask a question
Journal ArticleDOI

UMAG-Net: A New Unsupervised Multiattention-Guided Network for Hyperspectral and Multispectral Image Fusion

TL;DR: In this article, an unsupervised multi-attention-guided network named UMAG-Net was proposed to fuse a low-resolution hyperspectral image (HSI) with a high-resolution (HR) multispectral images (MSI) of the same scene.
Abstract: To reconstruct images with high spatial resolution and high spectral resolution, one of the most common methods is to fuse a low-resolution hyperspectral image (HSI) with a high-resolution (HR) multispectral image (MSI) of the same scene. Deep learning has been widely applied in the field of HSI-MSI fusion, which is limited with hardware. In order to break the limits, we construct an unsupervised multiattention-guided network named UMAG-Net without training data to better accomplish HSI-MSI fusion. UMAG-Net first extracts deep multiscale features of MSI by using a multiattention encoding network. Then, a loss function containing a pair of HSI and MSI is used to iteratively update parameters of UMAG-Net and learn prior knowledge of the fused image. Finally, a multiscale feature-guided network is constructed to generate an HR-HSI. The experimental results show the visual and quantitative superiority of the proposed method compared to other methods.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , a literature survey is conducted to analyze the trends of multimodal remote sensing data fusion, and some prevalent sub-fields in multi-modal RS data fusion are reviewed in terms of the to-be-fused data modalities.
Abstract: With the extremely rapid advances in remote sensing (RS) technology, a great quantity of Earth observation (EO) data featuring considerable and complicated heterogeneity is readily available nowadays, which renders researchers an opportunity to tackle current geoscience applications in a fresh way. With the joint utilization of EO data, much research on multimodal RS data fusion has made tremendous progress in recent years, yet these developed traditional algorithms inevitably meet the performance bottleneck due to the lack of the ability to comprehensively analyse and interpret these strongly heterogeneous data. Hence, this non-negligible limitation further arouses an intense demand for an alternative tool with powerful processing competence. Deep learning (DL), as a cutting-edge technology, has witnessed remarkable breakthroughs in numerous computer vision tasks owing to its impressive ability in data representation and reconstruction. Naturally, it has been successfully applied to the field of multimodal RS data fusion, yielding great improvement compared with traditional methods. This survey aims to present a systematic overview in DL-based multimodal RS data fusion. More specifically, some essential knowledge about this topic is first given. Subsequently, a literature survey is conducted to analyse the trends of this field. Some prevalent sub-fields in the multimodal RS data fusion are then reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radar-optical, and RS-Geospatial Big Data fusion. Furthermore, We collect and summarize some valuable resources for the sake of the development in multimodal RS data fusion. Finally, the remaining challenges and potential future directions are highlighted.

39 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a multi-scale low-rank deep back projection fusion network (MLR-DBPFN) to fuse LR hyperspectral (HS) data and HR multispectral data.
Abstract: Fusing low spatial resolution (LR) hyperspectral (HS) data and high spatial resolution (HR) multispectral (MS) data aims to obtain HR HS data. However, due to bad weather and aging of sensor equipment, HS images usually contain a lot of noise, e.g., Gaussian noise, strip noise and mixed noise, which would make the fused image have low quality. To solve this problem, we propose the multi-scale low rank deep back projection fusion network (MLR-DBPFN). First, HS and MS are superimposed, and multi-scale spectral features of the stacked image are extracted through multi-scale low-rank decomposition and convolution operation, which effectively remove noisy spectral features. Second, the up-sampling and down-sampling network mechanisms are used to extract the multi-scale spatial features from each layer of spectral features. Finally, the multi-scale spectral features and multi-scale spatial features are combined for network training, and the weight of the noisy spectrum features is reduced through the network feedback mechanism, which suppresses the noisy spectrum and improves the noisy HS fusion performance. Experimental results on datasets of different noise demonstrate that MLR-DBPFN has superior spatial and spectral fidelity, comparative fusion quality, and robust anti-noise performance compared with state-of-the-art methods.

18 citations

Journal ArticleDOI
TL;DR: In this paper , a multiscale low-rank deep back projection fusion network (MLR-DBPFN) is proposed to fuse LR hyperspectral (HS) data and HR multispectral data.
Abstract: Fusing low spatial resolution (LR) hyperspectral (HS) data and high spatial resolution (HR) multispectral (MS) data aims to obtain HR HS data. However, due to bad weather and the aging of sensor equipment, HS images usually contain a lot of noise, e.g., Gaussian noise, strip noise, and mixed noise, which would make the fused image have low quality. To solve this problem, we propose the multiscale low-rank deep back projection fusion network (MLR-DBPFN). First, HS and MS are superimposed, and multiscale spectral features of the stacked image are extracted through multiscale low-rank decomposition and convolution operation, which effectively removes noisy spectral features. Second, the upsampling and downsampling network mechanisms are used to extract the multiscale spatial features from each layer of spectral features. Finally, the multiscale spectral features and multiscale spatial features are combined for network training, and the weight of the noisy spectrum features is reduced through the network feedback mechanism, which suppresses the noisy spectrum and improves the noisy HS fusion performance. Experimental results on datasets of different noise demonstrate that MLR-DBPFN has superior spatial and spectral fidelity, comparative fusion quality, and robust antinoise performance compared with state-of-the-art methods.

15 citations

Journal ArticleDOI
TL;DR: In this paper , a literature survey is conducted to analyze the trends of multimodal remote sensing data fusion, and some prevalent sub-fields in the field are reviewed in terms of the to-be-fused data modalities, i.e., spatiospectral, spatiotemporal, light detection and ranging-optical, synthetic aperture radaroptical and RS-Geospatial Big Data fusion.

12 citations

Journal ArticleDOI
TL;DR: In this article , an attention mechanism-based wavelet convolution neural network (AWNN) was proposed for epilepsy EEG classification and achieved state-of-the-art performance.
Abstract: As a kind of non-invasive, low-cost, and readily available brain examination, EEG has attached significance to the means of clinical diagnosis of epilepsy. However, the reading of long-term EEG records has brought a heavy burden to neurologists and experts. Therefore, automatic EEG classification for epileptic patients plays an essential role in epilepsy diagnosis and treatment. This paper proposes an Attention Mechanism-based Wavelet Convolution Neural Network for epilepsy EEG classification. Attention Mechanism-based Wavelet Convolution Neural Network firstly uses multi-scale wavelet analysis to decompose the input EEGs to obtain their components in different frequency bands. Then, these decomposed multi-scale EEGs are input into the Convolution Neural Network with an attention mechanism for further feature extraction and classification. The proposed algorithm achieves 98.89% triple classification accuracy on the Bonn EEG database and 99.70% binary classification accuracy on the Bern-Barcelona EEG database. Our experiments prove that the proposed algorithm achieves a state-of-the-art classification effect on epilepsy EEG.

8 citations

References
More filters
Book ChapterDOI
05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

49,590 citations

Journal ArticleDOI
TL;DR: In this article, a structural similarity index is proposed for image quality assessment based on the degradation of structural information, which can be applied to both subjective ratings and objective methods on a database of images compressed with JPEG and JPEG2000.
Abstract: Objective methods for assessing perceptual image quality traditionally attempted to quantify the visibility of errors (differences) between a distorted image and a reference image using a variety of known properties of the human visual system. Under the assumption that human visual perception is highly adapted for extracting structural information from a scene, we introduce an alternative complementary framework for quality assessment based on the degradation of structural information. As a specific example of this concept, we develop a structural similarity index and demonstrate its promise through a set of intuitive examples, as well as comparison to both subjective ratings and state-of-the-art objective methods on a database of images compressed with JPEG and JPEG2000. A MATLAB implementation of the proposed algorithm is available online at http://www.cns.nyu.edu//spl sim/lcv/ssim/.

40,609 citations

Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, the non-local operation computes the response at a position as a weighted sum of the features at all positions, which can be used to capture long-range dependencies.
Abstract: Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method [4] in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

8,059 citations

Journal ArticleDOI
TL;DR: Although the new index is mathematically defined and no human visual system model is explicitly employed, experiments on various image distortion types indicate that it performs significantly better than the widely used distortion metric mean squared error.
Abstract: We propose a new universal objective image quality index, which is easy to calculate and applicable to various image processing applications. Instead of using traditional error summation methods, the proposed index is designed by modeling any image distortion as a combination of three factors: loss of correlation, luminance distortion, and contrast distortion. Although the new index is mathematically defined and no human visual system model is explicitly employed, our experiments on various image distortion types indicate that it performs significantly better than the widely used distortion metric mean squared error. Demonstrative images and an efficient MATLAB implementation of the algorithm are available online at http://anchovy.ece.utexas.edu//spl sim/zwang/research/quality_index/demo.html.

5,285 citations

Journal ArticleDOI
TL;DR: The Center for the Study of Earth from Space (CSES) at the University of Colorado, Boulder, has developed a prototype interactive software system called the Spectral Image Processing System (SIPS) using IDL (the Interactive Data Language) on UNIX-based workstations to develop operational techniques for quantitative analysis of imaging spectrometer data.

2,686 citations