scispace - formally typeset
Search or ask a question
Journal ArticleDOI

3-D Channel and Spatial Attention Based Multiscale Spatial–Spectral Residual Network for Hyperspectral Image Classification

TL;DR: The proposed CSMS-SSRN framework can achieve better classification performance on different HSI datasets and enhance the expressiveness of the image features from the two aspects of channel and spatial domains, thereby improving the accuracy of classification.
Abstract: With the rapid development of aerospace and various remote sensing platforms, the amount of data related to remote sensing is increasing rapidly. To meet the application requirements of remote sensing big data, an increasing number of scholars are combining deep learning with remote sensing data. In recent years, based on the rapid development of deep learning methods, research in the field of hyperspectral image (HSI) classification has seen continuous breakthroughs. In order to fully extract the characteristics of HSIs and improve the accuracy of image classification, this article proposes a novel three-dimensional (3-D) channel and spatial attention-based multiscale spatial–spectral residual network (termed CSMS-SSRN). The CSMS-SSRN framework uses a three-layer parallel residual network structure by using different 3-D convolutional kernels to continuously learn spectral and spatial features from their respective residual blocks. Then, the extracted depth multiscale features are stacked and input into the 3-D attention module to enhance the expressiveness of the image features from the two aspects of channel and spatial domains, thereby improving the accuracy of classification. The CSMS-SSRN framework proposed in this article can achieve better classification performance on different HSI datasets.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: An evolving block-based CNN (EB-CNN) to search the optimal architecture based on the genetic algorithm automatically for HSI classification, leading to its better usability than handcrafted CNNs.
Abstract: Deep Convolutional Neural Network (CNN) shows excellent effectiveness on hyperspectral image (HSI) classification. However, the architecture design of CNN requires abundant expert knowledge and experience, which poses great prohibition to its wide application in real-world engineering. To alleviate the issue, this paper proposes an evolving block-based CNN (EB-CNN) to search the optimal architecture based on the genetic algorithm automatically. Specifically, two kinds of basic blocks with totally six different configurations are first designed to construct the search space. Then, a flexible encoding strategy is devised for the genetic algorithm to allow different chromosomes to evolve with different lengths. In this manner, the width of each layer and the depth of the architecture can be simultaneously optimized. Furthermore, a novel swapping mutation operator is proposed for the genetic algorithm to speed up the search efficiency and save computing resources. With the above techniques, the proposed algorithm automatically seeks the optimal CNN architecture for HSI classification, leading to its better usability than handcrafted CNNs. At last, extensive experiments conducted on 5 commonly used HSI datasets demonstrate that the proposed EB-CNN achieves highly competitive or even better performance, as compared to state-of-the-art peer algorithms.

26 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors applied the residual block to 3D-CNN and constructed a 3DRes CNN model, the performance of which was then compared with that of 3DCNN, 2DCNN and 2D-Res CNN in identifying PWDinfected pine trees from the hyperspectral images.
Abstract: As one of the most devastating disasters to pine forests, pine wilt disease (PWD) has caused tremendous ecological and economic losses in China. An effective way to prevent large-scale PWD outbreaks is to detect and remove the damaged pine trees at the early stage of PWD infection. However, early infected pine trees do not show obvious changes in morphology or color in the visible wavelength range, making early detection of PWD tricky. Unmanned aerial vehicle (UAV)-based hyperspectral imagery (HI) has great potential for early detection of PWD. However, the commonly used methods, such as the two-dimensional convolutional neural network (2D-CNN), fail to simultaneously extract and fully utilize the spatial and spectral information, whereas the three-dimensional convolutional neural network (3D-CNN) is able to collect this information from raw hyperspectral data. In this paper, we applied the residual block to 3D-CNN and constructed a 3D-Res CNN model, the performance of which was then compared with that of 3D-CNN, 2D-CNN, and 2D-Res CNN in identifying PWD-infected pine trees from the hyperspectral images. The 3D-Res CNN model outperformed the other models, achieving an overall accuracy (OA) of 88.11% and an accuracy of 72.86% for detecting early infected pine trees (EIPs). Using only 20% of the training samples, the OA and EIP accuracy of 3D-Res CNN can still achieve 81.06% and 51.97%, which is superior to the state-of-the-art method in the early detection of PWD based on hyperspectral images. Collectively, 3D-Res CNN was more accurate and effective in early detection of PWD. In conclusion, 3D-Res CNN is proposed for early detection of PWD in this paper, making the prediction and control of PWD more accurate and effective. This model can also be applied to detect pine trees damaged by other diseases or insect pests in the forest.

21 citations

Journal ArticleDOI
TL;DR: The results of the three classic hyperspectral datasets illustrate that 3DCNN-AM-DSC can improve the classification performance and reduce the time required for model training, and may be a new way to tackle hyperspectrals datasets in HRSl classification tasks without dimensionality reduction.
Abstract: Hyperspectral Remote Rensing Image (HRSI) classification based on Convolution Neural Network (CNN) has become one of the hot topics in the field of remote sensing. However, the high dimensional information and limited training samples are prone to the Hughes phenomenon for hyperspectral remote sensing images. Meanwhile, high-dimensional information processing also consumes significant time and computing power, or the extracted features may not be representative, resulting in unsatisfactory classification efficiency and accuracy. To solve these problems, an attention mechanism and depthwise separable convolution are introduced to the three-dimensional convolutional neural network (3DCNN). Thus, 3DCNN-AM and 3DCNN-AM-DSC are proposed for HRSI classification. Firstly, three hyperspectral datasets (Indian pines, University of Pavia and University of Houston) are used to analyze the patchsize and dataset allocation ratio (Training set: Validation set: Test Set) in the performance of 3DCNN and 3DCNN-AM. Secondly, in order to improve work efficiency, principal component analysis (PCA) and autoencoder (AE) dimension reduction methods are applied to reduce data dimensionality, and maximize the classification accuracy of the 3DCNN, but it will still take time. Furthermore, the HRSI classification model 3DCNN-AM and 3DCNN-AM-DSC are applied to classify with the three classic HRSI datasets. Lastly, the classification accuracy index and time consumption are evaluated. The results indicate that 3DCNN-AM could improve classification accuracy and reduce computing time with the dimension reduction dataset, and the 3DCNN-AM-DSC model can reduce the training time by a maximum of 91.77% without greatly reducing the classification accuracy. The results of the three classic hyperspectral datasets illustrate that 3DCNN-AM-DSC can improve the classification performance and reduce the time required for model training. It may be a new way to tackle hyperspectral datasets in HRSl classification tasks without dimensionality reduction.

19 citations

Journal ArticleDOI
TL;DR: In this paper , a deep spatial-spectral transformer (DSS-TRM) was proposed to improve the classification performance of hyperspectral image (HSI) classification tasks.
Abstract: ABSTRACT In recent years, the wide use of deep learning based methods has greatly improved the classification performance of hyperspectral image (HSI). As an effective method to improve the performance of deep convolution networks, attention mechanism is also widely used for HSI classification tasks. However, the majority of the existing attention mechanisms for HSI classification are based on the convolution layer, and the classification accuracy still has margins for improvement. Motivated by the latest self attention mechanism in natural language processing, a deep transformer is proposed for HSI classification in this paper. Specifically, deep transformer along the spectral dimension and the spatial dimension are explored respectively. Then, a deep spatial-spectral transformer (DSS-TRM) is proposed to improve the classification performance of HSI. The contribution of this paper is to make full use of self attention mechanism, that is to use transformer layer instead of convolution layer. More importantly, a DSS-TRM is proposed to realize end-to-end HSI classification. Extensive experiments are conducted on three HSI data sets. The experimental results demonstrates that the proposed DSS-TRM could outperform the traditional convolutional neural networks and attention based methods.

19 citations

Journal ArticleDOI
TL;DR: In this paper, a convolutional capsule layer based on the extension of dynamic routing using 3-D convolution is used to reduce the number of parameters and enhance the robustness of the learned spectral-spatial features.
Abstract: Deep learning models have shown excellent performance in the hyperspectral remote sensing image (HSI) classification. In particular, convolutional neural networks (CNNs) have received widespread attention because of their powerful feature-extraction ability. Recently, a capsule network (CapsNet) was introduced to boost the performance of CNNs, marking a remarkable progress in the field of HSI classification. In this article, we propose a novel deep convolutional capsule neural network (DC-CapsNet) based on spectral–spatial features to improve the performance of CapsNet in the HSI classification while significantly reducing the computation cost of the model. Specifically, a convolutional capsule layer based on the extension of dynamic routing using 3-D convolution is used to reduce the number of parameters and enhance the robustness of the learned spectral–spatial features. Furthermore, a lighter and stronger decoder network composed of deconvolutional layers as a better regularization term and capable of acquiring more spatial relationships is used to further improve the HSI classification accuracy with low computation cost. In this study, we tested the performance of the proposed model on four widely used HSI datasets: the Kennedy Space Center, Indian Pines, Pavia University, and Salinas datasets. We found that the DC-CapsNet achieved high classification accuracy with limited training samples and effectively reduced the computation cost.

14 citations

References
More filters
Book ChapterDOI

[...]

01 Jan 2012

139,059 citations


"3-D Channel and Spatial Attention B..." refers background in this paper

  • ...and remote sensing technology to achieve tremendous advancement in target recognition [5], image segmentation [6], and parameter inversion [7]....

    [...]

Journal ArticleDOI
TL;DR: This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines by understanding and assessing the potentialities of SVM classifiers in hyperdimensional feature spaces and concludes that SVMs are a valid and effective alternative to conventional pattern recognition approaches.
Abstract: This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines (SVMs) First, we propose a theoretical discussion and experimental analysis aimed at understanding and assessing the potentialities of SVM classifiers in hyperdimensional feature spaces Then, we assess the effectiveness of SVMs with respect to conventional feature-reduction-based approaches and their performances in hypersubspaces of various dimensionalities To sustain such an analysis, the performances of SVMs are compared with those of two other nonparametric classifiers (ie, radial basis function neural networks and the K-nearest neighbor classifier) Finally, we study the potentially critical issue of applying binary SVMs to multiclass problems in hyperspectral data In particular, four different multiclass strategies are analyzed and compared: the one-against-all, the one-against-one, and two hierarchical tree-based strategies Different performance indicators have been used to support our experimental studies in a detailed and accurate way, ie, the classification accuracy, the computational time, the stability to parameter setting, and the complexity of the multiclass architecture The results obtained on a real Airborne Visible/Infrared Imaging Spectroradiometer hyperspectral dataset allow to conclude that, whatever the multiclass strategy adopted, SVMs are a valid and effective alternative to conventional pattern recognition approaches (feature-reduction procedures combined with a classification method) for the classification of hyperspectral remote sensing data

3,607 citations

Journal ArticleDOI
TL;DR: The concept of deep learning is introduced into hyperspectral data classification for the first time, and a new way of classifying with spatial-dominated information is proposed, which is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression.
Abstract: Classification is one of the most popular topics in hyperspectral remote sensing. In the last two decades, a huge number of methods were proposed to deal with the hyperspectral data classification problem. However, most of them do not hierarchically extract deep features. In this paper, the concept of deep learning is introduced into hyperspectral data classification for the first time. First, we verify the eligibility of stacked autoencoders by following classical spectral information-based classification. Second, a new way of classifying with spatial-dominated information is proposed. We then propose a novel deep learning framework to merge the two features, from which we can get the highest classification accuracy. The framework is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression. Specifically, as a deep learning architecture, stacked autoencoders are aimed to get useful high-level features. Experimental results with widely-used hyperspectral data indicate that classifiers built in this deep learning-based framework provide competitive performance. In addition, the proposed joint spectral-spatial deep neural network opens a new window for future research, showcasing the deep learning-based methods' huge potential for accurate hyperspectral data classification.

2,071 citations


"3-D Channel and Spatial Attention B..." refers methods in this paper

  • ...[39] introduced a new framework that combines PCA and logistic regression in a deep learning model....

    [...]

Journal ArticleDOI
TL;DR: This paper proposes a 3-D CNN-based FE model with combined regularization to extract effective spectral-spatial features of hyperspectral imagery and reveals that the proposed models with sparse constraints provide competitive results to state-of-the-art methods.
Abstract: Due to the advantages of deep learning, in this paper, a regularized deep feature extraction (FE) method is presented for hyperspectral image (HSI) classification using a convolutional neural network (CNN). The proposed approach employs several convolutional and pooling layers to extract deep features from HSIs, which are nonlinear, discriminant, and invariant. These features are useful for image classification and target detection. Furthermore, in order to address the common issue of imbalance between high dimensionality and limited availability of training samples for the classification of HSI, a few strategies such as L2 regularization and dropout are investigated to avoid overfitting in class data modeling. More importantly, we propose a 3-D CNN-based FE model with combined regularization to extract effective spectral-spatial features of hyperspectral imagery. Finally, in order to further improve the performance, a virtual sample enhanced method is proposed. The proposed approaches are carried out on three widely used hyperspectral data sets: Indian Pines, University of Pavia, and Kennedy Space Center. The obtained results reveal that the proposed models with sparse constraints provide competitive results to state-of-the-art methods. In addition, the proposed deep FE opens a new window for further research.

2,059 citations


"3-D Channel and Spatial Attention B..." refers methods in this paper

  • ...[44] combined virtual sample enhancement technology with a CNN to effectively extract the spectral and spatial information....

    [...]

Journal ArticleDOI
TL;DR: A general framework of DL for RS data is provided, and the state-of-the-art DL methods in RS are regarded as special cases of input-output data combined with various deep networks and tuning tricks.
Abstract: Deep-learning (DL) algorithms, which learn the representative and discriminative features in a hierarchical manner from the data, have recently become a hotspot in the machine-learning area and have been introduced into the geoscience and remote sensing (RS) community for RS big data analysis. Considering the low-level features (e.g., spectral and texture) as the bottom level, the output feature representation from the top level of the network can be directly fed into a subsequent classifier for pixel-based classification. As a matter of fact, by carefully addressing the practical demands in RS applications and designing the input?output levels of the whole network, we have found that DL is actually everywhere in RS data analysis: from the traditional topics of image preprocessing, pixel-based classification, and target recognition, to the recent challenging tasks of high-level semantic feature extraction and RS scene understanding.

1,625 citations


"3-D Channel and Spatial Attention B..." refers background in this paper

  • ...Therefore, deep learning is widely used to learn the deep features of images and improve their classification accuracy [38]....

    [...]