scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A Multiscale Dual-Branch Feature Fusion and Attention Network for Hyperspectral Images Classification

12 Aug 2021-IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Institute of Electrical and Electronics Engineers (IEEE))-Vol. 14, pp 8180-8192
TL;DR: Wang et al. as discussed by the authors proposed a multi-scale feature extraction (MSFE) module to extract spatial-spectral features at a granular level and expand the range of receptive fields, thereby enhancing the MSFE ability.
Abstract: Recently, hyperspectral image classification based on deep learning has achieved considerable attention. Many convolutional neural network classification methods have emerged and exhibited superior classification performance. However, most methods focus on extracting features by using fixed convolution kernels and layer-wise representation, resulting in feature extraction singleness. Additionally, the feature fusion process is rough and simple. Numerous methods get accustomed to fusing different levels of features by stacking modules hierarchically, which ignore the combination of shallow and deep spectral-spatial features. In order to overcome the preceding issues, a novel multiscale dual-branch feature fusion and attention network is proposed. Specifically, we design a multiscale feature extraction (MSFE) module to extract spatial-spectral features at a granular level and expand the range of receptive fields, thereby enhancing the MSFE ability. Subsequently, we develop a dual-branch feature fusion interactive module that integrates the residual connection's feature reuse property and the dense connection's feature exploration capability, obtaining more discriminative features in both spatial and spectral branches. Additionally, we introduce a novel shuffle attention mechanism that allows for adaptive weighting of spatial and spectral features, further improving classification performance. Experimental results on three benchmark datasets demonstrate that our model outperforms other state-of-the-art methods while incurring the lower computational cost.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a multiscale dual-branch residual spectral and spatial network with attention to the hyperspectral image classification model, which can learn and fuse deeper hierarchical spectral features with fewer training samples.
Abstract: The development of remote sensing images in recent years has made it possible to identify materials in inaccessible environments and study natural materials on a large scale. But hyperspectral images (HSIs) are a rich source of information with their unique features in various applications. However, several problems reduce the accuracy of HSI classification; for example, the extracted features are not effective, noise, the correlation of bands, and most importantly, the limited labeled samples. To improve accuracy in the case of limited training samples, we propose a multiscale dual-branch residual spectral–spatial network with attention to the HSI classification model named MDBRSSN in this article. First, due to the correlation and redundancy between HSI bands, a principal component analysis operation is applied to preprocess the raw HSI data. Then, in MDBRSSN, a dual-branch structure is designed to extract the useful spectral–spatial features of HSI. The advanced feature, multiscale abstract information extracted by the convolution neural network, is applied to image processing, which can improve complex hyperspectral data classification accuracy. In addition, the attention mechanisms applied separately to each branch enable MDBRSSN to optimize and refine the extracted feature maps. Such an MDBRSSN framework can learn and fuse deeper hierarchical spectral–spatial features with fewer training samples. The purpose of designing the MDBRSSN model is to have high classification accuracy compared to state-of-the-art methods when the training samples are limited, which is proved by the results of the experiments in this article on four datasets. In Salinas, Pavia University, Indian Pines, and Houston 2013, the proposed model obtained 99.64%, 98.93%, 98.17%, and 96.57% overall accuracy using only 1%, 1%, 5%, and 5% of labeled data for training, respectively, which are much better compared to the state-of-the-art methods.

15 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors developed a multilevel LC contextual (MLCC) framework that can adaptively integrate the effective global context with the local context for LC classification, and the proposed MLCC has superior capability in capturing contextual features and thus outperforms the existing methods.

7 citations

Journal ArticleDOI
TL;DR: In this paper , a wavelet-attention convolutional neural network (WA-CNN), random forest and support vector machine (SVM) algorithms were utilized to automatically map the crops over the agricultural lands.
Abstract: Developments in space-based hyperspectral sensors, advanced remote sensing, and machine learning can help crop yield measurement, modelling, prediction, and crop monitoring for loss prevention and global food security. However, precise and continuous spectral signatures, important for large-area crop growth monitoring and early prediction of yield production with cutting-edge algorithms, can be only provided via hyperspectral imaging. Therefore, this article used new-generation Deutsches Zentrum für Luft- und Raumfahrt Earth Sensing Imaging Spectrometer (DESIS) images to classify the main crop types (hybrid corn, soybean, sunflower, and winter wheat) in Mezőhegyes (southeastern Hungary). A Wavelet-attention convolutional neural network (WA-CNN), random forest and support vector machine (SVM) algorithms were utilized to automatically map the crops over the agricultural lands. The best accuracy was achieved with the WA-CNN, a feature-based deep learning algorithm and a combination of two images with overall accuracy (OA) value of 97.89% and the user's accuracy producer's accuracy was from 97% to 99%. To obtain this, first, factor analysis was introduced to decrease the size of the hyperspectral image data cube. A wavelet transform was applied to extract important features and combined with the spectral attention mechanism CNN to gain higher accuracy in mapping crop types. Followed by SVM algorithm reported OA of 87.79%, with the producer's and user's accuracies of its classes ranging from 79.62% to 96.48% and from 79.63% to 95.73%, respectively. These results demonstrate the potentiality of DESIS data to observe the growth of different crop types and predict the harvest volume, which is crucial for farmers, smallholders, and decision-makers.

4 citations

Journal ArticleDOI
TL;DR: In this paper , a dual-branch network (DBN) embedding attention module was designed to extract more discriminative deep transferable features, thereby improving the performance of the subdomain adaptation.
Abstract: This study aims at improving fine-grained ship classification performance under the condition that there is no labeled samples available in SAR domain (target domain) by transferring the knowledge from optical remote sensing (ORS) domain (source domain) which has rich labeled samples. The proposed method improves the original deep subdomain adaptation network (DSAN) by designing a dual-branch network (DBN) embedding attention module to extract more discriminative deep transferable features, thereby improving the performance of the subdomain adaptation. Specifically, we utilized a deep base network (ResNet-50) and a shallow base network (ResNet-18) to build the DBN, and embedded the convolutional block attention module (CBAM) after the first and the last convolutional layer of each branch. Extensive experiments demonstrate that the proposed method, which is termed as DSAN++, is feasible and achieves remarkable improvement than the state-of-the-art methods on the task of fine-grained ship classification.

2 citations

Journal ArticleDOI
TL;DR: In this paper , a multiscale spatial-spectral feature extraction algorithm based on ladder structure is proposed to effectively achieve the integration of spatialspectral features with different scales, which can achieve higher accuracy than the representative HSI classifiers and the existing PN-based algorithms.
Abstract: Due to the complex environment of hyperspectral image (HSI) gathering area, it is difficult to obtain a large number of labeled samples for HSI. Therefore, how to effectively achieve the HSI few-shot classification is a hot spot of current research. Prototypical network (PN) is one of the most classical few-shot learning algorithms, which has been widely employed for few-shot image classification and few-shot object detection. However, existing PN-based algorithms for HSI only utilize the single-scale spatial–spectral feature extracted from the last layer, ignoring the semantic information with different scales contained in the other layers. To solve this problem, a novel multiscale spatial–spectral PN (MSSPN) is proposed in this letter. The contribution of this letter is threefold. First, a multiscale spatial–spectral feature extraction algorithm based on ladder structure is proposed to effectively achieve the integration of spatial–spectral features with different scales. Second, with the theory of ladder-structure-based extraction algorithm, we design a multiscale spatial–spectral prototype representation, which is suggested to be more robust and effective in the multiscale spatial–spectral metric space. Finally, our proposed MSSPN has the advantage of expandability, and can be easily applied for the other PN-based few-shot learning methods. The experimental results on HSI few-shot classification indicate that our proposed MSSPN algorithm can achieve higher accuracy than the representative HSI classifiers and the existing PN-based algorithms.

2 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

27,821 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.
Abstract: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But pyramid representations have been avoided in recent object detectors that are based on deep convolutional networks, partially because they are slow to compute and memory intensive. In this paper, we exploit the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost. A top-down architecture with lateral connections is developed for building high-level semantic feature maps at all scales. This architecture, called a Feature Pyramid Network (FPN), shows significant improvement as a generic feature extractor in several applications. Using a basic Faster R-CNN system, our method achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles, surpassing all existing single-model entries including those from the COCO 2016 challenge winners. In addition, our method can run at 5 FPS on a GPU and thus is a practical and accurate solution to multi-scale object detection. Code will be made publicly available.

16,727 citations

Journal ArticleDOI
18 Jun 2018
TL;DR: This work proposes a novel architectural unit, which is term the "Squeeze-and-Excitation" (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and finds that SE blocks produce significant performance improvements for existing state-of-the-art deep architectures at minimal additional computational cost.
Abstract: The central building block of convolutional neural networks (CNNs) is the convolution operator, which enables networks to construct informative features by fusing both spatial and channel-wise information within local receptive fields at each layer. A broad range of prior research has investigated the spatial component of this relationship, seeking to strengthen the representational power of a CNN by enhancing the quality of spatial encodings throughout its feature hierarchy. In this work, we focus instead on the channel relationship and propose a novel architectural unit, which we term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels. We show that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets. We further demonstrate that SE blocks bring significant improvements in performance for existing state-of-the-art CNNs at slight additional computational cost. Squeeze-and-Excitation Networks formed the foundation of our ILSVRC 2017 classification submission which won first place and reduced the top-5 error to 2.251 percent, surpassing the winning entry of 2016 by a relative improvement of ${\sim }$ ∼ 25 percent. Models and code are available at https://github.com/hujie-frank/SENet .

14,807 citations

Book ChapterDOI
08 Sep 2018
TL;DR: Convolutional Block Attention Module (CBAM) as discussed by the authors is a simple yet effective attention module for feed-forward convolutional neural networks, given an intermediate feature map, the module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement.
Abstract: We propose Convolutional Block Attention Module (CBAM), a simple yet effective attention module for feed-forward convolutional neural networks. Given an intermediate feature map, our module sequentially infers attention maps along two separate dimensions, channel and spatial, then the attention maps are multiplied to the input feature map for adaptive feature refinement. Because CBAM is a lightweight and general module, it can be integrated into any CNN architectures seamlessly with negligible overheads and is end-to-end trainable along with base CNNs. We validate our CBAM through extensive experiments on ImageNet-1K, MS COCO detection, and VOC 2007 detection datasets. Our experiments show consistent improvements in classification and detection performances with various models, demonstrating the wide applicability of CBAM. The code and models will be publicly available.

5,335 citations