scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Hierarchical Shrinkage Multiscale Network for Hyperspectral Image Classification With Hierarchical Feature Fusion

25 May 2021-IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing (Institute of Electrical and Electronics Engineers (IEEE))-Vol. 14, pp 5760-5772
TL;DR: Wang et al. as mentioned in this paper proposed a hierarchical shrinkage multiscale feature extraction network by pruning MDMSRB to reduce the redundancy of network structure, and the proposed network hierarchically integrates low-level edge features and high-level semantic features effectively.
Abstract: Recently, deep learning (DL)-based hyperspectral image classification (HSIC) has attracted substantial attention. Many works based on the convolutional neural network (CNN) model have been certificated to be significantly successful for boosting the performance of HSIC. However, most of these methods extract features by using a fixed convolutional kernel and ignore multiscale features of the ground objects of hyperspectral images (HSIs). Although some recent methods have proposed multiscale feature extraction schemes, more computing and storage resources were consumed. Moreover, when using CNN to implement HSI classification, many methods only use the high-level semantic information extracted from the end of the network, ignoring the edge information extracted from shallow layers of the network. To settle the preceding two issues, a novel HSIC method based on hierarchical shrinkage multiscale network and the hierarchical feature fusion is proposed, with which the newly proposed classification framework can fuse features generated by both of multiscale receptive field and multiple levels. Specifically, multidepth and multiscale residual block (MDMSRB) is constructed by superposition dilated convolution to realize multiscale feature extraction. Furthermore, according to the change of feature size in different stages of the neural networks, we design a hierarchical shrinkage multiscale feature extraction network by pruning MDMSRB to reduce the redundancy of network structure. In addition, to make full use of the features extracted in each stage of the network, the proposed network hierarchically integrates low-level edge features and high-level semantic features effectively. Experimental results demonstrate that the proposed method achieves more competitive performance with a limited computational cost than other state-of-the-art methods.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: A Contextual Axial-Preserving Attention Network-based MRI image-assisted segmentation method for osteosarcoma detection that achieves better segmentation results than alternative models and shows significant advantages with respect to small target segmentation.
Abstract: Osteosarcoma is a malignant bone tumor that is extremely dangerous to human health. Not only does it require a large amount of work, it is also a complicated task to outline the lesion area in an image manually, using traditional methods. With the development of computer-aided diagnostic techniques, more and more researchers are focusing on automatic segmentation techniques for osteosarcoma analysis. However, existing methods ignore the size of osteosarcomas, making it difficult to identify and segment smaller tumors. This is very detrimental to the early diagnosis of osteosarcoma. Therefore, this paper proposes a Contextual Axial-Preserving Attention Network (CaPaN)-based MRI image-assisted segmentation method for osteosarcoma detection. Based on the use of Res2Net, a parallel decoder is added to aggregate high-level features which effectively combines the local and global features of osteosarcoma. In addition, channel feature pyramid (CFP) and axial attention (A-RA) mechanisms are used. A lightweight CFP can extract feature mapping and contextual information of different sizes. A-RA uses axial attention to distinguish tumor tissues by mining, which reduces computational costs and thus improves the generalization performance of the model. We conducted experiments using a real dataset provided by the Second Xiangya Affiliated Hospital and the results showed that our proposed method achieves better segmentation results than alternative models. In particular, our method shows significant advantages with respect to small target segmentation. Its precision is about 2% higher than the average values of other models. For the segmentation of small objects, the DSC value of CaPaN is 0.021 higher than that of the commonly used U-Net method.

23 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a densely connected pyramidal dilated convolutional network (PDCNet) to solve the problem of blind spots in the receptive field, where the dilated factor of sub-dilated convolution increases exponentially, achieving multi-sacle receptive fields.
Abstract: Recently, with the extensive application of deep learning techniques in the hyperspectral image (HSI) field, particularly convolutional neural network (CNN), the research of HSI classification has stepped into a new stage. To avoid the problem that the receptive field of naive convolution is small, the dilated convolution is introduced into the field of HSI classification. However, the dilated convolution usually generates blind spots in the receptive field, resulting in discontinuous spatial information obtained. In order to solve the above problem, a densely connected pyramidal dilated convolutional network (PDCNet) is proposed in this paper. Firstly, a pyramidal dilated convolutional (PDC) layer integrates different numbers of sub-dilated convolutional layers is proposed, where the dilated factor of the sub-dilated convolution increases exponentially, achieving multi-sacle receptive fields. Secondly, the number of sub-dilated convolutional layers increases in a pyramidal pattern with the depth of the network, thereby capturing more comprehensive hyperspectral information in the receptive field. Furthermore, a feature fusion mechanism combining pixel-by-pixel addition and channel stacking is adopted to extract more abstract spectral–spatial features. Finally, in order to reuse the features of the previous layers more effectively, dense connections are applied in densely pyramidal dilated convolutional (DPDC) blocks. Experiments on three well-known HSI datasets indicate that PDCNet proposed in this paper has good classification performance compared with other popular models.

7 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a DL-based BS-HTD (DLBSTD) algorithm, which incorporates DLbased BS with DL-Based HTD for the first time.
Abstract: Deep learning (DL) has recently risen to prominence in hyperspectral target detection (HTD). Nevertheless, how to tackle the extreme training sample imbalance together with achieving target highlighting and background suppression is challenging. In addition, due to the spectral redundancy of hyperspectral imagery (HSI), it is a new course for HTD through band selection (BS) to retain crucial bands, thereupon improving the subsequent detection performance. Accordingly, we propose a DL-based BS-HTD (DLBSTD) algorithm, incorporating DL-based BS with DL-based HTD for the first time. Most significantly, a multidepth and multibranch network (MDBN) for HTD based on a novel BS method is proposed. First, the BS method, including an alternating local–global reconstruction network (ALGRN) and a correlation measurement strategy, provides representative bands containing key target information for MDBN. For the training sample imbalance of MDBN, we develop a BS-based method to select multifarious representative background training samples and propose a target band random substitution (TBRS) strategy to augment an ample target training set. Finally, the MDBN composed of a multidepth feature extraction (MDFE) module, three fusion strategies, and the parallel local convolution and gated recurrent unit (Conv-GRU) fully taps the spectral feature relationships to highlight targets and suppress backgrounds. Compared with nine competitive HTD algorithms, we carry out plentiful experiments on four classical datasets exhibiting that the proposed DLBSTD has strong generalization and salient detection performance of target highlighting and background suppression.

3 citations

Journal ArticleDOI
TL;DR: Experimental results on three popular benchmark datasets demonstrate that the DSD-HAFF achieves better performance and has a much smaller number of network parameters than the other state-of-the-art methods.
Abstract: ABSTRACT In recent years, the convolutional neural network (CNN) plays a vital role in hyperspectral image classification and performs more competitively than many other methods. However, in order to pursue better performance, most of existing CNN-based methods just simply stack rather deep convolutional layers. Although they improve the classification accuracy to a certain extent, they result in plenty of network parameters. In this paper, a light-weighted directionally separable dilated CNN with hierarchical attention feature fusion (DSD-HAFF) is proposed to solve these problems. First, two global dense dilated CNN branches that focus on two spatial directions separately are constructed to extract and reuse spatial information as much as possible. Second, a hierarchical attention feature fusion branch that consists of several coordinate attention blocks (CABs) is constructed. Hierarchical features from two directionally separable dilated CNN branches are adopted as inputs of CABs. In this way, the structure can not only fully incorporate hierarchical features, but also significantly reduce the network parameters. Meanwhile, the hierarchical attention feature fusion branch incorporates features from high-level to low-level in the kernel-number pyramid strategy. Experimental results on three popular benchmark datasets demonstrate that the DSD-HAFF achieves better performance and has a much smaller number of network parameters than the other state-of-the-art methods.

3 citations

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a DL-based BS-HTD (DLBSTD) algorithm, which incorporates DLbased BS with DL based HTD for the first time, and a multi-depth and multi-branch network (MDBN) was proposed.
Abstract: Deep learning (DL) has recently risen to prominence in hyperspectral target detection (HTD). Nevertheless, how to tackle the extreme training sample imbalance together with achieving target highlighting and background suppression is challenging. Additionally, due to the spectral redundancy of hyperspectral imagery (HSI), it is a new course for HTD through band selection (BS) to retain crucial bands thereupon improving the subsequent detection performance. Accordingly, we propose a DL-based BS-HTD (DLBSTD) algorithm, incorporating DL-based BS with DL-based HTD for the first time. Most significantly, a multi-depth and multi-branch network (MDBN) for HTD based on a novel BS method is proposed. First of all, the BS method including an alternating local-global reconstruction network (ALGRN) and a correlation measurement strategy provides representative bands containing key target information for MDBN. For the training sample imbalance of MDBN, we develop a BS-based method to select multifarious representative background training samples and propose a target band random substitution (TBRS) strategy to augment an ample target training set. Lastly, the MDBN composed of a multi-depth feature extraction (MDFE) module, three fusion strategies, and the parallel local convolution and gated recurrent unit (Conv-GRU) fully taps the spectral feature relationships to highlight targets and suppress backgrounds. Compared with nine competitive HTD algorithms, we carry out plentiful experiments on four classical datasets exhibiting that the proposed DLBSTD has strong generalization and salient detection performance of target highlighting and background suppression.

1 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

27,821 citations

Journal ArticleDOI
TL;DR: This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines by understanding and assessing the potentialities of SVM classifiers in hyperdimensional feature spaces and concludes that SVMs are a valid and effective alternative to conventional pattern recognition approaches.
Abstract: This paper addresses the problem of the classification of hyperspectral remote sensing images by support vector machines (SVMs) First, we propose a theoretical discussion and experimental analysis aimed at understanding and assessing the potentialities of SVM classifiers in hyperdimensional feature spaces Then, we assess the effectiveness of SVMs with respect to conventional feature-reduction-based approaches and their performances in hypersubspaces of various dimensionalities To sustain such an analysis, the performances of SVMs are compared with those of two other nonparametric classifiers (ie, radial basis function neural networks and the K-nearest neighbor classifier) Finally, we study the potentially critical issue of applying binary SVMs to multiclass problems in hyperspectral data In particular, four different multiclass strategies are analyzed and compared: the one-against-all, the one-against-one, and two hierarchical tree-based strategies Different performance indicators have been used to support our experimental studies in a detailed and accurate way, ie, the classification accuracy, the computational time, the stability to parameter setting, and the complexity of the multiclass architecture The results obtained on a real Airborne Visible/Infrared Imaging Spectroradiometer hyperspectral dataset allow to conclude that, whatever the multiclass strategy adopted, SVMs are a valid and effective alternative to conventional pattern recognition approaches (feature-reduction procedures combined with a classification method) for the classification of hyperspectral remote sensing data

3,607 citations

Journal ArticleDOI
TL;DR: The overall mean recognition probability (mean accuracy) of a pattern classifier is calculated and numerically plotted as a function of the pattern measurement complexity n and design data set size m, using the well-known probabilistic model of a two-class, discrete-measurement pattern environment.
Abstract: The overall mean recognition probability (mean accuracy) of a pattern classifier is calculated and numerically plotted as a function of the pattern measurement complexity n and design data set size m . Utilized is the well-known probabilistic model of a two-class, discrete-measurement pattern environment (no Gaussian or statistical independence assumptions are made). The minimum-error recognition rule (Bayes) is used, with the unknown pattern environment probabilities estimated from the data relative frequencies. In calculating the mean accuracy over all such environments, only three parameters remain in the final equation: n, m , and the prior probability p_{c} of either of the pattern classes. With a fixed design pattern sample, recognition accuracy can first increase as the number of measurements made on a pattern increases, but decay with measurement complexity higher than some optimum value. Graphs of the mean accuracy exhibit both an optimal and a maximum acceptable value of n for fixed m and p_{c} . A four-place tabulation of the optimum n and maximum mean accuracy values is given for equally likely classes and m ranging from 2 to 1000 . The penalty exacted for the generality of the analysis is the use of the mean accuracy itself as a recognizer optimality criterion. Namely, one necessarily always has some particular recognition problem at hand whose Bayes accuracy will be higher or lower than the mean over all recognition problems having fixed n, m , and p_{c} .

2,705 citations

Proceedings Article
07 May 2015
TL;DR: DeepLab as mentioned in this paper combines the responses at the final layer with a fully connected CRF to localize segment boundaries at a level of accuracy beyond previous methods, achieving 71.6% IOU accuracy in the test set.
Abstract: Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called "semantic image segmentation"). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our "DeepLab" system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantitatively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6% IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the 'hole' algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU.

2,469 citations