scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Learning Deep Hierarchical Spatial-Spectral Features for Hyperspectral Image Classification Based on Residual 3D-2D CNN.

29 Nov 2019-Sensors (Sensors (Basel))-Vol. 19, Iss: 23, pp 5276
TL;DR: Focused on the limited sample-based hyperspectral classification, an 11-layer CNN model called R- HybridSN (Residual-HybridSN) is designed with an organic combination of 3D-2D-CNN, residual learning, and depth-separable convolutions to better learn deep hierarchical spatial–spectral features with very few training data.
Abstract: Every pixel in a hyperspectral image contains detailed spectral information in hundreds of narrow bands captured by hyperspectral sensors. Pixel-wise classification of a hyperspectral image is the cornerstone of various hyperspectral applications. Nowadays, deep learning models represented by the convolutional neural network (CNN) provides an ideal solution for feature extraction, and has made remarkable achievements in supervised hyperspectral classification. However, hyperspectral image annotation is time-consuming and laborious, and available training data is usually limited. Due to the "small-sample problem", CNN-based hyperspectral classification is still challenging. Focused on the limited sample-based hyperspectral classification, we designed an 11-layer CNN model called R-HybridSN (Residual-HybridSN) from the perspective of network optimization. With an organic combination of 3D-2D-CNN, residual learning, and depth-separable convolutions, R-HybridSN can better learn deep hierarchical spatial-spectral features with very few training data. The performance of R-HybridSN is evaluated over three public available hyperspectral datasets on different amounts of training samples. Using only 5%, 1%, and 1% labeled data for training in Indian Pines, Salinas, and University of Pavia, respectively, the classification accuracy of R-HybridSN is 96.46%, 98.25%, 96.59%, respectively, which is far better than the contrast models.
Citations
More filters
Journal Article
TL;DR: This result is proved here for a class of nodes termed "semi-algebraic gates" which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with reLU and maximization gates, sum-product networks, and boosted decision trees.
Abstract: For any positive integer $k$, there exist neural networks with $\Theta(k^3)$ layers, $\Theta(1)$ nodes per layer, and $\Theta(1)$ distinct parameters which can not be approximated by networks with $\mathcal{O}(k)$ layers unless they are exponentially large --- they must possess $\Omega(2^k)$ nodes. This result is proved here for a class of nodes termed "semi-algebraic gates" which includes the common choices of ReLU, maximum, indicator, and piecewise polynomial functions, therefore establishing benefits of depth against not just standard networks with ReLU gates, but also convolutional networks with ReLU and maximization gates, sum-product networks, and boosted decision trees (in this last case with a stronger separation: $\Omega(2^{k^3})$ total tree nodes are required).

288 citations

Journal ArticleDOI
TL;DR: A novel CD framework based on the convolutional neural network (CNN) is proposed to not only address the aforementioned problems but also to considerably improve the level of accuracy.
Abstract: The diversity of change detection (CD) methods and the limitations in generalizing these techniques using different types of remote sensing datasets over various study areas have been a challenge for CD applications Additionally, most CD methods have been implemented in two intensive and time-consuming steps: (a) predicting change areas, and (b) decision on predicted areas In this study, a novel CD framework based on the convolutional neural network (CNN) is proposed to not only address the aforementioned problems but also to considerably improve the level of accuracy The proposed CNN-based CD network contains three parallel channels: the first and second channels, respectively, extract deep features on the original first- and second-time imagery and the third channel focuses on the extraction of change deep features based on differencing and staking deep features Additionally, each channel includes three types of convolution kernels: 1D-, 2D-, and 3D-dilated-convolution The effectiveness and reliability of the proposed CD method are evaluated using three different types of remote sensing benchmark datasets (ie, multispectral, hyperspectral, and Polarimetric Synthetic Aperture RADAR (PolSAR)) The results of the CD maps are also evaluated both visually and statistically by calculating nine different accuracy indices Moreover, the results of the CD using the proposed method are compared to those of several state-of-the-art CD algorithms All the results prove that the proposed method outperforms the other remote sensing CD techniques For instance, considering different scenarios, the Overall Accuracies (OAs) and Kappa Coefficients (KCs) of the proposed CD method are better than 9589% and 0805, respectively, and the Miss Detection (MD) and the False Alarm (FA) rates are lower than 12% and 3%, respectively

58 citations


Cites background from "Learning Deep Hierarchical Spatial-..."

  • ...The main purpose of the 3D convolution layer is to investigate the relationship between spectral bands so that all of the content of the spectral information is fully used [46,47]....

    [...]

  • ...The 2D convolution layer is concentrated on spatial dimensions and is unable to handle spectral information [47,50]....

    [...]

Journal ArticleDOI
TL;DR: In this article, the authors provided the first narrative deep learning review by considering all facets of image classification using AI and employed a PRISMA search strategy using Google Scholar, PubMed, IEEE, and Elsevier Science Direct, through which 127 relevant HDL studies were considered.

50 citations

Journal ArticleDOI
11 Sep 2020-Sensors
TL;DR: This work proposed a novel 3D-2D-convolutional neural network (CNN) model named AD-HybridSN (Attention-Dense- HybridSN), which can learn more discriminative spatial–spectral features using very few training data and is far better than all the contrast models.
Abstract: Convolutional neural networks provide an ideal solution for hyperspectral image (HSI) classification. However, the classification effect is not satisfactory when limited training samples are available. Focused on "small sample" hyperspectral classification, we proposed a novel 3D-2D-convolutional neural network (CNN) model named AD-HybridSN (Attention-Dense-HybridSN). In our proposed model, a dense block was used to reuse shallow features and aimed at better exploiting hierarchical spatial-spectral features. Subsequent depth separable convolutional layers were used to discriminate the spatial information. Further refinement of spatial-spectral features was realized by the channel attention method and spatial attention method, which were performed behind every 3D convolutional layer and every 2D convolutional layer, respectively. Experiment results indicate that our proposed model can learn more discriminative spatial-spectral features using very few training data. In Indian Pines, Salinas and the University of Pavia, AD-HybridSN obtain 97.02%, 99.59% and 98.32% overall accuracy using only 5%, 1% and 1% labeled data for training, respectively, which are far better than all the contrast models.

25 citations


Cites background or methods from "Learning Deep Hierarchical Spatial-..."

  • ...Compared with traditional 2D convolutional layers, the depth separable convolutional layers have fewer parameters and less computational burden, which make it more suitable for hyperspectral data processing [34]....

    [...]

  • ...[34] conducted vast experiments using different amounts of training samples and found that the degradation of the CNN model is very common when the sample size decreased....

    [...]

  • ...[34] proposed R-HybridSN (Residual-HybridSN) by means of rational use of non-identity residual connections, enriching the feature learning paths and enhancing the flow of spectral information in the network....

    [...]

  • ...Compared with the simple pipelined network, the well-designed model, which is more like a directed acyclic graph of layers, usually has a better classification effect [34]....

    [...]

  • ...The main strategies for small sample hyperspectral classification include generative adversarial networks [39,40], semi-supervised learning [41,42] and network optimization [33,34]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a lightweight, multiscale squeeze-and-excitation pyramid pooling network (MSPN) was proposed to tackle the small sample problem, which applied the semantic segmentation function to the pixel-level hyperspectral classification due to their comparability.
Abstract: Hyperspectral images are widely used for classification due to its rich spectral information along with spatial information. To process the high dimensionality and high nonlinearity of hyperspectral images, deep learning methods based on convolutional neural network (CNN) are widely used in hyperspectral classification applications. However, most CNN structures are stacked vertically in addition to using a onefold size of convolutional kernels or pooling layers, which cannot fully mine the multiscale information on the hyperspectral images. When such networks meet the practical challenge of a limited labeled hyperspectral image dataset—i.e., “small sample problem”—the classification accuracy and generalization ability would be limited. In this paper, to tackle the small sample problem, we apply the semantic segmentation function to the pixel-level hyperspectral classification due to their comparability. A lightweight, multiscale squeeze-and-excitation pyramid pooling network (MSPN) is proposed. It consists of a multiscale 3D CNN module, a squeezing and excitation module, and a pyramid pooling module with 2D CNN. Such a hybrid 2D-3D-CNN MSPN framework can learn and fuse deeper hierarchical spatial–spectral features with fewer training samples. The proposed MSPN was tested on three publicly available hyperspectral classification datasets: Indian Pine, Salinas, and Pavia University. Using 5%, 0.5%, and 0.5% training samples of the three datasets, the classification accuracies of the MSPN were 96.09%, 97%, and 96.56%, respectively. In addition, we also selected the latest dataset with higher spatial resolution, named WHU-Hi-LongKou, as the challenge object. Using only 0.1% of the training samples, we could achieve a 97.31% classification accuracy, which is far superior to the state-of-the-art hyperspectral classification methods.

19 citations

References
More filters
Proceedings ArticleDOI
27 Jun 2016
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

123,388 citations

Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"Learning Deep Hierarchical Spatial-..." refers background in this paper

  • ...Since then, VGG [32], GoogleNet [33], ResNet [34], and other networks with excellent performance in the ILCVRS competition have overcome one milestone after another in CNN model research....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations