scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Geoscience and Remote Sensing in 2020"


Journal ArticleDOI
TL;DR: The proposed multiscale dynamic GCN (MDGCN) enables the graph to be dynamically updated along with the graph convolution process so that these two steps can be benefited from each other to gradually produce the discriminative embedded features as well as a refined graph.
Abstract: Convolutional neural network (CNN) has demonstrated impressive ability to represent hyperspectral images and to achieve promising results in hyperspectral image classification. However, traditional CNN models can only operate convolution on regular square image regions with fixed size and weights, and thus, they cannot universally adapt to the distinct local regions with various object distributions and geometric appearances. Therefore, their classification performances are still to be improved, especially in class boundaries. To alleviate this shortcoming, we consider employing the recently proposed graph convolutional network (GCN) for hyperspectral image classification, as it can conduct the convolution on arbitrarily structured non-Euclidean data and is applicable to the irregular image regions represented by graph topological information. Different from the commonly used GCN models that work on a fixed graph, we enable the graph to be dynamically updated along with the graph convolution process so that these two steps can be benefited from each other to gradually produce the discriminative embedded features as well as a refined graph. Moreover, to comprehensively deploy the multiscale information inherited by hyperspectral images, we establish multiple input graphs with different neighborhood scales to extensively exploit the diversified spectral–spatial correlations at multiple scales. Therefore, our method is termed multiscale dynamic GCN (MDGCN). The experimental results on three typical benchmark data sets firmly demonstrate the superiority of the proposed MDGCN to other state-of-the-art methods in both qualitative and quantitative aspects.

270 citations


Journal ArticleDOI
TL;DR: This article presents an active deep learning approach for HSI classification, which integrates both active learning and deep learning into a unified framework and achieves better performance on three benchmark HSI data sets with significantly fewer labeled samples.
Abstract: Deep neural network has been extensively applied to hyperspectral image (HSI) classification recently. However, its success is greatly attributed to numerous labeled samples, whose acquisition costs a large amount of time and money. In order to improve the classification performance while reducing the labeling cost, this article presents an active deep learning approach for HSI classification, which integrates both active learning and deep learning into a unified framework. First, we train a convolutional neural network (CNN) with a limited number of labeled pixels. Next, we actively select the most informative pixels from the candidate pool for labeling. Then, the CNN is fine-tuned with the new training set constructed by incorporating the newly labeled pixels. This step together with the previous step is iteratively conducted. Finally, Markov random field (MRF) is utilized to enforce class label smoothness to further boost the classification performance. Compared with the other state-of-the-art traditional and deep learning-based HSI classification methods, our proposed approach achieves better performance on three benchmark HSI data sets with significantly fewer labeled samples.

203 citations


Journal ArticleDOI
TL;DR: A spectral–spatial attention network (SSAN) is proposed to capture discriminative spectral-spatial features from attention areas of HSI cubes to outperforms several state-of-the-art methods.
Abstract: Hyperspectral image (HSI) classification aims to assign each hyperspectral pixel with a proper land-cover label. Recently, convolutional neural networks (CNNs) have shown superior performance. To identify the land-cover label, CNN-based methods exploit the adjacent pixels as an input HSI cube, which simultaneously contains spectral signatures and spatial information. However, at the edge of each land-cover area, an HSI cube often contains several pixels whose land-cover labels are different from that of the center pixel. These pixels, named interfering pixels, will weaken the discrimination of spectral–spatial features and reduce classification accuracy. In this article, a spectral–spatial attention network (SSAN) is proposed to capture discriminative spectral–spatial features from attention areas of HSI cubes. First, a simple spectral–spatial network (SSN) is built to extract spectral–spatial features from HSI cubes. The SSN is composed of a spectral module and a spatial module. Each module consists of only a few 3-D convolution and activation operations, which make the proposed method easy to converge with a small number of training samples. Second, an attention module is introduced to suppress the effects of interfering pixels. The attention module is embedded into the SSN to obtain the SSAN. The experiments on several public HSI databases demonstrate that the proposed SSAN outperforms several state-of-the-art methods.

193 citations


Journal ArticleDOI
TL;DR: A novel and general deep siamese convolutional multiple-layers recurrent neural network (RNN) (SiamCRNN) for CD in multitemporal VHR images demonstrates that the promising performances of the proposed network outperform several state-of-the-art approaches.
Abstract: With the rapid development of Earth observation technology, very-high-resolution (VHR) images from various satellite sensors are more available, which greatly enrich the data source of change detection (CD). Multisource multitemporal images can provide abundant information on observed landscapes with various physical and material views, and it is exigent to develop efficient techniques to utilize these multisource data for CD. In this article, we propose a novel and general deep siamese convolutional multiple-layers recurrent neural network (RNN) (SiamCRNN) for CD in multitemporal VHR images. Superior to most VHR image CD methods, SiamCRNN can be used for both homogeneous and heterogeneous images. Integrating the merits of both convolutional neural network (CNN) and RNN, SiamCRNN consists of three subnetworks: deep siamese convolutional neural network (DSCNN), multiple-layers RNN (MRNN), and fully connected (FC) layers. The DSCNN has a flexible structure for multisource image and is able to extract spatial–spectral features from homogeneous or heterogeneous VHR image patches. The MRNN stacked by long-short term memory (LSTM) units is responsible for mapping the spatial–spectral features extracted by DSCNN into a new latent feature space and mining the change information between them. In addition, FC, the last part of SiamCRNN, is adopted to predict change probability. The experimental results in two homogeneous data sets and one challenging heterogeneous VHR images data set demonstrate that the promising performances of the proposed network outperform several state-of-the-art approaches.

191 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a hybrid-graph learning method to reveal the complex high-order relationships of the hyperspectral image (HSI), termed enhanced hybrid graph discriminant learning (EHGDL).
Abstract: Dimensionality reduction (DR) is an important way of improving the classification accuracy of a hyperspectral image (HSI). Graph learning, which can effectively reveal the intrinsic relationships of data, has been widely used in the case of HSIs. However, most of them are based on a simple graph to represent the binary relationships of data. An HSI contains complex high-order relationships among different samples. Therefore, in this article, we propose a hybrid-graph learning method to reveal the complex high-order relationships of the HSI, termed enhanced hybrid-graph discriminant learning (EHGDL). In EHGDL, an intraclass hypergraph and an interclass hypergraph are constructed to analyze the complex multiple relationships of a HSI. Then, a supervised locality graph is applied to reveal the binary relationships of a HSI which can form the complementarity of a hypergraph. Simultaneously, we also construct a weighted neighborhood margin model to boost the difference of samples from different classes. Finally, we design a DR model based on the intraclass hypergraph, the interclass hypergraph, the supervised locality graph, and the weighted neighborhood margin to improve the compactness of the intraclass samples and the separability of the interclass samples, and an optimal projection matrix can be achieved to extract the low-dimensional embedding features of the HSI. To demonstrate the effectiveness of the proposed method, experiments have been conducted on the Indian Pines, PaviaU, and HoustonU data sets. The experimental results show that EHGDL can generate better classification performance compared with some related DR methods. As a result, EHGDL can better reveal the complex intrinsic relationships of a HSI by the complementarity of different characteristics and enhance the discriminant performance of land-cover types.

188 citations


Journal ArticleDOI
TL;DR: This work designs a network unit that makes use of a gating mechanism to adaptively recalibrate spectral bands by selectively emphasizing informative bands and suppressing less useful ones, and demonstrates using extensive experiments that in comparison with state-of-the-art approaches, the spectral attention module-based convolutional networks are able to offer competitive results.
Abstract: Over the past few years, hyperspectral image classification using convolutional neural networks (CNNs) has progressed significantly. In spite of their effectiveness, given that hyperspectral images are of high dimensionality, CNNs can be hindered by their modeling of all spectral bands with the same weight, as probably not all bands are equally informative and predictive. Moreover, the usage of useless spectral bands in CNNs may even introduce noises and weaken the performance of networks. For the sake of boosting the representational capacity of CNNs for spectral-spatial hyperspectral data classification, in this work, we improve networks by discriminating the significance of different spectral bands. We design a network unit, which is termed as the spectral attention module, that makes use of a gating mechanism to adaptively recalibrate spectral bands by selectively emphasizing informative bands and suppressing less useful ones. We theoretically analyze and discuss why such a spectral attention module helps in a CNN for hyperspectral image classification. We demonstrate using extensive experiments that in comparison with state-of-the-art approaches, the spectral attention module-based convolutional networks are able to offer competitive results. Furthermore, this work sheds light on how a CNN interacts with spectral bands for the purpose of classification.

175 citations


Journal ArticleDOI
TL;DR: Quantitative and qualitative results demonstrate that HSI-BERT outperforms any other CNN-based model in terms of both classification accuracy and computational time and achieves state-of-the-art performance on three widely used hyperspectral image data sets.
Abstract: Deep learning methods have been widely used in hyperspectral image classification and have achieved state-of-the-art performance. Nonetheless, the existing deep learning methods are restricted by a limited receptive field, inflexibility, and difficult generalization problems in hyperspectral image classification. To solve these problems, we propose HSI-BERT, where BERT stands for bidirectional encoder representations from transformers and HSI stands for hyperspectral imagery. The proposed HSI-BERT has a global receptive field that captures the global dependence among pixels regardless of their spatial distance. HSI-BERT is very flexible and enables the flexible and dynamic input regions. Furthermore, HSI-BERT has good generalization ability because the jointly trained HSI-BERT can be generalized from regions with different shapes without retraining. HSI-BERT is primarily built on a multihead self-attention (MHSA) mechanism in an MHSA layer. Moreover, several attentions are learned by different heads, and each head of the MHSA layer encodes the semantic context-aware representation to obtain discriminative features. Because all head-encoded features are merged, the resulting features exhibit spatial–spectral information that is essential for accurate pixel-level classification. Quantitative and qualitative results demonstrate that HSI-BERT outperforms any other CNN-based model in terms of both classification accuracy and computational time and achieves state-of-the-art performance on three widely used hyperspectral image data sets.

173 citations


Journal ArticleDOI
TL;DR: A unified framework called feature-merged single-shot detection (FMSSD) network, which aggregates the context information both in multiple scales and the same scale feature maps, and proposes a novel area-weighted loss function to pay more attention to small objects.
Abstract: Recently, the deep convolutional neural network has brought great improvements in object detection. However, the balance between high accuracy and high speed has always been a challenging task in multiclass object detection for large-scale remote sensing imagery. One-stage methods are more widely used because of their high efficiency but are limited by their performances on small object detection. In this article, we propose a unified framework called feature-merged single-shot detection (FMSSD) network, which aggregates the context information both in multiple scales and the same scale feature maps. First, our network leverages the atrous spatial feature pyramid (ASFP) module to fuse the context information in multiscale features by using feature pyramid and multiple atrous rates. Second, we propose a novel area-weighted loss function to pay more attention to small objects, while the replaced original loss treats all objects equally. We believe that small objects should be given more weight than large objects because they lose more information during training. Specifically, a monotonic decreasing function about the area is designed to add weights on the loss function. Extensive experiments on the DOTA data set and NWPU VHR-10 data set demonstrate that our method achieves state-of-the-art detection accuracy with high efficiency. We also build a new large-scale data set called AIR-OBJ data set from Google Earth and show the detection results of small objects, which validates the effectiveness on large-scale remote sensing imagery.

167 citations


Journal ArticleDOI
TL;DR: This article proposes a solution to address local semantic change by locally extracting invariant features from hyperspectral imagery (HSI) in both spatial and frequency domains, using a method called invariant attribute profiles (IAPs).
Abstract: So far, a large number of advanced techniques have been developed to enhance and extract the spatially semantic information in hyperspectral image processing and analysis. However, locally semantic change, such as scene composition, relative position between objects, spectral variability caused by illumination, atmospheric effects, and material mixture, has been less frequently investigated in modeling spatial information. Consequently, identifying the same materials from spatially different scenes or positions can be difficult. In this article, we propose a solution to address this issue by locally extracting invariant features from hyperspectral imagery (HSI) in both spatial and frequency domains, using a method called invariant attribute profiles (IAPs). IAPs extract the spatial invariant features by exploiting isotropic filter banks or convolutional kernels on HSI and spatial aggregation techniques (e.g., superpixel segmentation) in the Cartesian coordinate system. Furthermore, they model invariant behaviors (e.g., shift, rotation) by the means of a continuous histogram of oriented gradients constructed in a Fourier polar coordinate. This yields a combinatorial representation of spatial-frequency invariant features with application to HSI classification. Extensive experiments conducted on three promising hyperspectral data sets (Houston2013 and Houston2018) to demonstrate the superiority and effectiveness of the proposed IAP method in comparison with several state-of-the-art profile-related techniques. The codes will be available from the website: https://sites.google.com/view/danfeng-hong/data-code .

166 citations


Journal ArticleDOI
TL;DR: A high-resolution RS image change detection approach based on a deep feature difference convolutional neural network (CNN) that achieves better performance compared with other classic approaches and has fewer missed detections and false alarms, proves that the proposed approach has strong robustness and generalization ability.
Abstract: Change detection based on remote sensing (RS) images has a wide range of applications in many fields. However, many existing approaches for detecting changes in RS images with complex land covers still have room for improvement. In this article, a high-resolution RS image change detection approach based on a deep feature difference convolutional neural network (CNN) is proposed. This approach uses a CNN to learn the deep features from RS images and then uses transfer learning to compose a two-channel network with shared weight to generate a multiscale and multidepth feature difference map for change detection. The network is trained by a change magnitude guided loss function proposed in this article and needs to train with only a few pixel-level samples to generate change magnitude maps, which can help to remove some of the pseudochanges. Finally, the binary change map can be obtained by a threshold. The approach is tested on several data sets from different sensors, including WorldView-3, QuickBird, and Ziyuan-3. The experimental results show that the proposed approach achieves better performance compared with other classic approaches and has fewer missed detections and false alarms, which proves that the proposed approach has strong robustness and generalization ability.

163 citations


Journal ArticleDOI
TL;DR: This article proposes a novel graph-based semisupervised network called nonlocal graph convolutional network (nonlocal GCN), which compared with state-of-the-art spectral classifiers and spectral–spatial classification networks is able to offer competitive results and high-quality classification maps.
Abstract: Over the past few years making use of deep networks, including convolutional neural networks (CNNs) and recurrent neural networks (RNNs), classifying hyperspectral images has progressed significantly and gained increasing attention. In spite of being successful, these networks need an adequate supply of labeled training instances for supervised learning, which, however, is quite costly to collect. On the other hand, unlabeled data can be accessed in almost arbitrary amounts. Hence it would be conceptually of great interest to explore networks that are able to exploit labeled and unlabeled data simultaneously for hyperspectral image classification. In this article, we propose a novel graph-based semisupervised network called nonlocal graph convolutional network (nonlocal GCN). Unlike existing CNNs and RNNs that receive pixels or patches of a hyperspectral image as inputs, this network takes the whole image (including both labeled and unlabeled data) in. More specifically, a nonlocal graph is first calculated. Given this graph representation, a couple of graph convolutional layers are used to extract features. Finally, the semisupervised learning of the network is done by using a cross-entropy error over all labeled instances. Note that the nonlocal GCN is end-to-end trainable. We demonstrate in extensive experiments that compared with state-of-the-art spectral classifiers and spectral–spatial classification networks, the nonlocal GCN is able to offer competitive results and high-quality classification maps (with fine boundaries and without noisy scattered points of misclassification).

Journal ArticleDOI
TL;DR: A fibered rank minimization model for HSI mixed noise removal is proposed, in which the underlying HSI is modeled as a low-fibered-rank component and each subproblem within ADMM is proven to have a closed-form solution, although 3DLogTNN is nonconvex.
Abstract: The tensor tubal rank, defined based on the tensor singular value decomposition (t-SVD), has obtained promising results in hyperspectral image (HSI) denoising. However, the framework of the t-SVD lacks flexibility for handling different correlations along different modes of HSIs, leading to suboptimal denoising performance. This article mainly makes three contributions. First, we introduce a new tensor rank named tensor fibered rank by generalizing the t-SVD to the mode- ${k}$ t-SVD, to achieve a more flexible and accurate HSI characterization. Since directly minimizing the fibered rank is NP-hard, we suggest a three-directional tensor nuclear norm (3DTNN) and a three-directional log-based tensor nuclear norm (3DLogTNN) as its convex and nonconvex relaxation to provide an efficient numerical solution, respectively. Second, we propose a fibered rank minimization model for HSI mixed noise removal, in which the underlying HSI is modeled as a low-fibered-rank component. Third, we develop an efficient alternating direction method of multipliers (ADMMs)-based algorithm to solve the proposed model, especially, each subproblem within ADMM is proven to have a closed-form solution, although 3DLogTNN is nonconvex. Extensive experimental results demonstrate that the proposed method has superior denoising performance, as compared with the state-of-the-art competing methods on low-rank matrix/tensor approximation and noise modeling.

Journal ArticleDOI
TL;DR: A gated bidirectional network is proposed to integrate the hierarchical feature aggregation and the interference information elimination into an end-to-end network and can compete with the state-of-the-art methods on four RS scene classification data sets.
Abstract: Remote sensing (RS) scene classification is a challenging task due to various land covers contained in RS scenes. Recent RS classification methods demonstrate that aggregating the multilayer convolutional features, which are extracted from different hierarchical layers of a convolutional neural network, can effectively improve classification accuracy. However, these methods treat the multilayer convolutional features as equally important and ignore the hierarchical structure of multilayer convolutional features. Multilayer convolutional features not only provide complementary information for classification but also bring some interference information (e.g., redundancy and mutual exclusion). In this paper, a gated bidirectional network is proposed to integrate the hierarchical feature aggregation and the interference information elimination into an end-to-end network. First, the performance of each convolutional feature is quantitatively analyzed and a superior combination of convolutional features is selected. Then, a bidirectional connection is proposed to hierarchically aggregate multilayer convolutional features. Both the top–down direction and the bottom–up direction are considered to aggregate multilayer convolutional features into the semantic-assist feature and appearance-assist feature, respectively, and a gated function is utilized to eliminate interference information in the bidirectional connection. Finally, the semantic-assist feature and appearance-assist feature are merged for classification. The proposed method can compete with the state-of-the-art methods on four RS scene classification data sets (AID, UC-Merced, WHU-RS19, and OPTIMAL-31).

Journal ArticleDOI
TL;DR: Joint classification of hyperspectral imagery and LiDAR data is investigated using an effective hierarchical random walk network (HRWN), demonstrating that the proposed HRWN significantly outperforms other state-of-the-art methods.
Abstract: Earth observation using multisensor data is drawing increasing attention. Fusing remotely sensed hyperspectral imagery and light detection and ranging (LiDAR) data helps to increase application performance. In this article, joint classification of hyperspectral imagery and LiDAR data is investigated using an effective hierarchical random walk network (HRWN). In the proposed HRWN, a dual-tunnel convolutional neural network (CNN) architecture is first developed to capture spectral and spatial features. A pixelwise affinity branch is proposed to capture the relationships between classes with different elevation information from LiDAR data and confirm the spatial contrast of classification. Then in the designed hierarchical random walk layer, the predicted distribution of dual-tunnel CNN serves as global prior while pixelwise affinity reflects the local similarity of pixel pairs, which enforce spatial consistency in the deeper layers of networks. Finally, a classification map is obtained by calculating the probability distribution. Experimental results validated with three real multisensor remote sensing data demonstrate that the proposed HRWN significantly outperforms other state-of-the-art methods. For example, the two branches CNN classifier achieves an accuracy of 88.91% on the University of Houston campus data set, while the proposed HRWN classifier obtains an accuracy of 93.61%, resulting in an improvement of approximately 5%.

Journal ArticleDOI
TL;DR: This study proposes an automatic building footprint extraction framework that consists of a convolutional neural network (CNN)-based segmentation and an empirical polygon regularization that transforms segmentation maps into structured individual building polygons.
Abstract: This study proposes an automatic building footprint extraction framework that consists of a convolutional neural network (CNN)-based segmentation and an empirical polygon regularization that transforms segmentation maps into structured individual building polygons. The framework attempts to replace part of the manual delineation of building footprints that are involved in surveying and mapping field with algorithms. First, we develop a scale robust fully convolutional network (FCN) by introducing multiple scale aggregation of feature pyramids from convolutional layers. Two postprocessing strategies are introduced to refine the segmentation maps from the FCN. The refined segmentation maps are vectorized and polygonized. Then, we propose a polygon regularization algorithm consisting of a coarse and fine adjustment, to translate the initial polygons into structured footprints. Experiments on a large open building data set including 181 000 buildings showed that our algorithm reached a high automation level where at least 50% of individual buildings in the test area could be delineated to replace manual work. Experiments on different data sets demonstrated that our FCN-based segmentation method outperformed several most recent segmentation methods, and our polygon regularization algorithm is robust in challenging situations with different building styles, image resolutions, and even low-quality segmentation.

Journal ArticleDOI
TL;DR: Two simple yet effective network units are introduced, the spatial relation module, and the channel relation module to learn and reason about global relationships between any two spatial positions or feature maps, and then produce Relation-Augmented (RA) feature representations.
Abstract: Most current semantic segmentation approaches fall back on deep convolutional neural networks (CNNs). However, their use of convolution operations with local receptive fields causes failures in modeling contextual spatial relations. Prior works have sought to address this issue by using graphical models or spatial propagation modules in networks. But such models often fail to capture long-range spatial relationships between entities, which leads to spatially fragmented predictions. Moreover, recent works have demonstrated that channel-wise information also acts a pivotal part in CNNs. In this article, we introduce two simple yet effective network units, the spatial relation module, and the channel relation module to learn and reason about global relationships between any two spatial positions or feature maps, and then produce Relation-Augmented (RA) feature representations. The spatial and channel relation modules are general and extensible, and can be used in a plug-and-play fashion with the existing fully convolutional network (FCN) framework. We evaluate relation module-equipped networks on semantic segmentation tasks using two aerial image data sets, namely International Society for Photogrammetry and Remote Sensing (ISPRS) Vaihingen and Potsdam data sets, which fundamentally depend on long-range spatial relational reasoning. The networks achieve very competitive results, a mean $F_{1}$ score of 88.54% on the Vaihingen data set and a mean $F_{1}$ score of 88.01% on the Potsdam data set, bringing significant improvements over baselines.

Journal ArticleDOI
TL;DR: A novel denoising method by combining nonlocal low-rank tensor decomposition and total variation regularization, which is referred to as TV-NLRTD is proposed, which confirms the validity and superiority of the proposed method compared with the current state-of-the-art HSI denoised algorithms.
Abstract: Hyperspectral images (HSIs) are normally corrupted by a mixture of various noise types, which degrades the quality of the acquired image and limits the subsequent application. In this article, we propose a novel denoising method for the HSI restoration task by combining nonlocal low-rank tensor decomposition and total variation regularization, which we refer to as TV-NLRTD. To simultaneously capture the nonlocal similarity and high spectral correlation, the HSI is first segmented into overlapping 3-D cubes that are grouped into several clusters by the $k$ -means++ algorithm and exploited by low-rank tensor approximation. Spatial–spectral total variation (SSTV) regularization is then investigated to restore the clean HSI from the denoised overlapping cubes. Meanwhile, the $\ell _{1} $ -norm facilitates the separation of the clean nonlocal low-rank tensor groups and the sparse noise. The proposed TV-NLRTD method is optimized by employing the efficient alternating direction method of multipliers (ADMM) algorithm. The experimental results obtained with both simulated and real hyperspectral data sets confirm the validity and superiority of the proposed method compared with the current state-of-the-art HSI denoising algorithms.

Journal ArticleDOI
TL;DR: Experimental results on four popular hyperspectral data sets with two training sample selection strategies show that the transferred CNN obtains better classification accuracy than that of state-of-the-art methods.
Abstract: Deep convolutional neural networks (CNNs) have shown their outstanding performance in the hyperspectral image (HSI) classification. The success of CNN-based HSI classification relies on the availability sufficient training samples. However, the collection of training samples is expensive and time consuming. Besides, there are many pretrained models on large-scale data sets, which extract the general and discriminative features. The proper reusage of low-level and midlevel representations will significantly improve the HSI classification accuracy. The large-scale ImageNet data set has three channels, but HSI contains hundreds of channels. Therefore, there are several difficulties to simply adapt the pretrained models for the classification of HSIs. In this article, heterogeneous transfer learning for HSI classification is proposed. First, a mapping layer is used to handle the issue of having different numbers of channels. Then, the model architectures and weights of the CNN trained on the ImageNet data sets are used to initialize the model and weights of the HSI classification network. Finally, a well-designed neural network is used to perform the HSI classification task. Furthermore, attention mechanism is used to adjust the feature maps due to the difference between the heterogeneous data sets. Moreover, controlled random sampling is used as another training sample selection method to test the effectiveness of the proposed methods. Experimental results on four popular hyperspectral data sets with two training sample selection strategies show that the transferred CNN obtains better classification accuracy than that of state-of-the-art methods. In addition, the idea of heterogeneous transfer learning may open a new window for further research.

Journal ArticleDOI
TL;DR: Two novel deep models are proposed to extract more discriminative spatial–spectral features by exploiting the convolutional LSTM (ConvLSTM) and can provide better classification performance than the other state-of-the-art approaches.
Abstract: In recent years, deep learning has presented a great advance in the hyperspectral image (HSI) classification. Particularly, long short-term memory (LSTM), as a special deep learning structure, has shown great ability in modeling long-term dependencies in the time dimension of video or the spectral dimension of HSIs. However, the loss of spatial information makes it quite difficult to obtain better performance. In order to address this problem, two novel deep models are proposed to extract more discriminative spatial–spectral features by exploiting the convolutional LSTM (ConvLSTM). By taking the data patch in a local sliding window as the input of each memory cell band by band, the 2-D extended architecture of LSTM is considered for building the spatial–spectral ConvLSTM 2-D neural network (SSCL2DNN) to model long-range dependencies in the spectral domain. To better preserve the intrinsic structure information of the hyperspectral data, the spatial–spectral ConvLSTM 3-D neural network (SSCL3DNN) is proposed by extending LSTM to the 3-D version for further improving the classification performance. The experiments, conducted on three commonly used HSI data sets, demonstrate that the proposed deep models have certain competitive advantages and can provide better classification performance than the other state-of-the-art approaches.

Journal ArticleDOI
TL;DR: An unsupervised discriminative reconstruction constrained generative adversarial network for HAD (HADGAN) is proposed, mainly based on the assumption that the number of normal samples is much larger than thenumber of abnormal ones.
Abstract: The rich and distinguishable spectral information in hyperspectral images (HSIs) makes it possible to capture anomalous samples [i.e., anomaly detection (AD)] that deviate from background samples. However, hyperspectral anomaly detection (HAD) faces various challenges due to high dimensionality, redundant information, and unlabeled and limited samples. To address these problems, this article proposes an unsupervised discriminative reconstruction constrained generative adversarial network for HAD (HADGAN). Our solution is mainly based on the assumption that the number of normal samples is much larger than the number of abnormal ones. The key contribution of this article is to learn a discriminative background reconstruction with anomaly targets being suppressed, which produces the initial detection image (i.e., the residual image between the original image and reconstructed image) with anomaly targets being highlighted and background samples being suppressed. To accomplish this goal, first, by using an autoencoder (AE) network and an adversarial latent discriminator, the latent feature layer learns normal background distribution and AE learns a background reconstruction as much as possible. Second, consistency enhanced representation and shrink constraints are added to the latent feature layer to ensure that anomaly samples are projected to similar positions as normal samples in the latent feature layer. Third, using an adversarial image feature corrector in the input space can guarantee the reliability of the generated samples. Finally, an energy-based spatial and distance-based spectral joint anomaly detector is applied in the residual map to generate the final detection map. Experiments conducted on several data sets over different scenes demonstrate its state-of-the-art performance.

Journal ArticleDOI
TL;DR: Experimental results suggest that the proposed nonlocal tensor decomposition model for hyperspectral and multispectral image fusion (HSI-MSI fusion) substantially outperforms the existing state-of-the-art HSI- MSI fusion methods.
Abstract: Hyperspectral (HS) super-resolution, which aims at enhancing the spatial resolution of hyperspectral images (HSIs), has recently attracted considerable attention. A common way of HS super-resolution is to fuse the HSI with a higher spatial-resolution multispectral image (MSI). Various approaches have been proposed to solve this problem by establishing the degradation model of low spatial-resolution HSIs and MSIs based on matrix factorization methods, e.g., unmixing and sparse representation. However, this category of approaches cannot well construct the relationship between the high-spatial-resolution (HR) HSI and MSI. In fact, since the HSI and the MSI capture the same scene, these two image sources must have common factors. In this paper, a nonlocal tensor decomposition model for hyperspectral and multispectral image fusion (HSI-MSI fusion) is proposed. First, the nonlocal similar patch tensors of the HSI are constructed according to the MSI for the purpose of calculating the smooth order of all the patches for clustering. Then, the relationship between the HR HSI and the MSI is explored through coupled tensor canonical polyadic (CP) decomposition. The fundamental idea of the proposed model is that the factor matrices in the CP decomposition of the HR HSI’s nonlocal tensor can be shared with the matrices factorized by the MSI’s nonlocal tensor. Alternating direction method of multipliers is used to solve the proposed model. Through this method, the spatial structure of the MSI can be successfully transferred to the HSI. Experimental results on three synthetic data sets and one real data set suggest that the proposed method substantially outperforms the existing state-of-the-art HSI-MSI fusion methods.

Journal ArticleDOI
TL;DR: An efficient and effective framework to fuse hyperspectral and light detection and ranging (LiDAR) data using two coupled convolutional neural networks (CNNs) designed to learn spectral–spatial features from hyperspectrals and elevation information from LiDAR data is proposed.
Abstract: In this article, we propose an efficient and effective framework to fuse hyperspectral and light detection and ranging (LiDAR) data using two coupled convolutional neural networks (CNNs). One CNN is designed to learn spectral–spatial features from hyperspectral data, and the other one is used to capture the elevation information from LiDAR data. Both of them consist of three convolutional layers, and the last two convolutional layers are coupled together via a parameter-sharing strategy. In the fusion phase, feature-level and decision-level fusion methods are simultaneously used to integrate these heterogeneous features sufficiently. For the feature-level fusion, three different fusion strategies are evaluated, including the concatenation strategy, the maximization strategy, and the summation strategy. For the decision-level fusion, a weighted summation strategy is adopted, where the weights are determined by the classification accuracy of each output. The proposed model is evaluated on an urban data set acquired over Houston, USA, and a rural one captured over Trento, Italy. On the Houston data, our model can achieve a new record overall accuracy (OA) of 96.03%. On the Trento data, it achieves an OA of 99.12%. These results sufficiently certify the effectiveness of our proposed model.

Journal ArticleDOI
TL;DR: This article incorporates the graph regularization and total variation (TV) regularization into the LRR formulation and proposes a novel anomaly detection method based on graph and TV regularized LRR (GTVLRR) model, to preserve the local geometrical structure and spatial relationships in hyperspectral images.
Abstract: Anomaly detection is of great importance among hyperspectral applications, which aims at locating targets that are spectrally different from their surrounding background. A variety of anomaly detection methods have been proposed in the past. However, most of them fail to take the high spectral correlations of all the pixels into consideration. Low-rank representation (LRR) has drawn a great deal of interest in recent years, as a promising model to exploit the intrinsic low-rank property of hyperspectral images. Nevertheless, the original LRR model only analyzes the spectral signatures without taking advantage of the valuable spatial information in hyperspectral images. Furthermore, it has been shown that the local geometrical information of the hyperspectral data is also important for discrimination between the anomalies and background pixels. In this article, we incorporate the graph regularization and total variation (TV) regularization into the LRR formulation and propose a novel anomaly detection method based on graph and TV regularized LRR (GTVLRR) model, to preserve the local geometrical structure and spatial relationships in hyperspectral images. Extensive experiments have been conducted on both simulated and real hyperspectral data sets. The experimental results demonstrate the superiority of the proposed method over conventional and state-of-the-art anomaly detection methods.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed to build the mapping from apparent resistivity data (input) to resistivity model (output) directly by convolutional neural networks (CNNs).
Abstract: The inverse problem of electrical resistivity surveys (ERSs) is difficult because of its nonlinear and ill-posed nature. For this task, traditional linear inversion methods still face challenges such as suboptimal approximation and initial model selection. Inspired by the remarkable nonlinear mapping ability of deep learning approaches, in this article, we propose to build the mapping from apparent resistivity data (input) to resistivity model (output) directly by convolutional neural networks (CNNs). However, the vertically varying characteristic of patterns in the apparent resistivity data may cause ambiguity when using CNNs with the weight sharing and effective receptive field properties. To address the potential issue, we supply an additional tier feature map to CNNs to help those aware of the relationship between input and output. Based on the prevalent U-Net architecture, we design our network (ERSInvNet) that can be trained end-to-end and can reach a very fast inference speed during testing. We further introduce a depth weighting function and a smooth constraint into loss function to improve inversion accuracy for the deep region and suppress false anomalies. Six groups of experiments are considered to demonstrate the feasibility and efficiency of the proposed methods. According to the comprehensive qualitative analysis and quantitative comparison, ERSInvNet with tier feature map, smooth constraints, and depth weighting function together achieve the best performance.

Journal ArticleDOI
TL;DR: A deep architecture with a two-stage multiscale training strategy that is tailored to the semantic segmentation of large-size VHR RSIs is proposed and experimental results show that it outperforms local-patch-based training models in terms of both accuracy and stability.
Abstract: Very-high resolution (VHR) remote sensing images (RSIs) have significantly larger spatial size compared to typical natural images used in computer vision applications. Therefore, it is computationally unaffordable to train and test classifiers on these images at a full-size scale. Commonly used methodologies for semantic segmentation of RSIs perform training and prediction on cropped image patches. Thus, they have the limitation of failing to incorporate enough context information. In order to better exploit the correlations between ground objects, we propose a deep architecture with a two-stage multiscale training strategy that is tailored to the semantic segmentation of large-size VHR RSIs. In the first stage of the training strategy, a semantic embedding network is designed to learn high-level features from downscaled images covering a large area. In the second training stage, a local feature extraction network is designed to introduce low-level information from cropped image patches. The resulting training strategy is able to fuse complementary information learned from multiple levels to make predictions. Experimental results on two data sets show that it outperforms local-patch-based training models in terms of both accuracy and stability.

Journal ArticleDOI
TL;DR: This article considers deep learning models—such as convolutional neural networks (CNNs)—to perform spectral–spatial HSI denoising, and proposes a model that efficiently takes into consideration both the spatial and spectral information contained in HSIs.
Abstract: Denoising is a common preprocessing step prior to the analysis and interpretation of hyperspectral images (HSIs). However, the vast majority of methods typically adopted for HSI denoising exploit architectures originally developed for grayscale or RGB images, exhibiting limitations when processing high-dimensional HSI data cubes. In particular, traditional methods do not take into account the high spectral correlation between adjacent bands in HSIs, which leads to unsatisfactory denoising performance as the rich spectral information present in HSIs is not fully exploited. To overcome this limitation, this article considers deep learning models—such as convolutional neural networks (CNNs)—to perform spectral–spatial HSI denoising. The proposed model, called HSI single denoising CNN (HSI-SDeCNN), efficiently takes into consideration both the spatial and spectral information contained in HSIs. Experimental results on both synthetic and real data demonstrate that the proposed HSI-SDeCNN outperforms other state-of-the-art HSI denoising methods. Source code: https://github.com/mhaut/HSI-SDeCNN

Journal ArticleDOI
TL;DR: This article addresses the problem of fast object tracking in satellite videos, by developing a novel tracking algorithm based on correlation filters embedded with motion estimations, based on the kernelized correlation filter (KCF).
Abstract: As a new method of Earth observation, video satellite is capable of monitoring specific events on the Earth’s surface continuously by providing high-temporal resolution remote sensing images. The video observations enable a variety of new satellite applications such as object tracking and road traffic monitoring. In this article, we address the problem of fast object tracking in satellite videos, by developing a novel tracking algorithm based on correlation filters embedded with motion estimations. Based on the kernelized correlation filter (KCF), the proposed algorithm provides the following improvements: 1) proposing a novel motion estimation (ME) algorithm by combining the Kalman filter and motion trajectory averaging and mitigating the boundary effects of KCF by using this ME algorithm and 2) solving the problem of tracking failure when a moving object is partially or completely occluded. The experimental results demonstrate that our algorithm can track the moving object in satellite videos with 95% accuracy.

Journal ArticleDOI
TL;DR: This article proposes a unified BS framework, BS Network (BS-Net), which consists of a band attention module (BAM), which aims to explicitly model the nonlinear interdependences between spectral bands, and a reconstruction network (RecNet) which is used to restore the original HSI from the learned informative bands, resulting in a flexible architecture.
Abstract: Hyperspectral image (HSI) consists of hundreds of continuous narrowbands with high spectral correlation, which would lead to the so-called Hughes phenomenon and the high computational cost in processing. Band selection (BS) has been proven to be effective in avoiding such problems by removing redundant bands. However, many existing BS methods separately estimate the significance for every single band and cannot fully consider the nonlinear and global interaction between spectral bands. In this article, by assuming that a complete HSI band set can be reconstructed from its few informative bands, we propose a unified BS framework, BS Network (BS-Net). The framework consists of a band attention module (BAM), which aims to explicitly model the nonlinear interdependences between spectral bands, and a reconstruction network (RecNet), which is used to restore the original HSI from the learned informative bands, resulting in a flexible architecture. The resulting framework is end-to-end trainable, making it easier to train from scratch and to combine with many existing networks. We implement two versions of BS-Nets, respectively, using fully connected networks (BS-Net-FC) and convolutional neural networks (BS-Net-Conv), and extensively compare their results with popular existing BS approaches on three real hyperspectral data sets, showing that the proposed BS-Nets can accurately select informative band subset with less redundancy and outperform the competitors in terms of classification accuracy with competitive time cost.

Journal ArticleDOI
TL;DR: Experimental results on several real hyperspectral data sets demonstrate that the proposed method outperforms other state-of-the-art methods.
Abstract: In this article, a novel hyperspectral anomaly detection method with kernel Isolation Forest (iForest) is proposed. The method is based on an assumption that anomalies rather than background can be more susceptible to isolation in the kernel space. Based on this idea, the proposed method detects anomalies as follows. First, the hyperspectral data are mapped into the kernel space, and the first $K$ principal components are used. Then, the isolation samples in the image are detected with the iForest constructed using randomly selected samples in the principal components. Finally, the initial anomaly detection map is iteratively refined with locally constructed iForest in connected regions with large areas. Experimental results on several real hyperspectral data sets demonstrate that the proposed method outperforms other state-of-the-art methods.

Journal ArticleDOI
Lan Lan1, Guisheng Liao1, Jingwei Xu1, Yuhong Zhang1, Bin Liao2 
TL;DR: A series of novel beamforming methods based on nulling the transceive beampattern accurately in frequency diverse array (FDA)-multiple-input and multiple-output (MIMO) scheme can be suppressed effectively in synthetic aperture radar systems.
Abstract: Beamforming plays a crucial role in synthetic aperture radar (SAR) for interference mitigation and ambiguity unaliasing. In this article, a series of novel beamforming methods for SAR systems is proposed based on nulling the transceive beampattern accurately in frequency diverse array (FDA)-multiple-input and multiple-output (MIMO) scheme. In general, these methods are implemented by assigning artificial interferences with prescribed powers within the given rectangular regions in the joint transmit–receive spatial frequency domain. In specific, according to the predefined null depths, closed-form expressions of artificial interference powers are first formulated. Then, iteration algorithms are developed to update the interference-plus-noise covariance matrix and the designed weight vector. In such a way, a trough-like transceive beampattern with arbitrarily distributed broadened nulls is formed in the joint transmit–receive spatial frequency domain. As a result, interferences mixed in signals received by SAR can be suppressed effectively. Numerical simulations and experimental results are provided to corroborate the effectiveness of the proposed methods.