scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Geoscience and Remote Sensing in 2019"


Journal ArticleDOI
TL;DR: The Siamese U-Net outperforms current building extraction methods and could provide valuable reference and the designed experiments indicate the data set is accurate and can serve multiple purposes including building instance segmentation and change detection.
Abstract: The application of the convolutional neural network has shown to greatly improve the accuracy of building extraction from remote sensing imagery. In this paper, we created and made open a high-quality multisource data set for building detection, evaluated the accuracy obtained in most recent studies on the data set, demonstrated the use of our data set, and proposed a Siamese fully convolutional network model that obtained better segmentation accuracy. The building data set that we created contains not only aerial images but also satellite images covering 1000 km2 with both raster labels and vector maps. The accuracy of applying the same methodology to our aerial data set outperformed several other open building data sets. On the aerial data set, we gave a thorough evaluation and comparison of most recent deep learning-based methods, and proposed a Siamese U-Net with shared weights in two branches, and original images and their down-sampled counterparts as inputs, which significantly improves the segmentation accuracy, especially for large buildings. For multisource building extraction, the generalization ability is further evaluated and extended by applying a radiometric augmentation strategy to transfer pretrained models on the aerial data set to the satellite data set. The designed experiments indicate our data set is accurate and can serve multiple purposes including building instance segmentation and change detection; our result shows the Siamese U-Net outperforms current building extraction methods and could provide valuable reference.

721 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel end-to-end attention recurrent convolutional network (ARCNet) for scene classification that can learn to focus selectively on some key regions or locations and just process them at high-level features, thereby discarding the noncritical information and promoting the classification performance.
Abstract: Scene classification of remote sensing images has drawn great attention because of its wide applications. In this paper, with the guidance of the human visual system (HVS), we explore the attention mechanism and propose a novel end-to-end attention recurrent convolutional network (ARCNet) for scene classification. It can learn to focus selectively on some key regions or locations and just process them at high-level features, thereby discarding the noncritical information and promoting the classification performance. The contributions of this paper are threefold. First, we design a novel recurrent attention structure to squeeze high-level semantic and spatial features into several simplex vectors for the reduction of learning parameters. Second, an end-to-end network named ARCNet is proposed to adaptively select a series of attention regions and then to generate powerful predictions by learning to process them sequentially. Third, we construct a new data set named OPTIMAL-31, which contains more categories than popular data sets and gives researchers an extra platform to validate their algorithms. The experimental results demonstrate that our model makes great promotion in comparison with the state-of-the-art approaches.

457 citations


Journal ArticleDOI
TL;DR: It is demonstrated that the proposed DCS outperforms the other two schemes in terms of accuracy and is able to solve typical ISPs quickly within 1 s, and the proposed deep-learning inversion scheme is promising in providing quantitative images in real time.
Abstract: This paper is devoted to solving a full-wave inverse scattering problem (ISP), which is aimed at retrieving permittivities of dielectric scatterers from the knowledge of measured scattering data. ISPs are highly nonlinear due to multiple scattering, and iterative algorithms with regularizations are often used to solve such problems. However, they are associated with heavy computational cost, and consequently, they are often time-consuming. This paper proposes the convolutional neural network (CNN) technique to solve full-wave ISPs. We introduce and compare three training schemes based on U-Net CNN, including direct inversion, backpropagation, and dominant current schemes (DCS). Several representative tests are carried out, including both synthetic and experimental data, to evaluate the performances of the proposed methods. It is demonstrated that the proposed DCS outperforms the other two schemes in terms of accuracy and is able to solve typical ISPs quickly within 1 s. The proposed deep-learning inversion scheme is promising in providing quantitative images in real time.

320 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors presented a general end-to-end 2D convolutional neural network (CNN) framework for hyperspectral image CD (HSI-CD).
Abstract: Change detection (CD) is an important application of remote sensing, which provides timely change information about large-scale Earth surface. With the emergence of hyperspectral imagery, CD technology has been greatly promoted, as hyperspectral data with high spectral resolution are capable of detecting finer changes than using the traditional multispectral imagery. Nevertheless, the high dimension of the hyperspectral data makes it difficult to implement traditional CD algorithms. Besides, endmember abundance information at subpixel level is often not fully utilized. In order to better handle high-dimension problem and explore abundance information, this paper presents a general end-to-end 2-D convolutional neural network (CNN) framework for hyperspectral image CD (HSI-CD). The main contributions of this paper are threefold: 1) mixed-affinity matrix that integrates subpixel representation is introduced to mine more cross-channel gradient features and fuse multisource information; 2) 2-D CNN is designed to learn the discriminative features effectively from the multisource data at a higher level and enhance the generalization ability of the proposed CD algorithm; and 3) the new HSI-CD data set is designed for objective comparison of different methods. Experimental results on real hyperspectral data sets demonstrate that the proposed method outperforms most of the state of the arts.

319 citations


Journal ArticleDOI
TL;DR: A novel unsupervised context-sensitive framework—deep change vector analysis (DCVA)—for CD in multitemporal VHR images that exploit convolutional neural network (CNN) features is proposed and experimental results on mult itemporal data sets of Worldview-2, Pleiades, and Quickbird images confirm the effectiveness of the proposed method.
Abstract: Change detection (CD) in multitemporal images is an important application of remote sensing. Recent technological evolution provided very high spatial resolution (VHR) multitemporal optical satellite images showing high spatial correlation among pixels and requiring an effective modeling of spatial context to accurately capture change information. Here, we propose a novel unsupervised context-sensitive framework—deep change vector analysis (DCVA)—for CD in multitemporal VHR images that exploit convolutional neural network (CNN) features. To have an unsupervised system, DCVA starts from a suboptimal pretrained multilayered CNN for obtaining deep features that can model spatial relationship among neighboring pixels and thus complex objects. An automatic feature selection strategy is employed layerwise to select features emphasizing both high and low prior probability change information. Selected features from multiple layers are combined into a deep feature hypervector providing a multiscale scene representation. The use of the same pretrained CNN for semantic segmentation of single images enables us to obtain coherent multitemporal deep feature hypervectors that can be compared pixelwise to obtain deep change vectors that also model spatial context information. Deep change vectors are analyzed based on their magnitude to identify changed pixels. Then, deep change vectors corresponding to identified changed pixels are binarized to obtain a compressed binary deep change vectors that preserve information about the direction (kind) of change. Changed pixels are analyzed for multiple CD based on the binary features, thus implicitly using the spatial information. Experimental results on multitemporal data sets of Worldview-2, Pleiades, and Quickbird images confirm the effectiveness of the proposed method.

310 citations


Journal ArticleDOI
TL;DR: A generative adversarial network (GAN)-based edge-enhancement network (EEGAN) for robust satellite image SR reconstruction along with the adversarial learning strategy that is insensitive to noise is proposed.
Abstract: The current superresolution (SR) methods based on deep learning have shown remarkable comparative advantages but remain unsatisfactory in recovering the high-frequency edge details of the images in noise-contaminated imaging conditions, e.g., remote sensing satellite imaging. In this paper, we propose a generative adversarial network (GAN)-based edge-enhancement network (EEGAN) for robust satellite image SR reconstruction along with the adversarial learning strategy that is insensitive to noise. In particular, EEGAN consists of two main subnetworks: an ultradense subnetwork (UDSN) and an edge-enhancement subnetwork (EESN). In UDSN, a group of 2-D dense blocks is assembled for feature extraction and to obtain an intermediate high-resolution result that looks sharp but is eroded with artifacts and noises as previous GAN-based methods do. Then, EESN is constructed to extract and enhance the image contours by purifying the noise-contaminated components with mask processing. The recovered intermediate image and enhanced edges can be combined to generate the result that enjoys high credibility and clear contents. Extensive experiments on Kaggle Open Source Data set , Jilin-1 video satellite images, and Digitalglobe show superior reconstruction performance compared to the state-of-the-art SR approaches.

305 citations


Journal ArticleDOI
TL;DR: This survey paper presents a systematic review of deep learning-based HSI classification literatures and compares several strategies to improve classification performance, which can provide some guidelines for future studies on this topic.
Abstract: Hyperspectral image (HSI) classification has become a hot topic in the field of remote sensing. In general, the complex characteristics of hyperspectral data make the accurate classification of such data challenging for traditional machine learning methods. In addition, hyperspectral imaging often deals with an inherently nonlinear relation between the captured spectral information and the corresponding materials. In recent years, deep learning has been recognized as a powerful feature-extraction tool to effectively address nonlinear problems and widely used in a number of image processing tasks. Motivated by those successful applications, deep learning has also been introduced to classify HSIs and demonstrated good performance. This survey paper presents a systematic review of deep learning-based HSI classification literatures and compares several strategies for this topic. Specifically, we first summarize the main challenges of HSI classification which cannot be effectively overcome by traditional machine learning methods, and also introduce the advantages of deep learning to handle these problems. Then, we build a framework that divides the corresponding works into spectral-feature networks, spatial-feature networks, and spectral–spatial-feature networks to systematically review the recent achievements in deep learning-based HSI classification. In addition, considering the fact that available training samples in the remote sensing field are usually very limited and training deep networks require a large number of samples, we include some strategies to improve classification performance, which can provide some guidelines for future studies on this topic. Finally, several representative deep learning-based classification methods are conducted on real HSIs in our experiments.

301 citations


Journal ArticleDOI
TL;DR: Experiments show that the proposed DAPN method can detect multi-scale ships in different scenes of SAR images with extremely high accuracy and outperforms other ship detection methods implemented on SSDD.
Abstract: Synthetic aperture radar (SAR) is an active microwave imaging sensor with the capability of working in all-weather, all-day to provide high-resolution SAR images. Recently, SAR images have been widely used in civilian and military fields, such as ship detection. The scales of different ships vary in SAR images, especially for small-scale ships, which only occupy few pixels and have lower contrast. Compared with large-scale ships, the current ship detection methods are insensitive to small-scale ships. Therefore, the ship detection methods are facing difficulties with multi-scale ship detection in SAR images. A novel multi-scale ship detection method based on a dense attention pyramid network (DAPN) in SAR images is proposed in this paper. The DAPN adopts a pyramid structure, which densely connects convolutional block attention module (CBAM) to each concatenated feature map from top to bottom of the pyramid network. In this way, abundant features containing resolution and semantic information are extracted for multi-scale ship detection while refining concatenated feature maps to highlight salient features for specific scales by CBAM. Then, the salient features are integrated with global unblurred features to improve accuracy effectively in SAR images. Finally, the fused feature maps are fed to the detection network to obtain the final detection results. Experiments on the data set of SAR ship detection data set (SSDD) including multi-scale ships in various SAR images show that the proposed method can detect multi-scale ships in different scenes of SAR images with extremely high accuracy and outperforms other ship detection methods implemented on SSDD.

279 citations


Journal ArticleDOI
TL;DR: A CNN model extension is developed that redefines the concept of capsule units to become spectral–spatial units specialized in classifying remotely sensed HSI data and is able to provide competitive advantages in terms of both classification accuracy and computational time.
Abstract: Convolutional neural networks (CNNs) have recently exhibited an excellent performance in hyperspectral image classification tasks. However, the straightforward CNN-based network architecture still finds obstacles when effectively exploiting the relationships between hyperspectral imaging (HSI) features in the spectral–spatial domain, which is a key factor to deal with the high level of complexity present in remotely sensed HSI data. Despite the fact that deeper architectures try to mitigate these limitations, they also find challenges with the convergence of the network parameters, which eventually limit the classification performance under highly demanding scenarios. In this paper, we propose a new CNN architecture based on spectral–spatial capsule networks in order to achieve a highly accurate classification of HSIs while significantly reducing the network design complexity. Specifically, based on Hinton’s capsule networks, we develop a CNN model extension that redefines the concept of capsule units to become spectral–spatial units specialized in classifying remotely sensed HSI data. The proposed model is composed by several building blocks, called spectral–spatial capsules, which are able to learn HSI spectral–spatial features considering their corresponding spatial positions in the scene, their associated spectral signatures, and also their possible transformations. Our experiments, conducted using five well-known HSI data sets and several state-of-the-art classification methods, reveal that our HSI classification approach based on spectral–spatial capsules is able to provide competitive advantages in terms of both classification accuracy and computational time.

274 citations


Journal ArticleDOI
TL;DR: A new deep CNN architecture specially designed for the HSI data is presented to improve the spectral–spatial features uncovered by the convolutional filters of the network and is able to provide competitive advantages over the state-of-the-art HSI classification methods.
Abstract: Convolutional neural networks (CNNs) exhibit good performance in image processing tasks, pointing themselves as the current state-of-the-art of deep learning methods. However, the intrinsic complexity of remotely sensed hyperspectral images still limits the performance of many CNN models. The high dimensionality of the HSI data, together with the underlying redundancy and noise, often makes the standard CNN approaches unable to generalize discriminative spectral–spatial features. Moreover, deeper CNN architectures also find challenges when additional layers are added, which hampers the network convergence and produces low classification accuracies. In order to mitigate these issues, this paper presents a new deep CNN architecture specially designed for the HSI data. Our new model pursues to improve the spectral–spatial features uncovered by the convolutional filters of the network. Specifically, the proposed residual-based approach gradually increases the feature map dimension at all convolutional layers, grouped in pyramidal bottleneck residual blocks, in order to involve more locations as the network depth increases while balancing the workload among all units, preserving the time complexity per layer. It can be seen as a pyramid, where the deeper the blocks, the more feature maps can be extracted. Therefore, the diversity of high-level spectral–spatial attributes can be gradually increased across layers to enhance the performance of the proposed network with the HSI data. Our experiments, conducted using four well-known HSI data sets and 10 different classification techniques, reveal that our newly developed HSI pyramidal residual model is able to provide competitive advantages (in terms of both classification accuracy and computational time) over the state-of-the-art HSI classification methods

254 citations


Journal ArticleDOI
TL;DR: A deep few-shot learning method is proposed to address the small sample size problem of HSI classification and can achieve better classification accuracy than the conventional semisupervised methods with only a few labeled samples.
Abstract: Deep learning methods have recently been successfully explored for hyperspectral image (HSI) classification. However, training a deep-learning classifier notoriously requires hundreds or thousands of labeled samples. In this paper, a deep few-shot learning method is proposed to address the small sample size problem of HSI classification. There are three novel strategies in the proposed algorithm. First, spectral–spatial features are extracted to reduce the labeling uncertainty via a deep residual 3-D convolutional neural network. Second, the network is trained by episodes to learn a metric space where samples from the same class are close and those from different classes are far. Finally, the testing samples are classified by a nearest neighbor classifier in the learned metric space. The key idea is that the designed network learns a metric space from the training data set. Furthermore, such metric space could generalize to the classes of the testing data set. Note that the classes of the testing data set are not seen in the training data set. Four widely used HSI data sets were used to assess the performance of the proposed algorithm. The experimental results indicate that the proposed method can achieve better classification accuracy than the conventional semisupervised methods with only a few labeled samples.

Journal ArticleDOI
TL;DR: This paper presents a method to retrieve surface soil moisture (SSM) from the Sentinel-1 (S-1) satellites, which carry C-band Synthetic Aperture Radar (CSAR) sensors that provide the richest freely available SAR data source so far, unprecedented in accuracy and coverage.
Abstract: Soil moisture is a key environmental variable, important to, e.g., farmers, meteorologists, and disaster management units. Here, we present a method to retrieve surface soil moisture (SSM) from the Sentinel-1 (S-1) satellites, which carry C-band Synthetic Aperture Radar (CSAR) sensors that provide the richest freely available SAR data source so far, unprecedented in accuracy and coverage. Our SSM retrieval method, adapting well-established change detection algorithms, builds the first globally deployable soil moisture observation data set with 1-km resolution. This paper provides an algorithm formulation to be operated in data cube architectures and high-performance computing environments. It includes the novel dynamic Gaussian upscaling method for spatial upscaling of SAR imagery, harnessing its field-scale information and successfully mitigating effects from the SAR’s high signal complexity. Also, a new regression-based approach for estimating the radar slope is defined, coping with Sentinel-1’s inhomogeneity in spatial coverage. We employ the S-1 SSM algorithm on a 3-year S-1 data cube over Italy, obtaining a consistent set of model parameters and product masks, unperturbed by coverage discontinuities. An evaluation of therefrom generated S-1 SSM data, involving a 1-km soil water balance model over Umbria, yields high agreement over plains and agricultural areas, with low agreement over forests and strong topography. While positive biases during the growing season are detected, the excellent capability to capture small-scale soil moisture changes as from rainfall or irrigation is evident. The S-1 SSM is currently in preparation toward operational product dissemination in the Copernicus Global Land Service.

Journal ArticleDOI
TL;DR: The deep convolutional neural network (CNN) is introduced to achieve the HSI denoising method (HSI-DeNet), which can be regarded as a tensor-based method by directly learning the filters in each layer without damaging the spectral-spatial structures.
Abstract: The spectral and the spatial information in hyperspectral images (HSIs) are the two sides of the same coin. How to jointly model them is the key issue for HSIs’ noise removal, including random noise, structural stripe noise, and dead pixels/lines. In this paper, we introduce the deep convolutional neural network (CNN) to achieve this goal. The learned filters can well extract the spatial information within their local receptive filed. Meanwhile, the spectral correlation can be depicted by the multiple channels of the learned 2-D filters, namely, the number of filters in each layer. The consequent advantages of our CNN-based HSI denoising method (HSI-DeNet) over previous methods are threefold. First, the proposed HSI-DeNet can be regarded as a tensor-based method by directly learning the filters in each layer without damaging the spectral-spatial structures. Second, the HSI-DeNet can simultaneously accommodate various kinds of noise in HSIs. Moreover, our method is flexible for both single image and multiple images by slightly modifying the channels of the filters in the first and last layers. Last but not least, our method is extremely fast in the testing phase, which makes it more practical for real application. The proposed HSI-DeNet is extensively evaluated on several HSIs, and outperforms the state-of-the-art HSI-DeNets in terms of both speed and performance.

Journal ArticleDOI
TL;DR: The proposed CDSAE framework comprises two stages with different optimization objectives, which can learn discriminative low-dimensional feature mappings and train an effective classifier progressively, and imposes a local Fisher discriminant regularization on each hidden layer of stacked autoencoder (SAE) to train discrim inative SAE (DSAE).
Abstract: As one of the fundamental research topics in remote sensing image analysis, hyperspectral image (HSI) classification has been extensively studied so far. However, how to discriminatively learn a low-dimensional feature space, in which the mapped features have small within-class scatter and big between-class separation, is still a challenging problem. To address this issue, this paper proposes an effective framework, named compact and discriminative stacked autoencoder (CDSAE), for HSI classification. The proposed CDSAE framework comprises two stages with different optimization objectives, which can learn discriminative low-dimensional feature mappings and train an effective classifier progressively. First, we impose a local Fisher discriminant regularization on each hidden layer of stacked autoencoder (SAE) to train discriminative SAE (DSAE) by minimizing reconstruction error. This stage can learn feature mappings, in which the pixels from the same land-cover class are mapped as nearly as possible and the pixels from different land-cover categories are separated by a large margin. Second, we learn an effective classifier and meanwhile update DSAE with a local Fisher discriminant regularization being embedded on the top of feature representations. Moreover, to learn a compact DSAE with as small number of hidden neurons as possible, we impose a diversity regularization on the hidden neurons of DSAE to balance the feature dimensionality and the feature representation capability. The experimental results on three widely-used HSI data sets and comprehensive comparisons with existing methods demonstrate that our proposed method is effective.

Journal ArticleDOI
TL;DR: Experimental results on several benchmark hyperspectral data sets have demonstrated that the proposed 3D-CAE is very effective in extracting spatial–spectral features and outperforms not only traditional unsupervised feature extraction algorithms but also many supervised feature extraction algorithm in classification application.
Abstract: Feature learning technologies using convolutional neural networks (CNNs) have shown superior performance over traditional hand-crafted feature extraction algorithms. However, a large number of labeled samples are generally required for CNN to learn effective features under classification task, which are hard to be obtained for hyperspectral remote sensing images. Therefore, in this paper, an unsupervised spatial–spectral feature learning strategy is proposed for hyperspectral images using 3-Dimensional (3D) convolutional autoencoder (3D-CAE). The proposed 3D-CAE consists of 3D or elementwise operations only, such as 3D convolution, 3D pooling, and 3D batch normalization, to maximally explore spatial–spectral structure information for feature extraction. A companion 3D convolutional decoder network is also designed to reconstruct the input patterns to the proposed 3D-CAE, by which all the parameters involved in the network can be trained without labeled training samples. As a result, effective features are learned in an unsupervised mode that label information of pixels is not required. Experimental results on several benchmark hyperspectral data sets have demonstrated that our proposed 3D-CAE is very effective in extracting spatial–spectral features and outperforms not only traditional unsupervised feature extraction algorithms but also many supervised feature extraction algorithms in classification application.

Journal ArticleDOI
TL;DR: A new technique for unsupervised unmixing which is based on a deep autoencoder network (DAEN), which can unmix data sets with outliers and low signal-to-noise ratio and demonstrates very competitive performance.
Abstract: Spectral unmixing is a technique for remotely sensed image interpretation that expresses each (possibly mixed) pixel as a combination of pure spectral signatures (endmembers) and their fractional abundances. In this paper, we develop a new technique for unsupervised unmixing which is based on a deep autoencoder network (DAEN). Our newly developed DAEN consists of two parts. The first part of the network adopts stacked autoencoders (SAEs) to learn spectral signatures, so as to generate a good initialization for the unmixing process. In the second part of the network, a variational autoencoder (VAE) is employed to perform blind source separation, aimed at obtaining the endmember signatures and abundance fractions simultaneously. By taking advantage from the SAEs, the robustness of the proposed approach is remarkable as it can unmix data sets with outliers and low signal-to-noise ratio. Moreover, the multihidden layers of the VAE ensure the required constraints (nonnegativity and sum-to-one) when estimating the abundances. The effectiveness of the proposed method is evaluated using both synthetic and real hyperspectral data. When compared with other unmixing methods, the proposed approach demonstrates very competitive performance.

Journal ArticleDOI
TL;DR: A new hand-crafted feature extraction method, based on multiscale covariance maps (MCMs), that is specifically aimed at improving the classification of HSIs using CNNs, which demonstrates that the proposed method can indeed increase the robustness of the CNN model.
Abstract: The classification of hyperspectral images (HSIs) using convolutional neural networks (CNNs) has recently drawn significant attention. However, it is important to address the potential overfitting problems that CNN-based methods suffer when dealing with HSIs. Unlike common natural images, HSIs are essentially three-order tensors which contain two spatial dimensions and one spectral dimension. As a result, exploiting both spatial and spectral information is very important for HSI classification. This paper proposes a new hand-crafted feature extraction method, based on multiscale covariance maps (MCMs), that is specifically aimed at improving the classification of HSIs using CNNs. The proposed method has the following distinctive advantages. First, with the use of covariance maps, the spatial and spectral information of the HSI can be jointly exploited. Each entry in the covariance map stands for the covariance between two different spectral bands within a local spatial window, which can absorb and integrate the two kinds of information (spatial and spectral) in a natural way. Second, by means of our multiscale strategy, each sample can be enhanced with spatial information from different scales, increasing the information conveyed by training samples significantly. To verify the effectiveness of our proposed method, we conduct comprehensive experiments on three widely used hyperspectral data sets, using a classical 2-D CNN (2DCNN) model. Our experimental results demonstrate that the proposed method can indeed increase the robustness of the CNN model. Moreover, the proposed MCMs+2DCNN method exhibits better classification performance than other CNN-based classification strategies and several standard techniques for spectral-spatial classification of HSIs.

Journal ArticleDOI
TL;DR: A novel object detection network [(context-aware detection network (CAD-Net)] that exploits attention-modulated features as well as global and local contexts to address the new challenges in detecting objects from remote sensing images is presented.
Abstract: Accurate and robust detection of multi-class objects in optical remote sensing images is essential to many real-world applications, such as urban planning, traffic control, searching, and rescuing. However, the state-of-the-art object detection techniques designed for images captured using ground-level sensors usually experience a sharp performance drop when directly applied to remote sensing images, largely due to the object appearance differences in remote sensing images in terms of sparse texture, low contrast, arbitrary orientations, and large-scale variations. This paper presents a novel object detection network [(context-aware detection network (CAD-Net)] that exploits attention-modulated features as well as global and local contexts to address the new challenges in detecting objects from remote sensing images. The proposed CAD-Net learns global and local contexts of objects by capturing their correlations with the global scene (at scene level) and the local neighboring objects or features (at object level), respectively. In addition, it designs a spatial-and-scale-aware attention module that guides the network to focus on more informative regions and features as well as more appropriate feature scales. Experiments over two publicly available object detection data sets for remote sensing images demonstrate that the proposed CAD-Net achieves superior detection performance. The implementation codes will be made publicly available for facilitating future works.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed novel convolutional neural networks with multiscale convolution and diversified metric is better than original deep models and can produce comparable or even better classification performance in different hyperspectral image data sets with respect to spectral and spectral–spatial features.
Abstract: Recently, researchers have shown the powerful ability of deep methods with multilayers to extract high-level features and to obtain better performance for hyperspectral image classification. However, a common problem of traditional deep models is that the learned deep models might be suboptimal because of the limited number of training samples, especially for the image with large intraclass variance and low interclass variance. In this paper, novel convolutional neural networks (CNNs) with multiscale convolution (MS-CNNs) are proposed to address this problem by extracting deep multiscale features from the hyperspectral image. Moreover, deep metrics usually accompany with MS-CNNs to improve the representational ability for the hyperspectral image. However, the usual metric learning would make the metric parameters in the learned model tend to behave similarly. This similarity leads to obvious model’s redundancy and, thus, shows negative effects on the description ability of the deep metrics. Traditionally, determinantal point process (DPP) priors, which encourage the learned factors to repulse from one another, can be imposed over these factors to diversify them. Taking advantage of both the MS-CNNs and DPP-based diversity-promoting deep metrics, this paper develops a CNN with multiscale convolution and diversified metric to obtain discriminative features for hyperspectral image classification. Experiments are conducted over four real-world hyperspectral image data sets to show the effectiveness and applicability of the proposed method. Experimental results show that our method is better than original deep models and can produce comparable or even better classification performance in different hyperspectral image data sets with respect to spectral and spectral–spatial features.

Journal ArticleDOI
TL;DR: A new change detection algorithm for multi-temporal remotes sensing images called deep SFA (DSFA), based on the deep network and slow feature analysis (SFA) theory, which could outperform other state-of-the-art algorithms, including other SFA-based and deep learning methods.
Abstract: Change detection has been a hotspot in the remote sensing technology for a long time. With the increasing availability of multi-temporal remote sensing images, numerous change detection algorithms have been proposed. Among these methods, image transformation methods with feature extraction and mapping could effectively highlight the changed information and thus has a better change detection performance. However, the changes of multi-temporal images are usually complex, and the existing methods are not effective enough. In recent years, the deep network has shown its brilliant performance in many fields, including feature extraction and projection. Therefore, in this paper, based on the deep network and slow feature analysis (SFA) theory, we proposed a new change detection algorithm for multi-temporal remotes sensing images called deep SFA (DSFA). In the DSFA model, two symmetric deep networks are utilized for projecting the input data of bi-temporal imagery. Then, the SFA module is deployed to suppress the unchanged components and highlight the changed components of the transformed features. The change vector analysis pre-detection is employed to find unchanged pixels with high confidence as training samples. Finally, the change intensity is calculated with chi-square distance and the changes are determined by threshold algorithms. The experiments are performed on two real-world data sets and a public hyperspectral data set. The visual comparison and the quantitative evaluation have shown that DSFA could outperform the other state-of-the-art algorithms, including other SFA-based and deep learning methods.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel recurrent convolutional neural network (ReCNN) architecture, which is trained to learn a joint spectral-spatial-temporal feature representation in a unified framework for change detection in multispectral images.
Abstract: Change detection is one of the central problems in earth observation and was extensively investigated over recent decades. In this paper, we propose a novel recurrent convolutional neural network (ReCNN) architecture, which is trained to learn a joint spectral–spatial–temporal feature representation in a unified framework for change detection in multispectral images. To this end, we bring together a convolutional neural network and a recurrent neural network into one end-to-end network. The former is able to generate rich spectral-spatial feature representations, while the latter effectively analyzes temporal dependence in bitemporal images. In comparison with previous approaches to change detection, the proposed network architecture possesses three distinctive properties: 1) it is end-to-end trainable, in contrast to most existing methods whose components are separately trained or computed; 2) it naturally harnesses spatial information that has been proven to be beneficial to change detection task; and 3) it is capable of adaptively learning the temporal dependence between multitemporal images, unlike most of the algorithms that use fairly simple operation like image differencing or stacking. As far as we know, this is the first time that a recurrent convolutional network architecture has been proposed for multitemporal remote sensing image analysis. The proposed network is validated on real multispectral data sets. Both visual and quantitative analyses of the experimental results demonstrate competitive performance in the proposed mode.

Journal ArticleDOI
TL;DR: A nonlocal low-rank regularized CANDECOMP/PARAFAC (CP) tensor decomposition (NLR-CPTD) is proposed to fully utilize these two intrinsic priors and can greatly promote the denoising performance of an HSI in various quality assessments.
Abstract: Hyperspectral image (HSI) enjoys great advantages over more traditional image types for various applications due to the extra knowledge available. For the nonideal optical and electronic devices, HSI is always corrupted by various noises, such as Gaussian noise, deadlines, and stripings. The global correlation across spectrum (GCS) and nonlocal self-similarity (NSS) over space are two important characteristics for HSI. In this paper, a nonlocal low-rank regularized CANDECOMP/PARAFAC (CP) tensor decomposition (NLR-CPTD) is proposed to fully utilize these two intrinsic priors. To make the rank estimation more accurate, a new manner of rank determination for the NLR-CPTD model is proposed. The intrinsic GCS and NSS priors can be efficiently explored under the low-rank regularized CPTD to avoid tensor rank estimation bias for denoising performance. Then, the proposed HSI denoising model is performed on tensors formed by nonlocal similar patches within an HSI. The alternating direction method of multipliers-based optimization technique is designed to solve the minimum problem. Compared with state-of-the-art methods, the proposed algorithm can greatly promote the denoising performance of an HSI in various quality assessments.

Journal ArticleDOI
Qiangqiang Yuan1, Qiang Zhang1, Jie Li1, Huanfeng Shen1, Liangpei Zhang1 
TL;DR: A novel deep learning-based method by learning a nonlinear end-to-end mapping between the noisy and clean HSIs with a combined spatial–spectral deep convolutional neural network (HSID-CNN) that outperforms many of the mainstream methods in both the quantitative evaluation indexes, visual effects, and HSI classification accuracy.
Abstract: Hyperspectral image (HSI) denoising is a crucial preprocessing procedure to improve the performance of the subsequent HSI interpretation and applications. In this paper, a novel deep learning-based method for this task is proposed, by learning a nonlinear end-to-end mapping between the noisy and clean HSIs with a combined spatial–spectral deep convolutional neural network (HSID-CNN). Both the spatial and spectral information are simultaneously assigned to the proposed network. In addition, multiscale feature extraction and multilevel feature representation are, respectively, employed to capture both the multiscale spatial–spectral feature and fuse different feature representations for the final restoration. The simulated and real-data experiments demonstrate that the proposed HSID-CNN outperforms many of the mainstream methods in both the quantitative evaluation indexes, visual effects, and HSI classification accuracy.

Journal ArticleDOI
TL;DR: This paper presents a hierarchical robust CNN, where multiscale convolutional features are extracted to represent the hierarchical spatial semantic information and multiple fully connected layer features are stacked together so as to improve the rotation and scaling robustness.
Abstract: Object detection is a basic issue of very high-resolution remote sensing images (RSIs) for automatically labeling objects. At present, deep learning has gradually gained the competitive advantage for remote sensing object detection, especially based on convolutional neural networks (CNNs). Most of the existing methods use the global information in the fully connected feature vector and ignore the local information in the convolutional feature cubes. However, the local information can provide spatial information, which is helpful for accurate localization. In addition, there are variable factors, such as rotation and scaling, which affect the object detection accuracy in RSIs. In order to solve these problems, this paper presents a hierarchical robust CNN. First, multiscale convolutional features are extracted to represent the hierarchical spatial semantic information. Second, multiple fully connected layer features are stacked together so as to improve the rotation and scaling robustness. Experiments on two data sets have shown the effectiveness of our method. In addition, a large-scale high-resolution remote sensing object detection data set is established to make up for the current situation that the existing data set is insufficient or too small. The data set is available at https://github.com/CrazyStoneonRoad/TGRS-HRRSD-Dataset .

Journal ArticleDOI
TL;DR: This paper introduces a new visual attention-driven technique for the HSI classification that incorporates attention mechanisms to a ResNet in order to better characterize the spectral–spatial information contained in the data.
Abstract: Deep neural networks (DNNs), including convolutional neural networks (CNNs) and residual networks (ResNets) models, are able to learn abstract representations from the input data by considering a deep hierarchy of layers that perform advanced feature extraction. The combination of these models with visual attention techniques can assist with the identification of the most representative parts of the data from a visual standpoint, obtained through more detailed filtering of the features extracted by the operational layers of the network. This is of significant interest for analyzing remotely sensed hyperspectral images (HSIs), characterized by their very high spectral dimensionality. However, few efforts have been conducted in the literature in order to adapt visual attention methods to remotely sensed HSI data analysis. In this paper, we introduce a new visual attention-driven technique for the HSI classification. Specifically, we incorporate attention mechanisms to a ResNet in order to better characterize the spectral–spatial information contained in the data. Our newly proposed method calculates a mask that is applied to the features obtained by the network in order to identify the most desirable ones for classification purposes. Our experiments, conducted using four widely used HSI data sets, reveal that the proposed deep attention model provides competitive advantages in terms of classification accuracy when compared to other state-of-the-art methods.

Journal ArticleDOI
TL;DR: A scale-free CNN (SF-CNN) is introduced for remote sensing scene classification that not only allows the input images to be of arbitrary sizes but also retain the ability to extract discriminative features using a traditional sliding-window-based strategy.
Abstract: Fine-tuning of pretrained convolutional neural networks (CNNs) has been proven to be an effective strategy for remote sensing image scene classification, particularly when a limited number of labeled data sets are available for training purposes. However, such a fine-tuning process often needs that the input images are resized into a fixed size to generate input vectors of the size required by fully connected layers (FCLs) in the pretrained CNN model. Such a resizing process often discards key information in the scenes and thus deteriorates the classification performance. To address this issue, in this paper, we introduce a scale-free CNN (SF-CNN) for remote sensing scene classification. Specifically, the FCLs in the CNN model are first converted into convolutional layers, which not only allow the input images to be of arbitrary sizes but also retain the ability to extract discriminative features using a traditional sliding-window-based strategy. Then, a global average pooling (GAP) layer is added after the final convolutional layer so that input images of arbitrary size can be mapped to feature maps of uniform size. Finally, we utilize the resulting feature maps to create a new FCL that is fed to a softmax layer for final classification. Our experimental results conducted using several real data sets demonstrate the superiority of the proposed SF-CNN method over several well-known classification methods, including pretrained CNN-based ones.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a novel object detection framework, called Optical Remote Sensing Imagery detector (ORSIm detector), integrating diverse channel features extraction, feature learning, fast image pyramid matching, and boosting strategy.
Abstract: With the rapid development of spaceborne imaging techniques, object detection in optical remote sensing imagery has drawn much attention in recent decades. While many advanced works have been developed with powerful learning algorithms, the incomplete feature representation still cannot meet the demand for effectively and efficiently handling image deformations, particularly objective scaling and rotation. To this end, we propose a novel object detection framework, called Optical Remote Sensing Imagery detector (ORSIm detector), integrating diverse channel features extraction, feature learning, fast image pyramid matching, and boosting strategy. An ORSIm detector adopts a novel spatial-frequency channel feature (SFCF) by jointly considering the rotation-invariant channel features constructed in the frequency domain and the original spatial channel features (e.g., color channel and gradient magnitude). Subsequently, we refine SFCF using learning-based strategy in order to obtain the high-level or semantically meaningful features. In the test phase, we achieve a fast and coarsely scaled channel computation by mathematically estimating a scaling factor in the image domain. Extensive experimental results conducted on the two different airborne data sets are performed to demonstrate the superiority and effectiveness in comparison with the previous state-of-the-art methods.

Journal ArticleDOI
TL;DR: An end-to-end 3-D lightweight convolutional neural network (CNN) (abbreviated as3-D-LWNet) for limited samples-based HSI classification has a deeper network structure, less parameters, and lower computation cost, resulting in better classification performance.
Abstract: Recently, hyperspectral image (HSI) classification approaches based on deep learning (DL) models have been proposed and shown promising performance. However, because of very limited available training samples and massive model parameters, DL methods may suffer from overfitting. In this paper, we propose an end-to-end 3-D lightweight convolutional neural network (CNN) (abbreviated as 3-D-LWNet) for limited samples-based HSI classification. Compared with conventional 3-D-CNN models, the proposed 3-D-LWNet has a deeper network structure, less parameters, and lower computation cost, resulting in better classification performance. To further alleviate the small sample problem, we also propose two transfer learning strategies: 1) cross-sensor strategy, in which we pretrain a 3-D model in the source HSI data sets containing a greater number of labeled samples and then transfer it to the target HSI data sets and 2) cross-modal strategy, in which we pretrain a 3-D model in the 2-D RGB image data sets containing a large number of samples and then transfer it to the target HSI data sets. In contrast to previous approaches, we do not impose restrictions over the source data sets, in which they do not have to be collected by the same sensors as the target data sets. Experiments on three public HSI data sets captured by different sensors demonstrate that our model achieves competitive performance for HSI classification compared to several state-of-the-art methods.

Journal ArticleDOI
TL;DR: In this paper, a cross-modality feature learning framework, called common subspace learning (CoSpace), is proposed to achieve accurate land cover classification over a large coverage, by jointly considering sub-space learning and supervised classification.
Abstract: With a large amount of open satellite multispectral (MS) imagery (e.g., Sentinel-2 and Landsat-8), considerable attention has been paid to global MS land cover classification. However, its limited spectral information hinders further improving the classification performance. Hyperspectral imaging enables discrimination between spectrally similar classes but its swath width from space is narrow compared to MS ones. To achieve accurate land cover classification over a large coverage, we propose a cross-modality feature learning framework, called common subspace learning (CoSpace), by jointly considering subspace learning and supervised classification. By locally aligning the manifold structure of the two modalities, CoSpace linearly learns a shared latent subspace from hyperspectral-MS (HS-MS) correspondences. The MS out-of-samples can be then projected into the subspace, which are expected to take advantages of rich spectral information of the corresponding hyperspectral data used for learning, and thus leads to a better classification. Extensive experiments on two simulated HS-MS data sets (University of Houston and Chikusei), where HS-MS data sets have tradeoffs between coverage and spectral resolution, are performed to demonstrate the superiority and effectiveness of the proposed method in comparison with previous state-of-the-art methods.

Journal ArticleDOI
TL;DR: Compared to other cloud detection methods, the experimental results show that the proposed MF-CNN method has a better performance not only in thick and thin clouds but also in the entire cloud regions.
Abstract: Cloud detection in remote sensing images is a challenging but significant task. Due to the variety and complexity of underlying surfaces, most of the current cloud detection methods have difficulty in detecting thin cloud regions. In fact, it is quite meaningful to distinguish thin clouds from thick clouds, especially in cloud removal and target detection tasks. Therefore, we propose a method based on multiscale features-convolutional neural network (MF-CNN) to detect thin cloud, thick cloud, and noncloud pixels of remote sensing images simultaneously. Landsat 8 satellite imagery with various levels of cloud coverage is used to demonstrate the effectiveness of our proposed MF-CNN model. We first stack visible, near-infrared, short-wave, cirrus, and thermal infrared bands of Landsat 8 imagery to obtain the combined spectral information. The MF-CNN model is then used to learn the multiscale global features of input images. The high-level semantic information obtained in the process of feature learning is integrated with low-level spatial information to classify the imagery into thick, thin and noncloud regions. The performance of our proposed model is compared to that of various commonly used cloud detection methods in both qualitative and quantitative aspects. Compared to other cloud detection methods, the experimental results show that our proposed method has a better performance not only in thick and thin clouds but also in the entire cloud regions.