scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing in 2021"


Journal ArticleDOI
TL;DR: The weighted double-margin contrastive loss is proposed to address the imbalanced sample is a serious problem in change detection, i.e., unchanged samples are much more abundant than changed samples, which is one of the main reasons for pseudochanges.
Abstract: Change detection is a basic task of remote sensing image processing. The research objective is to identify the change information of interest and filter out the irrelevant change information as interference factors. Recently, the rise in deep learning has provided new tools for change detection, which have yielded impressive results. However, the available methods focus mainly on the difference information between multitemporal remote sensing images and lack robustness to pseudochange information. To overcome the lack of resistance in current methods to pseudochanges, in this article, we propose a new method, namely, dual attentive fully convolutional Siamese networks, for change detection in high-resolution images. Through the dual attention mechanism, long-range dependencies are captured to obtain more discriminant feature representations to enhance the recognition performance of the model. Moreover, the imbalanced sample is a serious problem in change detection, i.e., unchanged samples are much more abundant than changed samples, which is one of the main reasons for pseudochanges. We propose the weighted double-margin contrastive loss to address this problem by punishing attention to unchanged feature pairs and increasing attention to changed feature pairs. The experimental results of our method on the change detection dataset and the building change detection dataset demonstrate that compared with other baseline methods, the proposed method realizes maximum improvements of 2.9% and 4.2%, respectively, in the F 1 score. Our PyTorch implementation is available at https://github.com/lehaifeng/DASNet .

324 citations


Journal ArticleDOI
TL;DR: A bibliometric analysis of the existing works for remote sensing interpretation related to few-shot learning can be found in this article, where the authors provide a reference for scholars working on few-shotted learning research in the remote sensing field.
Abstract: The rapid development of deep learning brings effective solutions for remote sensing image interpretation. Training deep neural network models usually require a large number of manually labeled samples. However, there is a limitation to obtain sufficient labeled samples in remote sensing field to satisfy the data requirement. Therefore, it is of great significance to conduct the research on few-shot learning for remote sensing image interpretation. First, this article provides a bibliometric analysis of the existing works for remote sensing interpretation related to few-shot learning. Second, two categories of few-shot learning methods, i.e., the data-augmentation-based and the prior-knowledge-based, are introduced for the interpretation of remote sensing images. Then, three typical remote sensing interpretation applications are listed, including scene classification, semantic segmentation, and object detection, together with the corresponding public datasets and the evaluation criteria. Finally, the research status is summarized, and some possible research directions are provided. This article gives a reference for scholars working on few-shot learning research in the remote sensing field.

109 citations


Journal ArticleDOI
TL;DR: In this paper, a feature extraction method integrating principal component analysis (PCA) and local binary pattern (LBP) is developed for hyperspectral images in order to improve the accuracy and generalization ability.
Abstract: To improve the accuracy and generalization ability of hyperspectral image classification, a feature extraction method integrating principal component analysis (PCA) and local binary pattern (LBP) is developed for hyperspectral images in this article. The PCA is employed to reduce the dimension of the spectral features of hyperspectral images. The LBP with low computational complexity is used to extract the local spatial texture features of hyperspectral images to construct multifeature vectors. Then, the gray wolf optimization algorithm with global search capability is employed to optimize the parameters of kernel extreme learning machine (KELM) to construct an optimized KELM model, which is used to effectively realize a hyperspectral image classification (PLG-KELM) method. Finally, the Indian pines dataset, Houston dataset, and Pavia University dataset and an application of WHU-Hi-LongKou dataset are selected to verify the effectiveness of the PLG-KELM. The comparison experiment results show that the PLG-KELM can obtain higher classification accuracy, and takes on better generalization ability for small samples. It provides a new idea for processing hyperspectral images.

99 citations


Journal ArticleDOI
TL;DR: Tang et al. as mentioned in this paper proposed an attention consistent network (ACNet) based on the Siamese network for remote sensing image scene classification, which unifies the salient regions and impact/separate the RS images from the same/different semantic categories.
Abstract: Remote sensing (RS) image scene classification is an important research topic in the RS community, which aims to assign the semantics to the land covers. Recently, due to the strong behavior of convolutional neural network (CNN) in feature representation, the growing number of CNN-based classification methods has been proposed for RS images. Although they achieve cracking performance, there is still some room for improvement. First, apart from the global information, the local features are crucial to distinguish the RS images. The existing networks are good at capturing the global features since the CNNs’ hierarchical structure and the nonlinear fitting capacity. However, the local features are not always emphasized. Second, to obtain satisfactory classification results, the distances of RS images from the same/different classes should be minimized/maximized. Nevertheless, these key points in pattern classification do not get the attention they deserve. To overcome the limitation mentioned above, we propose a new CNN named attention consistent network (ACNet) based on the Siamese network in this article. First, due to the dual-branch structure of ACNet, the input data are the image pairs that are obtained by the spatial rotation. This helps our model to fully explore the global features from RS images. Second, we introduce different attention techniques to mine the objects’ information from RS images comprehensively. Third, considering the influence of the spatial rotation and the similarities between RS images, we develop an attention consistent model to unify the salient regions and impact/separate the RS images from the same/different semantic categories. Finally, the classification results can be obtained using the learned features. Three popular RS scene datasets are selected to validate our ACNet. Compared with some existing networks, the proposed method can achieve better performance. The encouraging results illustrate that ACNet is effective for the RS image scene classification. The source codes of this method can be found in https://github.com/TangXu-Group/Remote-Sensing-Images-Classification/tree/main/GLCnet .

84 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed an anchor-free method for ship target detection in HR SAR images, which can obtain encouraging detection performance compared with Faster-RCNN, RetinaNet, and FCOS.
Abstract: With the rapid development of earth observation technology, high-resolution synthetic aperture radar (HR SAR) imaging satellites could provide more observational information for maritime surveillance. However, there are still some problems to detect ship targets in HR SAR images due to the complex surroundings, targets defocusing, and diversity of the scales. In this article, an anchor-free method is proposed for ship target detection in HR SAR images. First, fully convolutional one-stage object detection (FCOS) as the base network is applied to detect ship targets, achieving better detection performance through pixel-by-pixel prediction of the image. Second, the category-position (CP) module is proposed to optimize the position regression branch features in the FCOS network. This module can improve target positioning performance in complex scenes by generating guidance vector from the classification branch features. At the same time, target classification and boundary box regression methods are redesigned to shield the adverse effects of fuzzy areas in the network training. Finally, to evaluate the effectiveness of CP-FCOS, extensive experiments are conducted on High-Resolution SAR Images Dataset, SAR Ship Detection Dataset, IEEE 2020 Gaofen Challenge SAR dataset, and two complex large-scene HR SAR images. The experimental results show that our method can obtain encouraging detection performance compared with Faster-RCNN, RetinaNet, and FCOS. Remarkably, the proposed method was applied to SAR ship detection in the 2020 Gaofen Challenge. Our team ranked first among 292 teams in the preliminary contest and won seventh place in the final match.

82 citations


Journal ArticleDOI
TL;DR: The current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations are analyzed and the general guidances on creating benchmark datasets in efficient manners are presented.
Abstract: The past years have witnessed great progress on remote sensing (RS) image interpretation and its wide applications. With RS images becoming more accessible than ever before, there is an increasing demand for the automatic interpretation of these images. In this context, the benchmark datasets serve as an essential prerequisites for developing and testing intelligent interpretation algorithms. After reviewing existing benchmark datasets in the research community of RS image interpretation, this article discusses the problem of how to efficiently prepare a suitable benchmark dataset for RS image interpretation. Specifically, we first analyze the current challenges of developing intelligent algorithms for RS image interpretation with bibliometric investigations. We then present the general guidances on creating benchmark datasets in efficient manners. Following the presented guidances, we also provide an example on building RS image dataset, i.e., Million Aerial Image Dataset (Online. Available: https://captain-whu.github.io/DiRS/0 ), a new large-scale benchmark dataset containing a million instances for RS image scene classification. Several challenges and perspectives in RS image annotation are finally discussed to facilitate the research in benchmark dataset construction. We do hope this article will provide the RS community an overall perspective on constructing large-scale and practical image datasets for further research, especially data-driven ones.

80 citations


Journal ArticleDOI
TL;DR: In this article, a 3D fast learning block (depthwise separable convolution and a fast convolution block) followed by a 2D convolutional neural network was introduced to extract spectral-spatial features.
Abstract: Due to the unique feature of the three-dimensional convolution neural network, it is used in image classification. There are some problems such as noise, lack of labeled samples, the tendency to overfitting, a lack of extraction of spectral and spatial features, which has challenged the classification. Among the mentioned problems, the lack of experimental samples is the main problem that has been used to solve the methods in recent years. Among them, convolutional neural network-based algorithms have been proposed as a popular option for hyperspectral image analysis due to their ability to extract useful features and high performance. The traditional convolutional neural network (CNN) based methods mainly use the two-dimensional CNN for feature extraction, which makes the interband correlations of HSIs underutilized. The 3-D-CNN extracts the joint spectral–spatial information representation, but it depends on a more complex model. To address these issues, the report uses a 3-D fast learning block (depthwise separable convolution block and a fast convolution block) followed by a 2-D convolutional neural network was introduced to extract spectral-spatial features. Using a hybrid CNN reduces the complexity of the model compared to using 3-D-CNN alone and can also perform well against noise and a limited number of training samples. In addition, a series of optimization methods including batch normalization, dropout, exponential decay learning rate, and L2 regularization are adopted to alleviate the problem of overfitting and improve the classification results. To test the performance of this hybrid method, it is performed on the Salinas, University Pavia and Indian Pines datasets, and the results are compared with 2-D-CNN and 3-D-CNN deep learning models with the same number of layers.

77 citations


Journal ArticleDOI
TL;DR: A novel multiscale graph sample and aggregate network with a context-aware learning method for HSI classification that improves the diversity of network input information and effectively solves the impact of original input graph errors on classification.
Abstract: Recently, graph convolutional network (GCN) has achieved promising results in hyperspectral image (HSI) classification. However, GCN is a transductive learning method, which is difficult to aggregate the new node. Besides, the existing GCN-based methods divide graph construction and graph classification into two stages ignoring the influence of constructed graph error on classification results. Moreover, the available GCN-based methods fail to understand the global and contextual information of the graph. In this article, we propose a novel multiscale graph sample and aggregate network with a context-aware learning method for HSI classification. The proposed network adopts a multiscale graph sample and aggregate network (graphSAGE) to learn the multiscale features from the local regions graph, which improves the diversity of network input information and effectively solves the impact of original input graph errors on classification. By employing a context-aware mechanism to characterize the importance among spatially neighboring regions, deep contextual and global information of the graph can be learned automatically by focusing on important spatial targets. Meanwhile, the graph structure is reconstructed automatically based on the classified objects as network training, which is able to effectively reduce the influence of the initial graph error on the classification result. Extensive experiments are conducted on three real HSI datasets, which are demonstrated to outperform the compared state-of-the-art methods.

71 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a new improvement on the YOLOv3 framework for ship detection in marine surveillance, based on synthetic aperture radar (SAR) and optical imagery.
Abstract: Deep learning detection methods use in ship detection remains a challenge, owing to the small scale of the objects and interference from complex sea surfaces. In addition, existing ship detection methods rarely verify the robustness of their algorithms on multisensor images. Thus, we propose a new improvement on the “you only look once” version 3 (YOLOv3) framework for ship detection in marine surveillance, based on synthetic aperture radar (SAR) and optical imagery. First, improved choices are obtained for the anchor boxes by using linear scaling based on the k-means++ algorithm. This addresses the difficulty in reflecting the advantages of YOLOv3's multiscale detection, as the anchor boxes of a single detection target type between different detection scales have small differences. Second, we add uncertainty estimators for the positioning of the bounding boxes by introducing a Gaussian parameter for ship detection into the YOLOv3 framework. Finally, four anchor boxes are allocated to each detection scale in the Gaussian-YOLO layer instead of three as in the default YOLOv3 settings, as there are wide disparities in an object's size and direction in remote sensing images with different resolutions. Applying the proposed strategy to ``YOLOv3-spp” and ``YOLOv3-tiny,” the results are enhanced by 2%–3%. Compared with other models, the improved-YOLOv3 has the highest average precision on both the optical (93.56%) and SAR (95.52%) datasets. The improved-YOLOv3 is robust, even in the context of a mixed dataset of SAR and optical images comprising images from different satellites and with different scales.

65 citations


Journal ArticleDOI
Rong Yang1, Zhenru Pan1, Xiaoxue Jia1, Lei Zhang1, Yunkai Deng1 
TL;DR: In this paper, an improved one-stage object detection framework based on RetinaNet and rotatable bounding box (RBox), which is referred as R-RetinaNet, is proposed to solve the mismatch of feature scale, contradictions between different learning tasks, and unbalanced distribution of positive samples.
Abstract: Thanks to the excellent feature representation capabilities of neural networks, deep learning-based methods perform far better than traditional methods on target detection tasks such as ship detection. Although various network models have been proposed for SAR ship detection such as DRBox-v1, DRBox-v2, and MSR2N, there are still some problems such as mismatch of feature scale, contradictions between different learning tasks, and unbalanced distribution of positive samples, which have not been mentioned in these studies. In this article, an improved one-stage object detection framework based on RetinaNet and rotatable bounding box (RBox), which is referred as R-RetinaNet, is proposed to solve the above problems. The main improvements of R-RetinaNet as well as the contributions of this article are threefold. First, a scale calibration method is proposed to align the scale distribution of the output backbone feature map with the scale distribution of the targets. Second, a feature fusion network based on task-wise attention feature pyramid network is designed to decouple the feature optimization process of different tasks, which alleviates the conflict between different learning goals. Finally, an adaptive intersection over union (IoU) threshold training method is proposed for RBox-based model to correct the unbalanced distribution of positive samples caused by the fixed IoU threshold on RBox. Experimental results show that our method obtains 13.26%, 9.49%, 8.92%, and 4.55% gains in average precision under an IoU threshold of 0.5 on the public SAR ship detection dataset compared with four state-of-the-art RBox-based methods, respectively.

60 citations


Journal ArticleDOI
TL;DR: A novel cross-modal image-text retrieval network is introduced to establish the direct relationship between remote sensing images and their paired text data and designed a semantic alignment module to fully explore the latent correspondence between images and text.
Abstract: Because of the rapid growth of multimodal data from the internet and social media, a cross-modal retrieval has become an important and valuable task in recent years.The purpose of the cross-modal retrieval is to obtain the result data in one modality (e.g., image), which is semantically similar to the query data in another modality (e.g., text).In the field of remote sensing, despite a great number of existing works on image retrieval, there has only been a small amount of research on the cross-modal image-text retrieval, due to the scarcity of datasets and the complicated characteristics of remote sensing image data. In this article, we introduce a novel cross-modal image-text retrieval network to establish the direct relationship between remote sensing images and their paired text data. Specifically, in our framework, we designed a semantic alignment module to fully explore the latent correspondence between images and text, in which we used the attention and gate mechanisms to filter and optimize data features so that more discriminative feature representations can be obtained. Experimental results on four benchmark remote sensing datasets, including UCMerced-LandUse-Captions, Sydney-Captions, RSICD, and NWPU-RESISC45-Captions, well showed that our proposed method outperformed other baselines and achieved the state-of-the-art performance in remote sensing image-text retrieval tasks.

Journal ArticleDOI
TL;DR: A deep learning algorithm with semi-supervision is proposed in this article: SAR2SAR, where Multitemporal time series are leveraged and the neural network learns to restore SAR images by only looking at noisy acquisitions.
Abstract: Speckle reduction is a key step in many remote sensing applications By strongly affecting synthetic aperture radar (SAR) images, it makes them difficult to analyze Due to the difficulty to model the spatial correlation of speckle, a deep learning algorithm with semi-supervision is proposed in this article: SAR2SAR Multitemporal time series are leveraged and the neural network learns to restore SAR images by only looking at noisy acquisitions To this purpose, the recently proposed noise2noise framework [1] has been employed The strategy to adapt it to SAR despeckling is presented, based on a compensation of temporal changes and a loss function adapted to the statistics of speckle A study with synthetic speckle noise is presented to compare the performances of the proposed method with other state-of-the-art filters Then, results on real images are discussed, to show the potential of the proposed algorithm The code is made available to allow testing and reproducible research in this field

Journal ArticleDOI
TL;DR: A novel learning scheme for training a lightweight ship detector called Tiny YOLO-Lite, which simultaneously reduces the model storage size, decreases the floating point operations (FLOPs) calculation, and guarantees the high accuracy with faster speed.
Abstract: The deployment of deep convolutional neural networks (CNNs) in synthetic aperture radar (SAR) ship real-time detection is largely hindered by huge computational cost. In this article, we propose a novel learning scheme for training a lightweight ship detector called Tiny YOLO-Lite, which simultaneously 1) reduces the model storage size; 2) decreases the floating point operations (FLOPs) calculation; and 3) guarantees the high accuracy with faster speed. This is achieved by self-designed backbone structure and network pruning, which enforces channel-level sparsity in the backbone network and yields a compact model. In addition, knowledge distillation is also applied to make up for the performance decline caused by network pruning. Hereinto, we propose to let small student model mimic cumbersome teacher's output to achieve improved generalization. Rather than applying vanilla full feature imitation, we redefine the distilled knowledge as the inter-relationship between different levels of feature maps and then transfer it from the large network to a smaller one. On account that the detectors should focus more on the salient regions containing ships while background interference is overwhelming, a novel attention mechanism is designed and then attached to the distilled feature for enhanced representation. Finally, extensive experiments are conducted on SSDD, HRSID, and two large-scene SAR images to verify the effectiveness of the thinner SAR ship object detector in comparison of with other CNN-based algorithms. The detection results demonstrate that the proposed detector can achieve lighter architecture with 2.8-M model size, more efficient inference ( $>$ 200 fps) with low computation cost, and more accurate prediction with knowledge transfer strategy.

Journal ArticleDOI
TL;DR: In this paper, the stable encoder-decoder architecture was combined with a grid-based attention gate and atrous spatial pyramid pooling module, to capture and restore features progressively and effectively.
Abstract: Rapidly developing remote sensing technology provides massive data for urban planning, mapping, and disaster management As a carrier of human productive activities, buildings are essential to both urban dynamic monitoring and suburban construction inspection Fully-convolutional-network-based methods have provided a paradigm for automatically extracting buildings from high-resolution imagery However, high intraclass variance and complexity are two problems in building extraction It is hard to identify different scales of buildings by using a single receptive field For this purpose, in this article, we use the stable encoder– decoder architecture, combined with a grid-based attention gate and atrous spatial pyramid pooling module, to capture and restore features progressively and effectively A modified ResNet50 encoder is also applied to extract features The proposed method could learn gated features and distinguish buildings from complex surroundings such as trees We evaluate our model on two building datasets, WHU aerial building dataset and our DB UAV rural building dataset Experiments show that our model outperforms other five most recent models The results also exhibit great potential for extracting buildings with different scales and validate the effectiveness of deep learning in practical scenarios

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a hierarchical residual network with attention mechanism (HResNetAM) for hyperspectral image (HSI) spectral-spatial classification to improve the performance of conventional deep learning networks.
Abstract: This article proposes a novel hierarchical residual network with attention mechanism (HResNetAM) for hyperspectral image (HSI) spectral-spatial classification to improve the performance of conventional deep learning networks. The straightforward convolutional neural network-based models have limitations in exploiting the multiscale spatial and spectral features, and this is the key factor in dealing with the high-dimensional nonlinear characteristics present in HSIs. The proposed hierarchical residual network can extract multiscale spatial and spectral features at a granular level, so the receptive fields range of this network will be increased, which can enhance the feature representation ability of the model. Besides, we utilize the attention mechanism to set adaptive weights for spatial and spectral features of different scales, and this can further improve the discriminative ability of extracted features. Furthermore, the double branch structure is also exploited to extract spectral and spatial features with corresponding convolution kernels in parallel, and the extracted spatial and spectral features of multiple scales are fused for hyperspectral image classification. Four benchmark hyperspectral datasets collected by different sensors and at different acquisition time are employed for classification experiments, and comparative results reveal that the proposed method has competitive advantages in terms of classification performance when compared with other state-of-the-art deep learning models.

Journal ArticleDOI
TL;DR: In this paper, the authors define a benchmarking framework to evaluate panchromatic and multispectral pansharpening algorithms and provide a reference implementation for reproducible algorithm evaluation.
Abstract: Comparative evaluation is a requirement for reproducible science and objective assessment of new algorithms. Reproducible research in the field of pansharpening of very high resolution images is a difficult task due to the lack of openly available reference datasets and protocols. The contribution of this article is threefold, and it defines a benchmarking framework to evaluate pansharpening algorithms. First, it establishes a reference dataset, named PAirMax , composed of 14 panchromatic and multispectral image pairs collected over heterogeneous landscapes by different satellites. Second, it standardizes various image preprocessing steps, such as filtering, upsampling, and band coregistration, by providing a reference implementation. Third, it details the quality assessment protocols for reproducible algorithm evaluation.

Journal ArticleDOI
TL;DR: A novel self-supervised pre-training scheme to initialize a Transformer-based network by utilizing large-scale unlabeled data to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics.
Abstract: Satellite image time series (SITS) classification is a major research topic in remote sensing and is relevant for a wide range of applications. Deep learning approaches have been commonly employed for the SITS classification and have provided state-of-the-art performance. However, deep learning methods suffer from overfitting when labeled data are scarce. To address this problem, we propose a novel self-supervised pretraining scheme to initialize a transformer-based network by utilizing large-scale unlabeled data. In detail, the model is asked to predict randomly contaminated observations given an entire time series of a pixel. The main idea of our proposal is to leverage the inherent temporal structure of satellite time series to learn general-purpose spectral-temporal representations related to land cover semantics. Once pretraining is completed, the pretrained network can be further adapted to various SITS classification tasks by fine-tuning all the model parameters on small-scale task-related labeled data. In this way, the general knowledge and representations about SITS can be transferred to a label-scarce task, thereby improving the generalization performance of the model as well as reducing the risk of overfitting. Comprehensive experiments have been carried out on three benchmark datasets over large study areas. Experimental results demonstrate the effectiveness of the proposed pretraining scheme, leading to substantial improvements in classification accuracy using transformer, 1-D convolutional neural network, and bidirectional long short-term memory network. The code and the pretrained model will be available at https://github.com/linlei1214/SITS-BERT upon publication .

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed an attention-guided end-to-end change detection network (AGCDetNet) based on the fully convolutional network and attention mechanism.
Abstract: While deep learning-based methods have gained considerable improvements in remote sensing (RS) image change detection (CD), scale variations and pseudochanges hinder most supervised methods’ performance. The CD networks derived from other fields can be fronted with false alarms and miss detections in high-resolution RS images due to the weak feature representation ability. In this article, an attention-guided end-to-end change detection network (AGCDetNet) is proposed based on the fully convolutional network and attention mechanism. AGCDetNet learns to enhance the feature representation of change information and achieve accuracy improvements using spatial attention and channel attention. A spatial attention module (SPAM) promotes the discrimination between the changed objects and the background by adding the learned spatial attention to the deep features. Channelwise attention-guided interference filtering unit (CIFU)/atrous spatial pyramid pooling (CG-ASPP) module enhances the representation of multilevel features and multiscale context, respectively. Extensive experiments have been conducted on several public datasets for performance evaluation, including LEVIR-CD, WHU, Season-Varying, WV2, and ZY3. Experiment results demonstrate that AGCDetNet outperforms several state-of-the-art methods of accuracy and robustness. Specifically, AGCDetNet achieves the best F1-score on two datasets, i.e., LEVIR-CD (0.9076) and Season-Varying (0.9654).

Journal ArticleDOI
TL;DR: In this article, the authors proposed data augmentation, model construction, loss function choices, and ensembling techniques to enhance the representation learned from the synthetic data, and ultimately achieved over 95% accuracy on the SAMPLE dataset.
Abstract: Obtaining measured synthetic aperture radar (SAR) data for training automatic target recognition (ATR) models can be too expensive (in terms of time and money) and complex of a process in many situations. In response, researchers have developed methods for creating synthetic SAR data for targets using electro-magnetic prediction software, which is then used to enrich an existing measured training dataset. However, this approach relies on the availability of some amount of measured data. In this work, we focus on the case of having 100% synthetic training data, while testing on only measured data. We use the SAMPLE dataset public released by AFRL, and find significant challenges to learning generalizable representations from the synthetic data due to distributional differences between the two modalities and extremely limited training sample quantities. Using deep learning-based ATR models, we propose data augmentation, model construction, loss function choices, and ensembling techniques to enhance the representation learned from the synthetic data, and ultimately achieved over 95% accuracy on the SAMPLE dataset. We then analyze the functionality of our ATR models using saliency and feature-space investigations and find them to learn a more cohesive representation of the measured and synthetic data. Finally, we evaluate the out-of-library detection performance of our synthetic-only models and find that they are nearly 10% more effective than baseline methods at identifying measured test samples that do not belong to the training class set. Overall, our techniques and their compositions significantly enhance the feasibility of using ATR models trained exclusively on synthetic data.

Journal ArticleDOI
Guoqing Zhou1, Chenyang Li1, Dianjun Zhang1, Dequan Liu1, Xiang Zhou1, Jie Zhan1 
TL;DR: Li et al. as mentioned in this paper provided an overview of the principle of LiDAR echo signal formation, and comprehensively summarized the LiDARS echo signal simulation modeling methods and the corresponding factors that affect modeling accuracy by focusing on the characteristics of different methods.
Abstract: Oceanic LiDAR (hereafter referred to as O-LiDAR) is an important remote sensing device for measuring the near-coastal water depth and for studying the optical properties of water bodies. With the commercialization of LiDAR, the theoretical research on the underwater transmission characteristics of LiDAR has been intensified worldwide. Primary research interests include the simulation and modeling of LiDAR underwater echo signals and the inversion of optical parameters using LiDAR water echo signals. This article provides an overview of the principle of LiDAR echo signal formation, and comprehensively summarizes the LiDAR echo signal simulation modeling methods and the corresponding factors that affect modeling accuracy by focusing on the characteristics of different methods. We found that the current simulation methods of LiDAR underwater transmission echo signals primarily include an analytical method based on the radiation transfer equation and a statistical method based on the Monte Carlo (MC) model. The radiation transport equation needs to be appropriately simplified using the analytical method, usually using the quasi-single-small-angle approximation principle. The analytical method has high calculation efficiency but its accuracy is dependent to the quasi-single small-angle approximation. The statistical method can analyze the influence of various factors on echo signals by controlling the variables, but it has poor calculation efficiency. Finally, the semianalytical MC model was used to quantitatively analyze the three main factors (LiDAR system parameters, water body optical parameters, and environmental parameters) affecting underwater LiDAR transmission characteristics, and summarizes the mechanism and results of different factors.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors designed a bidirectional 3D quasi-recurrent neural network for hyperspectral image (HSI) spatial super-resolution with arbitrary number of bands.
Abstract: Hyperspectral imaging is unable to acquire images with high resolution in both spatial and spectral dimensions yet, due to physical hardware limitations. It can only produce low spatial resolution images in most cases and thus hyperspectral image (HSI) spatial super-resolution is important. Recently, deep learning-based methods for HSI spatial super-resolution have been actively exploited. However, existing methods do not focus on structural spatial-spectral correlation and global correlation along spectra, which cannot fully exploit useful information for super-resolution. Also, some of the methods are straightforward extension of RGB super-resolution methods, which have fixed number of spectral channels and cannot be generally applied to hyperspectral images whose number of channels varies. Furthermore, unlike RGB images, existing HSI datasets are small and limit the performance of learning-based methods. In this article, we design a bidirectional 3D quasi-recurrent neural network for HSI super-resolution with arbitrary number of bands. Specifically, we introduce a core unit that contains a 3D convolutional module and a bidirectional quasi-recurrent pooling module to effectively extract structural spatial-spectral correlation and global correlation along spectra, respectively. By combining domain knowledge of HSI with a novel pretraining strategy, our method can be well generalized to remote sensing HSI datasets with limited number of training data. Extensive evaluations and comparisons on HSI super-resolution demonstrate improvements over state-of-the-art methods, in terms of both restoration accuracy and visual quality.

Journal ArticleDOI
TL;DR: In this article, the performance of five waveform decomposition algorithms (Gaussian, adaptive Gaussian, Weibull, Richardson-Lucy (RL), and Gold) under different topographic conditions such as forests, glaciers, lakes, and residential areas was evaluated.
Abstract: The information from the components obtained by waveform decomposition is usually used to inverse topography, and classify tree species, etc. Many efforts on waveform decomposition algorithms have been presented, but they lack comparison analysis and evaluation. Thereby, this article compares and analyzes the performance of five waveform decomposition algorithms, which are Gaussian, Adaptive Gaussian, Weibull, Richardson–Lucy (RL), and Gold, under different topographic conditions such as forests, glaciers, lakes, and residential areas. The experimental results reveal that: first, the Gaussian algorithm causes the biggest fitting error at 9.96 mV in the forested area. It is easy to identify multiple dense peaks as single peaks. Second, there are many misjudged, superimposed, and overlapped waveform components separated by the Weibull algorithm. The Adaptive Gaussian is more capable of fitting complex waveforms but has 122 more outliers than the Weibull algorithm does. Third, the Gold and RL algorithms decompose the largest number of waveform components (272.2k and 265.9k) in the forested area; both RL and Gold algorithms can effectively improve the separability of peaks. Fourth, the RL algorithm is only more effective for the area with sparse vegetation than the Gold algorithm does, i.e., the Gold algorithm is capable of processing data with dense vegetation areas at a lowest false component detection rate of 1.3%, 0.9%, 1.1%, and 0.1% in four areas. Finally, the Gaussian and Gold algorithms have much faster decomposition speed at 1000/s and 2000/s than the other three algorithms do. These results are useful for selecting different algorithms under different environments.

Journal ArticleDOI
TL;DR: It is found that ASIs generated by nontarget attack algorithms feature attack selectivity, which is related to the feature space distribution of the original SAR images and the decision boundary of the classification model, and a sample-boundary-based AE selectivity distance is proposed to successfully explain the attackSelectivity of ASIs.
Abstract: Synthetic aperture radar (SAR) has all-day and all-weather characteristics and plays an extremely important role in the military field. The breakthroughs in deep learning methods represented by convolutional neural network (CNN) models have greatly improved the SAR image recognition accuracy. Classification models based on CNNs can perform high-precision classification, but there are security problems against adversarial examples (AEs). However, the research on AEs is mostly limited to natural images, and remote sensing images (SAR, multispectral, etc.) have not been extensively studied. To explore the basic characteristics of AEs of SAR images (ASIs), we use two classic white-box attack methods to generate ASIs from two SAR image classification datasets and then evaluate the vulnerability of six commonly used CNNs. The results show that ASIs are quite effective in fooling CNNs trained on SAR images, as indicated by the obtained high attack success rate. Due to the structural differences among CNNs, different CNNs present different vulnerabilities in the face of ASIs. We found that ASIs generated by nontarget attack algorithms feature attack selectivity, which is related to the feature space distribution of the original SAR images and the decision boundary of the classification model. We propose the sample-boundary-based AE selectivity distance to successfully explain the attack selectivity of ASIs. We also analyze the effects of image parameters, such as image size and number of channels, on the attack success rate of ASIs through parameter sensitivity. The experimental results of this study provide data support and an effective reference for attacks on and the defense capabilities of various CNNs with regard to AEs in SAR image classification models.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a reconstruction bias U-Net for road extraction from remote sensing images, which increases the decoding branches to obtain multiple semantic information from different upsamplings.
Abstract: Automatic road extraction from remote sensing images plays an important role for navigation, intelligent transportation, and road network update, etc. Convolutional neural network (CNN)-based methods have presented many achievements for road extraction from remote sensing images. CNN-based methods require a large dataset with high quality labels for model training. However, there is still few standard and large dataset, which is specially designed for road extraction from optical remote sensing images. Besides, the existing end-to-end CNN models for road extraction from remote sensing images are usually with symmetric structure, studying on asymmetric structure between encoding and decoding is rare. To address the above problems, this article first provides a publicly available dataset LRSNY for road extraction from optical remote sensing images with manually labelled labels. Second, we propose a reconstruction bias U-Net for road extraction from remote sensing images. In our model, we increase the decoding branches to obtain multiple semantic information from different upsamplings. Experimental results show that our method achieves better performance compared with other six state-of-the-art segmentation models when testing on our LRSNY dataset. We also test on Massachusetts and Shaoshan datasets. The good performances on the two datasets further prove the effectiveness of our method.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a multichannel water body detection network (MC-WBDN) that incorporates three innovative components, i.e., a multICHannel fusion module, an enhanced atrous spatial pyramid pooling module, and Space-to-Depth/Depth-To-Space operations.
Abstract: Automated water body detection from satellite imagery is a fundamental stage for urban hydrological studies. In recent years, various deep convolutional neural network (DCNN)-based methods have been proposed to segment remote sensing data collected by conventional RGB or multispectral imagery for such studies. However, how to effectively explore the wider spectrum bands of multispectral sensors to achieve significantly better performance compared to the use of only RGB bands has been left underexplored. In this article, we propose a novel DCNN model—multichannel water body detection network (MC-WBDN)—that incorporates three innovative components, i.e., a multichannel fusion module, an Enhanced Atrous Spatial Pyramid Pooling module, and Space-to-Depth/Depth-to-Space operations, to outperform state-of-the-art DCNN-based water body detection methods. Experimental results convincingly show that our MC-WBDN model achieves remarkable water body detection performance, is more robust to light and weather variations, and can better distinguish tiny water bodies compared to other DCNN models.

Journal ArticleDOI
TL;DR: TimeSen2Crop, a pixel-based dataset made up of more than 1 million samples of Sentinel 2 time series associated to 16 crop types, is presented along with a benchmark comparison of deep learning models for crop type mapping.
Abstract: This article presents TimeSen2Crop, a pixel-based dataset made up of more than 1 million samples of Sentinel 2 time series (TSs) associated to 16 crop types. This dataset, publicly available, aims to contribute to the worldwide research related to the supervised classification of TSs of Sentinel 2 data for crop type mapping. TimeSen2Crop includes atmospherically corrected images and reports the snow, shadows, and clouds information per labeled unit. The provided TSs represent an agronomic year (i.e., period from one year's harvest to the next one for agricultural commodity) ranging from September 2017 to August 2018. To generate the dataset, the publicly available Austrian crop type map based on farmer's declarations has been considered. To ensure the selection of reliable labeled units from the map (i.e., pure pixels correctly associated to their labels), an automatic procedure for the extraction of the training set based on a multitemporal deep learning model has been defined. TimeSen2Crop also includes a TS of Sentinel 2 images acquired in the following agronomic year (i.e., from September 2018 to August 2019). These data are provided with the aim of attract more research activities for solving a typical challenge of the crop type mapping task: adapting multitemporal deep learning models to different year (domain adaptation). The design of the dataset is described along with a benchmark comparison of deep learning models for crop type mapping.

Journal ArticleDOI
TL;DR: In this article, a workflow for the detection of fire-affected areas from satellite imagery acquired in the visible, infrared, and microwave domains is described, using this workflow, the fire detection potentials of four sources of freely available satellite imagery were investigated.
Abstract: Deriving the extent of areas affected by wildfires is critical to fire management, protection of the population, damage assessment, and better understanding of the consequences of fires. In the past two decades, several algorithms utilizing data from Earth observation satellites have been developed to detect fire-affected areas. However, most of these methods require the establishment of complex functional relationships between numerous remote sensing data parameters. In contrast, more recently, deep learning has found its way into the application, having the advantage of being able to detect patterns in complex data by learning from examples automatically. In this article, a workflow for the detection of fire-affected areas from satellite imagery acquired in the visible, infrared, and microwave domains is described. Using this workflow, the fire detection potentials of four sources of freely available satellite imagery were investigated: the C-SAR instrument on board Sentinel-1, the multispectral instrument on board Sentinel-2, the sea and land surface temperature instrument on board Sentinel-3, and the MODIS instrument on board Terra and Aqua. For each of them, a single-input convolutional neural network based on the well-known U-Net architecture was trained on a newly created dataset. The performance of the resulting four single-instrument models was evaluated in presence of clouds and in clear conditions. In addition, the potential of combining predictions from pairs of single-instrument models was investigated. The results show that fusion of Sentinel-2 and Sentinel-3 data provides the best detection rate in clear conditions, whereas the fusion of Sentinel-1 and Sentinel-2 data shows a significant benefit in cloudy weather.

Journal ArticleDOI
TL;DR: A novel pseudo flow spatiotemporal L STM unit (PFST-LSTM), where a spatial memory cell and a position alignment module are developed and embedded in the structure of LSTM and developed a new sequence-to-sequence architecture for precipitation nowcasting.
Abstract: Precipitation nowcasting is an important task, which can serve numerous applications such as urban alert and transportation. Previous studies leverage convolutional recurrent neural networks (RNNs) to address the problem. However, they all suffer from two inherent drawbacks of the convolutional RNN, namely, the lack of a memory cell to preserve the fine-grained spatial appearances and the position misalignment issue when combining current observations with previous hidden states. In this article, we aim to overcome the defects. Specifically, we propose a novel pseudo flow spatiotemporal LSTM unit (PFST-LSTM), where a spatial memory cell and a position alignment module are developed and embedded in the structure of LSTM. Upon the PFST-LSTM units, we develop a new sequence-to-sequence architecture for precipitation nowcasting, which can effectively combine the spatial appearances and motion information. Extensive empirical evaluations are conducted on synthetic MovingMNIST++ and CIKM AnalytiCup 2017 datasets. Our experimental results demonstrate the superiority of the proposed PFST-LSTM over the state-of-the-art competitors. To reproduce the results, we release the source code at: https://github.com/luochuyao/PFST-LSTM .

Posted ContentDOI
Liang Gao1, Hui Liu1, Minhang Yang1, Long Chen1, Yaling Wan1, Zhengqing Xiao1, Yurong Qian1 
TL;DR: STransFuse as discussed by the authors combines the benefits of Transformer with CNN to improve the segmentation quality of various remote sensing images by employing a staged model to extract coarse-grained and finegrained feature representations at various semantic scales, unlike earlier techniques based on Transformer model fusion.
Abstract: The applied research in remote sensing images has been pushed by convolutional neural network (CNN). Because of the fixed size of the perceptual field, CNN is unable to model global semantic relevance. Modeling global semantic information is possible with the self-attentive Transformer-based model. However, the method of patch computation used by Transformer for self-attentive computation ignores the spatial information inside each patch. To address these issues, we offer the STransFuse model as a new semantic segmentation method for remote sensing images. It is a model that combines the benefits of Transformer with CNN to improve the segmentation quality of various remote sensing images. We employ a staged model to extract coarse-grained and fine-grained feature representations at various semantic scales, unlike earlier techniques based on Transformer model fusion. In order to take full advantage of the features acquired at different stages, we designed an adaptive fusion module. This module adaptively fuses the semantic information between features at different scales employing a self-attentive mechanism. The overall accuracy (OA) of our proposed model on the Vaihingen dataset is 1.36% higher than the baseline, and 1.27% improvement in OA over baseline on the Potsdam dataset. When compared to other advanced models, the STransFuse model performs admirably.

Journal ArticleDOI
TL;DR: Experimental results reveal that the newly proposed MorphConvHyperNet offers comparable (and even superior) performance when compared to traditional 2D and 3D CNNs for HSI classification.
Abstract: Convolutional neural networks (CNNs) have become quite popular for solving many different tasks in remote sensing data processing. The convolution is a linear operation, which extracts features from the input data. However, nonlinear operations are able to better characterize the internal relationships and hidden patterns within complex remote sensing data, such as hyperspectral images (HSIs). Morphological operations are powerful nonlinear transformations for feature extraction that preserve the essential characteristics of the image, such as borders, shape, and structural information. In this article, a new end-to-end morphological deep learning framework (called MorphConvHyperNet ) is introduced. The proposed approach efficiently models nonlinear information during the training process of HSI classification. Specifically, our method includes spectral and spatial morphological blocks to extract relevant features from the HSI input data. These morphological blocks consist of two basic 2-D morphological operators (erosion and dilation) in the respective layers, followed by a weighted combination of the feature maps. Both layers can successfully encode the nonlinear information related to shape and size, playing an important role in classification performance. Our experimental results, obtained on five widely used HSIs, reveal that our newly proposed MorphConvHyperNet offers comparable (and even superior) performance when compared to traditional 2-D and 3-D CNNs for HSI classification.