scispace - formally typeset
Search or ask a question

Showing papers in "IEEE Transactions on Geoscience and Remote Sensing in 2017"


Journal ArticleDOI
TL;DR: The Aerial Image Data Set (AID) as mentioned in this paper is a large-scale data set for aerial scene classification, which contains more than 10,000 aerial images from remote sensing images.
Abstract: Aerial scene classification, which aims to automatically label an aerial image with a specific semantic category, is a fundamental problem for understanding high-resolution remote sensing imagery. In recent years, it has become an active task in the remote sensing area, and numerous algorithms have been proposed for this task, including many machine learning and data-driven approaches. However, the existing data sets for aerial scene classification, such as UC-Merced data set and WHU-RS19, contain relatively small sizes, and the results on them are already saturated. This largely limits the development of scene classification algorithms. This paper describes the Aerial Image data set (AID): a large-scale data set for aerial scene classification. The goal of AID is to advance the state of the arts in scene classification of remote sensing images. For creating AID, we collect and annotate more than 10000 aerial scene images. In addition, a comprehensive review of the existing aerial scene classification techniques as well as recent widely used deep learning methods is given. Finally, we provide a performance analysis of typical aerial scene classification and deep learning approaches on AID, which can be served as the baseline results on this benchmark.

1,081 citations


Journal ArticleDOI
TL;DR: An end-to-end framework for the dense, pixelwise classification of satellite imagery with convolutional neural networks (CNNs) and design a multiscale neuron module that alleviates the common tradeoff between recognition and precise localization is proposed.
Abstract: We propose an end-to-end framework for the dense, pixelwise classification of satellite imagery with convolutional neural networks (CNNs). In our framework, CNNs are directly trained to produce classification maps out of the input images. We first devise a fully convolutional architecture and demonstrate its relevance to the dense classification problem. We then address the issue of imperfect training data through a two-step training approach: CNNs are first initialized by using a large amount of possibly inaccurate reference data, and then refined on a small amount of accurately labeled data. To complete our framework, we design a multiscale neuron module that alleviates the common tradeoff between recognition and precise localization. A series of experiments show that our networks consider a large amount of context to provide fine-grained classification maps.

859 citations


Journal ArticleDOI
TL;DR: Experimental results based on several hyperspectral image data sets demonstrate that the proposed pixel-pair method can achieve better classification performance than the conventional deep learning-based method.
Abstract: The deep convolutional neural network (CNN) is of great interest recently. It can provide excellent performance in hyperspectral image classification when the number of training samples is sufficiently large. In this paper, a novel pixel-pair method is proposed to significantly increase such a number, ensuring that the advantage of CNN can be actually offered. For a testing pixel, pixel-pairs, constructed by combining the center pixel and each of the surrounding pixels, are classified by the trained CNN, and the final label is then determined by a voting strategy. The proposed method utilizing deep CNN to learn pixel-pair features is expected to have more discriminative power. Experimental results based on several hyperspectral image data sets demonstrate that the proposed method can achieve better classification performance than the conventional deep learning-based method.

676 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a sequence-based recurrent neural network (RNN) for hyperspectral image classification, which makes use of a newly proposed activation function, parametric rectified tanh (PRetanh), instead of the popular tanh or rectified linear unit.
Abstract: In recent years, vector-based machine learning algorithms, such as random forests, support vector machines, and 1-D convolutional neural networks, have shown promising results in hyperspectral image classification. Such methodologies, nevertheless, can lead to information loss in representing hyperspectral pixels, which intrinsically have a sequence-based data structure. A recurrent neural network (RNN), an important branch of the deep learning family, is mainly designed to handle sequential data. Can sequence-based RNN be an effective method of hyperspectral image classification? In this paper, we propose a novel RNN model that can effectively analyze hyperspectral pixels as sequential data and then determine information categories via network reasoning. As far as we know, this is the first time that an RNN framework has been proposed for hyperspectral image classification. Specifically, our RNN makes use of a newly proposed activation function, parametric rectified tanh (PRetanh), for hyperspectral sequential data analysis instead of the popular tanh or rectified linear unit. The proposed activation function makes it possible to use fairly high learning rates without the risk of divergence during the training procedure. Moreover, a modified gated recurrent unit, which uses PRetanh for hidden representation, is adopted to construct the recurrent layer in our network to efficiently process hyperspectral data and reduce the total number of parameters. Experimental results on three airborne hyperspectral images suggest competitive performance in the proposed mode. In addition, the proposed network architecture opens a new window for future research, showcasing the huge potential of deep recurrent networks for hyperspectral data analysis.

560 citations


Journal ArticleDOI
TL;DR: This paper proposes a new object localization framework, which can be divided into three processes: region proposal, classification, and accurate object localization process, and a dimension-reduction model performs better than the retrained and fine-tuned models and the detection precision of the combined CNN model is much higher than that of any single model.
Abstract: In this paper, we focus on tackling the problem of automatic accurate localization of detected objects in high-resolution remote sensing images. The two major problems for object localization in remote sensing images caused by the complex context information such images contain are achieving generalizability of the features used to describe objects and achieving accurate object locations. To address these challenges, we propose a new object localization framework, which can be divided into three processes: region proposal, classification, and accurate object localization process. First, a region proposal method is used to generate candidate regions with the aim of detecting all objects of interest within these images. Then, generic image features from a local image corresponding to each region proposal are extracted by a combination model of 2-D reduction convolutional neural networks (CNNs). Finally, to improve the location accuracy, we propose an unsupervised score-based bounding box regression (USB-BBR) algorithm, combined with a nonmaximum suppression algorithm to optimize the bounding boxes of regions that detected as objects. Experiments show that the dimension-reduction model performs better than the retrained and fine-tuned models and the detection precision of the combined CNN model is much higher than that of any single model. Also our proposed USB-BBR algorithm can more accurately locate objects within an image. Compared with traditional features extraction methods, such as elliptic Fourier transform-based histogram of oriented gradients and local binary pattern histogram Fourier, our proposed localization framework shows robustness when dealing with different complex backgrounds.

539 citations


Journal ArticleDOI
TL;DR: The C6 algorithm changes can collectively result in significant changes relative to C5, though the magnitude depends on the data set and the pixel's retrieval location in the cloud parameter space.
Abstract: The Moderate-Resolution Imaging Spectroradiometer (MODIS) level-2 (L2) cloud product (earth science data set names MOD06 and MYD06 for Terra and Aqua MODIS, respectively) provides pixel-level retrievals of cloud top properties (day and night pressure, temperature, and height) and cloud optical properties (optical thickness, effective particle radius, and water path for both liquid water and ice cloud thermodynamic phases—daytime only) Collection 6 (C6) reprocessing of the product was completed in May 2014 and March 2015 for MODIS Aqua and Terra, respectively Here we provide an overview of major C6 optical property algorithm changes relative to the previous Collection 5 (C5) product Notable C6 optical and microphysical algorithm changes include: 1) new ice cloud optical property models and a more extensive cloud radiative transfer code lookup table (LUT) approach; 2) improvement in the skill of the shortwave-derived cloud thermodynamic phase; 3) separate cloud effective radius retrieval data sets for each spectral combination used in previous collections; 4) separate retrievals for partly cloudy pixels and those associated with cloud edges; 5) failure metrics that provide diagnostic information for pixels having observations that fall outside the LUT solution space; and 6) enhanced pixel-level retrieval uncertainty calculations The C6 algorithm changes can collectively result in significant changes relative to C5, though the magnitude depends on the data set and the pixel’s retrieval location in the cloud parameter space Example L2 granule and level-3 gridded data set differences between the two collections are shown While the emphasis is on the suite of cloud optical property data sets, other MODIS cloud data sets are discussed when relevant

496 citations


Journal ArticleDOI
TL;DR: The proposed CV-CNN is comparable to that of existing state-of-the-art methods in terms of overall classification accuracy and experiments show that the classification error can be further reduced if employingCV-CNN instead of conventional real-valued CNN with the same degrees of freedom.
Abstract: Following the great success of deep convolutional neural networks (CNNs) in computer vision, this paper proposes a complex-valued CNN (CV-CNN) specifically for synthetic aperture radar (SAR) image interpretation. It utilizes both amplitude and phase information of complex SAR imagery. All elements of CNN including input-output layer, convolution layer, activation function, and pooling layer are extended to the complex domain. Moreover, a complex backpropagation algorithm based on stochastic gradient descent is derived for CV-CNN training. The proposed CV-CNN is then tested on the typical polarimetric SAR image classification task which classifies each pixel into known terrain types via supervised training. Experiments with the benchmark data sets of Flevoland and Oberpfaffenhofen show that the classification error can be further reduced if employing CV-CNN instead of conventional real-valued CNN with the same degrees of freedom. The performance of CV-CNN is comparable to that of existing state-of-the-art methods in terms of overall classification accuracy.

460 citations


Journal ArticleDOI
TL;DR: This paper presents a CNN-based system relying on a downsample-then-upsample architecture, which learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions, and compares two standard CNN architectures with the proposed one.
Abstract: Semantic labeling (or pixel-level land-cover classification) in ultrahigh-resolution imagery (<10 cm) requires statistical models able to learn high-level concepts from spatial data, with large appearance variations. Convolutional neural networks (CNNs) achieve this goal by learning discriminatively a hierarchy of representations of increasing abstraction. In this paper, we present a CNN-based system relying on a downsample-then-upsample architecture. Specifically, it first learns a rough spatial map of high-level representations by means of convolutions and then learns to upsample them back to the original resolution by deconvolutions. By doing so, the CNN learns to densely label every pixel at the original resolution of the image. This results in many advantages, including: 1) the state-of-the-art numerical accuracy; 2) the improved geometric accuracy of predictions; and 3) high efficiency at inference time. We test the proposed system on the Vaihingen and Potsdam subdecimeter resolution data sets, involving the semantic labeling of aerial images of 9- and 5-cm resolution, respectively. These data sets are composed by many large and fully annotated tiles, allowing an unbiased evaluation of models making use of spatial information. We do so by comparing two standard CNN architectures with the proposed one: standard patch classification, prediction of local label patches by employing only convolutions, and full patch labeling by employing deconvolutions. All the systems compare favorably or outperform a state-of-the-art baseline relying on superpixels and powerful appearance descriptors. The proposed full patch labeling CNN outperforms these models by a large margin, also showing a very appealing inference time.

442 citations


Journal ArticleDOI
TL;DR: The pretrained visual geometry group network (VGG-Net) model is proposed as deep feature extractors to extract informative features from the original VHR images to produce good informative features to describe the images scene with much lower dimension.
Abstract: The rapid development of remote sensing technology allows us to get images with high and very high resolution (VHR) VHR imagery scene classification has become an important and challenging problem In this paper, we introduce a framework for VHR scene understanding First, the pretrained visual geometry group network (VGG-Net) model is proposed as deep feature extractors to extract informative features from the original VHR images Second, we select the fully connected layers constructed by VGG-Net in which each layer is regarded as separated feature descriptors And then we combine between them to construct final representation of the VHR image scenes Third, discriminant correlation analysis (DCA) is adopted as feature fusion strategy to further refine the original features extracting from VGG-Net, which allows a more efficient fusion approach with small cost than the traditional feature fusion strategies We apply our approach to three challenging data sets: 1) UC MERCED data set that contains 21 different areal scene categories with submeter resolution; 2) WHU-RS data set that contains 19 challenging scene categories with various resolutions; and 3) the Aerial Image data set that has a number of 10 000 images within 30 challenging scene categories with various resolutions The experimental results demonstrate that our proposed method outperforms the state-of-the-art approaches Using feature fusion technique achieves a higher accuracy than solely using the raw deep features Moreover, the proposed method based on DCA fusion produces good informative features to describe the images scene with much lower dimension

383 citations


Journal ArticleDOI
TL;DR: A novel deep model, i.e., a cascaded end-to-end convolutional neural network (CasNet), to simultaneously cope with the road detection and centerline extraction tasks and outperforms the state-of-the-art methods greatly in learning quality and learning speed.
Abstract: Accurate road detection and centerline extraction from very high resolution (VHR) remote sensing imagery are of central importance in a wide range of applications. Due to the complex backgrounds and occlusions of trees and cars, most road detection methods bring in the heterogeneous segments; besides for the centerline extraction task, most current approaches fail to extract a wonderful centerline network that appears smooth, complete, as well as single-pixel width. To address the above-mentioned complex issues, we propose a novel deep model, i.e., a cascaded end-to-end convolutional neural network (CasNet), to simultaneously cope with the road detection and centerline extraction tasks. Specifically, CasNet consists of two networks. One aims at the road detection task, whose strong representation ability is well able to tackle the complex backgrounds and occlusions of trees and cars. The other is cascaded to the former one, making full use of the feature maps produced formerly, to obtain the good centerline extraction. Finally, a thinning algorithm is proposed to obtain smooth, complete, and single-pixel width road centerline network. Extensive experiments demonstrate that CasNet outperforms the state-of-the-art methods greatly in learning quality and learning speed. That is, CasNet exceeds the comparing methods by a large margin in quantitative performance, and it is nearly 25 times faster than the comparing methods. Moreover, as another contribution, a large and challenging road centerline data set for the VHR remote sensing image will be publicly available for further studies.

346 citations


Journal ArticleDOI
TL;DR: Experiments demonstrate that the learned deep joint spectral–spatial features are discriminative, and competitive classification results can be achieved when compared with state-of-the-art methods.
Abstract: Feature extraction is of significance for hyperspectral image (HSI) classification. Compared with conventional hand-crafted feature extraction, deep learning can automatically learn features with discriminative information. However, two issues exist in applying deep learning to HSIs. One issue is how to jointly extract spectral features and spatial features, and the other one is how to train the deep model when training samples are scarce. In this paper, a deep convolutional neural network with two-branch architecture is proposed to extract the joint spectral–spatial features from HSIs. The two branches of the proposed network are devoted to features from the spectral domain as well as the spatial domain. The learned spectral features and spatial features are then concatenated and fed to fully connected layers to extract the joint spectral–spatial features for classification. When the training samples are limited, we investigate the transfer learning to improve the performance. Low and mid-layers of the network are pretrained and transferred from other data sources; only top layers are trained with limited training samples extracted from the target scene. Experiments on Airborne Visible/Infrared Imaging Spectrometer and Reflective Optics System Imaging Spectrometer data demonstrate that the learned deep joint spectral–spatial features are discriminative, and competitive classification results can be achieved when compared with state-of-the-art methods. The experiments also reveal that the transferred features boost the classification performance.

Journal ArticleDOI
TL;DR: The experimental results obtained on real hyperspectral data sets including airport, beach, and urban scenes demonstrate that the performance of the proposed method is quite competitive in terms of computing time and detection accuracy.
Abstract: A novel method for anomaly detection in hyperspectral images is proposed. The method is based on two ideas. First, compared with the surrounding background, objects with anomalies usually appear with small areas and distinct spectral signatures. Second, for both the background and the objects with anomalies, pixels in the same class are usually highly correlated in the spatial domain. In this paper, the pixels with specific area property and distinct spectral signatures are first detected with attribute filtering and a Boolean map-based fusion approach in order to obtain an initial pixel-wise detection result. Then, the initial detection result is refined with edge-preserving filtering to make full use of the spatial correlations among adjacent pixels. Compared with other widely used anomaly detection methods, the experimental results obtained on real hyperspectral data sets including airport, beach, and urban scenes demonstrate that the performance of the proposed method is quite competitive in terms of computing time and detection accuracy.

Journal ArticleDOI
TL;DR: A new diversified DBN is developed through regularizing pretraining and fine-tuning procedures by a diversity promoting prior over latent factors that obtain much better results than original DBNs and comparable or even better performances compared with other recent hyperspectral image classification methods.
Abstract: In the literature of remote sensing, deep models with multiple layers have demonstrated their potentials in learning the abstract and invariant features for better representation and classification of hyperspectral images. The usual supervised deep models, such as convolutional neural networks, need a large number of labeled training samples to learn their model parameters. However, the real-world hyperspectral image classification task provides only a limited number of training samples. This paper adopts another popular deep model, i.e., deep belief networks (DBNs), to deal with this problem. The DBNs allow unsupervised pretraining over unlabeled samples at first and then a supervised fine-tuning over labeled samples. But the usual pretraining and fine-tuning method would make many hidden units in the learned DBNs tend to behave very similarly or perform as “dead” (never responding) or “potential over-tolerant” (always responding) latent factors. These results could negatively affect description ability and thus classification performance of DBNs. To further improve DBN’s performance, this paper develops a new diversified DBN through regularizing pretraining and fine-tuning procedures by a diversity promoting prior over latent factors. Moreover, the regularized pretraining and fine-tuning can be efficiently implemented through usual recursive greedy and back-propagation learning framework. The experiments over real-world hyperspectral images demonstrated that the diversity promoting prior in both pretraining and fine-tuning procedure lead to the learned DBNs with more diverse latent factors, which directly make the diversified DBNs obtain much better results than original DBNs and comparable or even better performances compared with other recent hyperspectral image classification methods.

Journal ArticleDOI
TL;DR: The proposed PCA-EPFs method for HSI classification sharply improves the accuracy of the SVM classifier with respect to the standard edge-preserving filtering-based feature extraction method, and other widely used spectral-spatial classifiers.
Abstract: Edge-preserving features (EPFs) obtained by the application of edge-preserving filters to hyperspectral images (HSIs) have been found very effective in characterizing significant spectral and spatial structures of objects in a scene. However, a direct use of the EPFs can be insufficient to provide a complete characterization of spatial information when objects of different scales are present in the considered images. Furthermore, the edge-preserving smoothing operation unavoidably decreases the spectral differences among objects of different classes, which may affect the following classification. To overcome these problems, in this paper, a novel principal component analysis (PCA)-based EPFs (PCA-EPFs) method for HSI classification is proposed, which consists of the following steps. First, the standard EPFs are constructed by applying edge-preserving filters with different parameter settings to the considered image, and the resulting EPFs are stacked together. Next, the spectral dimension of the stacked EPFs is reduced with the PCA, which not only can represent the EPFs in the mean square sense but also highlight the separability of pixels in the EPFs. Finally, the resulting PCA-EPFs are classified by a support vector machine (SVM) classifier. Experiments performed on several real hyperspectral data sets show the effectiveness of the proposed PCA-EPFs, which sharply improves the accuracy of the SVM classifier with respect to the standard edge-preserving filtering-based feature extraction method, and other widely used spectral-spatial classifiers.

Journal ArticleDOI
TL;DR: An unsupervised representation learning method is proposed to investigate deconvolution networks for remote sensing scene classification and outperform most state of the arts results, which demonstrates the effectiveness of this method.
Abstract: With the rapid development of the satellite sensor technology, high spatial resolution remote sensing (HSR) data have attracted extensive attention in military and civilian applications In order to make full use of these data, remote sensing scene classification becomes an important and necessary precedent task In this paper, an unsupervised representation learning method is proposed to investigate deconvolution networks for remote sensing scene classification First, a shallow weighted deconvolution network is utilized to learn a set of feature maps and filters for each image by minimizing the reconstruction error between the input image and the convolution result The learned feature maps can capture the abundant edge and texture information of high spatial resolution images, which is definitely important for remote sensing images After that, the spatial pyramid model (SPM) is used to aggregate features at different scales to maintain the spatial layout of HSR image scene A discriminative representation for HSR image is obtained by combining the proposed weighted deconvolution model and SPM Finally, the representation vector is input into a support vector machine to finish classification We apply our method on two challenging HSR image data sets: the UCMerced data set with 21 scene categories and the Sydney data set with seven land-use categories All the experimental results achieved by the proposed method outperform most state of the arts, which demonstrates the effectiveness of the proposed method

Journal ArticleDOI
TL;DR: A fusion strategy for integrating multilayer features of a pretrained CNN model for scene classification is presented, and a multiscale improved Fisher kernel coding method is proposed to build a mid-level feature representation of convolutional deep features.
Abstract: Scene classification from remote sensing images provides new possibilities for potential application of high spatial resolution imagery. How to efficiently implement scene recognition from high spatial resolution imagery remains a significant challenge in the remote sensing domain. Recently, convolutional neural networks (CNN) have attracted tremendous attention because of their excellent performance in different fields. However, most works focus on fully training a new deep CNN model for the target problems without considering the limited data and time-consuming issues. To alleviate the aforementioned drawbacks, some works have attempted to use the pretrained CNN models as feature extractors to build a feature representation of scene images for classification and achieved successful applications including remote sensing scene classification. However, existing works pay little attention to exploring the benefits of multilayer features for improving the scene classification in different aspects. As a matter of fact, the information hidden in different layers has great potential for improving feature discrimination capacity. Therefore, this paper presents a fusion strategy for integrating multilayer features of a pretrained CNN model for scene classification. Specifically, the pretrained CNN model is used as a feature extractor to extract deep features of different convolutional and fully connected layers; then, a multiscale improved Fisher kernel coding method is proposed to build a mid-level feature representation of convolutional deep features. Finally, the mid-level features extracted from convolutional layers and the features of fully connected layers are fused by a principal component analysis/spectral regression kernel discriminant analysis method for classification. For validation and comparison purposes, the proposed approach is evaluated via experiments with two challenging high-resolution remote sensing data sets, and shows the competitive performance compared with fully trained CNN models, fine-tuning CNN models, and other related works.

Journal ArticleDOI
TL;DR: Results show that the -value classification provides a robust basis for decisions regarding using either active or passive data alone, or an unweighted average in cases where relative weights cannot be estimated reliably, and that the weights estimated from TCA in almost all cases outperform the ternary decision upon which the ESA CCI SM v02.3 is based.
Abstract: We propose a method for merging soil moisture retrievals from spaceborne active and passive microwave instruments based on weighted averaging taking into account the error characteristics of the individual data sets. The merging scheme is parameterized using error variance estimates obtained from using triple collocation analysis (TCA). In regions where TCA is deemed unreliable, we use correlation significance levels ( $p$ -values) as indicator for retrieval quality to decide whether to use active data only, passive data only, or an unweighted average. We apply the proposed merging scheme to active retrievals from advanced scatterometer and passive retrievals from the Advanced Microwave Scanning Radiometer—Earth Observing System using Global Land Data Assimilation System-Noah to complement the triplet required for TCA. The merged time series is evaluated against soil moisture estimates from ERA-Interim/Land and in situ measurements from the International Soil Moisture Network using the European Space Agency’s (ESA’s) current Climate Change Initiative—Soil Moisture (ESA CCI SM) product version v02.3 as benchmark merging scheme. Results show that the $p$ -value classification provides a robust basis for decisions regarding using either active or passive data alone, or an unweighted average in cases where relative weights cannot be estimated reliably, and that the weights estimated from TCA in almost all cases outperform the ternary decision upon which the ESA CCI SM v02.3 is based. The proposed method forms the basis for the new ESA CCI SM product version v03.x and higher.

Journal ArticleDOI
TL;DR: The proposed C-CNN outperforms the state-of-the-art CNN-based classification methods, and its corresponding FL-CNN is very effective to extract sensor-specific spatial-spectral features for hyperspectral applications under both supervised and unsupervised modes.
Abstract: Convolutional neural network (CNN) is well known for its capability of feature learning and has made revolutionary achievements in many applications, such as scene recognition and target detection. In this paper, its capability of feature learning in hyperspectral images is explored by constructing a five-layer CNN for classification (C-CNN). The proposed C-CNN is constructed by including recent advances in deep learning area, such as batch normalization, dropout, and parametric rectified linear unit (PReLU) activation function. In addition, both spatial context and spectral information are elegantly integrated into the C-CNN such that spatial-spectral features are learned for hyperspectral images. A companion feature-learning CNN (FL-CNN) is constructed by extracting fully connected feature layers in this C-CNN. Both supervised and unsupervised modes are designed for the proposed FL-CNN to learn sensor-specific spatial-spectral features. Extensive experimental results on four benchmark data sets from two well-known hyperspectral sensors, namely airborne visible/infrared imaging spectrometer (AVIRIS) and reflective optics system imaging spectrometer (ROSIS) sensors, demonstrate that our proposed C-CNN outperforms the state-of-the-art CNN-based classification methods, and its corresponding FL-CNN is very effective to extract sensor-specific spatial-spectral features for hyperspectral applications under both supervised and unsupervised modes.

Journal ArticleDOI
TL;DR: The results show that HOPCncc is robust against complex nonlinear radiometric differences and outperforms the state-of-the-art similarities metrics (i.e., NCC and mutual information) in matching performance.
Abstract: Automatic registration of multimodal remote sensing data [e.g., optical, light detection and ranging (LiDAR), and synthetic aperture radar (SAR)] is a challenging task due to the significant nonlinear radiometric differences between these data. To address this problem, this paper proposes a novel feature descriptor named the histogram of orientated phase congruency (HOPC), which is based on the structural properties of images. Furthermore, a similarity metric named HOPCncc is defined, which uses the normalized correlation coefficient (NCC) of the HOPC descriptors for multimodal registration. In the definition of the proposed similarity metric, we first extend the phase congruency model to generate its orientation representation and use the extended model to build HOPCncc. Then, a fast template matching scheme for this metric is designed to detect the control points between images. The proposed HOPCncc aims to capture the structural similarity between images and has been tested with a variety of optical, LiDAR, SAR, and map data. The results show that HOPCncc is robust against complex nonlinear radiometric differences and outperforms the state-of-the-art similarities metrics (i.e., NCC and mutual information) in matching performance. Moreover, a robust registration method is also proposed in this paper based on HOPCncc, which is evaluated using six pairs of multimodal remote sensing images. The experimental results demonstrate the effectiveness of the proposed method for multimodal image registration.

Journal ArticleDOI
TL;DR: This paper analyzes and evaluates different MKL algorithms and their respective characteristics in different cases of HSI classification cases, and discusses the future direction and trends of research in this area.
Abstract: With the rapid development of spectral imaging techniques, classification of hyperspectral images (HSIs) has attracted great attention in various applications such as land survey and resource monitoring in the field of remote sensing. A key challenge in HSI classification is how to explore effective approaches to fully use the spatial–spectral information provided by the data cube. Multiple kernel learning (MKL) has been successfully applied to HSI classification due to its capacity to handle heterogeneous fusion of both spectral and spatial features. This approach can generate an adaptive kernel as an optimally weighted sum of a few fixed kernels to model a nonlinear data structure. In this way, the difficulty of kernel selection and the limitation of a fixed kernel can be alleviated. Various MKL algorithms have been developed in recent years, such as the general MKL, the subspace MKL, the nonlinear MKL, the sparse MKL, and the ensemble MKL. The goal of this paper is to provide a systematic review of MKL methods, which have been applied to HSI classification. We also analyze and evaluate different MKL algorithms and their respective characteristics in different cases of HSI classification cases. Finally, we discuss the future direction and trends of research in this area.

Journal ArticleDOI
TL;DR: A hierarchical split-based approach that searches for tiles of variable size allowing the parameterization of the distributions of two classes to evaluate its capacity for parameterizing distribution functions attributed to floodwater and changes caused by floods.
Abstract: Parametric thresholding algorithms applied to synthetic aperture radar (SAR) imagery typically require the estimation of two distribution functions, i.e., one representing the target class and one its background. They are eventually used for selecting the threshold that allows binarizing the image in an optimal way. In this context, one of the main difficulties in parameterizing these functions originates from the fact that the target class often represents only a small fraction of the image. Under such circumstances, the histogram of the image values is often not obviously bimodal and it becomes difficult, if not impossible, to accurately parameterize distribution functions. Here we introduce a hierarchical split-based approach that searches for tiles of variable size allowing the parameterization of the distributions of two classes. The method is integrated into a flood-mapping algorithm in order to evaluate its capacity for parameterizing distribution functions attributed to floodwater and changes caused by floods. We analyzed a data set acquired during a flood event along the Severn River (U.K.) in 2007. It is composed of moderate (ENVISAT-WS) and high (TerraSAR-X)-resolution SAR images. The obtained classification accuracies as well as the similarity of performance levels to a benchmark obtained with an established method based on the manual selection of tiles indicate the validity of the new method.

Journal ArticleDOI
TL;DR: This paper reformulates the approximation problem using nonconvex regularizer instead of the traditional nuclear norm, resulting in a tighter approximation of the original sparsity-regularised rank function and develops an iterative algorithm based on the augmented Lagrangian multipliers method that can preserve large-scale image structures and small-scale details well.
Abstract: Hyperspectral image (HSI) denoising is challenging not only because of the difficulty in preserving both spectral and spatial structures simultaneously, but also due to the requirement of removing various noises, which are often mixed together. In this paper, we present a nonconvex low rank matrix approximation (NonLRMA) model and the corresponding HSI denoising method by reformulating the approximation problem using nonconvex regularizer instead of the traditional nuclear norm, resulting in a tighter approximation of the original sparsity-regularised rank function. NonLRMA aims to decompose the degraded HSI, represented in the form of a matrix, into a low rank component and a sparse term with a more robust and less biased formulation. In addition, we develop an iterative algorithm based on the augmented Lagrangian multipliers method and derive the closed-form solution of the resulting subproblems benefiting from the special property of the nonconvex surrogate function. We prove that our iterative optimization converges easily. Extensive experiments on both simulated and real HSIs indicate that our approach can not only suppress noise in both severely and slightly noised bands but also preserve large-scale image structures and small-scale details well. Comparisons against state-of-the-art LRMA-based HSI denoising approaches show our superior performance.

Journal ArticleDOI
TL;DR: In the multiplicative iterative solution to the proposed TV-RSNMF model, the TV regularizer can be regarded as an abundance map denoising procedure, which improves the robustness of TV- RSNMF to noise.
Abstract: Blind hyperspectral unmixing (HU), which includes the estimation of endmembers and their corresponding fractional abundances, is an important task for hyperspectral analysis. Recently, nonnegative matrix factorization (NMF) and its extensions have been widely used in HU. Unfortunately, most of the NMF-based methods can easily lead to an unsuitable solution, due to the nonconvexity of the NMF model and the influence of noise. To overcome this limitation, we make the best use of the structure of the abundance maps, and propose a new blind HU method named total variation regularized reweighted sparse NMF (TV-RSNMF). First, the abundance matrix is assumed to be sparse, and a weighted sparse regularizer is incorporated into the NMF model. The weights of the weighted sparse regularizer are adaptively updated related to the abundance matrix. Second, the abundance map corresponding to a single fixed endmember should be piecewise smooth. Therefore, the TV regularizer is adopted to capture the piecewise smooth structure of each abundance map. In our multiplicative iterative solution to the proposed TV-RSNMF model, the TV regularizer can be regarded as an abundance map denoising procedure, which improves the robustness of TV-RSNMF to noise. A number of experiments were conducted in both simulated and real-data conditions to illustrate the advantage of the proposed TV-RSNMF method for blind HU.

Journal ArticleDOI
TL;DR: A novel HSIC framework, named deep multiscale spatial-spectral feature extraction algorithm, which focuses on learning effective discriminant features for HSIC, and provides the state-of-the-art performance and is much more effective, especially for images with high nonlinear distribution and spatial diversity.
Abstract: Most of the existing spatial-spectral-based hyperspectral image classification (HSIC) methods mainly extract the spatial-spectral information by combining the pixels in a small neighborhood or aggregating the statistical and morphological characteristics. However, those strategies can only generate shallow appearance features with limited representative ability for classes with high interclass similarity and spatial diversity and therefore reduce the classification accuracy. To this end, we present a novel HSIC framework, named deep multiscale spatial-spectral feature extraction algorithm, which focuses on learning effective discriminant features for HSIC. First, the well pretrained deep fully convolutional network based on VGG-verydeep-16 is introduced to excavate the potential deep multiscale spatial structural information in the proposed hyperspectral imaging framework. Then, the spectral feature and the deep multiscale spatial feature are fused by adopting the weighted fusion method. Finally, the fusion feature is put into a generic classifier to obtain the pixelwise classification. Compared with the existing spectral-spatial-based classification techniques, the proposed method provides the state-of-the-art performance and is much more effective, especially for images with high nonlinear distribution and spatial diversity.

Journal ArticleDOI
TL;DR: The group-structured prior information of hyperspectral images is incorporated into the nonnegative matrix factorization optimization, where the data are organized into spatial groups to exploit the shared sparse pattern and to avoid the loss of spatial details within a spatial group.
Abstract: In recent years, blind source separation (BSS) has received much attention in the hyperspectral unmixing field due to the fact that it allows the simultaneous estimation of both endmembers and fractional abundances. Although great performances can be obtained by the BSS-based unmixing methods, the decomposition results are still unstable and sensitive to noise. Motivated by the first law of geography, some recent studies have revealed that spatial information can lead to an improvement in the decomposition stability. In this paper, the group-structured prior information of hyperspectral images is incorporated into the nonnegative matrix factorization optimization, where the data are organized into spatial groups. Pixels within a local spatial group are expected to share the same sparse structure in the low-rank matrix (abundance). To fully exploit the group structure, image segmentation is introduced to generate the spatial groups. Instead of a predefined group with a regular shape (e.g., a cross or a square window), the spatial groups are adaptively represented by superpixels. Moreover, the spatial group structure and sparsity of the abundance are integrated as a modified mixed-norm regularization to exploit the shared sparse pattern, and to avoid the loss of spatial details within a spatial group. The experimental results obtained with both simulated and real hyperspectral data confirm the high efficiency and precision of the proposed algorithm.

Journal ArticleDOI
TL;DR: A novel change detection framework for high-resolution remote sensing images, which incorporates superpixel-based change feature extraction and hierarchical difference representation learning by neural networks is presented.
Abstract: With the rapid technological development of various satellite sensors, high-resolution remotely sensed imagery has been an important source of data for change detection in land cover transition. However, it is still a challenging problem to effectively exploit the available spectral information to highlight changes. In this paper, we present a novel change detection framework for high-resolution remote sensing images, which incorporates superpixel-based change feature extraction and hierarchical difference representation learning by neural networks. First, highly homogenous and compact image superpixels are generated using superpixel segmentation, which makes these image blocks adhere well to image boundaries. Second, the change features are extracted to represent the difference information using spectrum, texture, and spatial features between the corresponding superpixels. Third, motivated by the fact that deep neural network has the ability to learn from data sets that have few labeled data, we use it to learn the semantic difference between the changed and unchanged pixels. The labeled data can be selected from the bitemporal multispectral images via a preclassification map generated in advance. And then, a neural network is built to learn the difference and classify the uncertain samples into changed or unchanged ones. Finally, a robust and high-contrast change detection result can be obtained from the network. The experimental results on the real data sets demonstrate its effectiveness, feasibility, and superiority of the proposed technique.

Journal ArticleDOI
TL;DR: This paper analyzes imagery data from remote sensing satellites to detect forest cover changes over a period of 29 years, and automatically learns region representations using a deep neural network in a data-driven fashion.
Abstract: Land cover change monitoring is an important task from the perspective of regional resource monitoring, disaster management, land development, and environmental planning. In this paper, we analyze imagery data from remote sensing satellites to detect forest cover changes over a period of 29 years (1987–2015). Since the original data are severely incomplete and contaminated with artifacts, we first devise a spatiotemporal inpainting mechanism to recover the missing surface reflectance information. The spatial filling process makes use of the available data of the nearby temporal instances followed by a sparse encoding-based reconstruction. We formulate the change detection task as a region classification problem. We build a multiresolution profile (MRP) of the target area and generate a candidate set of bounding-box proposals that enclose potential change regions. In contrast to existing methods that use handcrafted features, we automatically learn region representations using a deep neural network in a data-driven fashion. Based on these highly discriminative representations, we determine forest changes and predict their onset and offset timings by labeling the candidate set of proposals. Our approach achieves the state-of-the-art average patch classification rate of 91.6% (an improvement of ~16%) and the mean onset/offset prediction error of 4.9 months (an error reduction of five months) compared with a strong baseline. We also qualitatively analyze the detected changes in the unlabeled image regions, which demonstrate that the proposed forest change detection approach is scalable to new regions.

Journal ArticleDOI
TL;DR: The problem of automatically recognizing and fitting hyperbolae from ground-penetrating radar (GPR) images is addressed, and a novel technique computationally suitable for real-time on-site application is proposed, which is more robust and accurate than algebraic hyperbola fitting algorithms.
Abstract: The problem of automatically recognizing and fitting hyperbolae from ground-penetrating radar (GPR) images is addressed, and a novel technique computationally suitable for real-time on-site application is proposed. After preprocessing of the input GPR images, a novel thresholding method is applied to separate the regions of interest from background. A novel column-connection clustering (C3) algorithm is then applied to separate the regions of interest from each other. Subsequently, a machine learnt model is applied to identify hyperbolic signatures from outputs of the C3 algorithm, and a hyperbola is fitted to each such signature with an orthogonal-distance hyperbola fitting algorithm. The novel clustering algorithm C3 is a central component of the proposed system, which enables the identification of hyperbolic signatures and hyperbola fitting. Only two features are used in the machine learning algorithm, which is easy to train using a small set of training data. An orthogonal-distance hyperbola fitting algorithm for “south-opening” hyperbolae is introduced in this work, which is more robust and accurate than algebraic hyperbola fitting algorithms. The proposed method can successfully recognize and fit hyperbolic signatures with intersections with others, hyperbolic signatures with distortions, and incomplete hyperbolic signatures with one leg fully or largely missed. As an additional novel contribution, formulas to compute an initial “south-opening” hyperbola directly from a set of given points are derived, which make the system more efficient. The parameters obtained by fitting hyperbolae to hyperbolic signatures are very important features; they can be used to estimate the location and size of the related target objects and the average propagation velocity of the electromagnetic wave in the medium. The effectiveness of the proposed system is tested on both synthetic and real GPR data.

Journal ArticleDOI
TL;DR: A CNN framework specifically adapted to the semantic labeling problem, which integrates local and global information in an efficient and flexible manner and outperforms previous techniques.
Abstract: The problem of dense semantic labeling consists in assigning semantic labels to every pixel in an image. In the context of aerial image analysis, it is particularly important to yield high-resolution outputs. In order to use convolutional neural networks (CNNs) for this task, it is required to design new specific architectures to provide fine-grained classification maps. Many dense semantic labeling CNNs have been recently proposed. Our first contribution is an in-depth analysis of these architectures. We establish the desired properties of an ideal semantic labeling CNN, and assess how those methods stand with regard to these properties. We observe that even though they provide competitive results, these CNNs often underexploit properties of semantic labeling that could lead to more effective and efficient architectures. Out of these observations, we then derive a CNN framework specifically adapted to the semantic labeling problem. In addition to learning features at different resolutions, it learns how to combine these features. By integrating local and global information in an efficient and flexible manner, it outperforms previous techniques. We evaluate the proposed framework and compare it with state-of-the-art architectures on public benchmarks of high-resolution aerial image labeling.

Journal ArticleDOI
TL;DR: A matrix-vector nonnegative tensor factorization (NTF) model is proposed in this paper for spectral unmixing, derived from block term decomposition, which is a combination of CPD and Tucker decomposition and leads to a more flexible frame to model various application-dependent problems.
Abstract: Many spectral unmixing approaches ranging from geometry, algebra to statistics have been proposed, in which nonnegative matrix factorization (NMF)-based ones form an important family. The original NMF-based unmixing algorithm loses the spectral and spatial information between mixed pixels when stacking the spectral responses of the pixels into an observed matrix. Therefore, various constrained NMF methods are developed to impose spectral structure, spatial structure, and spectral-spatial joint structure into NMF to enforce the estimated endmembers and abundances preserve these structures. Compared with matrix format, the third-order tensor is more natural to represent a hyperspectral data cube as a whole, by which the intrinsic structure of hyperspectral imagery can be losslessly retained. Extended from NMF-based methods, a matrix-vector nonnegative tensor factorization (NTF) model is proposed in this paper for spectral unmixing. Different from widely used tensor factorization models, such as canonical polyadic decomposition CPD) and Tucker decomposition, the proposed method is derived from block term decomposition, which is a combination of CPD and Tucker decomposition. This leads to a more flexible frame to model various application-dependent problems. The matrix-vector NTF decomposes a third-order tensor into the sum of several component tensors, with each component tensor being the outer product of a vector (endmember) and a matrix (corresponding abundances). From a formal perspective, this tensor decomposition is consistent with linear spectral mixture model. From an informative perspective, the structures within spatial domain, within spectral domain, and cross spectral-spatial domain are retreated interdependently. Experiments demonstrate that the proposed method has outperformed several state-of-the-art NMF-based unmixing methods.