scispace - formally typeset
Search or ask a question

Showing papers in "Isprs Journal of Photogrammetry and Remote Sensing in 2019"


Journal ArticleDOI
TL;DR: This review covers nearly every application and technology in the field of remote sensing, ranging from preprocessing to mapping, and a conclusion regarding the current state-of-the art methods, a critical conclusion on open challenges, and directions for future research are presented.
Abstract: Deep learning (DL) algorithms have seen a massive rise in popularity for remote-sensing image analysis over the past few years. In this study, the major DL concepts pertinent to remote-sensing are introduced, and more than 200 publications in this field, most of which were published during the last two years, are reviewed and analyzed. Initially, a meta-analysis was conducted to analyze the status of remote sensing DL studies in terms of the study targets, DL model(s) used, image spatial resolution(s), type of study area, and level of classification accuracy achieved. Subsequently, a detailed review is conducted to describe/discuss how DL has been applied for remote sensing image analysis tasks including image fusion, image registration, scene classification, object detection, land use and land cover (LULC) classification, segmentation, and object-based image analysis (OBIA). This review covers nearly every application and technology in the field of remote sensing, ranging from preprocessing to mapping. Finally, a conclusion regarding the current state-of-the art methods, a critical conclusion on open challenges, and directions for future research are presented.

1,181 citations


Journal ArticleDOI
TL;DR: A comprehensive review of the current-state-of-the-art in DL for HSI classification, analyzing the strengths and weaknesses of the most widely used classifiers in the literature is provided, providing an exhaustive comparison of the discussed techniques.
Abstract: Advances in computing technology have fostered the development of new and powerful deep learning (DL) techniques, which have demonstrated promising results in a wide range of applications. Particularly, DL methods have been successfully used to classify remotely sensed data collected by Earth Observation (EO) instruments. Hyperspectral imaging (HSI) is a hot topic in remote sensing data analysis due to the vast amount of information comprised by this kind of images, which allows for a better characterization and exploitation of the Earth surface by combining rich spectral and spatial information. However, HSI poses major challenges for supervised classification methods due to the high dimensionality of the data and the limited availability of training samples. These issues, together with the high intraclass variability (and interclass similarity) –often present in HSI data– may hamper the effectiveness of classifiers. In order to solve these limitations, several DL-based architectures have been recently developed, exhibiting great potential in HSI data interpretation. This paper provides a comprehensive review of the current-state-of-the-art in DL for HSI classification, analyzing the strengths and weaknesses of the most widely used classifiers in the literature. For each discussed method, we provide quantitative results using several well-known and widely used HSI scenes, thus providing an exhaustive comparison of the discussed techniques. The paper concludes with some remarks and hints about future challenges in the application of DL techniques to HSI classification. The source codes of the methods discussed in this paper are available from: https://github.com/mhaut/hyperspectral_deeplearning_review .

534 citations


Journal ArticleDOI
TL;DR: An extensive state-of-the-art survey on OBIA techniques is conducted, discussed different segmentation techniques and their applicability to OBIB, and selected optimal parameters and algorithms that can general image objects matching with the meaningful geographic objects.
Abstract: Image segmentation is a critical and important step in (GEographic) Object-Based Image Analysis (GEOBIA or OBIA). The final feature extraction and classification in OBIA is highly dependent on the quality of image segmentation. Segmentation has been used in remote sensing image processing since the advent of the Landsat-1 satellite. However, after the launch of the high-resolution IKONOS satellite in 1999, the paradigm of image analysis moved from pixel-based to object-based. As a result, the purpose of segmentation has been changed from helping pixel labeling to object identification. Although several articles have reviewed segmentation algorithms, it is unclear if some segmentation algorithms are generally more suited for (GE)OBIA than others. This article has conducted an extensive state-of-the-art survey on OBIA techniques, discussed different segmentation techniques and their applicability to OBIA. Conceptual details of those techniques are explained along with the strengths and weaknesses. The available tools and software packages for segmentation are also summarized. The key challenge in image segmentation is to select optimal parameters and algorithms that can general image objects matching with the meaningful geographic objects. Recent research indicates an apparent movement towards the improvement of segmentation algorithms, aiming at more accurate, automated, and computationally efficient techniques.

325 citations


Journal ArticleDOI
Xin Huang1, Ying Wang1
TL;DR: Wang et al. as mentioned in this paper investigated the relationship between 2D/3D urban morphology and summer daytime LST in Wuhan, a representative megacity in Central China, which is known for its extremely hot weather in summer, by adopting high-resolution remote sensing data and geographical information data.
Abstract: The Urban heat island (UHI) effect is an increasingly serious problem in urban areas. Information on the driving forces of intra-urban temperature variation is crucial for ameliorating the urban thermal environment. Although prior studies have suggested that urban morphology (e.g., landscape pattern, land-use type) can significantly affect land surface temperature (LST), few studies have explored the comprehensive effect of 2D and 3D urban morphology on LST in different urban functional zones (UFZs), especially at a fine scale. Therefore, in this research, we investigated the relationship between 2D/3D urban morphology and summer daytime LST in Wuhan, a representative megacity in Central China, which is known for its extremely hot weather in summer, by adopting high-resolution remote sensing data and geographical information data. The “urban morphology” in this study consists of 2D urban morphological parameters, 3D urban morphological parameters, and UFZs. Our results show that: (1) The LST is significantly related to 2D and 3D urban morphological parameters, and the scattered distribution of buildings with high rise can facilitate the mitigation of LST. Although sky view factor (SVF) is an important measure of 3D urban geometry, its influence on LST is complicated and context-dependent. (2) Trees are the most influential factor in reducing LST, and the cooling efficiency mainly depends on their proportions. The fragmented and irregular distribution of grass/shrubs also plays a significant role in alleviating LST. (3) With respect to UFZs, the residential zone is the largest heat source, whereas the highest LST appears in commercial and industrial zones. (4) Results of the multivariate regression and variation partitioning indicate that the relative importance of 2D and 3D urban morphological parameters on LST varies among different UFZs and 2D morphology outperforms 3D morphology in LST modulation. The results are generally consistent in spring, summer and autumn. These findings can provide insights for urban planners and designers on how to mitigate the surface UHI (SUHI) effect via rational landscape design and urban management during summer daytime.

241 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel semi-supervised cross-modality learning framework, called learnable manifold alignment (LeMA), which learns a joint graph structure directly from the data instead of using a given fixed graph defined by a Gaussian kernel function.
Abstract: In this paper, we aim at tackling a general but interesting cross-modality feature learning question in remote sensing community—can a limited amount of highly-discriminative (e.g., hyperspectral) training data improve the performance of a classification task using a large amount of poorly-discriminative (e.g., multispectral) data? Traditional semi-supervised manifold alignment methods do not perform sufficiently well for such problems, since the hyperspectral data is very expensive to be largely collected in a trade-off between time and efficiency, compared to the multispectral data. To this end, we propose a novel semi-supervised cross-modality learning framework, called learnable manifold alignment (LeMA). LeMA learns a joint graph structure directly from the data instead of using a given fixed graph defined by a Gaussian kernel function. With the learned graph, we can further capture the data distribution by graph-based label propagation, which enables finding a more accurate decision boundary. Additionally, an optimization strategy based on the alternating direction method of multipliers (ADMM) is designed to solve the proposed model. Extensive experiments on two hyperspectral-multispectral datasets demonstrate the superiority and effectiveness of the proposed method in comparison with several state-of-the-art methods.

223 citations


Journal ArticleDOI
TL;DR: Using transfer learning capabilities of FCNs to slum mapping in various satellite images is found to be extremely valuable in retrieving information on small-scaled urban structures such as slum patches even in satellite images of decametric resolution.
Abstract: Unprecedented urbanization in particular in countries of the global south result in informal urban development processes, especially in mega cities. With an estimated 1 billion slum dwellers globally, the United Nations have made the fight against poverty the number one sustainable development goal. To provide better infrastructure and thus a better life to slum dwellers, detailed information on the spatial location and size of slums is of crucial importance. In the past, remote sensing has proven to be an extremely valuable and effective tool for mapping slums. The nature of used mapping approaches by machine learning, however, made it necessary to invest a lot of effort in training the models. Recent advances in deep learning allow for transferring trained fully convolutional networks (FCN) from one data set to another. Thus, in our study we aim at analyzing transfer learning capabilities of FCNs to slum mapping in various satellite images. A model trained on very high resolution optical satellite imagery from QuickBird is transferred to Sentinel-2 and TerraSAR-X data. While free-of-charge Sentinel-2 data is widely available, its comparably lower resolution makes slum mapping a challenging task. TerraSAR-X data on the other hand, has a higher resolution and is considered a powerful data source for intra-urban structure analysis. Due to the different image characteristics of SAR compared to optical data, however, transferring the model could not improve the performance of semantic segmentation but we observe very high accuracies for mapped slums in the optical data: QuickBird image obtains 86–88% (positive prediction value and sensitivity) and a significant increase for Sentinel-2 applying transfer learning can be observed (from 38 to 55% and from 79 to 85% for PPV and sensitivity, respectively). Using transfer learning proofs extremely valuable in retrieving information on small-scaled urban structures such as slum patches even in satellite images of decametric resolution.

221 citations


Journal ArticleDOI
Zhiwei Li1, Huanfeng Shen1, Qing Cheng1, Yuhao Liu1, Shucheng You, Zongyi He1 
TL;DR: The experimental results show that MSCFF achieves a higher accuracy than the traditional rule-based cloud detection methods and the state-of-the-art deep learning models, especially in bright surface covered areas.
Abstract: Cloud detection is an important preprocessing step for the precise application of optical satellite imagery. In this paper, we propose a deep learning based cloud detection method named multi-scale convolutional feature fusion (MSCFF) for remote sensing images of different sensors. In the network architecture of MSCFF, the symmetric encoder-decoder module, which provides both local and global context by densifying feature maps with trainable convolutional filter banks, is utilized to extract multi-scale and high-level spatial features. The feature maps of multiple scales are then up-sampled and concatenated, and a novel multi-scale feature fusion module is designed to fuse the features of different scales for the output. The two output feature maps of the network are cloud and cloud shadow maps, which are in turn fed to binary classifiers outside the model to obtain the final cloud and cloud shadow mask. The MSCFF method was validated on hundreds of globally distributed optical satellite images, with spatial resolutions ranging from 0.5 to 50 m, including Landsat-5/7/8, Gaofen-1/2/4, Sentinel-2, Ziyuan-3, CBERS-04, Huanjing-1, and collected high-resolution images exported from Google Earth. The experimental results show that MSCFF achieves a higher accuracy than the traditional rule-based cloud detection methods and the state-of-the-art deep learning models, especially in bright surface covered areas. The effectiveness of MSCFF means that it has great promise for the practical application of cloud detection for multiple types of medium and high-resolution remote sensing images. Our established global high-resolution cloud detection validation dataset has been made available online ( http://sendimage.whu.edu.cn/en/mscff/ ).

194 citations


Journal ArticleDOI
TL;DR: A deep learning architecture to combine information coming from S1 and S2 time series, namely TWINNS (TWIn Neural Networks for Sentinel data), able to discover spatial and temporal dependencies in both types of SITS is proposed.
Abstract: The huge amount of data currently produced by modern Earth Observation (EO) missions has allowed for the design of advanced machine learning techniques able to support complex Land Use/Land Cover (LULC) mapping tasks. The Copernicus programme developed by the European Space Agency provides, with missions such as Sentinel-1 (S1) and Sentinel-2 (S2), radar and optical (multi-spectral) imagery, respectively, at 10 m spatial resolution with revisit time around 5 days. Such high temporal resolution allows to collect Satellite Image Time Series (SITS) that support a plethora of Earth surface monitoring tasks. How to effectively combine the complementary information provided by such sensors remains an open problem in the remote sensing field. In this work, we propose a deep learning architecture to combine information coming from S1 and S2 time series, namely TWINNS (TWIn Neural Networks for Sentinel data), able to discover spatial and temporal dependencies in both types of SITS. The proposed architecture is devised to boost the land cover classification task by leveraging two levels of complementarity, i.e., the interplay between radar and optical SITS as well as the synergy between spatial and temporal dependencies. Experiments carried out on two study sites characterized by different land cover characteristics (i.e., the Koumbia site in Burkina Faso and Reunion Island, a overseas department of France in the Indian Ocean), demonstrate the significance of our proposal.

172 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the use of image textures, VIs, and combinations thereof to make multiple temporal estimates and maps of AGB covering three winter-wheat growth stages.
Abstract: When dealing with multiple growth stages, estimates of above-ground biomass (AGB) based on optical vegetation indices (VIs) are difficult for two reasons: (i) optical VIs saturate at medium-to-high canopy cover, and (ii) organs that grow vertically (e.g., biomass of reproductive organs and stems) are difficult to detect by canopy spectral VIs. Although several significant improvements have been made for estimating AGB by using narrow-band hyperspectral VIs, synthetic aperture radar, laser intensity direction and ranging, the crop surface model technique, and combinations thereof, applications of these new techniques have been limited by cost, availability, data-processing difficulties, and high dimensionality. The present study thus evaluates the use of ultrahigh-ground-resolution image textures, VIs, and combinations thereof to make multiple temporal estimates and maps of AGB covering three winter-wheat growth stages. The selected gray-tone spatial-dependence matrix-based image textures (e.g., variance, entropy, data range, homogeneity, second moment, dissimilarity, contrast, correlation) are calculated from 1-, 2-, 5-, 10-, 15-, 20-, 25-, and 30-cm-ground-resolution images acquired by using an inexpensive RGB sensor mounted on an unmanned aerial vehicle (UAV). Optical-VI data were obtained by using a ground spectrometer to analyze UAV-acquired RGB images. The accuracy of AGB estimates based on optical VIs varies, with validation R2: 0.59–0.78, root mean square error (RMSE): 1.22–1.59 t/ha, and mean absolute error (MAE): 1.03–1.27 t/ha. The most accurate AGB estimate was obtained by combining image textures and VIs, which gave R2: 0.89, MAE: 0.67 t/ha, and RMSE: 0.82 t/ha. The results show that (i) the eight selected textures from ultrahigh-ground-resolution images were significantly related to AGB, (ii) the combined use of image textures from 1- to 30-cm-ground-resolution images and VIs can improve the accuracy of AGB estimates as compared with using only optical VIs or image textures alone; and (iii) high AGB values from winter-wheat reproductive growth stages can be accurately estimated by using this method; (iv) high estimates of winter-wheat AGB (8–14 t/ha) using the proposed combined method (DIS1, SE30, B460, B560, B670, EVI2 using MSR) show a 22.63% (nRMSE) improvement compared with using only spectral VIs (LCI, NDVI using MSR), and a 21.24% (nRMSE) improvement compared with using only image textures (COR1, DIS1, SE30, EN30 using MSR). Thus, the combined use of image textures and VIs can help improve estimates of AGB under conditions of high canopy coverage.

165 citations


Journal ArticleDOI
TL;DR: In this paper, the reliability and robustness of tree height observations obtained via a conventional field inventory, airborne laser scanning (ALS) and terrestrial laser scanner (TLS) were investigated.
Abstract: Quantitative comparisons of tree height observations from different sources are scarce due to the difficulties in effective sampling. In this study, the reliability and robustness of tree height observations obtained via a conventional field inventory, airborne laser scanning (ALS) and terrestrial laser scanning (TLS) were investigated. A carefully designed non-destructive experiment was conducted that included 1174 individual trees in 18 sample plots (32 m × 32 m) in a Scandinavian boreal forest. The point density of the ALS data was approximately 450 points/m2. The TLS data were acquired with multi-scans from the center and the four quadrant directions of the sample plots. Both the ALS and TLS data represented the cutting edge point cloud products. Tree heights were manually measured from the ALS and TLS point clouds with the aid of existing tree maps. Therefore, the evaluation results revealed the capacities of the applied laser scanning (LS) data while excluding the influence of data processing approach such as the individual tree detection. The reliability and robustness of different tree height sources were evaluated through a cross-comparison of the ALS-, TLS-, and field- based tree heights. Compared to ALS and TLS, field measurements were more sensitive to stand complexity, crown classes, and species. Overall, field measurements tend to overestimate height of tall trees, especially tall trees in codominant crown class. In dense stands, high uncertainties also exist in the field measured heights for small trees in intermediate and suppressed crown class. The ALS-based tree height estimates were robust across all stand conditions. The taller the tree, the more reliable was the ALS-based tree height. The highest uncertainty in ALS-based tree heights came from trees in intermediate crown class, due to the difficulty of identifying treetops. When using TLS, reliable tree heights can be expected for trees lower than 15–20 m in height, depending on the complexity of forest stands. The advantage of LS systems was the robustness of the geometric accuracy of the data. The greatest challenges of the LS techniques in measuring individual tree heights lie in the occlusion effects, which lead to omissions of trees in intermediate and suppressed crown classes in ALS data and incomplete crowns of tall trees in TLS data.

154 citations


Journal ArticleDOI
TL;DR: In this article, the authors examined the potential of integrating synthetic aperture radar (SAR, Sentinel-1) and optical remote sensing (Landsat-8 and Sentinel-2) data to monitor the conditions of a native pasture and an introduced pasture in Oklahoma, USA.
Abstract: Grassland degradation has accelerated in recent decades in response to increased climate variability and human activity. Rangeland and grassland conditions directly affect forage quality, livestock production, and regional grassland resources. In this study, we examined the potential of integrating synthetic aperture radar (SAR, Sentinel-1) and optical remote sensing (Landsat-8 and Sentinel-2) data to monitor the conditions of a native pasture and an introduced pasture in Oklahoma, USA. Leaf area index (LAI) and aboveground biomass (AGB) were used as indicators of pasture conditions under varying climate and human activities. We estimated the seasonal dynamics of LAI and AGB using Sentinel-1 (S1), Landsat-8 (LC8), and Sentinel-2 (S2) data, both individually and integrally, applying three widely used algorithms: Multiple Linear Regression (MLR), Support Vector Machine (SVM), and Random Forest (RF). Results indicated that integration of LC8 and S2 data provided sufficient data to capture the seasonal dynamics of grasslands at a 10–30-m spatial resolution and improved assessments of critical phenology stages in both pluvial and dry years. The satellite-based LAI and AGB models developed from ground measurements in 2015 reasonably predicted the seasonal dynamics and spatial heterogeneity of LAI and AGB in 2016. By comparison, the integration of S1, LC8, and S2 has the potential to improve the estimation of LAI and AGB more than 30% relative to the performance of S1 at low vegetation cover (LAI 2 m2/m2, AGB > 500 g/m2). These results demonstrate the potential of combining S1, LC8, and S2 monitoring grazing tallgrass prairie to provide timely and accurate data for grassland management.

Journal ArticleDOI
TL;DR: A new Fully Convolutional Network (FCN) architecture that can be trained in an end-to-end scheme and is specifically designed for the classification of wetland complexes using polarimetric SAR (PolSAR) imagery, demonstrating that the proposed network outperforms the conventional random forest classifier and the state-of-the-art FCNs, both visually and numerically for wetland mapping.
Abstract: Despite the application of state-of-the-art fully Convolutional Neural Networks (CNNs) for semantic segmentation of very high-resolution optical imagery, their capacity has not yet been thoroughly examined for the classification of Synthetic Aperture Radar (SAR) images. The presence of speckle noise, the absence of efficient feature expression, and the limited availability of labelled SAR samples have hindered the application of the state-of-the-art CNNs for the classification of SAR imagery. This is of great concern for mapping complex land cover ecosystems, such as wetlands, where backscattering/spectrally similar signatures of land cover units further complicate the matter. Accordingly, we propose a new Fully Convolutional Network (FCN) architecture that can be trained in an end-to-end scheme and is specifically designed for the classification of wetland complexes using polarimetric SAR (PolSAR) imagery. The proposed architecture follows an encoder-decoder paradigm, wherein the input data are fed into a stack of convolutional filters (encoder) to extract high-level abstract features and a stack of transposed convolutional filters (decoder) to gradually up-sample the low resolution output to the spatial resolution of the original input image. The proposed network also benefits from recent advances in CNN designs, namely the addition of inception modules and skip connections with residual units. The former component improves multi-scale inference and enriches contextual information, while the latter contributes to the recovery of more detailed information and simplifies optimization. Moreover, an in-depth investigation of the learned features via opening the black box demonstrates that convolutional filters extract discriminative polarimetric features, thus mitigating the limitation of the feature engineering design in PolSAR image processing. Experimental results from full polarimetric RADARSAT-2 imagery illustrate that the proposed network outperforms the conventional random forest classifier and the state-of-the-art FCNs, such as FCN-32s, FCN-16s, FCN-8s, and SegNet, both visually and numerically for wetland mapping.

Journal ArticleDOI
TL;DR: Compared with other cloud removal methods, the results demonstrate that S-NMF-EC is visually and quantitatively effective for the removal of thick clouds, thin clouds, and shadows.
Abstract: In the imaging process of optical remote sensing platforms, clouds are an inevitable barrier to the effective observation of sensors. To recover the original information covered by the clouds and the accompanying shadows, a nonnegative matrix factorization (NMF) and error correction method (S-NMF-EC) is proposed in this paper. Firstly, a cloud-free fused reference image is obtained by a reference image and two or more low-resolution images using the spatial and temporal nonlocal filter-based data fusion model (STNLFFM). Secondly, the cloud-free fused reference image is used to remove the cloud cover of the cloud-contaminated image based on NMF. Finally, the cloud removal result is further improved by error correction. It is worth noting that cloud detection is not required by S-NMF-EC, and the cloud-free information of the cloud-contaminated image is maximally retained. Both simulated and real-data experiments were conducted to validate the proposed S-NMF-EC method. Compared with other cloud removal methods, the results demonstrate that S-NMF-EC is visually and quantitatively effective (correlation coefficients ≥ 0.99) for the removal of thick clouds, thin clouds, and shadows.

Journal ArticleDOI
TL;DR: In this article, the authors compared the capability of multispectral S2 and airborne hyperspectral remote sensing data for soil organic carbon (SOC) prediction, and investigated the importance of spectral and spatial resolution through the signal-to-noise ratio (SNR), the variable importance in the prediction (VIP) models and the spatial variability of the SOC maps at field and regional scales.
Abstract: The short revisit time of the Sentinel-2 (S2) constellation entails a large availability of remote sensing data, but S2 data have been rarely used to predict soil organic carbon (SOC) content. Thus, this study aims at comparing the capability of multispectral S2 and airborne hyperspectral remote sensing data for SOC prediction, and at the same time, we investigated the importance of spectral and spatial resolution through the signal-to-noise ratio (SNR), the variable importance in the prediction (VIP) models and the spatial variability of the SOC maps at field and regional scales. We tested the capability of the S2 data to predict SOC in croplands with quite different soil types and parent materials in Germany, Luxembourg and Belgium, using multivariate statistics and local ground calibration with soil samples. We split the calibration dataset into sub-regions according to soil maps and built a multivariate regression model within each sub-region. The prediction accuracy obtained by S2 data is generally slightly lower than that retrieved by airborne hyperspectral data. The ratio of performance to deviation (RPD) is higher than 2 in Luxembourg (2.6) and German (2.2) site, while it is 1.1 in the Belgian area. After the spectral resampling of the airborne data according to S2 band, the prediction accuracy did not change for four out of five of the sub-regions. The variable importance values obtained by S2 data showed the same trend as the airborne VIP values, while the importance of SWIR bands decreased using airborne data resampled according the S2 bands. These differences of VIP values can be explained by the loss of spectral resolution as compared to APEX data and the strong difference in terms of SNR between the SWIR region and other spectral regions. The investigation on the spatial variability of the SOC maps derived by S2 data has shown that the spatial resolution of S2 is adequate to describe SOC variability both within field and at regional scale.

Journal ArticleDOI
TL;DR: An end-to-end trainable gated residual refinement network (GRRNet) that fuses high-resolution aerial images and LiDAR point clouds for building extraction and has competitive building extraction performance in comparison with other approaches is developed.
Abstract: Automated extraction of buildings from remotely sensed data is important for a wide range of applications but challenging due to difficulties in extracting semantic features from complex scenes like urban areas. The recently developed fully convolutional neural networks (FCNs) have shown to perform well on urban object extraction because of the outstanding feature learning and end-to-end pixel labeling abilities. The commonly used feature fusion or skip-connection refine modules of FCNs often overlook the problem of feature selection and could reduce the learning efficiency of the networks. In this paper, we develop an end-to-end trainable gated residual refinement network (GRRNet) that fuses high-resolution aerial images and LiDAR point clouds for building extraction. The modified residual learning network is applied as the encoder part of GRRNet to learn multi-level features from the fusion data and a gated feature labeling (GFL) unit is introduced to reduce unnecessary feature transmission and refine classification results. The proposed model - GRRNet is tested in a publicly available dataset with urban and suburban scenes. Comparison results illustrated that GRRNet has competitive building extraction performance in comparison with other approaches. The source code of the developed GRRNet is made publicly available for studies.

Journal ArticleDOI
TL;DR: Experiments conducted on three widely-used hyperspectral image datasets demonstrate that the dimension-reduced features learned by the proposed IMR framework with respect to classification or recognition accuracy are superior to those of related state-of-the-art HDR approaches.
Abstract: Hyperspectral dimensionality reduction (HDR), an important preprocessing step prior to high-level data analysis, has been garnering growing attention in the remote sensing community. Although a variety of methods, both unsupervised and supervised models, have been proposed for this task, yet the discriminative ability in feature representation still remains limited due to the lack of a powerful tool that effectively exploits the labeled and unlabeled data in the HDR process. A semi-supervised HDR approach, called iterative multitask regression (IMR), is proposed in this paper to address this need. IMR aims at learning a low-dimensional subspace by jointly considering the labeled and unlabeled data, and also bridging the learned subspace with two regression tasks: labels and pseudo-labels initialized by a given classifier. More significantly, IMR dynamically propagates the labels on a learnable graph and progressively refines pseudo-labels, yielding a well-conditioned feedback system. Experiments conducted on three widely-used hyperspectral image datasets demonstrate that the dimension-reduced features learned by the proposed IMR framework with respect to classification or recognition accuracy are superior to those of related state-of-the-art HDR approaches.

Journal ArticleDOI
TL;DR: A novel end-to-end network, namely class-wise attention-based convolutional and bidirectional LSTM network (CA-Conv-BiLSTM), for aerial image multi-label classification is proposed, which models the underlying class dependency in both directions and produces structured multiple object labels.
Abstract: Aerial image classification is of great significance in the remote sensing community, and many researches have been conducted over the past few years. Among these studies, most of them focus on categorizing an image into one semantic label, while in the real world, an aerial image is often associated with multiple labels, e.g., multiple object-level labels in our case. Besides, a comprehensive picture of present objects in a given high-resolution aerial image can provide a more in-depth understanding of the studied region. For these reasons, aerial image multi-label classification has been attracting increasing attention. However, one common limitation shared by existing methods in the community is that the co-occurrence relationship of various classes, so-called class dependency, is underexplored and leads to an inconsiderate decision. In this paper, we propose a novel end-to-end network, namely class-wise attention-based convolutional and bidirectional LSTM network (CA-Conv-BiLSTM), for this task. The proposed network consists of three indispensable components: (1) a feature extraction module, (2) a class attention learning layer, and (3) a bidirectional LSTM-based sub-network. Particularly, the feature extraction module is designed for extracting fine-grained semantic feature maps, while the class attention learning layer aims at capturing discriminative class-specific features. As the most important part, the bidirectional LSTM-based sub-network models the underlying class dependency in both directions and produce structured multiple object labels. Experimental results on UCM multi-label dataset and DFC15 multi-label dataset validate the effectiveness of our model quantitatively and qualitatively.

Journal ArticleDOI
Kai Yue1, Lei Yang1, Ruirui Li1, Wei Hu1, Fan Zhang1, Wei Li1 
TL;DR: This paper proposes TreeUNet, a tool that uses an adaptive network to increase the classification rate at the pixel level and shows that the improvement brought by the adaptive Tree-CNN block is significant.
Abstract: Fine-grained semantic segmentation results are typically difficult to obtain for subdecimeter aerial imagery segmentation as a result of complex remote sensing content and optical conditions. Recently, convolutional neural networks (CNNs) have shown outstanding performance on this task. Although many deep neural network structures and techniques have been applied to improve accuracy, few have attended to improving the differentiation of easily confused classes. In this paper, we propose TreeUNet, a tool that uses an adaptive network to increase the classification rate at the pixel level. Specifically, based on a deep semantic model infrastructure, a Tree-CNN block in which each node represents a ResNeXt unit is constructed adaptively in accordance with the confusion matrix and the proposed TreeCutting algorithm. By transmitting feature maps through concatenating connections, the Tree-CNN block fuses multiscale features and learns best weights for the model. In experiments on the ISPRS two-dimensional Vaihingen and Potsdam semantic labelling datasets, the results obtained by TreeUNet are competitive among published state-of-the-art methods. Detailed comparison and analysis show that the improvement brought by the adaptive Tree-CNN block is significant.

Journal ArticleDOI
TL;DR: It is shown that multi-temporal intensity (pre- and co-event) plays the most important role in urban flood detection and an active self-learning convolution neural network (A-SL CNN) framework is introduced to alleviate the effect of a limited annotated training dataset.
Abstract: Synthetic Aperture Radar (SAR) remote sensing has been widely used for flood mapping and monitoring. Nevertheless, flood detection in urban areas still proves to be particularly challenging by using SAR. In this paper, we assess the roles of SAR intensity and interferometric coherence in urban flood detection using multi-temporal TerraSAR-X data. We further introduce an active self-learning convolution neural network (A-SL CNN) framework to alleviate the effect of a limited annotated training dataset. The proposed framework selects informative unlabeled samples based on a temporal-ensembling CNN model. These samples are subsequently pseudo-labeled by a multi-scale spatial filter. Consistency regularization is introduced to penalize incorrect labels caused by pseudo-labeling. We show results for a case study that is centered on flooded areas in Houston, USA, during hurricane Harvey in August 2017. Our experiments show that multi-temporal intensity (pre- and co-event) plays the most important role in urban flood detection. Adding multi-temporal coherence can increase the reliability of the inundation map considerably. Meanwhile, encouraging results are achieved by the proposed A-SL CNN framework: the к statistic is improved from 0.614 to 0.686 in comparison to its supervised counterpart.

Journal ArticleDOI
TL;DR: A novel method for reconstructing parametric, volumetric, multi-story building models from unstructured, unfiltered indoor point clouds with oriented normals by means of solving an integer linear optimization problem.
Abstract: We present a novel method for reconstructing parametric, volumetric, multi-story building models from unstructured, unfiltered indoor point clouds with oriented normals by means of solving an integer linear optimization problem. Our approach overcomes limitations of previous methods in several ways: First, we drop assumptions about the input data such as the availability of separate scans as an initial room segmentation. Instead, a fully automatic room segmentation and outlier removal is performed on the unstructured point clouds. Second, restricting the solution space of our optimization approach to arrangements of volumetric wall entities representing the structure of a building enforces a consistent model of volumetric, interconnected walls fitted to the observed data instead of unconnected, paper-thin surfaces. Third, we formulate the optimization as an integer linear programming problem which allows for an exact solution instead of the approximations achieved with most previous techniques. Lastly, our optimization approach is designed to incorporate hard constraints which were difficult or even impossible to integrate before. We evaluate and demonstrate the capabilities of our proposed approach on a variety of complex real-world point clouds.

Journal ArticleDOI
TL;DR: A novel graph convolution is introduced by converting it from the vertex domain into a point-wise product in the Fourier domain using the graph Fourier transform and convolution theorem, which achieves a significant improvement over existing methods.
Abstract: Machine learning methods, specifically, convolutional neural networks (CNNs), have emerged as an integral part of scientific research in many disciplines. However, these powerful methods often fail to perform pattern analysis and knowledge mining with spatial vector data because in most cases, such data are not underlying grid-like or array structures but can only be modeled as graph structures. The present study introduces a novel graph convolution by converting it from the vertex domain into a point-wise product in the Fourier domain using the graph Fourier transform and convolution theorem. In addition, the graph convolutional neural network (GCNN) architecture is proposed to analyze graph-structured spatial vector data. The focus of this study is the classical task of building pattern classification, which remains limited by the use of design rules and manually extracted features for specific patterns. The spatial vector data representing grouped buildings are modeled as graphs, and indices for the characteristics of individual buildings are investigated to collect the input variables. The pattern features of these graphs are directly extracted by training labeled data. Experiments confirmed that the GCNN produces satisfactory results in terms of identifying regular and irregular patterns, and thus achieves a significant improvement over existing methods. In summary, the GCNN has considerable potential for the analysis of graph-structured spatial vector data as well as scope for further improvement.

Journal ArticleDOI
TL;DR: This study revealed that the CNN classifier classified particularly well for the specific LCZ classes in which buildings were mixed with trees or buildings or plants were sparsely distributed, providing a basis for guidance of future LCZ classification using deep learning.
Abstract: The Local Climate Zone (LCZ) scheme is a classification system providing a standardization framework to present the characteristics of urban forms and functions, especially for urban heat island (UHI) research. Landsat-based 100 m resolution LCZ maps have been classified by the World Urban Database and Portal Tool (WUDAPT) method using a random forest (RF) machine learning classifier. Some studies have proposed modified RF and convolutional neural network (CNN) approaches. This study aims to compare CNN with an RF classifier for LCZ mapping in great detail. We designed five schemes (three RF-based schemes (S1–S3) and two CNN-based ones (S4–S5)), which consist of various combinations of input features from bitemporal Landsat 8 data over four global mega cities: Rome, Hong Kong, Madrid, and Chicago. Among the five schemes, the CNN-based one with the incorporation of a larger neighborhood information showed the best classification performance. When compared to the WUDAPT workflow, the overall accuracies for entire land cover classes (OA) and for urban LCZ types (i.e., LCZ1-10; OAurb) increased by about 6–8% and 10–13%, respectively, for the four cities. The transferability of LCZ models for the four cities were evaluated, showing that CNN consistently resulted in higher accuracy (increased by about 7–18% and 18–29% for OA and OAurb, respectively) than RF. This study revealed that the CNN classifier classified particularly well for the specific LCZ classes in which buildings were mixed with trees or buildings or plants were sparsely distributed. The research findings can provide a basis for guidance of future LCZ classification using deep learning.

Journal ArticleDOI
TL;DR: In this paper, the authors performed a comprehensive assessment of WorldView-3 images acquired in the dry and wet seasons for tree species discrimination in tropical semi-deciduous forests, and applied an individual tree crown (ITC)-based approach that employed pan-sharpened VNIR bands and gray level co-occurrence matrix texture features.
Abstract: Tropical forest conservation and management can significantly benefit from information about the spatial distribution of tree species. Very-high resolution (VHR) spaceborne platforms have been hailed as a promising technology for mapping tree species over broad spatial extents. WorldView-3, the most advanced VHR sensor, provides spectral data in 16 bands covering the visible to near-infrared (VNIR, 400–1040 nm) and shortwave-infrared (SWIR, 1210–2365 nm) wavelength ranges. It also collects images at unprecedented levels of details using a panchromatic band with 0.3-m of spatial resolution. However, the potential of WorldView-3 at its full spectral and spatial resolution for tropical tree species classification remains unknown. In this study, we performed a comprehensive assessment of WorldView-3 images acquired in the dry and wet seasons for tree species discrimination in tropical semi-deciduous forests. Classification experiments were performed using VNIR individually and combined with SWIR channels. To take advantage of the sub-metric resolution of the panchromatic band for classification, we applied an individual tree crown (ITC)-based approach that employed pan-sharpened VNIR bands and gray level co-occurrence matrix texture features. We determined whether the combination of images from the two annual seasons improves the classification accuracy. Finally, we investigated which plant traits influenced species detection. The new SWIR sensing capabilities of WorldView-3 increased the average producer’s accuracy up to 7.8%, by enabling the detection of non-photosynthetic vegetation within ITCs. The combination of VNIR bands from the two annual seasons did not improve the classification results when compared to the results obtained using images from each season individually. The use of VNIR bands at their original 1.2-m spatial resolution yielded average producer’s accuracies of 43.1 ± 3.1% and 38.8 ± 3% in the wet and dry seasons, respectively. The ITC-based approach improved the accuracy to 70 ± 8% in the wet and 68.4 ± 7.4% in the dry season. Texture analysis of the panchromatic band enabled the detection of species-specific differences in crown structure, which improved species detection. The use of texture analysis, pan-sharpening, and ITC delineation is a potential approach to perform tree species classification in tropical forests with WorldView-3 satellite images.

Journal ArticleDOI
TL;DR: This version has been removed as the user did not have the right to agree to the license at the time of submission.
Abstract: arXiv admin note: This version has been removed as the user did not have the right to agree to the license at the time of submission

Journal ArticleDOI
TL;DR: In this article, a residuals corrected geographically weighted regression model (GWRc) was proposed to generate DMSP-like VIIRS data, which can be used to extend NTL time series, and in conjunction with the upcoming yearly VIIRS and Black Marble daily NTL data, it is possible to support long-term NTL-based studies such as monitoring light pollution in ecosystems, and mapping human activities.
Abstract: Night-time light (NTL) data provides a great opportunity to monitor human activities and settlements. Currently, global-scale NTL data are acquired by two satellite sensors, i.e., DMSP-OLS and VIIRS, but the data collected by the satellites are not compatible. To address this issue, we proposed a method for generating long-term and consistent NTL data. First, a logistic model was employed to estimate and smooth the missing DMSP-OLS data. Second, the Lomb-Scargle Periodogram technique was used to statistically examine the presence of seasonality of monthly VIIRS time series. The seasonal effect, noisy and unstable observations in VIIRS were eliminated by the BFAST time-series decomposition algorithm. Then, we proposed a residuals corrected geographically weighted regression model (GWRc) to generate DMSP-like VIIRS data. A consistent NTL time series from 1996 to 2017 was formed by combining the DMSP-OLS and synthetic DMSP-like VIIRS data. Our assessment shows that the proposed GWRc model outperformed existing methods (e.g., power function model), yielding a lower regression RMSE (6.36), a significantly improved pixel-level NTL intensity consistency (SNDI = 82.73, R2 = 0.986) and provided more coherent results when used for urban area extraction. The proposed method can be used to extend NTL time series, and in conjunction with the upcoming yearly VIIRS data and Black Marble daily VIIRS data, it is possible to support long-term NTL-based studies such as monitoring light pollution in ecosystems, and mapping human activities.

Journal ArticleDOI
TL;DR: A deep learning-based framework for road marking extraction, classification and completion from three-dimensional (3D) mobile laser scanning (MLS) point clouds is presented, which is less sensitive to data quality.
Abstract: Road markings play a critical role in road traffic safety and are one of the most important elements for guiding autonomous vehicles (AVs). High-Definition (HD) maps with accurate road marking information are very useful for many applications ranging from road maintenance, improving navigation, and prediction of upcoming road situations within AVs. This paper presents a deep learning-based framework for road marking extraction, classification and completion from three-dimensional (3D) mobile laser scanning (MLS) point clouds. Compared with existing road marking extraction methods, which are mostly based on intensity thresholds, our method is less sensitive to data quality. We added the step of road marking completion to further optimize the results. At the extraction stage, a modified U-net model was used to segment road marking pixels to overcome the intensity variation, low contrast and other issues. At the classification stage, a hierarchical classification method by integrating multi-scale clustering with Convolutional Neural Networks (CNN) was developed to classify different types of road markings with considerable differences. At the completion stage, a method based on a Generative Adversarial Network (GAN) was developed to complete small-size road markings first, then followed by completing broken lane lines and adding missing markings using a context-based method. In addition, we built a point cloud road marking dataset to train the deep network model and evaluate our method. The dataset contains urban road and highway MLS data and underground parking lot data acquired by our own assembled backpacked laser scanning system. Our experimental results obtained using the point clouds of different scenes demonstrated that our method is very promising for road marking extraction, classification and completion.

Journal ArticleDOI
TL;DR: In this paper, the authors explored the potential of UAS RGB imagery-derived spectral, structural, and volumetric information, as well as a proposed vegetation index weighted canopy volume model (CVMVI) for soybean aboveground biomass (AGB) estimation.
Abstract: Crop biomass estimation with high accuracy at low-cost is valuable for precision agriculture and high-throughput phenotyping. Recent technological advances in Unmanned Aerial Systems (UAS) significantly facilitate data acquisition at low-cost along with high spatial, spectral, and temporal resolution. The objective of this study was to explore the potential of UAS RGB imagery-derived spectral, structural, and volumetric information, as well as a proposed vegetation index weighted canopy volume model (CVMVI) for soybean [Glycine max (L.) Merr.] aboveground biomass (AGB) estimation. RGB images were collected from low-cost UAS throughout the growing season at a field site near Columbia, Missouri, USA. High-density point clouds were produced using the structure from motion (SfM) technique through a photogrammetric workflow based on UAS stereo images. Two-dimensional (2D) canopy structure metrics such as canopy height (CH) and canopy projected basal area (BA), as well as three-dimensional (3D) volumetric metrics such as canopy volume model (CVM) were derived from photogrammetric point clouds. A variety of vegetation indices (VIs) were also extracted from RGB orthomosaics. Then, CVMVI, which combines canopy spectral and volumetric information, was proposed. Commonly used regression models were established based on the UAS-derived information and field- measured AGB with a leave-one-out cross-validation. The results show that: (1) In general, canopy 2D structural metrics CH and BA yielded higher correlation with AGB than VIs. (2) Three-dimensional metrics, such as CVM, that encompass both horizontal and vertical properties of canopy provided better estimates for AGB compared to 2D structural metrics (R2 = 0.849; RRMSE = 18.7%; MPSE = 20.8%). (3) Optimized CVMVI, which incorporates both canopy spectral and 3D volumetric information outperformed the other indices and metrics, and was a better predictor for AGB estimation (R2 = 0.893; RRMSE = 16.3%; MPSE = 19.5%). In addition, CVMVI showed equal prediction power for different genotypes, which indicates its potential for high-throughput soybean biomass estimation. Moreover, a CVMVI based univariate regression model yielded AGB predicting capability comparable to multivariate complex regression models such as stepwise multilinear regression (SMR) and partial least squares regression (PLSR) that incorporate multiple canopy spectral indices and structural metrics. Overall, this study reveals the potential of canopy spectral, structural and volumetric information, and their combination (i.e., CVMVI) for estimations of soybean AGB. CVMVI was shown to be simple but effective in estimating AGB, and could be applied for high-throughput phenotyping and precision agro-ecological applications and management.

Journal ArticleDOI
Fan Zhang1, Lun Wu1, Di Zhu1, Di Zhu2, Yu Liu1 
TL;DR: The study shows that street-level imagery, as the counterpart of remote sensing imagery, provides an opportunity to infer fine-scale human activity information of an urban region and bridge gaps between the physical space and human space.
Abstract: Street-level imagery has covered the comprehensive landscape of urban areas. Compared to satellite imagery, this new source of image data has the advantage in fine-grained observations of not only physical environment but also social sensing. Prior studies using street-level imagery focus primarily on urban physical environment auditing. In this study, we demonstrate the potential usage of street-level imagery in uncovering spatio-temporal urban mobility patterns. Our method assumes that the streetscape depicted in street-level imagery reflects urban functions and that urban streets of similar functions exhibit similar temporal mobility patterns. We present how a deep convolutional neural network (DCNN) can be trained to identify high-level scene features from street view images that can explain up to 66.5% of the hourly variation of taxi trips along with the urban road network. The study shows that street-level imagery, as the counterpart of remote sensing imagery, provides an opportunity to infer fine-scale human activity information of an urban region and bridge gaps between the physical space and human space. This approach can therefore facilitate urban environment observation and smart urban planning.

Journal ArticleDOI
TL;DR: This work proposes a recurrent residual network (Re-ResNet) architecture that is capable of learning a joint spectral-spatial-temporal feature representation within a unitized framework, and has the potential to produce consistent-quality urban land cover and LCZ maps on a large scale.
Abstract: The local climate zone (LCZ) scheme was originally proposed to provide an interdisciplinary taxonomy for urban heat island (UHI) studies. In recent years, the scheme has also become a starting point for the development of higher-level products, as the LCZ classes can help provide a generalized understanding of urban structures and land uses. LCZ mapping can therefore theoretically aid in fostering a better understanding of spatio-temporal dynamics of cities on a global scale. However, reliable LCZ maps are not yet available globally. As a first step toward automatic LCZ mapping, this work focuses on LCZ-derived land cover classification, using multi-seasonal Sentinel-2 images. We propose a recurrent residual network (Re-ResNet) architecture that is capable of learning a joint spectral-spatial-temporal feature representation within a unitized framework. To this end, a residual convolutional neural network (ResNet) and a recurrent neural network (RNN) are combined into one end-to-end architecture. The ResNet is able to learn rich spectral-spatial feature representations from single-seasonal imagery, while the RNN can effectively analyze temporal dependencies of multi-seasonal imagery. Cross validations were carried out on a diverse dataset covering seven distinct European cities, and a quantitative analysis of the experimental results revealed that the combined use of the multi-temporal information and Re-ResNet results in an improvement of approximately 7 percent points in overall accuracy. The proposed framework has the potential to produce consistent-quality urban land cover and LCZ maps on a large scale, to support scientific progress in fields such as urban geography and urban climatology.

Journal ArticleDOI
TL;DR: A Time-Series Classification approach based on Change Detection (TSCCD) for rapid LULC mapping that uses the Prophet algorithm to detect the ground-cover change-points and perform time-series segmentation in a time dimension and the DTW algorithm to classify the sub-time series.
Abstract: Land-Use/Land-Cover Time-Series Classification (LULC-TSC) is an important and challenging problem in terrestrial remote sensing. Detecting change-points, dividing the entire time series into multiple invariant subsequences, and classifying the subsequences can improve LULC classification efficiency. Therefore, we have proposed a Time-Series Classification approach based on Change Detection (TSCCD) for rapid LULC mapping that uses the Prophet algorithm to detect the ground-cover change-points and perform time-series segmentation in a time dimension and the DTW (Dynamic Time Warping) algorithm to classify the sub-time series. Since we can assume that the ground cover remains unchanged in each subsequence, only one time-training sample selection and one LULC classification are needed, which greatly improves the work efficiency. Prophet can accurately detect large and subtle changes, capture change direction and change rate, and is strongly robust for handling noise and missing data. DTW is mainly used to improve the accuracy of time-series classification and to resolve the time misalignment problems of ground-cover series data caused by irregular observations or missing values. The results of comparative experiments with BFAST, LandTrendR, and CCDC using simulated time-series showed that TSCCD can detect large and subtle changes and capture change direction and change rate, performing substantially better than the other three contrasting algorithms overall in time-series change detection. Finally, the MODIS (Moderate Resolution Imaging Spectroradiometer) time-series images of Wuhan City from 2000 to 2018 were selected for TSCCD, and the results of China’s national land-use surveys in 2000, 2005, 2008, 2010, 2013, and 2015 were used for cross-validation. The results showed that the classification accuracy of each tested subsequence was higher than 90% and that most Kappa coefficients were greater than 0.9. This means that the proposed TSCCD approach can effectively solve real LULC-TSC problems and has high application value. It can be used for large-area, long time-series LULC classification, which is of great guiding significance for studying global environmental changes, forest-cover changes, and conducting land-use surveys.