scispace - formally typeset
Search or ask a question

Showing papers by "Peter Reinartz published in 2017"


Journal ArticleDOI
TL;DR: With its 10-m spatial resolution and 5-day revisit frequency, Sentinel-2 imagery can mitigate the Mediterranean seagrass distribution data gap and allow better management and conservation in the future in a retrospective, time- and cost-effective fashion.

114 citations


Journal ArticleDOI
TL;DR: A methodology, which automatically generates a full resolution binary building mask out of a Digital Surface Model (DSM) using a Fully Convolution Network (FCN) architecture, is proposed and is able to extract accurate building footprints which are close to the buildings original shapes to a high degree.
Abstract: Building detection and footprint extraction are highly demanded for many remote sensing applications Though most previous works have shown promising results, the automatic extraction of building footprints still remains a nontrivial topic, especially in complex urban areas Recently developed extensions of the CNN framework made it possible to perform dense pixel-wise classification of input images Based on these abilities we propose a methodology, which automatically generates a full resolution binary building mask out of a Digital Surface Model (DSM) using a Fully Convolution Network (FCN) architecture The advantage of using the depth information is that it provides geometrical silhouettes and allows a better separation of buildings from background as well as through its invariance to illumination and color variations The proposed framework has mainly two steps Firstly, the FCN is trained on a large set of patches consisting of normalized DSM (nDSM) as inputs and available ground truth building mask as target outputs Secondly, the generated predictions from FCN are viewed as unary terms for a Fully connected Conditional Random Fields (FCRF), which enables us to create a final binary building mask A series of experiments demonstrate that our methodology is able to extract accurate building footprints which are close to the buildings original shapes to a high degree The quantitative and qualitative analysis show the significant improvements of the results in contrast to the multy-layer fully connected network from our previous work

68 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel feature matching method for nadir or slightly tilted images of UAV and aerial images that achieves not only decimeter-level registration accuracy, but also comparable global accuracy as the reference images.
Abstract: Recent years have witnessed the fast development of UAVs (unmanned aerial vehicles). As an alternative to traditional image acquisition methods, UAVs bridge the gap between terrestrial and airborne photogrammetry and enable flexible acquisition of high resolution images. However, the georeferencing accuracy of UAVs is still limited by the low-performance on-board GNSS and INS. This paper investigates automatic geo-registration of an individual UAV image or UAV image blocks by matching the UAV image(s) with a previously taken georeferenced image, such as an individual aerial or satellite image with a height map attached or an aerial orthophoto with a DSM (digital surface model) attached. As the biggest challenge for matching UAV and aerial images is in the large differences in scale and rotation, we propose a novel feature matching method for nadir or slightly tilted images. The method is comprised of a dense feature detection scheme, a one-to-many matching strategy and a global geometric verification scheme. The proposed method is able to find thousands of valid matches in cases where SIFT and ASIFT fail. Those matches can be used to geo-register the whole UAV image block towards the reference image data. When the reference images offer high georeferencing accuracy, the UAV images can also be geolocalized in a global coordinate system. A series of experiments involving different scenarios was conducted to validate the proposed method. The results demonstrate that our approach achieves not only decimeter-level registration accuracy, but also comparable global accuracy as the reference images.

47 citations


Journal ArticleDOI
TL;DR: A deep learning architecture is proposed to classify hyperspectral remote sensing imagery by joint utilization of spectral–spatial information and the obtained results indicate the superiority of the proposed spectral-spatial deepLearning architecture against the conventional classification methods.
Abstract: Classification of hyperspectral remote sensing imagery is one of the most popular topics because of its intrinsic potential to gather spectral signatures of materials and provides distinct abilities to object detection and recognition. In the last decade, an enormous number of methods were suggested to classify hyperspectral remote sensing data using spectral features, though some are not using all information and lead to poor classification accuracy; on the other hand, the exploration of deep features is recently considered a lot and has turned into a Research hot spot in the geoscience and remote sensing research community to enhance classification accuracy. A deep learning architecture is proposed to classify hyperspectral remote sensing imagery by joint utilization of spectral-spatial information. A stacked sparse autoencoder provides unsupervised feature learning to extract high-level feature representations of joint spectral– spatial information; then, a soft classifier is employed to train high-level features and to fine-tune the deep learning architecture. Comparative experiments are performed on two widely used hyperspectral remote sensing data (Salinas and PaviaU) and a coarse resolution hyperspectral data in the long-wave infrared range. The obtained results indicate the superiority of the proposed spectral-spatial deep learning architecture against the conventional classification methods.

30 citations


Journal ArticleDOI
TL;DR: In this paper, the authors evaluated the quantitative application of Planet high resolution imagery for the detection of seagrasses in the Thermaikos Gulf, NW Aegean Sea, Greece.
Abstract: Seagrasses are one of the most productive and widespread yet threatened coastal ecosystems on Earth Despite their importance, they are declining due to various threats, which are mainly anthropogenic Lack of data on their distribution hinders any effort to rectify this decline through effective detection, mapping and monitoring Remote sensing can mitigate this data gap by allowing retrospective quantitative assessment of seagrass beds over large and remote areas In this paper, we evaluate the quantitative application of Planet high resolution imagery for the detection of seagrasses in the Thermaikos Gulf, NW Aegean Sea, Greece The low Signal-to-noise Ratio (SNR), which characterizes spectral bands at shorter wavelengths, prompts the application of the Unmixing-based denoising (UBD) as a pre-processing step for seagrass detection A total of 15 spectral-temporal patterns is extracted from a Planet image time series to restore the corrupted blue and green band in the processed Planet image Subsequently, we implement Lyzenga’s empirical water column correction and Support Vector Machines (SVM) to evaluate quantitative benefits of denoising Denoising aids detection of Posidonia oceanica seagrass species by increasing its producer and user accuracy by 317 % and 104 %, correspondingly, with a respective increase in its Kappa value from 03 to 048 In the near future, our objective is to improve accuracies in seagrass detection by applying more sophisticated, analytical water column correction algorithms to Planet imagery, developing time- and cost-effective monitoring of seagrass distribution that will enable in turn the effective management and conservation of these highly valuable and productive ecosystems

27 citations


Journal ArticleDOI
TL;DR: The quality of the DSM is found to be robust to variations in image resolution, especially when the forest density is high, and the forest decrease results confirm that besides aerial photogrammetry data, very high resolution satellite data can deliver results with comparable quality as the ones derived from LiDAR, followed by TanDEM-X and Cartosat DSMs.
Abstract: Digital surface models (DSMs) derived from spaceborne and airborne sensors enable the monitoring of the vertical structures for forests in large areas. Nevertheless, due to the lack of an objective performance assessment for this task, it is difficult to select the most appropriate data source for DSM generation. In order to fill this gap, this paper performs change detection analysis including forest decrease and tree growth. The accuracy of the DSMs is evaluated by comparison with measured tree heights from inventory plots (field data). In addition, the DSMs are compared with LiDAR data to perform a pixel-wise quality assessment. DSMs from four different satellite stereo sensors (ALOS/PRISM, Cartosat-1, RapidEye and WorldView-2), one satellite InSAR sensor (TanDEM-X), two aerial stereo camera systems (HRSC and UltraCam) and two airborne laser scanning datasets with different point densities are adopted for the comparison. The case study is a complex central European temperate forest close to Traunstein in Bavaria, Germany. As a major experimental result, the quality of the DSM is found to be robust to variations in image resolution, especially when the forest density is high. The forest decrease results confirm that besides aerial photogrammetry data, very high resolution satellite data, such as WorldView-2, can deliver results with comparable quality as the ones derived from LiDAR, followed by TanDEM-X and Cartosat DSMs. The quality of the DSMs derived from ALOS and Rapid-Eye data is lower, but the main changes are still correctly highlighted. Moreover, the vertical tree growth and their relationship with tree height are analyzed. The major tree height in the study site is between 15 and 30 m and the periodic annual increments (PAIs) are in the range of 0.30–0.50 m.

25 citations


Journal Article
TL;DR: This work proposes a Generative Adversarial Network (GAN) for SAR image generation and test, whether the synthetic data can actually improve classification accuracy, and investigates the accuracy of large CNN classification models and pre-trained networks for SAR imaging systems.
Abstract: Very High Spatial Resolution (VHSR) large-scale SAR image databases are still an unresolved issue in the Remote Sensing field. In this work, we propose such a dataset and use it to explore patch-based classification in urban and periurban areas, considering 7 distinct semantic classes. In this context, we investigate the accuracy of large CNN classification models and pre-trained networks for SAR imaging systems. Furthermore, we propose a Generative Adversarial Network (GAN) for SAR image generation and test, whether the synthetic data can actually improve classification accuracy.

24 citations


Journal ArticleDOI
TL;DR: A new approach that detects rough building boundaries (building mask) from Digital Surface Model data and then refines the resulting mask by classifying the geometrical features of the high spatial resolution panchromatic satellite image is proposed.
Abstract: Efficient and fully automatic building outline extraction and simplification methods are highly demanded for three-dimensional model reconstruction tasks. In spite of the efforts put into developing such methods, the results of the recently proposed methods are still not satisfactory, especially for satellite images, due to object complexities and the presence of noise. Dealing with this problem, in this article, we propose a new approach that detects rough building boundaries (building mask) from Digital Surface Model data and then refines the resulting mask by classifying the geometrical features of the high spatial resolution panchromatic satellite image. The refined mask represents finer details of the building outlines, which are close to the original building edges. These outlines are then simplified through a parameterization phase wherein a tracing algorithm detects the building boundary points from the refined masks and a set of line segments is fitted to them. After that, for each building, the existing main orientations are determined based on the length and arc lengths of the building's line segments. Our method is able to determine the multiple main orientations of complex buildings. Through a regularization process, the line segments are then aligned and adjusted according to the building's main orientations. Finally, the adjusted line segments are intersected and connected to each other in order to form a polygon representing the building's outlines. Experimental results demonstrate that the computed building outlines are highly accurate and simple, even for large and complex buildings with inner yards.

20 citations


Journal ArticleDOI
TL;DR: SimGeoI, a simulation framework for the object-related interpretation of optical and SAR images, is introduced as a solution to the problem of sensor-specific geometric distortion in high-resolution satellite data.
Abstract: The successful alignment of optical and synthetic aperture radar (SAR) satellite data requires that we account for the effects of sensor-specific geometric distortion, which is a consequence of the different imaging concepts of the sensors. This paper introduces SimGeoI, a simulation framework for the object-related interpretation of optical and SAR images, as a solution to this problem. Using metainformation from the images and a digital surface model as input, the processor follows the steps of scene definition, ray tracing, image generation, geocoding, interpretation layer generation, and image part extraction. Thereby, for the first time, object-related sections of optical and SAR images are automatically identified and extracted in world coordinates under consideration of three-dimensional object shapes. A case study for urban scenes in Munich and London, based on WorldView-2 images and high-resolution TerraSAR-X data, confirms the potential of SimGeoI in the context of a perspective-independent and object-focused analysis of high-resolution satellite data.

17 citations


Journal ArticleDOI
TL;DR: A new semi-automatic method is proposed to generate training and test patches of each roof type in the library, and using the pre-trained CNN model does not only decrease the computation time for training significantly but also increases the classification accuracy.
Abstract: 3D building reconstruction from remote sensing image data from satellites is still an active research topic and very valuable for 3D city modelling The roof model is the most important component to reconstruct the Level of Details 2 (LoD2) for a building in 3D modelling While the general solution for roof modelling relies on the detailed cues (such as lines, corners and planes) extracted from a Digital Surface Model (DSM), the correct detection of the roof type and its modelling can fail due to low quality of the DSM generated by dense stereo matching To reduce dependencies of roof modelling on DSMs, the pansharpened satellite images as a rich resource of information are used in addition In this paper, two strategies are employed for roof type classification In the first one, building roof types are classified in a state-of-the-art supervised pre-trained convolutional neural network (CNN) framework In the second strategy, deep features from deep layers of different pre-trained CNN model are extracted and then an RBF kernel using SVM is employed to classify the building roof type Based on roof complexity of the scene, a roof library including seven types of roofs is defined A new semi-automatic method is proposed to generate training and test patches of each roof type in the library Using the pre-trained CNN model does not only decrease the computation time for training significantly but also increases the classification accuracy

17 citations


Journal ArticleDOI
TL;DR: In this article, a feature-level fusion strategy is applied based on extraction of several recent proposed spectral and structural features from hyperspectral and LiDAR data, respectively, in order to optimize classification performance.
Abstract: One of the most sophisticated recent data fusions in remote sensing has involved the use of LiDAR and hyperspectral data. Feature-level fusion strategy is applied based on extraction of several recent proposed spectral and structural features from hyperspectral and LiDAR data, respectively. In order to optimize classification performance, feature selection and determination of classifier parameters are carried out simultaneously. Referring to complexity of search space, cuckoo search as a powerful metaheuristic optimization algorithm is applied. Experiments show that the proposed method can improve the overall classification accuracy up to 6% with respect to only hyperspectral imagery. The obtained results show the classification improvement for the tree, residential and commercial classes is about 4%, 21% and 35%, respectively.

Journal ArticleDOI
TL;DR: A validation of Sentinel-2 retrieved water vapor over land is performed by comparing the scene-derived water vapor column with the Aerosol Robotic Network (AERONET) measurement as an independent source.
Abstract: Water vapor is one of the main parameters for atmospheric correction of Sentinel-2 imagery. Together with the aerosol retrieval it determines the accuracy of the surface reflectance product. Since Sentinel-2A and soon Sentinel-2B are operational satellites with a free data policy there is great interest in processing this data and using it for environmental studies. This contribution performs a validation of Sentinel-2 retrieved water vapor over land by comparing the scene-derived water vapor column with the AERONET measurement as an independent source. The validation is performed for a large range of AERONET site elevations and solar zenith angles using the atmospheric precorrected differential absorption (APDA) technique. Results show a high correlation with a low RMSE of about 0.1 cm for water vapor values from 0.2 to 5 cm.

Journal ArticleDOI
TL;DR: An algorithm for robustly fusing digital surface models (DSMs) with different ground sampling distances and confidences, using explicit surface priors to obtain locally smooth surface models is proposed.
Abstract: In this paper, we propose an algorithm for robustly fusing digital surface models (DSMs) with different ground sampling distances and confidences, using explicit surface priors to obtain locally smooth surface models. Robust fusion of the DSMs is achieved by minimizing the L1-distance of each pixel of the solution to each input DSM. This approach is similar to a pixel-wise median, and most outliers are discarded. We further incorporate local planarity assumption as an additional constraint to the optimization problem, thus reducing the noise compared with pixel-wise approaches. The optimization is also inherently able to include weights for the input data, therefore allowing to easily integrate invalid areas, fuse multiresolution DSMs, and to weight the input data. The complete optimization problem is constructed as a variational optimization problem with a convex energy functional, such that the solution is guaranteed to converge toward the global energy minimum. An efficient solver is presented to solve the optimization in reasonable time, e.g., running in real time on standard computer vision camera images. The accuracy of the algorithms and the quality of the resulting fused surface models are evaluated using synthetic data sets and spaceborne data sets from different optical satellite sensors.

Journal ArticleDOI
TL;DR: In this paper, a decision-based multi-sensor classification system is proposed to completely use the advantages of both sensors to attain enhanced land-cover classification results, where spectral, textural and spatial features are extracted for the proposed multilevel classification.
Abstract: Multi-sensor data fusion has become more and more popular for classification applications. The fusion of multisource remote-sensing data can provide more information about the same observed site results in a superior comprehension of the scene. In this field of study, a combination of very high-resolution data collected by a digital color camera and a new coarse resolution hyperspectral data in the long-wave infrared range for urban land-cover classification has been extensively enticed much consideration and turned into a research hot spot in image analysis and data fusion research community. In this paper, a decision-based multi-sensor classification system is proposed to completely use the advantages of both sensors to attain enhanced land-cover classification results. In this context, spectral, textural and spatial features are extracted for the proposed multilevel classification. Then, a land-cover separability preprocessing is employed to identify how the proposed method can fully utilize th...

Journal ArticleDOI
TL;DR: This method, implemented on a standard computer (CPU), exploits a full perspective projection model and provides near real-time 3D pose detection of a satellite for close-range approach and manipulation and able to initialize a local tracking method.
Abstract: In this paper we present a learning-based 3D detection of a highly challenging specular object exposed to a direct sunlight at very close-range. An object detection is one of the most important areas of image processing, and can also be used for initialization of local visual tracking methods. While the object detection in 3D space is generally a difficult problem, it poses more difficulties when the object is specular and exposed to the direct sunlight as in a space environment. Our solution to a such problem relies on an appearance learning of a real satellite mock-up based on a vector quantization and the vocabulary tree. Our method, implemented on a standard computer (CPU), exploits a full perspective projection model and provides near real-time 3D pose detection of a satellite for close-range approach and manipulation. The time consuming part of the training (feature description, building the vocabulary tree and indexing, depth buffering and back-projection) are performed offline, while a fast image retrieval and 3D-2D registration are performed on-line. In contrast, the state of the art image-based 3D pose detection methods are slower on CPU or assume a weak perspective camera projection model. In our case the dimension of the satellite is larger than the distance to the camera, hence the assumption of the weak perspective model does not hold. To evaluate the proposed method, the appearance of a full scale mock-up of the rear part of the TerraSAR-X satellite is trained under various illumination and camera views. The training images are captured with a camera mounted on six degrees of freedom robot, which enables to position the camera in a desired view, sampled over a sphere. The views that are not within the workspace of the robot are interpolated using image-based rendering. Moreover, we generate ground truth poses to verify the accuracy of the detection algorithm. The achieved results are robust and accurate even under noise due to specular reflection, and able to initialize a local tracking method.

Journal ArticleDOI
TL;DR: Results show that using TIR beside VIS image improves classification accuracy of roads and buildings in urban area and the optimum results obtained are obtained based on the proposed method which gains 94 percent and 92 percent.
Abstract: Recently, classification of urban area based on multi-sensor fusion has been widely investigated. In this paper, the potential of using visible (VIS) and thermal infrared (TIR) hyperspectral images fusion for classification of urban area is evaluated. For this purpose, comprehensive spatial-spectral feature space is generated which includes vegetation index, differential morphological profile (DMP), attribute profile (AP), texture, geostatistical features, structural feature set (SFS) and local statistical descriptors from both datasets in addition to original datasets. Although Support Vector Machine (SVM) is an appropriate tool in the classification of high dimensional feature space, its performance is significantly affected by its parameters and feature space. Cuckoo search (CS) optimization algorithm with mixed binary-continuous coding is proposed for feature selection and SVM parameter determination simultaneously. Moreover, the significance of each selected feature category in the classification of a specific object is verified. Accuracy assessment on two subsets shows that stacking of VIS and TIR bands can improve the classification performance to 87 percent and 82 percent for two subsets, compare to VIS image (72 percent and 80 percent) and TIR image (50 percent and 56 percent). However, the optimum results obtained based on the proposed method which gains 94 percent and 92 percent. Furthermore, results show that using TIR beside VIS image improves classification accuracy of roads and buildings in urban area.

Proceedings ArticleDOI
01 Mar 2017
TL;DR: A Bayesian linear regression method for person density estimation in extremely crowded areas in aerial images is proposed and the effectiveness of the proposed method for crowd density estimation is demonstrated.
Abstract: In this paper, we propose a Bayesian linear regression method for person density estimation in extremely crowded areas in aerial images. The fundamental idea is to learn a mapping function from local features to crowd density. In order to describe the appearances of persons within a crowd in aerial images, local texture features are computed for each small local neighborhood. Then we cast the problem as a linear regression. In order to model the nonlinearity between local features and crowd density, Gaussian basis functions are used and their locations are determined by a k-means clustering. Crowd density can be estimated by Bayesian inference. However, due to the presence of a hyper-prior distribution, variational inference is applied to compute the predictive distribution. Through experiments, the effectiveness of the proposed method for crowd density estimation is demonstrated.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: The basics and functionality of SimGeoI, a simulation-based framework for the automated interpretation and alignment of optical and SAR remote sensing data, and possible applications of the framework are indicated with results of a case study for Istanbul.
Abstract: This paper presents the basics and functionality of SimGeoI, a simulation-based framework for the automated interpretation and alignment of optical and SAR remote sensing data. SimGeoI has been developed in order to align optical and SAR data based on given geometric information about objects represented by digital surface models. Thereby, the analysis of urban scenes is possible with independence of sensor type and perspective. After a brief introduction of the processor environment, possible applications of the framework are indicated with results of a case study for Istanbul (WorldView-2 and TerraSAR-X data). In this context, opportunities in the context of a joint analysis of high resolution optical and SAR data are addressed, i.e. concerning data fusion, change detection, and machine learning tasks.

Journal ArticleDOI
TL;DR: In this article, UBD is used to enable analysis of a hyperspectral image acquired over a coral reef system in the Red Sea based on derivative features, by forcing each spectrum to a linear combination of other reference spectra.
Abstract: Coral reefs, among the world’s most biodiverse and productive submerged habitats, have faced several mass bleaching events due to climate change during the past 35 years In the course of this century, global warming and ocean acidification are expected to cause corals to become increasingly rare on reef systems This will result in a sharp decrease in the biodiversity of reef communities and carbonate reef structures Coral reefs may be mapped, characterized and monitored through remote sensing Hyperspectral images in particular excel in being used in coral monitoring, being characterized by very rich spectral information, which results in a strong discrimination power to characterize a target of interest, and separate healthy corals from bleached ones Being submerged habitats, coral reef systems are difficult to analyse in airborne or satellite images, as relevant information is conveyed in bands in the blue range which exhibit lower signal-to-noise ratio (SNR) with respect to other spectral ranges; furthermore, water is absorbing most of the incident solar radiation, further decreasing the SNR Derivative features, which are important in coral analysis, result greatly affected by the resulting noise present in relevant spectral bands, justifying the need of new denoising techniques able to keep local spatial and spectral features In this paper, Unmixing-based Denoising (UBD) is used to enable analysis of a hyperspectral image acquired over a coral reef system in the Red Sea based on derivative features UBD reconstructs pixelwise a dataset with reduced noise effects, by forcing each spectrum to a linear combination of other reference spectra, exploiting the high dimensionality of hyperspectral datasets Results show clear enhancements with respect to traditional denoising methods based on spatial and spectral smoothing, facilitating the coral detection task

Journal ArticleDOI
TL;DR: In this article, a post-classification approach is proposed for building change detection using satellite stereo imagery, where a digital surface model (DSM) is generated from satellite stereo images and further refined by using a segmentation result obtained from the Sobel gradients of the panchromatic image.
Abstract: . Automatic extraction of building changes is important for many applications like disaster monitoring and city planning. Although a lot of research work is available based on 2D as well as 3D data, an improvement in accuracy and efficiency is still needed. The introducing of digital surface models (DSMs) to building change detection has strongly improved the resulting accuracy. In this paper, a post-classification approach is proposed for building change detection using satellite stereo imagery. Firstly, DSMs are generated from satellite stereo imagery and further refined by using a segmentation result obtained from the Sobel gradients of the panchromatic image. Besides the refined DSMs, the panchromatic image and the pansharpened multispectral image are used as input features for mean-shift segmentation. The DSM is used to calculate the nDSM, out of which the initial building candidate regions are extracted. The candidate mask is further refined by morphological filtering and by excluding shadow regions. Following this, all segments that overlap with a building candidate region are determined. A building oriented segments merging procedure is introduced to generate a final building rooftop mask. As the last step, object based change detection is performed by directly comparing the building rooftops extracted from the pre- and after-event imagery and by fusing the change indicators with the roof-top region map. A quantitative and qualitative assessment of the proposed approach is provided by using WorldView-2 satellite data from Istanbul, Turkey.

Journal ArticleDOI
TL;DR: This paper adopts a fully Bayesian treatment for learning the classifier of logistic regression, which has a number of obvious advantages over other learning methods.
Abstract: . In this paper a method for building detection in aerial images based on variational inference of logistic regression is proposed. It consists of three steps. In order to characterize the appearances of buildings in aerial images, an effective bag-of-Words (BoW) method is applied for feature extraction in the first step. In the second step, a classifier of logistic regression is learned using these local features. The logistic regression can be trained using different methods. In this paper we adopt a fully Bayesian treatment for learning the classifier, which has a number of obvious advantages over other learning methods. Due to the presence of hyper prior in the probabilistic model of logistic regression, approximate inference methods have to be applied for prediction. In order to speed up the inference, a variational inference method based on mean field instead of stochastic approximation such as Markov Chain Monte Carlo is applied. After the prediction, a probabilistic map is obtained. In the third step, a fully connected conditional random field model is formulated and the probabilistic map is used as the data term in the model. A mean field inference is utilized in order to obtain a binary building mask. A benchmark data set consisting of aerial images and digital surfaced model (DSM) released by ISPRS for 2D semantic labeling is used for performance evaluation. The results demonstrate the effectiveness of the proposed method.