scispace - formally typeset
Search or ask a question

Showing papers in "ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences in 2019"


Journal ArticleDOI
TL;DR: In this article, the authors exploit the freely available data acquired by the Sentinel satellites of the Copernicus program implemented by the European Space Agency, as well as the cloud computing facilities of Google Earth Engine, to provide a dataset consisting of 180,662 triplets of dual-pol synthetic aperture radar (SAR) image patches, multi-spectral Sentinel-2 image patches and MODIS land cover maps.
Abstract: . The availability of curated large-scale training data is a crucial factor for the development of well-generalizing deep learning methods for the extraction of geoinformation from multi-sensor remote sensing imagery. While quite some datasets have already been published by the community, most of them suffer from rather strong limitations, e.g. regarding spatial coverage, diversity or simply number of available samples. Exploiting the freely available data acquired by the Sentinel satellites of the Copernicus program implemented by the European Space Agency, as well as the cloud computing facilities of Google Earth Engine, we provide a dataset consisting of 180,662 triplets of dual-pol synthetic aperture radar (SAR) image patches, multi-spectral Sentinel-2 image patches, and MODIS land cover maps. With all patches being fully georeferenced at a 10 m ground sampling distance and covering all inhabited continents during all meteorological seasons, we expect the dataset to support the community in developing sophisticated deep learning-based approaches for common tasks such as scene classification or semantic segmentation for land cover mapping.

138 citations


Journal ArticleDOI
TL;DR: A method that first locates selected classes of objects whose sizes are approximately known, then, it leverages this property to estimate the water level and shows the ability of the trained model to recognize objects and at the same time predict correctly flood-water level.
Abstract: . In the event of a flood, being able to build accurate flood level maps is essential for supporting emergency plan operations. In order to build such maps, it is important to collect observations from the disaster area. Social media platforms can be useful sources of information in this case, as people located in the flood area tend to share text and pictures depicting the current situation. Developing an effective and fully automatized method able to retrieve data from social media and extract useful information in real-time is crucial for a quick and proper response to these catastrophic events. In this paper, we propose a method to quantify flood-water from images gathered from social media. If no prior information about the zone where the picture was taken is available, one possible way to estimate the flood level consists of assessing how much the objects appearing in the image are submerged in water. There are various factors that make this task difficult: i) the precise size of the objects appearing in the image might not be known; ii) flood-water appearing in different zones of the image scene might have different height; iii) objects may be only partially visible as they can be submerged in water. In order to solve these problems, we propose a method that first locates selected classes of objects whose sizes are approximately known, then, it leverages this property to estimate the water level. To prove the validity of this approach, we first build a flood-water image dataset, then we use it to train a deep learning model. We finally show the ability of our trained model to recognize objects and at the same time predict correctly flood-water level.

44 citations


Journal ArticleDOI
TL;DR: A fully automatic workflow to aggregate cloud-free Sentinel-2 images for user-defined areas of interest and time periods, which can be significantly shorter than the one-year time frames that are commonly used in other multi-temporal image aggregation approaches.
Abstract: . Cloud coverage is one of the biggest concerns in spaceborne optical remote sensing, because it hampers a continuous monitoring of the Earth’s surface. Based on Google Earth Engine, a web- and cloud-based platform for the analysis and visualization of large-scale geospatial data, we present a fully automatic workflow to aggregate cloud-free Sentinel-2 images for user-defined areas of interest and time periods, which can be significantly shorter than the one-year time frames that are commonly used in other multi-temporal image aggregation approaches. We demonstrate the feasibility of our workflow for several cities spread around the globe and affected by different amounts of average cloud cover. The experimental results confirm that our results are better than the results achieved by standard approaches for cloud-free image aggregation.

41 citations


Journal ArticleDOI
TL;DR: This work proposes the application of SSCNs for efficient semantic segmentation of voxelized ALS point clouds in an end-to-end encoder-decoder architecture and demonstrates its capabilities regarding large-scale ALS data.
Abstract: . Semantic segmentation of point clouds is one of the main steps in automated processing of data from Airborne Laser Scanning (ALS). Established methods usually require expensive calculation of handcrafted, point-wise features. In contrast, Convolutional Neural Networks (CNNs) have been established as powerful classifiers, which at the same time also learn a set of features by themselves. However, their application to ALS data is not trivial. Pure 3D CNNs require a lot of memory and computing time, therefore most related approaches project ALS point clouds into two-dimensional images. Sparse Submanifold Convolutional Networks (SSCNs) address this issue by exploiting the sparsity often inherent in 3D data. In this work, we propose the application of SSCNs for efficient semantic segmentation of voxelized ALS point clouds in an end-to-end encoder-decoder architecture. We evaluate this method on the ISPRS Vaihingen 3D Semantic Labeling benchmark and achieve state-of-the-art 85.0% overall accuracy. Furthermore, we demonstrate its capabilities regarding large-scale ALS data by classifying a 2.5 km2 subset containing 41 M points from the Actueel Hoogtebestand Nederland (AHN3) with 95% overall accuracy in just 48 s inference time or with 96% in 108 s.

40 citations


Journal ArticleDOI
TL;DR: In this article, a comparison of Single Photon LiDAR (SPL) and full-waveform Li-DAR data acquired in July and September 2018 in the City of Vienna is presented.
Abstract: . Single photon sensitive LiDAR sensors are currently competing with conventional multi-photon laser scanning systems. The advantage of the prior is the potentially higher area coverage performance, which comes at the price of an increased outlier rate and a lower ranging accuracy. In this contribution, the principles of both technologies are reviewed with special emphasis on their respective properties. In addition, a comparison of Single Photon LiDAR (SPL) and FullWaveform LiDAR data acquired in July and September 2018 in the City of Vienna are presented. From data analysis we concluded that (i) less flight strips are needed to cover the same area with comparable point density with SPL, (ii) the sharpness of the resulting 3D point cloud is higher for the waveform LiDAR dataset, (iii) SPL exhibits moderate vegetation penetration under leaf-on conditions, and (iv) the dispersion of the SPL point cloud assessed in smooth horizontal surface parts competes with waveform LiDAR but is higher by a factor of 2–3 for inclined and grassy surfaces, respectively. Still, SPL yielded satisfactory precision measures mostly below 10 cm.

30 citations


Journal ArticleDOI
TL;DR: A first rigorous integration of these two tasks, the hybrid orientation of lidar point clouds and aerial images, is presented in this work.
Abstract: . Airborne LiDAR (Light Detection And Ranging) and airborne photogrammetry are both proven and widely used techniques for the 3D topographic mapping of extended areas. Although both techniques are based on different reconstruction principles (polar measurement vs. ray triangulation), they ultimately serve the same purpose, the 3D reconstruction of the Earth’s surface, natural objects or infrastructure. It is therefore obvious for many applications to integrate the data from both techniques to generate more accurate and complete results. Many works have been published on this topic of data fusion. However, no rigorous integrated solution exists for the first two steps that need to be carried out after data acquisition, namely (a) the lidar strip adjustment and (b) the aerial triangulation. A consequence of solving these two optimization problems independently can be large discrepancies (of up to several decimeters) between the lidar block and the image block. This is especially the case in challenging situations, e.g. corridor mapping with one strip only or in case few or no ground control data. To avoid this problem and thereby profit from many other advantages, a first rigorous integration of these two tasks, the hybrid orientation of lidar point clouds and aerial images, is presented in this work.

30 citations


Journal ArticleDOI
TL;DR: A LoD1 reconstruction service that generates several heights per building (both for the ground surface and the extrusion height) and reports on the spatial analysis that is performed on the generated height values.
Abstract: The 3D representation of buildings with roof shapes (also called LoD2) is popular in the 3D city modelling domain since it provides a realistic view of 3D city models. However, for many application block models of buildings are sufficient or even more suitable. These so called LoD1 models can be reconstructed relatively easily from building footprints and point clouds. But LoD1 representations for the same building can be rather different because of differences in height references used to reconstruct the block models and differences in underlying statistical calculation methods. Users are often not aware of these differences, while these differences may have an impact on the outcome of spatial analyses. To standardise possible variances of LoD1 models and let the users choose the best one for their application, we have developed a LoD1 reconstruction service that generates several heights per building (both for the ground surface and the extrusion height). The building models are generated for all ~10 million buildings in The Netherlands based on footprints of buildings and LiDAR point clouds. The 3D dataset is updated every month automatically. In addition, for each building quality parameters are calculated and made available. This article describes the development of the LoD1 building service and we report on the spatial analysis that we performed on the generated height values.

25 citations


Journal ArticleDOI
TL;DR: In this paper, a novel approach for detecting broad-leaved dock from orthomosaics obtained using a commercial available quadrotor (DJI Phantom 3 PRO) is proposed.
Abstract: Broad-leaved dock (Rumex obtusifolius) is a fast growing and spreading weed and is one of the most common weeds in production grasslands in the Netherlands. The heavy occurrence, fast growth and negative environmental-agricultural impact makes Rumex a species important to control. Current control is done directly in the field by mechanical or chemical actuation methods as soon as the plants are found in situ by the farmer. In nature conservation areas control is much more difficult because spraying is not allowed. This reduces the amount of grass and its quality. Rumex could be rapidly detected using high-resolution RGB images obtained from a UAV and optimize the plant control practices in wide nature conservation areas. In this paper, a novel approach for Rumex detection from orthomosaics obtained using a commercial available quadrotor (DJI Phantom 3 PRO) is proposed. The results obtained shown that Rumex can be detected up to 90% from a 6 mm/pixel ortho-mosaic generated from an aerial survey and using deep learning.

25 citations


Journal ArticleDOI
TL;DR: This paper presents an approach for a consistent 3D reconstruction of LOD1 models on the basis of 3D point clouds, DTM, and 2D footprints of buildings, and can be easily extended for higher LODs or BIM models.
Abstract: . 3D modelling of precincts and cities has significantly advanced in the last decades, as we move towards the concept of the Digital Twin. Many 3D city models have been created but a large portion of them neglect representing terrain and buildings accurately. Very often the surface is either considered planar or is not represented. On the other hand, many Digital Terrain Models (DTM) have been created as 2.5D triangular irregular networks (TIN) or grids for different applications such as water management, sign of view or shadow computation, tourism, land planning, telecommunication, military operations and communications. 3D city models need to represent both the 3D objects and terrain in one consistent model, but still many challenges remain. A critical issue when integrating 3D objects and terrain is the identification of the valid intersection between 2.5D terrain and 3D objects. Commonly, 3D objects may partially float over or sink into the terrain; the depth of the underground parts might not be known; or the accuracy of data sets might be different. This paper discusses some of these issues and presents an approach for a consistent 3D reconstruction of LOD1 models on the basis of 3D point clouds, DTM, and 2D footprints of buildings. Such models are largely used for urban planning, city analytics or environmental analysis. The proposed method can be easily extended for higher LODs or BIM models.

24 citations


Journal ArticleDOI
TL;DR: This paper develops and implements three 1-dimensional convolutional neural networks (CNN): the LucasCNN, the LucasResNet which contains an identity block as residual network, and the LucasCoordConv with an additional coordinates layer, and modify two existing 1D CNN approaches for the presented classification task.
Abstract: Soil texture is important for many environmental processes. In this paper, we study the classification of soil texture based on hyperspectral data. We develop and implement three 1-dimensional (1D) convolutional neural networks (CNN): the LucasCNN, the LucasResNet which contains an identity block as residual network, and the LucasCoordConv with an additional coordinates layer. Furthermore, we modify two existing 1D CNN approaches for the presented classification task. The code of all five CNN approaches is available on GitHub (Riese, 2019). We evaluate the performance of the CNN approaches and compare them to a random forest classifier. Thereby, we rely on the freely available LUCAS topsoil dataset. The CNN approach with the least depth turns out to be the best performing classifier. The LucasCoordConv achieves the best performance regarding the average accuracy. In future work, we can further enhance the introduced LucasCNN, LucasResNet and LucasCoordConv and include additional variables of the rich LUCAS dataset.

23 citations


Journal ArticleDOI
TL;DR: This research demonstrates that the accuracy of image classification can be improved by using a combination of OBIA and CNN methods, and can be used where manual preparation of training samples for CNN is not preferred.
Abstract: Urban trees offer significant benefits for improving the sustainability and liveability of cities, but its monitoring is a major challenge for urban planners. Remote-sensing based technologies can effectively detect, monitor and quantify urban tree coverage as an alternative to field-based measurements. Automatic extraction of urban land cover features with high accuracy is a challenging task and it demands artificial intelligence workflows for efficiency and thematic quality. In this context, the objective of this research is to map urban tree coverage per cadastral parcel of Sandy Bay, Hobart from very high-resolution aerial orthophoto and LiDAR data using an Object Based Convolution Neural Network (CNN) approach. Instead of manual preparation of a large number of required training samples, automatically classified Object based image analysis (OBIA) output is used as an input samples to train CNN method. Also, CNN output is further refined and segmented using OBIA to assess the accuracy. The result shows 93.2% overall accuracy for refined CNN classification. Similarly, the overlay of improved CNN output with cadastral parcel layer shows that 21.5% of the study area is covered by trees. This research demonstrates that the accuracy of image classification can be improved by using a combination of OBIA and CNN methods. Such a combined method can be used where manual preparation of training samples for CNN is not preferred. Also, our results indicate that the technique can be implemented to calculate parcel level statistics for urban tree coverage that provides meaningful metrics to guide urban planning and land management practices.

Journal ArticleDOI
TL;DR: A workflow for generating LoD3 CityG ML models based on textured LoD2 CityGML models by adding window and door objects by using “Faster R-CNN”, a deep neural network to obtain a more realistic appearance of facades.
Abstract: . The paper describes a workflow for generating LoD3 CityGML models (i.e. semantic building models with structured facades) based on textured LoD2 CityGML models by adding window and door objects. For each wall texture, bounding boxes of windows and doors are detected using “Faster R-CNN”, a deep neural network. We evaluate results for textures with different resolutions on the ICG Graz50 facade dataset. In general, detected bounding boxes match very well with the rectangular shape of most wall openings. Thus, no further classification of shapes is required. Windows are typically aligned to rows and columns, and only a few different types of windows exist for each facade. However, the neural network proposes rectangles of varying sizes, which are not always aligned perfectly. Thus, we use post-processing to obtain a more realistic appearance of facades. Window and door rectangles get aligned by solving a mixed integer linear optimization problem, which automatically leads to a clustering of these openings into few different classes of window and door types. Furthermore, an a-priori knowledge about the number of clusters is not required.

Journal ArticleDOI
TL;DR: In this paper, a multi-temporal 3D point clouds acquired with a laser scanner can be efficiently used for an area-wide assessment of landslide-induced surface changes and the resulting mean annual displacements are compared to the results of a geodetic monitoring based on an automatic tracking total station (ATTS) measuring 53 retroreflective prisms across the study area every hour since May 2016.
Abstract: . Multi-temporal 3D point clouds acquired with a laser scanner can be efficiently used for an area-wide assessment of landslide-induced surface changes. In the present study, displacements of the Vogelsberg landslide (Tyrol, Austria) are assessed based on available data acquired with airborne laser scanning (ALS) in 2013 and data acquired with an unmanned aerial vehicle (UAV) equipped with a laser scanner (ULS) in 2018. Following the data pre-processing steps including registration and ground filtering, buildings are segmented and extracted from the datasets. The roofs, represented as multi-temporal 3D point clouds are then used to derive displacement vectors with a novel matching tool based on the iterative closest point (ICP) algorithm. The resulting mean annual displacements are compared to the results of a geodetic monitoring based on an automatic tracking total station (ATTS) measuring 53 retroreflective prisms across the study area every hour since May 2016. In general, the results are in agreement concerning the mean annual magnitude (ATTS: 6.4 cm within 2.2 years, 2.9 cm a−1; laser scanning data: 13.2 cm within 5.4 years, 2.4 cm a−1) and direction of the derived displacements. The analysis of the laser scanning data proved suitable for deriving long-term landslide displacements and can provide additional information about the deformation of single roofs.

Journal ArticleDOI
TL;DR: A pipeline that reconstructs buildings of urban environments as concise polygonal meshes from airborne LiDAR scans is introduced, demonstrating its robustness, flexibility and scalability by producing accurate and compact 3D models over large and varied urban areas in a few minutes only.
Abstract: . We introduce a pipeline that reconstructs buildings of urban environments as concise polygonal meshes from airborne LiDAR scans. It consists of three main steps: classification, building contouring, and building reconstruction, the two last steps being achieved using computational geometry tools. Our algorithm demonstrates its robustness, flexibility and scalability by producing accurate and compact 3D models over large and varied urban areas in a few minutes only.

Journal ArticleDOI
TL;DR: This work proposes to use image-to-image translation to transform images from a rendered domain to a captured domain and shows that translated images in the captured domain are of higher quality than the rendered images.
Abstract: The performance of machine learning and deep learning algorithms for image analysis depends significantly on the quantity and quality of the training data. The generation of annotated training data is often costly, time-consuming and laborious. Data augmentation is a powerful option to overcome these drawbacks. Therefore, we augment training data by rendering images with arbitrary poses from 3D models to increase the quantity of training images. These training images usually show artifacts and are of limited use for advanced image analysis. Therefore, we propose to use image-to-image translation to transform images from a rendered domain to a captured domain. We show that translated images in the captured domain are of higher quality than the rendered images. Moreover, we demonstrate that image-to-image translation based on rendered 3D models enhances the performance of common computer vision tasks, namely feature matching, image retrieval and visual localization. The experimental results clearly show the enhancement on translated images over rendered images for all investigated tasks. In addition to this, we present the advantages utilizing translated images over exclusively captured images for visual localization.

Journal ArticleDOI
TL;DR: In this paper, the authors investigate how the temporal interval influences volume change observed on a sandy beach regarding the temporal detail of the change process and the total volume budget, on which accretion and erosion counteract.
Abstract: Geomorphic processes occur spatially variable and at varying magnitudes, frequencies and velocities, which poses a great challenge to current methods of topographic change analysis. For the quantification of surface change, permanent terrestrial laser scanning (TLS) can generate time series of 3D point clouds at high temporal and spatial resolution. We investigate how the temporal interval influences volume change observed on a sandy beach regarding the temporal detail of the change process and the total volume budget, on which accretion and erosion counteract. We use an hourly time series of TLS point clouds acquired over six weeks in Kijkduin, the Netherlands. A raster-based approach of elevation differencing provides the volume change over time per square meter. We compare the hourly analysis to results of a three- and six-week observation period. For the larger period, a volume increase of 0.3 m³/m² is missed on a forming sand bar before it disappears, which corresponds to half its volume. Generally, a strong relationship is shown between observation interval and observed volume change. An increase from weekly to daily observations leads to a five times larger volume change quantified in total. Another important finding is a temporally variable measurement uncertainty in the 3D time series, which follows the daily course of air temperature. Further experiments are required to fully understand the effect of atmospheric conditions on high-frequency TLS acquisition in beach environments. Continued research of 4D geospatial analysis methods will enable automatic identification of dynamic change and improve the understanding of geomorphic processes.

Journal ArticleDOI
TL;DR: A feature-based approach for semantic mesh segmentation in an urban scenario using real-world training data, which achieves close to 80% Overall Accuracy (OA) on dedicated test meshes and is compared with a default Random Forest classifier that performs slightly worse.
Abstract: . We propose a feature-based approach for semantic mesh segmentation in an urban scenario using real-world training data. There are only few works that deal with semantic interpretation of urban triangle meshes so far. Most 3D classifications operate on point clouds. However, we claim that point clouds are an intermediate product in the photogrammetric pipeline. For this reason, we explore the capabilities of a Convolutional Neural Network (CNN) based approach to semantically enrich textured urban triangle meshes as generated from LiDAR or Multi-View Stereo (MVS). For each face within a mesh, a feature vector is computed and fed into a multi-branch 1D CNN. Ordinarily, CNNs are an end-to-end learning approach operating on regularly structured input data. Meshes, however, are not regularly structured. By calculating feature vectors, we enable the CNN to process mesh data. By these means, we combine explicit feature calculation and feature learning (hybrid model). Our model achieves close to 80% Overall Accuracy (OA) on dedicated test meshes. Additionally, we compare our results with a default Random Forest (RF) classifier that performs slightly worse. In addition to slightly better performance, the 1D CNN trains faster and is faster at inference.

Journal ArticleDOI
TL;DR: The proposed classification of future AHN iterations is feasible but needs more experimentation, and two different models based on PointNet are defined to classify the most relevant elements in the case study data: Ground, vegetation and buildings.
Abstract: During the last couple of years, there has been an increased interest to develop new deep learning networks specifically for processing 3D point cloud data. In that context, this work intends to expand the applicability of one of these networks, PointNet, from the semantic segmentation of indoor scenes, to outdoor point clouds acquired with Airborne Laser Scanning (ALS) systems. Our goal is to of assist the classification of future iterations of a national wide dataset such as the Actueel Hoogtebestand Nederland (AHN), using a classification model trained with a previous iteration. First, a simple application such as ground classification is proposed in order to prove the capabilities of the proposed deep learning architecture to perform an efficient point-wise classification with aerial point clouds. Then, two different models based on PointNet are defined to classify the most relevant elements in the case study data: Ground, vegetation and buildings. While the model for ground classification performs with a F-score metric above 96%, motivating the second part of the work, the overall accuracy of the remaining models is around 87%, showing consistency across different versions of AHN but with improvable false positive and false negative rates. Therefore, this work concludes that the proposed classification of future AHN iterations is feasible but needs more experimentation.

Journal ArticleDOI
TL;DR: This study addresses pedestrian tracking using stereo images and tracking-by-detection using a tracking-to-confirm-detector method, in which detections are treated differently depending on their confidence metrics to obtain a high recall value while keeping a low number of false positives.
Abstract: . Pedestrian tracking is a significant problem in autonomous driving. The majority of studies carries out tracking in the image domain, which is not sufficient for many realistic applications like path planning, collision avoidance, and autonomous navigation. In this study, we address pedestrian tracking using stereo images and tracking-by-detection. Our framework comes in three primary phases: (1) people are detected in image space by the mask R-CNN detector and their positions in 3D-space are computed using stereo information; (2) corresponding detections are assigned to each other across consecutive frames based on visual characteristics and 3D geometry; and (3) the current positions of pedestrians are corrected using their previous states using an extended Kalman filter. We use our tracking-to-confirm-detection method, in which detections are treated differently depending on their confidence metrics. To obtain a high recall value while keeping a low number of false positives. While existing methods consider all target trajectories have equal accuracy, we estimate a confidence value for each trajectory at every epoch. Thus, depending on their confidence values, the targets can have different contributions to the whole tracking system. The performance of our approach is evaluated using the Kitti benchmark dataset. It shows promising results comparable to those of other state-of-the-art methods.

Journal ArticleDOI
TL;DR: It is shown how to determine the navigable space in a voxel model for a pedestrian actor, and how to compute paths from arbitrary sources to specified destinations, on the basis of different input types.
Abstract: . The paper proposes to use voxel models of building interiors to perform indoor navigation. The algorithms can be purely geometrical, not relying on semantic information about different building elements, such as floors, walls, stairways etc. Therefore, it is possible to use voxel models from different data sources, in addition to vector-to-raster conversions. The paper demonstrates this on the basis of tree different input types: hand measurements, point clouds and images of floorplans. On the basis of these models, the paper shows how to determine the navigable space in a voxel model for a pedestrian actor, and how to compute paths from arbitrary sources to specified destinations.

Journal ArticleDOI
TL;DR: A super-resolution convolutional neural network (SRCNN) was adopted as a state-of-the-art deep learning model to test the proposed fusion-based augmentation method, which can further boost the performance of satellite image super resolution tasks.
Abstract: . Data augmentation is a well known technique that is frequently used in machine learning tasks to increase the number of training instances and hence decrease model over-fitting. In this paper we propose a data augmentation technique that can further boost the performance of satellite image super resolution tasks. A super-resolution convolutional neural network (SRCNN) was adopted as a state-of-the-art deep learning model to test the proposed data augmentation technique. Different augmentation techniques were studied to investigate their relative importance and accuracy gains. We categorized the augmentation methods into instance based and channel based augmentation methods. The former refers to the standard approach of creating new data instances through applying image transformations to the original images such as adding artificial noise, rotations and translations to training samples, while in the latter we fuse auxiliary channels (or custom bands) with each training instance, which helps the model learn useful representations. Fusing auxiliary derived channels to a satellite image RGB combination can be seen as a spectral-spatial fusion process as we explain later. Several experiments were carried out to evaluate the efficacy of the proposed fusion-based augmentation method compared with traditional data augmentation techniques such as rotation, flip and noisy training inputs. The reconstruction quality of the high resolution output was quantitatively evaluated using Peak-Signal-To-Noise-Ratio (PSNR) and qualitatively through visualisation of test samples before and after super-resolving.

Journal ArticleDOI
TL;DR: A framework that starts from point-clouds of complex indoor environments, performs advanced processes to identify the 3D structures critical to navigation and path planning, and provides fine-grained navigation networks that account for obstacles and spatial accessibility of the navigating agents is presented.
Abstract: . Indoor navigation can be a tedious process in a complex and unknown environment. It gets more critical when the first responders try to intervene in a big building after a disaster has occurred. For such cases, an accurate map of the building is among the best supports possible. Unfortunately, such a map is not always available, or generally outdated and imprecise, leading to error prone decisions. Thanks to advances in the laser scanning, accurate 3D maps can be built in relatively small amount of time using all sort of laser scanners (stationary, mobile, drone), although the information they provide is generally an unstructured point cloud. While most of the existing approaches try to extensively process the point cloud in order to produce an accurate architectural model of the scanned building, similar to a Building Information Model (BIM), we have adopted a space-focused approach. This paper presents our framework that starts from point-clouds of complex indoor environments, performs advanced processes to identify the 3D structures critical to navigation and path planning, and provides fine-grained navigation networks that account for obstacles and spatial accessibility of the navigating agents. The method involves generating a volumetric-wall vector model from the point cloud, identifying the obstacles and extracting the navigable 3D spaces. Our work contributes a new approach for space subdivision without the need of using laser scanner positions or viewpoints. Unlike 2D cell decomposition or a binary space partitioning, this work introduces a space enclosure method to deal with 3D space extraction and non-Manhattan World architecture. The results show more than 90% of spaces are correctly extracted. The approach is tested on several real buildings and relies on the latest advances in indoor navigation.

Journal ArticleDOI
TL;DR: A novel machine-learning method that characterizes the muck pile directly from UAV images to generate a globally consistent segmentation, and results clearly indicate that the method generalizes to previously unseen data.
Abstract: . In open pit mining it is essential for processing and production scheduling to receive fast and accurate information about the fragmentation of a muck pile after a blast. In this work, we propose a novel machine-learning method that characterizes the muck pile directly from UAV images. In contrast to state-of-the-art approaches, that require heavy user interaction, expert knowledge and careful threshold settings, our method works fully automatically. We compute segmentation masks, bounding boxes and confidence values for each individual fragment in the muck pile on multiple scales to generate a globally consistent segmentation. Additionally, we recorded lab and real-world images to generate our own dataset for training the network. Our method shows very promising quantitative and qualitative results in all our experiments. Further, the results clearly indicate that our method generalizes to previously unseen data.

Journal ArticleDOI
TL;DR: This article investigates the system development and testing challenges of automated driving and requirements of road space models for developing automated driving are derived and gaps to current standards are indicated.
Abstract: Automated driving has received a high degree of public attention in recent years as it will lead to profound changes in mobility, society and urban development. Despite several product announcements from automobile manufacturers and mobility providers, many questions have not yet been answered completely. The need of lane-level HD maps was widely discussed and has been the reason for company acquisitions. HD maps are tailored towards supporting the operation of an automated vehicle. However, the development of this technology also requires road space models, but with a completely different focus and level of detail. Therefore, this article investigates the system development and testing challenges of automated driving. Based on this, requirements of road space models for developing automated driving are derived and gaps to current standards are indicated.

Journal ArticleDOI
TL;DR: This paper proposes an approach to support versioning of 3D city models based on CityJSON and the concepts behind the Git version control system, including distributed and non-linear workflows.
Abstract: A 3D city model should be constantly updated with new versions, either to reflect the changes in its real-world counterpart, or to improve and correct parts of the model. However, the current standards for 3D city models do not support versioning, and existing version control systems do not work well with 3D city models. In this paper, we propose an approach to support versioning of 3D city models based on CityJSON and the concepts behind the Git version control system, including distributed and non-linear workflows. We demonstrate the benefits of our approach in two examples and in our software prototype, which is able to extract a given version of a 3D city model and to display its history.

Journal ArticleDOI
TL;DR: A way to detect doors using 3D Medial Axis Transform (MAT) combined with the intelligence stored in the path of a mobile laser scanner is described, showing good first results.
Abstract: Indoor environments tend to be more complex and more populated when buildings are accessible to the public. The need for knowing where people are, how they can get somewhere or how to reach them in these buildings is thus equally increasing. In this research point clouds are used, obtained by dynamic laser scanning of a building, since we cannot rely on architectural drawings for maps and paths, which can be outdated. The presented method focuses on the creation of an indoor navigation graph, based on IndoorGML structure, in a fast and automated way, while retaining the type of walkable surface. In this paper the focus has been on door detection, because doors are essential elements in an indoor environment, seeing that they connect spaces and are a logical step in a route. This paper describes a way to detect doors using 3D Medial Axis Transform (MAT) combined with the intelligence stored in the path of a mobile laser scanner, showing good first results. Additionally different spaces (e.g. rooms and corridors) in the building are identified and slopes and stairs in walkable spaces are detected. This results in a navigation graph which can be stored in an IndoorGML structure.

Journal ArticleDOI
TL;DR: The Deep Learning-based approaches clearly outperformed the SVM baseline in approaches, both in terms of F1-score and Overall Accuracy, with a superiority of S-CNN over EF.
Abstract: . Deforestation is one of the main causes of biodiversity reduction, climate change among other destructive phenomena. Thus, early detection of deforestation processes is of paramount importance. Motivated by this scenario, this work presents an evaluation of methods for automatic deforestation detection, specifically Early Fusion (EF) Convolutional Network, Siamese Convolutional Network (S-CNN) and the well-known Support Vector Machine (SVM), taken as the baseline. These methods were evaluated in a region of the Brazilian Legal Amazon (BLA). Two Landsat 8 images acquired in 2016 and 2017 were used in our experiments. The impact of training set size was also investigated. The Deep Learning-based approaches clearly outperformed the SVM baseline in our approaches, both in terms of F1-score and Overall Accuracy, with a superiority of S-CNN over EF.

Journal ArticleDOI
TL;DR: This paper reports on different types of transformation rules to populate the attributes on CityGML side using information extracted from the IFC data, and document the various ways how attribute values can be stored in IFC and CityG ML respectively.
Abstract: . In model transformation, the population of attributes on the target side constitutes the last step of the conversion process that carries over that part of the input which is often perceived as the most valuable actual information. We are employing a graph-based model transformation approach to convert building information models into geospatial city models. In this paper, we are reporting on different types of transformation rules to populate the attributes on CityGML side using information extracted from the IFC data. We document the various ways how attribute values can be stored in IFC and CityGML respectively and identify patterns that bridge these endpoints in the conversion process. These patterns lead to a set of prototypical graph transformation rules which have been applied to a range of building projects. The novel graph-based approach to IFC-to-CityGML conversion implicates an intuitive visual representation of these rules. This work can also serve as a starting point to convert IFC data to other formats or to populate CityGML from other data sources.

Journal ArticleDOI
TL;DR: Investigation of how certain errors in the camera calibration impact the accuracy of 3D measurement without the influence of other errors finds the presence of oblique images limits the drift on camera height hence gives better camera pose estimation.
Abstract: . Unmanned aerial vehicles (UAV) are increasingly used for topographic mapping. The camera calibration for UAV image blocks can be performed a priori or during the bundle block adjustment (self-calibration). For an area of interest with flat, corridor configuration, the focal length of camera is highly correlated with the height of camera. Furthermore, systematic errors of camera calibration accumulate on the longer dimension and cause deformation. Therefore, special precautions must be taken when estimating camera calibration parameters. In this paper, a simulated, error-free aerial image block is generated. error is then added on camera calibration and given as initial solution to bundle block adjustment. Depending on the nature of the error and the investigation purpose, camera calibration parameters are either fixed or re-estimated during the bundle block adjustment. The objective is to investigate how certain errors in the camera calibration impact the accuracy of 3D measurement without the influence of other errors. All experiments are carried out with Fraser camera calibration model being employed. When adopting a proper flight configuration, an error on focal length for the initial camera calibration can be corrected almost entirely during bundle block adjustment. For the case where an erroneous focal length is given for pre-calibration and not re-estimated, the presence of oblique images limits the drift on camera height hence gives better camera pose estimation. Other than that, the error on focal length when neglecting its variation during the acquisition (e.g., due to camera temperature increase) is also investigated; a bowl effect is observed when one focal length is given in camera pre-calibration to the whole image block. At last, a local error is added in image space to simulate camera flaws; this type of error is more difficult to be corrected with the Fraser camera model and the accuracy of 3D measurement degrades substantially.

Journal ArticleDOI
TL;DR: This study focuses on a methodology aiming to combine photogrammetry and spectral imagery acquired from a modified DSLR camera and shows that spectral imaging reconstruction is highly related to used wavelengths.
Abstract: . 3D photogrammetric reconstruction and spectral imaging have already proven useful and are being used on a daily basis for studying cultural heritage. Dense Image Matching allows to create a virtual replica of the object that can be used for morphometric studies, for monitoring and conservation purposes, virtual access, reduced handling of fragile objects and share objects with a broad audience. 2D spectral imaging is used in the field of cultural heritage conservation to analyse the condition of an object, map a previous restoration, detect a change in composition, reveal sub-drawings, improve details, etc. A 2D image representation of a three-dimensional object is a limited field of view and lead frequently to a lack of information, especially for artifacts with complex geometries. The combination of both techniques is the next step toward a more complete and more objective record of an object, but it can also be a tool to improve the identification of details presents on artifacts. This study focuses on a methodology aiming to combine photogrammetry and spectral imagery acquired from a modified DSLR camera. Two case studies acquired with multispectral reconstruction techniques are analysed. They are used to demonstrate the advantages and disadvantages of the developed methodology. The obtained results show that spectral imaging reconstruction is highly related to used wavelengths. Infrared and ultraviolet fluorescence can enhance features identification of the objects that are not or less visible in classic white light photogrammetry. Combining 3D reconstruction and multispectral imagery can facilitate the readings and the understanding of the object. It can help conservator and researchers to better understand the objects and how to preserve them.