scispace - formally typeset
Search or ask a question

Showing papers in "ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences in 2022"


Journal ArticleDOI
TL;DR: The results show that contrastive data fusion is a powerful self-supervised technique to train image encoders that are capable of producing meaningful representations: Simple linear probing performs on par with fully supervised approaches and fine-tuning with as little as 10% of the labelled data results in higher accuracy than supervised training on the entire dataset.
Abstract: Abstract. Self-supervised learning has great potential for the remote sensing domain, where unlabelled observations are abundant, but labels are hard to obtain. This work leverages unlabelled multi-modal remote sensing data for augmentation-free contrastive self-supervised learning. Deep neural network models are trained to maximize the similarity of latent representations obtained with different sensing techniques from the same location, while distinguishing them from other locations. We showcase this idea with two self-supervised data fusion methods and compare against standard supervised and self-supervised learning approaches on a land-cover classification task. Our results show that contrastive data fusion is a powerful self-supervised technique to train image encoders that are capable of producing meaningful representations: Simple linear probing performs on par with fully supervised approaches and fine-tuning with as little as 10% of the labelled data results in higher accuracy than supervised training on the entire dataset.

8 citations


Journal ArticleDOI
TL;DR: In this paper , the authors analyzed the impact of geolocation errors on the accuracy and precision of GEDI height metrics in the context of an Alpine forest environment in steep terrain scenarios.
Abstract: Abstract. Active remote sensing systems orbiting the Earth are only a small portion of the current constellation of satellites and will increase in number and advance in technology in the future. The launch of the GEDI sensor in December 2018, for an expected life-span period of about 2 years, is a fundamental step of this revolution, as it is the first spaceborne full-waveform lidar specifically designed for measuring the structure of ecosystems, providing information of the vertical profile of forests.Accuracy assessment of GEDI height metrics in the context of an Alpine forest environment in steep terrain scenarios has been conducted in this study. We used discrete return lidar from a recent aerial laser scanner survey as reference to analyse differences of heights of terrain elevation and maximum canopy height of the vegetation detected in each GEDI footprint. The height metrics differences between the discrete lidar and the GEDI data were then analysed to verify any correlation with the following factors: morphology (terrain slope), land cover (land cover type, fraction of canopy cover, vegetation density), GEDI laser beam characteristics (day/night-time acquisition, full power vs coverage laser beam, beam ID, laser sensitivity). Further analysis involved shifting the footprints’ location in 8 different direction and 4 distances to assess the impact of geolocation errors on accuracy and precision.Results show that what most influences accuracy in this study is the terrain slope, very likely linked to the uncertainty of geolocation of the GEDI footprints, suggesting caution in using single GEDI footprints if located in steep environments. Other than slope, terrain height accuracy varies mostly with forest type (conifer vs broadleaves), but not significantly with other factors. Canopy height instead is affected by most factors; high vegetation canopy is overestimated by ∼3 m in GEDI, and underestimated by 3 m over heath and bushes (median difference). Higher sensitivity pulses and night-time pulses provide better accuracy. Laser beams with full power also have better accuracy; beams with id 1000 and 1011 provide the most accurate canopy heights. Shifting the footprint position decreased accuracy except at 15 m and 270° with respect to orbit direction (left-looking).

7 citations


Journal ArticleDOI
TL;DR: In this article , the potential of street view imagery (SVI) to better estimate certain LCZrelated properties, such as sky view factor (SVF), has been reviewed, and potential ways to incorporate SVI and identifies challenges such as coarse temporal resolution and spatial coverage constrained to drivable roads.
Abstract: Abstract. Urban heat island (UHI) is considered a serious environmental issue in highly urbanized cities such as Singapore. To better quantify the UHI intensity, the local climate zones (LCZ) classification scheme was adopted to characterize land covers, and describe and compare their thermal performance. There are three commonly used LCZ classification approaches: manual sampling,World Urban Database and Access Portal Tools (WUDAPT) processing method using remote sensing, and geographical information system (GIS)-based method. Based on the current implementation of WUDAPT Level 0 method in the classification work in Singapore, the principal limitations are expounded. To overcome the deficiencies, street view imagery (SVI), which carries substantial urban spatial information, is regarded as a promising data source. This paper reviews the potential of SVI to better estimate certain LCZrelated properties, such as sky view factor (SVF). As it allows a detailed view on the ground objects, SVI opens up the possibility of identifying surface properties such as albedo, as well as anthropogenic heat sources. Although it is not a novel idea, there has been a lack of a comprehensive use of SVI in assisting LCZ classification from the ground up, especially in a high-density city such as Singapore. This paper overviews potential ways to incorporate SVI and identifies challenges such as coarse temporal resolution and spatial coverage constrained to drivable roads.

5 citations


Journal ArticleDOI
TL;DR: Two deep learning approaches are presented, first using a UNet and second, using a Feature Pyramid Network (FPN), both based on a backbone of EfficientNet-B7, by leveraging publicly available Sentinel-1 dataset provided jointly by NASA Interagency Implementation and Advanced Concepts Team, and IEEE GRSS Earth Science Informatics Technical Committee.
Abstract: Abstract. Floods are the most frequent, costliest natural disasters having devastating consequences on people, infrastructure, and the ecosystem. During flood events near real-time satellite imagery has proven to be an efficient management tool for disaster management authorities. However one of the challenges is accurate classification and segmentation of flooded water. The generalization ability of binary segmentation using threshold split-based method, is limited due to the effects of backscatter, geographical area, and time of image collection. Recent advancements in deep learning algorithms for image segmentation has demonstrated excellent potential for improving flood detection. However, there have been limited studies in this domain due to the lack of large scale labeled flood event dataset. In this paper, we present two deep learning approaches, first using a UNet and second, using a Feature Pyramid Network (FPN), both based on a backbone of EfficientNet-B7, by leveraging publicly available Sentinel-1 dataset provided jointly by NASA Interagency Implementation and Advanced Concepts Team, and IEEE GRSS Earth Science Informatics Technical Committee. The dataset covers flood events from Nebraska, North Alabama, Bangladesh, Red River North, and Florence. The performances of both networks were evaluated with multiple training, testing, and validation. During testing, the UNet model achieved the meanIOU score of 75.06% and the FPN model achieved the meanIOU score of 75.76%.

4 citations


Journal ArticleDOI
TL;DR: In this paper , UAV-based LiDAR and hyperspectral images were collected during the growing season of 2020 over a cornfield near Urbana Champaign, Illinois, USA.
Abstract: Abstract. The increased availability of remote sensing data combined with the wide-ranging applicability of artificial intelligence has enabled agriculture stakeholders to monitor changes in crops and their environment frequently and accurately. Applying cutting-edge technology in precision agriculture also enabled the prediction of pre-harvest yield from standing crop signals. Forecasting grain yield from standing crops benefits high-throughput plant phenotyping and agriculture policymaking with information on where crop production is likely to decline. Advanced developments in the Unmanned Aerial Vehicle (UAV) platform and sensor technologies aided high-resolution spatial, spectral, and structural data collection processes at a relatively lower cost and shorter time. In this study, UAV-based LiDAR and hyperspectral images were collected during the growing season of 2020 over a cornfield near Urbana Champaign, Illinois, USA. Hyperspectral imagery-based canopy spectral & texture features and LiDAR point cloud-based canopy structure features were extracted and, along with their combination, were used as inputs for maize yield prediction under the H2O Automated Machine Learning framework (H2O-AutoML). The research results are (1) UAV Hyperspectral imagery can successfully predict maize yield with relatively decent accuracies; additionally, LiDAR point cloud-based canopy structure features are found to be significant indicators for maize yield prediction, which produced slightly poorer, yet comparable results to hyperspectral data; (2) regardless of machine learning methods, integration of hyperspectral imagery-based canopy spectral and texture information with LiDAR-based canopy structure features outperformed the predictions when using a single sensor alone; (3)the H2O-AutoML framework presented to be an efficient strategy for machine learning-based data-driven model building.

4 citations


Journal ArticleDOI
TL;DR: The use of a digital twin to facilitate visuospatial communication in an expert-guided repair and maintenance operation scenario, supported by visual annotations can potentially improve remote collaboration tasks.
Abstract: Abstract. Various forms of extended reality might empower remote collaboration in ways that the current de facto standards cannot facilitate. Especially when combined by a digital twin of the remote physical object, mixed reality (MR) opens up interesting new ways to support spatial communication. In this study, we explore the use of a digital twin to facilitate visuospatial communication in an expert-guided repair and maintenance operation scenario, supported by visual annotations. We developed two MR prototypes, one with a digital twin of the object of interest, and another where a first-person camera view was shown additionally. We tested these prototypes in a study with 19 participants (9 pairs) against a state-of-the art solution as a baseline and measured their usability, and obtained qualitative user feedback. Our findings suggest that digital twin supported mixed reality enriched with real time visual annotations can potentially improve remote collaboration tasks.

4 citations


Journal ArticleDOI
TL;DR: A RANSAC-based (RANdom SAmple Consensus) method to correct the relative localization errors between two CAVs in order to ease the information fusion among the CAVs.
Abstract: Abstract. High-accurate localization is crucial for the safety and reliability of autonomous driving, especially for the information fusion of collective perception that aims to further improve road safety by sharing information in a communication network of Connected Autonomous Vehicles (CAV). In this scenario, small localization errors can impose additional difficulty on fusing the information from different CAVs. In this paper, we propose a RANSAC-based (RANdom SAmple Consensus) method to correct the relative localization errors between two CAVs in order to ease the information fusion among the CAVs. Different from previous LiDARbased localization algorithms that only take the static environmental information into consideration, this method also leverages the dynamic objects for localization thanks to the real-time data sharing between CAVs. Specifically, in addition to the static objects like poles, fences, and facades, the object centers of the detected dynamic vehicles are also used as keypoints for the matching of two point sets. The experiments on the synthetic dataset COMAP show that the proposed method can greatly decrease the relative localization error between two CAVs to less than 20cm as far as there are enough vehicles and poles are correctly detected by both CAVs. Besides, our proposed method is also highly efficient in runtime and can be used in real-time scenarios of autonomous driving.

4 citations


Journal ArticleDOI
TL;DR: In this paper , the U-Net architecture for multi-class segmentation of flooded areas and flooded vegetation was employed by using satellite synthetic aperture radars (SAR) data and altitude information as input.
Abstract: Abstract. The adverse effects of flood events have been increasing in the world due to the increasing occurrence frequency and their severity due to urbanization and the population growth. All weather sensors, such as satellite synthetic aperture radars (SAR) enable the extent detection and magnitude analysis of such events under cloudy atmospheric conditions. Sentinel-1 satellite from European Space Agency (ESA) facilitate such studies thanks to the free distribution, the regular data acquisition scheme and the availability of open source software. However, various difficulties in the visual interpretation and processing exist due to the size and the nature of the SAR data. The supervised machine learning algorithms have increasingly been used for automatic flood extent mapping. However, the use of Convolutional Neural Networks (CNNs) for this purpose is relatively new and requires further investigations. In this study, the U-Net architecture for multi-class segmentation of flooded areas and flooded vegetation was employed by using Sentinel-1 SAR data and altitude information as input. The training data was produced by an automatic thresholding approach using OTSU method in Sardoba, Uzbekistan and Sagaing, Myanmar. The results were validated in Ordu, Turkey and in Ca River, Vietnam by visual comparison with previously produced flood maps. The results show that CNNs have great potential in classifying flooded areas and flooded vegetation even when trained in areas with different geographical setting. The F1 scores obtained in the study for flood and flooded vegetation classes were 0.91 and 0.85, respectively.

4 citations


Journal ArticleDOI
TL;DR: In this paper , a machine learning approach that trains a neural network using samples distributed in space and time, enabling the temporal robustness of the model was proposed to examine and enhance the temporal transferability of models.
Abstract: Abstract. The empirical (regression-based) models have long been used for retrieving water quality parameters from optical imagery by training a model between image spectra and collocated in-situ data. However, a need clearly exists to examine and enhance the temporal transferability of models. The performance of a model trained in a specific period can deteriorate when applied at another time due to variations in the composition of constituents, atmospheric conditions, and sun glint. In this study, we propose a machine learning approach that trains a neural network using samples distributed in space and time, enabling the temporal robustness of the model. We explore the temporal transferability of the proposed neural network and standard band ratio models in retrieving total suspended matter (TSM) from Sentinel-2 imagery in San Francisco Bay. Multitemporal Sentinel-2 imagery and in-situ data are used to train the models. The transferability of models is then examined by estimating the TSM for imagery acquired after the training period. In addition, we assess the robustness of the models concerning the sun glint correction. The results imply that the neural network-based model is temporally transferable (R2 ≈ 0.75; RMSE ≈ 7 g/m3 for retrievals up to 70 g/m3) and is minimally impacted by the sun glint correction. Conversely, the ratio model showed relatively poor temporal robustness with high sensitivity to the glint correction.

3 citations


Journal ArticleDOI
TL;DR: In this article , a multi-frequency polarimetric radar vegetation index (MPRVI) was proposed for developing a biomass estimation model along a climatic gradient in Israel using L-band PASLAR and C-band Sentinel-1 data.
Abstract: Abstract. Monitoring biomass changes in the Mediterranean ecosystems is important for better understanding their responses to climatic and anthropogenic changes. Multi-frequency SAR data (L-band PASLAR and C-band Sentinel-1) is investigated for developing a biomass estimation model along a climatic gradient in Israel. First, the relationships between biomass and transmissivity at various frequencies are discussed. Multi-frequency polarimetric radar vegetation index (MPRVI) is then proposed utilizing the ensemble average of degree of polarization and cross-polarized backscattering coefficients. After randomly partitioning the data for training set and testing set, the new MPRVI-based biomass model is evaluated. It shows a good agreement with reference biomass data with an r-square of 0.899 and a root-mean-squared error (RMSE) of 0.381 kg/m2 and with a relative RMSE (RRMSE) of 10.8%.

3 citations


Journal ArticleDOI
TL;DR: MultiSenGE as discussed by the authors is a large scale multimodal and multitemporal benchmark dataset covering one of the biggest administrative regions located in the Eastern part of France, which contains 8,157 patches of 256'×'256 pixels for the Sentinel-2 L2A , Sentinel-1 GRD images in VV-VH polarization and a Regional large scale Land Use/Land Cover (LULC) topographic reference database.
Abstract: Abstract. This paper presents MultiSenGE that is a new large scale multimodal and multitemporal benchmark dataset covering one of the biggest administrative region located in the Eastern part of France. MultiSenGE contains 8,157 patches of 256 × 256 pixels for the Sentinel-2 L2A , Sentinel-1 GRD images in VV-VH polarization and a Regional large scale Land Use/Land Cover (LULC) topographic reference database. With MultiSenGE, we contribute to the recents developments towards shared data use and machine learning methods in the field of environmental science. The purpose of this dataset is to propose relevant and easy-access dataset to explore deep learning methods. We use MultiSenGE to evaluate the performance for urban areas using well-known deep learning techniques. These results serve as a baseline for future research on remote sensing applications using the multi-temporal and multimodal aspects of MultiSenGE. With all patches georeferenced at a 10 meters spatial resolution covering the whole Grand-Est Region, MultiSenGE provides an opportunity for environmental benchmark dataset will help to advance data-driven techniques for land use/land cover remote sensing applications.

Journal ArticleDOI
TL;DR: This paper investigated the current practices of state DOTs in digitizing the Data Collection for Roadside Asset Systems by developing and distributing a web-based survey and presents a case study from a leading DOT indigitizing the management of the built environment to further understand the requirements of implementing Digital Twins to support transportation asset data management.
Abstract: Abstract. Transportation Asset Management (TAM) is a data-driven decision-making process to maintain and extend the serviceability of transportation assets throughout their lifecycle. TAM is an extensive data process that requires accurate and high-quality information for better decision-making. A significant challenge faced by state Departments of Transportation (DOTs) is the need to allocate their limited funds to optimize their assets’ performance. The criticality of this challenge increases when state DOTs need to manage a wide variety of assets distributed along with a vast network. To address this challenge, a new paradigm of digitizing the management of the built environment is emerging and is perceived to highly depend on the integration of several technologies namely on Digital Twins. Digital Twins, by definition, are the connection between the physical and the digital aspects of an asset, thus, aligning with the overarching objective of asset management of leveraging the use of the asset information (i.e., digital aspect of the asset) to improve the asset’s performance throughout its lifecycle (i.e., physical aspect of the asset). At the core of implementing Digital Twins is having the right data collected for use throughout the lifecycle of the asset. Thus, realizing the potentials of Digital Twins in supporting state DOTs to manage their transportation assets and the anticipated benefits, this paper investigated the current practices of state DOTs in digitizing the Data Collection for Roadside Asset Systems by developing and distributing a web-based survey. Five major Data Collection variables and seven Roadside Asset Systems were considered. Furthermore, this paper presents a case study from a leading DOT in digitizing the management of the built environment to further understand the requirements of implementing Digital Twins to support transportation asset data management.

Journal ArticleDOI
TL;DR: Li et al. as mentioned in this paper employed the ground-based LiDAR which has higher spatial resolution to observe the coastal change detection at Fire Island, New York where the place is severely affected.
Abstract: Abstract. Coastal erosion occurring all over the world is a disastrous coastal geological phenomenon. It can be caused by many natural factors such as wind and waves. Hurricane Sandy made landfall on the east coast of the United States at the end of October 2012 and caused severe damage to the economy and ecological environment. In this paper, we employ the ground-based LiDAR which has higher spatial resolution to observe the coastal change detection at Fire Island, New York where the place is severely affected. The research showed that sediment accumulation occurred away from the coast, up to 2.9 meters, erosion close to the coast, up to 9 meters. The total volume of the study area decreased by 78160.96 cubic meters. The coastline retreated by 32.86 meters on average. In addition, a website has been designed to record coastal erosion anytime and anywhere. We hope this study will help people better understand the impact of hurricanes on coastal erosion, enhance our awareness of environmental protection, and provide scientific information for further study.

Journal ArticleDOI
TL;DR: The proposed trained network can accurately distinguish living from dead corals, which could reflect the health of the corals in the area of interest and achieve state-of-the-art performance compared to other methods tested on the dataset provided in this paper on underwater coral images.
Abstract: Abstract. Regular monitoring activities are important for assessing the influence of unfavourable factors on corals and tracking subsequent recovery or decline. Deep learning-based underwater photogrammetry provides a comprehensive solution for automatic large-scale and precise monitoring. It can quickly acquire a large range of underwater coral reef images, and extract information from these coral images through advanced image processing technology and deep learning methods. This procedure has three major components: (a) Generation of 3D models, (b) understanding of relevant corals in the images, and (c) tracking of those models over time and spatial change analysis. This paper focusses on issue (b), it applies five state-of-the-art neural networks to the semantic segmentation of coral images, compares their performance, and proposes a new coral semantic segmentation method. Finally, in order to quantitatively evaluate the performance of neural networks for semantic segmentation in these experiments, this paper uses mean class-wise Intersection over Union (mIoU), the most commonly used accuracy measure in semantic segmentation, as the standard metric. Meanwhile, considering that the coral boundary is very irregular and the evaluation index of IoU is not accurate enough, a new segmentation evaluation index based on boundary quality, Boundary IoU, is also used to evaluate the segmentation effect. The proposed trained network can accurately distinguish living from dead corals, which could reflect the health of the corals in the area of interest. The classification results show that we achieve state-of-the-art performance compared to other methods tested on the dataset provided in this paper on underwater coral images.

Journal ArticleDOI
TL;DR: In this paper , a dynamic graph convolutional HSI classification method is proposed, which is called Dynamic Graph Convolutional Networks (DGCNet), which first obtain two classification features by implementing flattening and global average pooling operation on the results of the convolution layer.
Abstract: Abstract. Deep learning has achieved impressive results on hyperspectral images (HSIs) classification. Among them, supervised learning convolutional neural networks (CNNs) and semi-supervised learning graph neural networks (GNNs) are the two main network frameworks. However, 1) the supervised learning CNN faces the problem of high model time complexity as the number of network layers deepens; 2) the semi-supervised learning GNN faces the problem of high spatial complexity due to the computation of adjacency relations. In this paper, a novel dynamic graph convolutional HSI classification method is proposed, which is called dynamic graph convolutional networks (DGCNet). We first obtain two classification features by implementing flattening and global average pooling operation on the results of the convolution layer, which fully exploits the spatial-spectral information contained in the hyperspectral data. Then the dynamic graph convolution module is applied to extract the intrinsic structural information of each patch. Finally, HSI is classified based on spatial, spectral and structural features. DGCNet uses three branches to process multiple features of HSI in parallel and is trained in a supervised learning manner. In addition, DropBlock and label smoothing regularization techniques are applied to further improve the generalization capability of the model. Comparative experiments show that our proposed algorithm is comparable with the state-of-the art supervised learning models in terms of accuracy while also significantly outperforming in terms of time.

Journal ArticleDOI
TL;DR: The overall method of fine-tuning the modified U-Net reduces the number of training parameters by 300 times and training time by 2.5 times while preserving the precision of segmentation.
Abstract: Abstract. Earth observation data including very high-resolution (VHR) imagery from satellites and unmanned aerial vehicles (UAVs) are the primary sources for highly accurate building footprint segmentation and extraction. However, with the increase in spatial resolution, smaller objects are prominently visible in the images, and using intelligent approaches like deep learning (DL) suffers from several problems. In this paper, we outline four prominent problems while using DL-based methods (P1, P2, P3, and P4): (P1) lack of contextual features, (P2) requirement of a large training dataset, (P3) domain-shift problem, and (P4) computational expense. In tackling P1, we modify a commonly used DL architecture called U-Net to increase the contextual feature information. Likewise, for P2 and P3, we use transfer learning to fine-tune the DL model on a smaller dataset utilising the knowledge previously gained from a larger dataset. For P4, we study the trade-off between the network’s performance and computational expense with reduced training parameters and optimum learning rates. Our experiments on a case study from the City of Melbourne show that the modified U-Net is highly robust than the original U-Net and SegNet, and the dataset we develop is significantly more robust than an existing benchmark dataset. Furthermore, the overall method of fine-tuning the modified U-Net reduces the number of training parameters by 300 times and training time by 2.5 times while preserving the precision of segmentation.

Journal ArticleDOI
TL;DR: In this article , the authors compared the point clouds obtained by a multi-ranger deck, a multilayer LIDAR scanner and a stereo camera, and assessed each against ground truth obtained with a terrestrial laser scanner, and found that the LidAR scanner-based system can handle a relatively large office environment with an accumulated drift less than 0.02% (1 cm) on the Z-axis and 0.77% on the X and Y axes over a length trajectory of about 65 m.
Abstract: Abstract. The use of drones to explore indoor spaces has gained attention and popularity for disaster management and indoor navigation applications. In this paper we present the operations and mapping techniques of two drones that are different in terms of size, the sensors deployed, and the positioning and mapping techniques used. The first drone is a low-cost commercial quadcopter microdrone, a Crazyflie, while the second drone is a relatively expensive research quadcopter macrodrone, called MAX. We investigated their feasibility in mapping areas where satellite positioning is not available, such as indoor spaces. We compared the point clouds obtained by a multi-ranger deck, a multi-layer LIDAR scanner and a stereo camera, and assessed each against ground truth obtained with a terrestrial laser scanner. Results showed that both drones are capable of mapping relatively cluttered indoor environments and can provide point clouds that are sufficient for a quick exploration. Furthermore, the LIDAR scanner-based system can handle a relatively large office environment with an accumulated drift less than 0.02% (1 cm) on the Z-axis and 0.77% (50 cm) on the X and Y axes over a length trajectory of about 65 m. Despite the limited features of the sensor configuration of the Crazyflie, its performance is promising for mapping indoor spaces, given the relatively low deviation from the ground truth: cloud-to-cloud distances measured were generally less than 20 cm.

Journal ArticleDOI
TL;DR: Fusion at different levels is outlined, in particular, fusion of data sources and modalities, fusion over different scales and fusion of differing representations, leading to a digital twin that allows better testing, prediction and understanding of complex effects.
Abstract: Abstract. In engineering, machines are typically built after a careful conception and design process: All components of a system, their roles and the interaction between them is well understood, and often even digital models of the system exist before the actual hardware is built. This enables simulations and even feedback loops between the real-world system and a digital model, leading to a digital twin that allows better testing, prediction and understanding of complex effects. On the contrary, in Earth sciences, and particularly in ocean sciences, models exist only for certain aspects of the real world, of certain processes and of some interactions and dependencies between different “components” of the ocean. These individual models cover large temporal (seconds to millions of years) and spatial (millimetres to thousands of kilometres) scales, a variety of field data underpin them, and their results are represented in many different ways. A key to enabling digital twins in the oceans is fusion at different levels, in particular, fusion of data sources and modalities, fusion over different scales and fusion of differing representations. We outline these challenges and exemplify different envisioned digital twins employed in the oceans involving remote sensing, underwater photogrammetry and computer vision, focusing on optical aspects of the digital twinning process. In particular, we look at the holistic sensing scenarios of optical properties in coastal waters as well as seafloor dynamics at volcanic slopes and discuss road blockers for digital twins as well as potential solutions to increase and widen the use of digital twins.

Journal ArticleDOI
TL;DR: In this article , the authors examine alternative gaze interaction and visualization design prototypes in a digital collaboration scenario, in which assumed collaboration environment is a co-located mixed reality environment, and representations of gaze as a line, a cursor, and an "automated line" where the line and cursor are automatically alternated based on occlusion detection.
Abstract: Abstract. There is evidence in literature that collaborative work while using digital tools could benefit from visualizing the real time eye movements of a selected participant, or possibly, several participants. In this study, we examine alternative gaze interaction and visualization design prototypes in a digital collaboration scenario, in which assumed collaboration environment is a co-located mixed reality environment. Specifically, we implemented a virtual pointer as a baseline, and representations of gaze as a line, a cursor, and an ‘automated line’ where the line and cursor are automatically alternated based on occlusion detection. These prototypes are then evaluated in a series of usability studies with additional exploratory observations for a spatial communication scenario. In the scenario participants either describe routes to someone else or learn them from someone else for navigational planning. In this paper we describe the alternative interaction design prototypes, as well as various visualization designs for the gaze itself (continuous line and dashed line) and the point of regard (donut, dashed donut, sphere, rectangle) to guide collaboration and report our findings from several usability studies (n=6). We also interviewed our participants which allows us to make some qualitative observations on the potential function and usefulness of these visualization and interaction prototypes. Overall, the outcomes suggest that gaze visualization solutions in general are promising approaches to assist communication in collaborative XR, although, not surprisingly, how they are designed is important.

Journal ArticleDOI
TL;DR: A method to generate high-quality DTMs based on a synthesis of deep learning and Shape from Shading with a Lunar Reconnaissance Orbiter Narrow Angle Camera image as well as a coarse-resolution DTM as input is proposed.
Abstract: Abstract. High-resolution Digital Terrain Models (DTMs) of the lunar surface can provide crucial spatial information for lunar exploration missions. In this paper, we propose a method to generate high-quality DTMs based on a synthesis of deep learning and Shape from Shading (SFS) with a Lunar Reconnaissance Orbiter Narrow Angle Camera (LROC NAC) image as well as a coarse-resolution DTM as input. Specifically, we use a Convolutional Neural Network (CNN)-based deep learning architecture to predict initial pixel-resolution DTMs. Then, we use SFS to improve the details of DTMs. The CNN-model is trained based on the dataset with 30, 000 samples, which are formed by stereo-photogrammetry derived DTMs and orthoimages using LROC NAC images as well as the Selenological and Engineering Explorer and LRO Elevation Model (SLDEM). We take Chang’E-3 landing site as an example, and use a 1.6 m resolution LROC NAC image and 5 m resolution stereo-photogrammetry derived DTM as input to test the proposed method. We evaluate our DTMs with those from stereo-photogrammetry and deep learning. The result shows the proposed method can generate 1.6 m resolution high-quality DTMs, which can clearly improve the visibility of details of the initial DTM generated from the deep learning method.

Journal ArticleDOI
TL;DR: In this article , the authors provide an overview outlining and discussing the role of SVI in GIS and urban studies spanning six use cases, supported by a systematic literature review of more than 100 papers and their own experiments that reveal the added value and challenges of extracting information on buildings and other urban features.
Abstract: Abstract. Street view imagery (SVI) has gained prominence in the past decade, offering a new perspective to map and understand cities. It supports numerous studies in the built environment, by replacing or supplementing aerial and satellite imagery, where some studies have not yet been possible with traditional platforms and have now been enabled for the first time thanks to the increasing volume of SVI data. However, the two perspectives are often disconnected and there has not been an overarching paper to discuss the pros and cons of each. We provide an overview outlining and discussing the role of SVI in GIS and urban studies spanning six use cases. Our discourse is supported by a systematic literature review of more than 100 papers and our own experiments that reveal the added value and challenges of SVI in extracting information on buildings and other urban features, an increasingly important use case. We find that the key advantages of SVI over aerial imagery are that it represents more closely how streetscapes are perceived by people and that it enables extracting certain information that otherwise cannot be gathered from top-down perspectives. However, the spatial coverage of SVI tends to be limited to the vicinity of driveable roads, and its temporal coverage is comparatively sparse.

Journal ArticleDOI
TL;DR: A method of combining visibility analysis and neural networks for enriching 3D models with window and door features and improves the accuracy of point cloud semantic segmentation and upgrades buildings with façade elements is proposed.
Abstract: Abstract. Semantic 3D building models are widely available and used in numerous applications. Such 3D building models display rich semantics but no façade openings, chiefly owing to their aerial acquisition techniques. Hence, refining models’ façades using dense, street-level, terrestrial point clouds seems a promising strategy. In this paper, we propose a method of combining visibility analysis and neural networks for enriching 3D models with window and door features. In the method, occupancy voxels are fused with classified point clouds, which provides semantics to voxels. Voxels are also used to identify conflicts between laser observations and 3D models. The semantic voxels and conflicts are combined in a Bayesian network to classify and delineate façade openings, which are reconstructed using a 3D model library. Unaffected building semantics is preserved while the updated one is added, thereby upgrading the building model to LoD3. Moreover, Bayesian network results are back-projected onto point clouds to improve points’ classification accuracy. We tested our method on a municipal CityGML LoD2 repository and the open point cloud datasets: TUM-MLS-2016 and TUM-FAÇADE. Validation results revealed that the method improves the accuracy of point cloud semantic segmentation and upgrades buildings with façade elements. The method can be applied to enhance the accuracy of urban simulations and facilitate the development of semantic segmentation algorithms.

Journal ArticleDOI
TL;DR: In this article , a neural representation of the 3D scene is presented as an implicit, continuous occupancy field, driven by learned embeddings of the point cloud and a stereo pair of ortho-photos.
Abstract: Abstract. High-resolution optical satellite sensors, combined with dense stereo algorithms, have made it possible to reconstruct 3D city models from space. However, these models are, in practice, rather noisy and tend to miss small geometric features that are clearly visible in the images. We argue that one reason for the limited quality may be a too early, heuristic reduction of the triangulated 3D point cloud to an explicit height field or surface mesh. To make full use of the point cloud and the underlying images, we introduce IMPLICITY, a neural representation of the 3D scene as an implicit, continuous occupancy field, driven by learned embeddings of the point cloud and a stereo pair of ortho-photos. We show that this representation enables the extraction of high-quality DSMs: with image resolution 0.5 m, IMPLICITY reaches a median height error of ≈0.7m and outperforms competing methods, especially w.r.t. building reconstruction, featuring intricate roof details, smooth surfaces, and straight, regular outlines.

Journal ArticleDOI
TL;DR: This paper provides an overview of the development of the photogrammetry and remote sensing specific Benchmark Metadata Database (BeMeDa), which is based on MongoDB, a NoSQL database system and serves for data structuring.
Abstract: Abstract. Data are a key component for many applications and methods in the domain of photogrammetry and remote sensing. Especially data-driven approaches such as deep learning rely heavily on available annotated data. The amount of data is increasing significantly every day. However, reference data is not increasing at the same rate and finding relevant data for a specific domain is still difficult. Thus, it is necessary to make existing reference data more accessible to the scientific community as far as possible in order to make optimal use of it. In this paper we provide an overview of the development of our photogrammetry and remote sensing specific Benchmark Metadata Database (BeMeDa). BeMeDa is based on MongoDB, a NoSQL database system. In addition, the development of a user-oriented metadata schema serves for data structuring. BeMeDa enables easy searching of benchmark datasets in the field of photogrammetry and remote sensing.

Journal ArticleDOI
TL;DR: In this paper , the impact of 3D convolutions in the spatial-temporal and spatial-spectral dimensions in comparison to 2D convolution in spatial dimensions only was investigated, and a new method was introduced to generate multitemporal input patches by using time intervals instead of fixed acquisition dates.
Abstract: Abstract. With the availability of large amounts of satellite image time series (SITS), the identification of different materials of the Earth’s surface is possible with a high temporal resolution. One of the basic tasks is the pixel-wise classification of land cover, i.e. the task of identifying the physical material of the Earth’s surface in an image. Fully convolutional neural networks (FCN) are successfully used for this task. In this paper, we investigate different FCN variants, using different methods for the computation of spatial, spectral, and temporal features. We investigate the impact of 3D convolutions in the spatial-temporal as well as in the spatial-spectral dimensions in comparison to 2D convolutions in the spatial dimensions only. Additionally, we introduce a new method to generate multitemporal input patches by using time intervals instead of fixed acquisition dates. We then choose the image that is closest in time to the middle of the corresponding time interval, which makes our approach more flexible with respect to the requirements for the acquisition of new data. Using these multi-temporal input patches, generated from Sentinel-2 images, we improve the classification of land cover by 4% in the mean F1-score and 1.3% in the overall accuracy compared to a classification using mono-temporal input patches. Furthermore, the usage of 3D convolutions instead of 2D convolutions improves the classification performance by a small amount of 0.4% in the mean F1-score and 1.2% in the overall accuracy.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper improved the accuracy of forest canopy height estimation from multiple perspectives by improving the detection capability of weak and overlapping waves and constructing a canopy height model considering slope correction and environmental features.
Abstract: Abstract. Tree canopy height is an important parameter for estimating forest carbon stock, and mountainous areas with dense vegetation cover are the main distribution areas of trees, so it is important to accurately measure the forest canopy height in mountainous areas with high vegetation cover. This paper focuses on the problem of poor inversion accuracy of canopy height estimation in large scale densely forest-covered mountainous areas, uses the complex echoes of GEDI full-waveform spaceborne laser in mountainous forests as the data source, improves the accuracy of forest canopy height estimation from multiple perspectives by improving the detection capability of weak and overlapping waves and constructing a canopy height model considering slope correction and environmental features. The results show that the modified RGD algorithm proposed in this paper can effectively detect the weak and overlapping waves in the echoes and improve the DTM/DSM inversion accuracy significantly (FVC>90%, R2=0.8663/R2=0.8073). In addition, the forest canopy height model is constructed on the basis of the physical geometric model of mountain slope and spatial environment characteristics, and finally the canopy height inversion accuracy of this paper is higher (FVC>90%, R2=0.6729). The experiment proves that the model constructed in this paper is not only applicable to densely forest-covered mountainous areas, but also improves the accuracy of forest canopy height inversion in other environments. This study can provide technical and decision support for forest resource survey and global carbon balance.

Journal ArticleDOI
TL;DR: In this paper , a multitemporal segmentation method based on the coefficient of variation of spectral bands and vegetation indices obtained from Sentinel-2 images, considering two agricultural years (2018-2019 and 2019-2020) in an area with agricultural intensification, was proposed.
Abstract: Abstract. With the recent evolution in the sensor's spatial resolution, such as the MultiSpectral Imager (MSI) of the Sentinel-2 mission, the need to use segmentation techniques in satellite images has increased. Although the advantages of image segmentation to delineate agricultural fields in images are already known, the literature shows that it is rarely used to consider temporal changes in highly managed regions with the intensification of agricultural activities. Therefore, this work aimed to evaluate a multitemporal segmentation method based on the coefficient of variation of spectral bands and vegetation indices obtained from Sentinel-2 images, considering two agricultural years (2018–2019 and 2019–2020) in an area with agricultural intensification. Images of the coefficient of variation represented the spectro-temporal dynamics within the study area. These images were also used to apply an edge detection filter (Sobel) to verify their performance. The region-based algorithm Watershed Segmentation (WS) was used in the segmentation process. Subsequently, to assess the quality of the segmentation results produced, the metrics Potential Segmentation Error (PSE), Number-of-Segments Ratio (NSR), and Euclidean Distance 2 (ED2) were calculated from manually delineated reference objects. The segmentation achieved its best performance when applied to the unfiltered coefficient of variation images of spectral bands with an ED2 equal to 7.289 and 2.529 for 2018–2019 and 2019–2020, respectively. There was a tendency for the WS algorithm to produce over-segmentation in the study area; however, its use proved to be effective in identifying objects in a dynamic area with the intensification of agricultural activities.

Journal ArticleDOI
TL;DR: The experimental results prove that the presented approach achieves accurate integration of objects extracted from single images with an input 3D model, allowing for an effective increase of its semantic coverage.
Abstract: Abstract. 3D building modeling is a diverse field of research with a multitude of challenges, where data integration is an inherent component. The intensively growing market of BIM-related consumer applications requires methods and algorithms that enable efficient updates of existing 3D models without the need for cost-intensive data capturing and repetitive reconstruction processes. We propose a novel approach for semantic enrichment of existing indoor models by window objects, based on amateur camera RGB images with unknown exterior orientation parameters. The core idea of the approach is the parallel estimation of image camera poses with semantic recognition of target objects and their automatic mapping onto a 3D vector model. The presented solution goes beyond pure texture matching and links deep learning detection techniques with camera pose estimation and 3D reconstruction. To evaluate the performance of our procedure, we compare the estimated camera parameters with reference data, obtaining median values of 13.8 cm for the camera position and 1.1° for its orientation. Furthermore, a quality of 3D mapping is assessed based on the comparison to the reference 3D point cloud. All the windows presented in the data source were detected successfully, with a mean distance between both point sets equal to 3.6 cm. The experimental results prove that the presented approach achieves accurate integration of objects extracted from single images with an input 3D model, allowing for an effective increase of its semantic coverage.

Journal ArticleDOI
C Zhang, H. Xue, Guangfeng Dong, Hao Jing, S He 
TL;DR: Wang et al. as discussed by the authors proposed a hybrid physical data (HPD) model combining physical model and deep learning model for runoff estimation, which uses the output of a physical hydrological model together with the driving factors as another input of the neural network to estimate the monthly runoff of the upper Heihe River Basin in China.
Abstract: Abstract. Runoff estimations play an important role in water resource planning and management. Existing hydrological models can be divided into physical models and data-driven models. Although the physical model contains certain physical knowledge and can be well generalized to new scenarios, the application of physical models is limited by the high professional knowledge requirements, difficulty in obtaining data and high computational costs. The data-driven model can fit the observed data well, but the estimation may not be physically consistent. In this letter, we propose a hybrid physical data (HPD) model combining physical model and deep learning model for runoff estimation. The model uses the output of a physical hydrological model together with the driving factors as another input of the neural network to estimate the monthly runoff of the upper Heihe River Basin in China. We show that the use of the HPD model improves the quality of runoff estimation, and results in high R2, NSE values of 0.969, and a low RMSE value of 9.645. It is indicated that the new model had an excellent learning capability to simulate runoff and flexible ability to extract complex relevant information; At the same time, the estimation capacity of peak runoff is optimized.

Journal ArticleDOI
TL;DR: In this article , a new deep learning-based method for 3D Point Cloud Semantic Segmentation specifically designed for processing real-world LIDAR railway scenes is presented. But this method relies on the use of spatial local point cloud transformations for convolutional learning.
Abstract: Abstract. This paper presents a new deep-learning-based method for 3D Point Cloud Semantic Segmentation specifically designed for processing real-world LIDAR railway scenes. The new approach relies on the use of spatial local point cloud transformations for convolutional learning. These transformations allow an increased robustness to varying point cloud densities while preserving metric information and a sufficient descriptive ability. The resulting performances are illustrated with results on railway data from two distinct LIDAR point cloud datasets acquired in industrial settings. The quality of the extraction of useful information for maintenance operations and topological analysis is pointed together with a noticeable robustness to point cloud variations in distribution and point redundancy.