scispace - formally typeset
Search or ask a question

Showing papers on "Aerial image published in 2022"


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a deeply supervised (DS) attention metric-based network (DSAMNet) for change detection, which employed a metric module to learn change maps by means of deep metric learning, in which convolutional block attention modules (CBAM) were integrated to provide more discriminative features.
Abstract: Change detection (CD) aims to identify surface changes from bitemporal images. In recent years, deep learning (DL)-based methods have made substantial breakthroughs in the field of CD. However, CD results can be easily affected by external factors, including illumination, noise, and scale, which leads to pseudo-changes and noise in the detection map. To deal with these problems and achieve more accurate results, a deeply supervised (DS) attention metric-based network (DSAMNet) is proposed in this article. A metric module is employed in DSAMNet to learn change maps by means of deep metric learning, in which convolutional block attention modules (CBAM) are integrated to provide more discriminative features. As an auxiliary, a DS module is introduced to enhance the feature extractor’s learning ability and generate more useful features. Moreover, another challenge encountered by data-driven DL algorithms is posed by the limitations in change detection datasets (CDDs). Therefore, we create a CD dataset, Sun Yat-Sen University (SYSU)-CD, for bitemporal image CD, which contains a total of 20 000 aerial image pairs of size $256\times256$ . Experiments are conducted on both the CDD and the SYSU-CD dataset. Compared to other state-of-the-art methods, our network achieves the highest accuracy on both datasets, with an F1 of 93.69% on the CDD dataset and 78.18% on the SYSU-CD dataset.

71 citations


Journal ArticleDOI
TL;DR: A large-scale dataset of object detection in aerial images (DOTA) is presented in this article , which contains 1,793,658 object instances of 18 categories of oriented-bounding-box annotations collected from 11,268 aerial images.
Abstract: In he past decade, object detection has achieved significant progress in natural images but not in aerial images, due to the massive variations in the scale and orientation of objects caused by the bird's-eye view of aerial images. More importantly, the lack of large-scale benchmarks has become a major obstacle to the development of object detection in aerial images (ODAI). In this paper, we present a large-scale Dataset of Object deTection in Aerial images (DOTA) and comprehensive baselines for ODAI. The proposed DOTA dataset contains 1,793,658 object instances of 18 categories of oriented-bounding-box annotations collected from 11,268 aerial images. Based on this large-scale and well-annotated dataset, we build baselines covering 10 state-of-the-art algorithms with over 70 configurations, where the speed and accuracy performances of each model have been evaluated. Furthermore, we provide a code library for ODAI and build a website for evaluating different algorithms. Previous challenges run on DOTA have attracted more than 1300 teams worldwide. We believe that the expanded large-scale DOTA dataset, the extensive baselines, the code library and the challenges can facilitate the designs of robust algorithms and reproducible research on the problem of object detection in aerial images.

61 citations


Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a novel network for multi-scale object detection in aerial images using hierarchical dilated convolutions, called as mSODANet, which can learn the contextual information of different types of objects at multiple scales and multiple field-of-views.

27 citations


Journal ArticleDOI
TL;DR: Guo et al. as mentioned in this paper proposed a coarse-to-fine boundary refinement network (CBR-Net) which progressively refined the building prediction in a coarse to fine manner, maintaining both the continuous entities and accurate boundaries of buildings.
Abstract: Extracting building footprints from remotely sensed imagery has long been a challenging task and is not yet fully solved. Obstructions from nearby shadows or trees, varying shapes of rooftops, omission of small buildings, and varying scale of buildings hinder existing automated models for extracting sharp building boundaries. Different reasons account for these challenges. In convolutional neural network-based methods, the down-sampling operation loses spatial details of the input images; and small buildings are omitted from the high-level features. The sheltering trees and adjacent objects shadowing may cause errors since semantic information cannot be effectively preserved. Moreover, the insufficient use of multi-scale building features causes blurry edges in the predictions for buildings with complex shapes. To address these challenges, we propose a novel coarse-to-fine boundary refinement network (CBR-Net) that accurately extracts building footprints from remote sensing imagery. Unlike the existing semantic segmentation methods that directly generate building predictions at the highest level, we designed a module that progressively refines the building prediction in a coarse-to-fine manner. In this way, the advantages of both the high-level and low-level features can be retained. We also present a novel boundary refinement (BR) module that enhances the ability of the CBR-Net model to perceive and refine building edges. The BR module refines building prediction by perceiving the direction of each pixel in a remotely sensed optical image to the center of the nearest object to which it might belong. The refined results are used as pseudo labels in a self-supervision process that increases model robustness to noisy labels or obstructions. Experimental results on three public building datasets, including the WHU building dataset, the Massachusetts building dataset, and the Inria aerial image dataset, demonstrate the effectiveness of the proposed method. In evaluation tests, CBR-Net outperformed other state-of-the-art algorithms on the three datasets by maintaining both the continuous entities and accurate boundaries of buildings. The source code of the proposed CBR-Net is available at https://github.com/HaonanGuo/CBRNet .

25 citations


Journal ArticleDOI
TL;DR: A novel optimal Squeezenet with a deep neural network (OSQN-DNN) model for aerial image classification in UAV networks and the usage of COA for hyperparameter tuning of the SqueezeNet model helps to considerably boost the overall classification performance.
Abstract: In present times, unmanned aerial vehicles (UAVs) are widely employed in several real time applications due to their autonomous, inexpensive, and compact nature. Aerial image classification in UAVs has gained significant interest in surveillance systems that assist object detection and tracking processes. The advent of deep learning (DL) models paves a way to design effective aerial image classification techniques in UAV networks. In this view, this paper presents a novel optimal Squeezenet with a deep neural network (OSQN-DNN) model for aerial image classification in UAV networks. The proposed OSQN-DNN model initially enables the UAVs to capture images using the inbuilt imaging sensors. Besides, the OSQN model is applied as a feature extractor to derive a useful set of feature vectors where the coyote optimization algorithm (COA) is employed to optimally choose the hyperparameters involved in the classical SqueezeNet model. Moreover, the DNN model is utilized as a classifier that aims to allocate proper class labels to the applied input aerial images. Furthermore, the usage of COA for hyperparameter tuning of the SqueezeNet model helps to considerably boost the overall classification performance. For examining the enhanced aerial image classification performance of the OSQN-DNN model, a series of experiments were performed on the benchmark UCM dataset. The experimental results pointed out that the OSQN-DNN model has resulted in a maximum accuracy of 98.97% and a minimum running time of 1.26mts.

20 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a two-stage object detection framework called "Focus-and-Detect" which consists of an object detector network supervised by a Gaussian Mixture Model, which generates clusters of objects constituting the focused regions.
Abstract: Despite recent advances, object detection in aerial images is still a challenging task. Specific problems in aerial images makes the detection problem harder, such as small objects, densely packed objects, objects in different sizes and with different orientations. To address small object detection problem, we propose a two-stage object detection framework called “Focus-and-Detect”. The first stage which consists of an object detector network supervised by a Gaussian Mixture Model, generates clusters of objects constituting the focused regions. The second stage, which is also an object detector network, predicts objects within the focal regions. Incomplete Box Suppression (IBS) method is also proposed to overcome the truncation effect of region search approach. Results indicate that the proposed two-stage framework achieves an AP score of 42.06 on VisDrone validation dataset, surpassing all other state-of-the-art small object detection methods reported in the literature, to the best of authors’ knowledge.

16 citations


Journal ArticleDOI
TL;DR: The experimental results show that the improved YOLOv5 network can identify and locate objects on aerial roads more accurately and effectively.
Abstract: In this paper, the target detection technology based on deep learning is applied to the process of object detection in highway aerial photography. By detecting road objects such as vehicles or crosswalk, it lays the foundation for digitalization and informationization of roads. Firstly, the unmanned aerial vehicle is used to collect road images. Then based on YOLOv5 network, aiming at the problem of small detection target, attention mechanism is introduced to weigh different channels of feature graph, and SoftPool is introduced in SPP module to improve pooling operation and retain more detailed feature information. The experimental results show that the improved YOLOv5 network can identify and locate objects on aerial roads more accurately and effectively.

15 citations


Journal ArticleDOI
01 Apr 2022-Sensors
TL;DR: Experimental results show that the proposed AGs-Unet model can improve the quality of building extraction from high-resolution remote sensing images effectively in terms of prediction performance and result accuracy.
Abstract: Building contour extraction from high-resolution remote sensing images is a basic task for the reasonable planning of regional construction. Recently, building segmentation methods based on the U-Net network have become popular as they largely improve the segmentation accuracy by applying ‘skip connection’ to combine high-level and low-level feature information more effectively. Meanwhile, researchers have demonstrated that introducing an attention mechanism into U-Net can enhance local feature expression and improve the performance of building extraction in remote sensing images. In this paper, we intend to explore the effectiveness of the primeval attention gate module and propose the novel Attention Gate Module (AG) based on adjusting the position of ‘Resampler’ in an attention gate to Sigmoid function for a building extraction task, and a novel Attention Gates U network (AGs-Unet) is further proposed based on AG, which can automatically learn different forms of building structures in high-resolution remote sensing images and realize efficient extraction of building contour. AGs-Unet integrates attention gates with a single U-Net network, in which a series of attention gate modules are added into the ‘skip connection’ for suppressing the irrelevant and noisy feature responses in the input image to highlight the dominant features of the buildings in the image. AGs-Unet improves the feature selection of the attention map to enhance the ability of feature learning, as well as paying attention to the feature information of small-scale buildings. We conducted the experiments on the WHU building dataset and the INRIA Aerial Image Labeling dataset, in which the proposed AGs-Unet model is compared with several classic models (such as FCN8s, SegNet, U-Net, and DANet) and two state-of-the-art models (such as PISANet, and ARC-Net). The extraction accuracy of each model is evaluated by using three evaluation indexes, namely, overall accuracy, precision, and intersection over union. Experimental results show that the proposed AGs-Unet model can improve the quality of building extraction from high-resolution remote sensing images effectively in terms of prediction performance and result accuracy.

15 citations


Journal ArticleDOI
TL;DR: The evaluation results demonstrate that the Faster R-CNN model trained with the ResNet50 network architecture out-performed in terms of detection accuracy, with a mean average precision of 100% and 55.7% for the test data of the OSU thermal dataset and AAU PD T datasets, respectively.
Abstract: The automatic detection of humans in aerial thermal imagery plays a significant role in various real-time applications, such as surveillance, search and rescue and border monitoring. Small target size, low resolution, occlusion, pose, and scale variations are the significant challenges in aerial thermal images that cause poor performance for various state-of-the-art object detection algorithms. Though many deep-learning-based object detection algorithms have shown impressive performance for generic object detection tasks, their ability to detect smaller objects in the aerial thermal images is analyzed through this study. This work carried out the performance evaluation of Faster R-CNN and single-shot multi-box detector (SSD) algorithms with different backbone networks to detect human targets in aerial view thermal images. For this purpose, two standard aerial thermal datasets having human objects of varying scale are considered with different backbone networks, such as ResNet50, Inception-v2, and MobileNet-v1. The evaluation results demonstrate that the Faster R-CNN model trained with the ResNet50 network architecture out-performed in terms of detection accuracy, with a mean average precision (mAP at 0.5 IoU) of 100% and 55.7% for the test data of the OSU thermal dataset and AAU PD T datasets, respectively. SSD with MobileNet-v1 achieved the highest detection speed of 44 frames per second (FPS) on the NVIDIA GeForce GTX 1080 GPU. Fine-tuning the anchor parameters of the Faster R-CNN ResNet50 and SSD Inception-v2 algorithms caused remarkable improvement in mAP by 10% and 3.5%, respectively, for the challenging AAU PD T dataset. The experimental results demonstrated the application of Faster R-CNN and SSD algorithms for human detection in aerial view thermal images, and the impact of varying backbone network and anchor parameters on the performance improvement of these algorithms.

13 citations


Journal ArticleDOI
TL;DR: This study aims to improve the detection accuracy of the model and propose an attention-guided multitask convolutional neural network (AGMNet), and introduces a multitask framework that creatively considers the identification of rust levels and abnormal conditions of power line components, which has not been considered in previous works.
Abstract: Power line parts detection refers to the inspection of key parts on transmission lines against the complex background in aerial images and identifying whether exist anomalies that cause transmission failure. Obviously, this process plays a pivotal role in ensuring the safety of power transmission. Most of the existing methods are based on deep convolutional neural networks. However, the complexity and variability of the aerial image background and the problem of unmanned aerial vehicles (UAVs) shooting perspective and distance pose a challenge for previous works. This study aims to improve the detection accuracy of the model and propose an attention-guided multitask convolutional neural network (AGMNet). First, to enhance the feature representation of objects in aerial images, we construct spatial region attention blocks that are suitable for object detection. It can be inserted into any existing convolutional backbone network. Due to its efficient feature tensor computation method, the network can obtain competitive results with less computational memory. Second, we introduce a multitask framework that creatively considers the identification of rust levels and abnormal conditions of power line components, which has not been considered in previous works. Finally, we incorporate the refinable region proposal network (RPN) structure and multiscale training strategy to improve the robustness of the network. The experimental results on the testing datasets show that the proposed AGMNet can recognize the power parts (dampers and suspension clamps) with a mean average precision (mAP) of 95.3% and simultaneously identify their rust levels with an mAP of 75.4% and abnormal conditions with an mAP of 92.7%.

12 citations


Journal ArticleDOI
TL;DR: TransEffiDet as mentioned in this paper proposes an aircraft detection method based on the EfficientDet method and deformable transformer module, which can efficiently fuse the different scale feature maps for global feature extraction.
Abstract: In recent years, analysis and optimization algorithm based on image data is a research hotspot. Aircraft detection based on aerial images can provide data support for accurately attacking military targets. Although many efforts have been devoted, it is still challenging due to the poor environment, the vastness of the sky background, and so on. This paper proposes an aircraft detection method named TransEffiDet in aerial images based on the EfficientDet method and Transformer module. We improved the EfficientDet algorithm by combining it with the Transformer which models the long-range dependency for the feature maps. Specifically, we first employ EfficientDet as the backbone network, which can efficiently fuse the different scale feature maps. Then, deformable Transformer is used to analyze the long-range correlation for global feature extraction. Furthermore, we designed a fusion module to fuse the long-range and short-range features extracted by EfficientDet and deformable Transformer, respectively. Finally, object class is produced by feeding the feature map to the class prediction net and the bounding box predictions are generated by feeding these fused features to the box prediction net. The mean Average Precision (mAP) is 86.6%, which outperforms the EfficientDet by 5.8%. The experiment shows that TransEffiDet is more robust than other methods. Additionally, we have established a public aerial dataset for aircraft detection, which will be released along with this paper.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper adjusted YOLOv4 to the image's characteristics, making the improved method more suitable for target detection in UAV aerial images, according to the characteristics of the activation function, different activation functions were used in the shallow network and the deep network, respectively.
Abstract: Target detection based on unmanned aerial vehicle (UAV) images has increasingly become a hot topic with the rapid development of UAVs and related technologies. UAV aerial images often feature a large number of small targets and complex backgrounds due to the UAV’s flying height and shooting angle of view. These characteristics make the advanced YOLOv4 detection method lack outstanding performance in UAV aerial images. In light of the aforementioned problems, this study adjusted YOLOv4 to the image’s characteristics, making the improved method more suitable for target detection in UAV aerial images. Specifically, according to the characteristics of the activation function, different activation functions were used in the shallow network and the deep network, respectively. The loss for the bounding box regression was computed using the EIOU loss function. Improved Efficient Channel Attention (IECA) modules were added to the backbone. At the neck, the Spatial Pyramid Pooling (SPP) module was replaced with a pyramid pooling module. At the end of the model, Adaptive Spatial Feature Fusion (ASFF) modules were added. In addition, a dataset of forklifts based on UAV aerial imagery was also established. On the PASCAL VOC, VEDAI, and forklift datasets, we ran a series of experiments. The experimental results reveal that the proposed method (YOLO-DRONE, YOLOD) has better detection performance than YOLOv4 for the aforementioned three datasets, with the mean average precision (mAP) being improved by 3.06%, 3.75%, and 1.42%, respectively.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed a workflow of UAV-based 3D mapping consisting of four major steps, including data acquisition by using an optimal trajectory configuration, image matching to obtain reliable correspondences, aerial triangulation (AT) to resume accurate camera poses, and dense image-matching to generate point clouds with high density.
Abstract: Three-dimensional mapping is an increasingly important feature for recent photogrammetry and remote sensing (RS) systems. Currently, unmanned aerial vehicles (UAVs) have become one of the extensively used RS platforms due to their high timeliness and flexibility on data acquisition as well as their high spatial resolution of recorded images. UAV-based 3D mapping has overwhelming advantages over traditional data sources from satellite and aerial platforms. Generally, the workflow of UAV-based 3D mapping consists of four major steps, including 1) data acquisition by using an optimal trajectory configuration, 2) image matching to obtain reliable correspondences, 3) aerial triangulation (AT) to resume accurate camera poses, and 4) dense image matching to generate point clouds with high density. The performance of the algorithms used in each step determines the reliability and precision of the final 3D mapping products.

Journal ArticleDOI
TL;DR: A novel dense generative adversarial network for real aerial imagery super-resolution reconstruction (NDSRGAN) is proposed, and image datasets with paired high- and low-resolution real aerial remote sensing images are produced.
Abstract: In recent years, more and more researchers have used deep learning methods for super-resolution reconstruction and have made good progress. However, most of the existing super-resolution reconstruction models generate low-resolution images for training by downsampling high-resolution images through bicubic interpolation, and the models trained from these data have poor reconstruction results on real-world low-resolution images. In the field of unmanned aerial vehicle (UAV) aerial photography, the use of existing super-resolution reconstruction models in reconstructing real-world low-resolution aerial images captured by UAVs is prone to producing some artifacts, texture detail distortion and other problems, due to compression and fusion processing of the aerial images, thereby resulting in serious loss of texture detail in the obtained low-resolution aerial images. To address this problem, this paper proposes a novel dense generative adversarial network for real aerial imagery super-resolution reconstruction (NDSRGAN), and we produce image datasets with paired high- and low-resolution real aerial remote sensing images. In the generative network, we use a multilevel dense network to connect the dense connections in a residual dense block. In the discriminative network, we use a matrix mean discriminator that can discriminate the generated images locally, no longer discriminating the whole input image using a single value but instead in chunks of regions. We also use smoothL1 loss instead of the L1 loss used in most existing super-resolution models, to accelerate the model convergence and reach the global optimum faster. Compared with traditional models, our model can better utilise the feature information in the original image and discriminate the image in patches. A series of experiments is conducted with real aerial imagery datasets, and the results show that our model achieves good performance on quantitative metrics and visual perception.

Journal ArticleDOI
TL;DR: In this paper , the authors proposed an end-to-end convolutional neural network (CNN) architecture consisting of contracting and symmetric expanding paths that precisely extract the global features to segment the vegetation class from the aerial image.

Journal ArticleDOI
TL;DR: This paper proposed the self-built aerial infrared dataset and designed ablation experiments to perform model evaluation well, and introduced Smooth-L1 regularization on channel scale factors, and pruned the channels and layers with less feature information to obtain a pruned YOLOv3 model.
Abstract: Aerial object detection acts a pivotal role in searching and tracking applications. However, the large model, limited memory, and computing power of embedded devices restrict aerial pedestrian detection algorithms’ deployment on the UAV (unmanned aerial vehicle) platform. In this paper, an innovative method of aerial infrared YOLO (AIR-YOLOv3) is proposed, which combines network pruning and the YOLOv3 method. Firstly, to achieve a more appropriate number and size of the prior boxes, the prior boxes are reclustered. Then, to accelerate the inference speed on the premise of ensuring the detection accuracy, we introduced Smooth-L1 regularization on channel scale factors, and we pruned the channels and layers with less feature information to obtain a pruned YOLOv3 model. Meanwhile, we proposed the self-built aerial infrared dataset and designed ablation experiments to perform model evaluation well. Experimental results show that the AP (average precision) of AIR-YOLOv3 is 91.5% and the model size is 10.7 MB (megabyte). Compared to the original YOLOv3, its model volume compressed by 228.7 MB, nearly 95.5 %, while the model AP decreased by only 1.7%. The calculation amount is reduced by about 2/3, and the inference speed on the airborne TX2 has been increased from 3.7 FPS (frames per second) to 8 FPS.

Journal ArticleDOI
TL;DR: A new benchmark is proposed, dubbed Aerial Image Few-Shot Dataset (AIFS-DATASET), which is composed of diverse datasets and can provide more realistic heterogeneous task distributions and outperforms the current prevailing FSL approaches by a large margin.
Abstract: Few-shot learning (FSL), which aims to rapidly recognize unseen categories with limited samples, has attracted wide attention in aerial image scene classification. However, existing methods generally train and evaluate the model within a dataset, and changing the dataset requires retraining and evaluation, which only realizes the generalization of intra-dataset. Considering meta-learning, this brings in a natural assumption: FSL should learn meta-knowledge from cross-domain heterogeneous tasks, then can generalize to new data distributions (e.g., datasets) with few samples. To this end, we propose a new benchmark, dubbed Aerial Image Few-Shot Dataset (AIFS-DATASET), which is composed of diverse datasets and can provide more realistic heterogeneous task distributions. On AIFS-DATASET, we use many heterogeneous tasks, which across multi-domains without any aerial image category, to train the model, achieving “see more”. Then we transfer the learned knowledge to new tasks in aerial images to evaluate the generalization performance of the model, thus acquiring a “well-informed” few-shot aerial image scene classification model. Moreover, the challenges of inter-class similarity and intra-class discrepancy in aerial images still exist. We also develop a Dual Constrained Distance Metric Learning (DC-DML) framework to deal with the variable learning tasks adaptively as well as to achieve compact data distribution within a class and clear distribution gaps between classes from the perspective of metric learning. DC-DML mainly employs a task-adapted feature extractor while devising a novel distance metric with a cross-class bias penalty. By conducting experiments on AIFS-DATASET, we observed that DC-DML outperforms the current prevailing FSL approaches by a large margin.

Journal ArticleDOI
TL;DR: In this article, an improved You Only Look Once (YOLO) algorithm was proposed to solve the problem of low accuracy in multi-scale target detection locations, slow detection, missed targets and misprediction of targets.
Abstract: Aerial image-based target object detection has several glitches such as low accuracy in multi-scale target detection locations, slow detection, missed targets, and misprediction of targets. To solve this problem, this paper proposes an improved You Only Look Once (YOLO) algorithm from the viewpoint of model efficiency using target box dimension clustering, classification of the pre-trained network, multi-scale detection training, and changing the screening rules of the candidate box. This modified approach has the potential to be better adapted to the positioning task. The aerial image of the unmanned aerial vehicle (UAV) can be positioned to the target area in real-time, and the projection relation can convert the latitude and longitude of the UAV. The results proved to be more effective; notably, the average accuracy of the detection network in the aerial image of the target area detection tasks increased to 79.5%. The aerial images containing the target area are considered to experiment with the flight simulation to verify its network positioning accuracy rate and were found to be greater than 84%. This proposed model can be effectively used for real-time target detection for multi-scale targets with reduced misprediction rate due to its superior accuracy.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper proposed a coarse-to-fine attentional deep network, which enables the formation of class information constraints to obtain explicit long-range context information, and achieved state-of-the-art performance on the IGS Data Fusion Contest Zeebrugge data set.
Abstract: Semantic segmentation is important for the understanding of subdecimeter aerial images. In recent years, deep convolutional neural networks (DCNNs) have been used widely for semantic segmentation in the field of remote sensing. However, because of the highly complex subdecimeter resolution of aerial images, inseparability often occurs among some geographic entities of interest in the spectral domain. In addition, the semantic segmentation methods based on DCNNs mostly obtain context information using extra information within the added receptive field. However, the context information obtained this way is not explicit. We propose a novel class-constraint coarse-to-fine attentional (CCA) deep network, which enables the formation of class information constraints to obtain explicit long-range context information. Further, the performance of subdecimeter aerial image semantic segmentation can be improved, particularly for fine-structured geographic entities. Based on coarse-to-fine technology, we obtained a coarse segmentation result and constructed an image class feature library. We propose the use of the attention mechanism to obtain strong class-constrained features. Consequently, pixels of different geographic entities can adaptively match the corresponding categories in the class feature library. Additionally, we employed a novel loss function, CCA-loss to realize end-to-end training. The experimental results obtained using two popular open benchmarks, International Society for Photogrammetry and Remote Sensing (ISPRS) 2-D semantic labeling Vaihingen data set and Institute of Electrical and Electronics Engineers (IEEE) Geoscience and Remote Sensing Society (GRSS) Data Fusion Contest Zeebrugge data set, validated the effectiveness and superiority of our proposed model. The proposed method achieved state-of-the-art performance on the IEEE GRSS Data Fusion Contest Zeebrugge data set.

Journal ArticleDOI
TL;DR: Zong et al. as mentioned in this paper proposed a simple yet effective calibrated-guidance (CG) scheme to enhance channel communications in a feature transformer fashion, which can adaptively determine the calibration weights for each channel based on the global feature affinity correlations.
Abstract: Object detection is one of the most fundamental yet challenging research topics in the domain of computer vision. Recently, the study on this topic in aerial images has made tremendous progress. However, complex background and worse imaging quality are obvious problems in aerial object detection. Most state-of-the-art approaches tend to develop elaborate attention mechanisms for the space-time feature calibrations with arduous computational complexity, while surprisingly ignoring the importance of feature calibrations in channel-wise. In this work, we propose a simple yet effective calibrated-guidance (CG) scheme to enhance channel communications in a feature transformer fashion, which can adaptively determine the calibration weights for each channel based on the global feature affinity correlations. Specifically, for a given set of feature maps, CG first computes the feature similarity between each channel and the remaining channels as the intermediary calibration guidance. Then, rerepresenting each channel by aggregating all the channels weighted together via the guidance operation. Our CG is a general module that can be plugged into any deep neural networks, which is named as CG-Net. To demonstrate its effectiveness and efficiency, extensive experiments are carried out on both oriented object detection task and horizontal object detection task in aerial images. Experimental results on two challenging benchmarks (i.e., DOTA and HRSC2016) demonstrate that our CG-Net can achieve the new state-of-the-art performance in accuracy with a fair computational overhead. The source code has been open sourced at https://github.com/WeiZongqi/CG-Net.

Journal ArticleDOI
TL;DR: In this paper , the authors collected and processed drone imagery of a moderate-density, structurally complex mixed-conifer stand and compared these maps to a 3.23-ha ground reference map of 1,775 trees >5 m tall that they created using traditional field survey methods.
Abstract: Recent advances in remotely piloted aerial systems (‘drones’) and imagery processing enable individual tree mapping in forests across broad areas with low-cost equipment and minimal ground-based data collection. One such method involves collecting many partially overlapping aerial photos, processing them using ‘structure from motion’ (SfM) photogrammetry to create a digital 3D representation and using the 3D model to detect individual trees. SfM-based forest mapping involves myriad decisions surrounding methods and parameters for imagery acquisition and processing, but it is unclear how these individual decisions or their combinations impact the quality of the resulting forest inventories. We collected and processed drone imagery of a moderate-density, structurally complex mixed-conifer stand. We tested 22 imagery collection methods (altering flight altitude, camera pitch and image overlap), 12 imagery processing parameterizations (image resolutions and depth map filtering intensities) and 286 tree detection methods (algorithms and their parameterizations) to create 7,568 tree maps. We compared these maps to a 3.23-ha ground reference map of 1,775 trees >5 m tall that we created using traditional field survey methods. The accuracy of individual tree detection (ITD) and the resulting tree maps was generally maximized by collecting imagery at high altitude (120 m) with at least 90% image-to-image overlap, photogrammetrically processing images into a canopy height model (CHM) with a twofold upscaling (coarsening) step and detecting trees from the CHM using a variable window filter after applying a moving window mean smooth to the CHM. Using this combination of methods, we mapped trees with an accuracy exceeding expectations for structurally complex forests (for canopy-dominant trees >10 m tall, sensitivity = 0.69 and precision = 0.90). Remotely measured tree heights corresponded to ground-measured heights with R2 = 0.95. Accuracy was higher for taller trees and lower for understorey trees and would likely be higher in less dense and less structurally complex stands. Our results may guide others wishing to efficiently produce broad-extent individual tree maps of conifer forests without investing substantial time tailoring imagery acquisition and processing parameters. The resulting tree maps create opportunities for addressing previously intractable ecological questions and informing forest management.

Journal ArticleDOI
TL;DR: In this article , an overview of the most frequently applied mosaicing techniques in UAVs by providing an introduction to those interested in developing in this area is given, showing the trend of the research field and the contribution of different countries over time.
Abstract: The use of UAV (unmanned aerial vehicle) technology has allowed for advances in the area of robotics in control processes and application development. Such is the case of image processing, in which, by the use of aerial photographs taken by these aircrafts, it is possible to perform surveillance and monitoring tasks. As an example, we can mention the use of aerial photographs for the generation of panoramic images through the process of stitching images without losing image resolution. Some applications are photogrammetry and mapping, where the main problems to be solved are image alignment and ghosting images, for which different stitching techniques can be applied. These methodologies can be categorized into direct methods or feature-based methods. This paper aims to show an overview of the most frequently applied mosaicing techniques in UAVs by providing an introduction to those interested in developing in this area. For this purpose, a summary of the most applied techniques and their applications is given, showing the trend of the research field and the contribution of different countries over time.

Journal ArticleDOI
TL;DR: In this paper , a novel software package, called RoboPV, is introduced for autonomous aerial monitoring of PV plants, from optimal trajectory planning to image processing and pattern recognition for real-time fault detection and analysis.

Journal ArticleDOI
TL;DR: Zhang et al. as mentioned in this paper proposed a new Progressive Attention Generative Adversarial Network (PAGAN) with two novel components: a multistage progressive generation framework and a cross-stage attention module.

Journal ArticleDOI
TL;DR: In this article , a UAV aerial image stitching algorithm is proposed based on semantic segmentation and ORB, which can solve the problem of splicing misalignment and background stitching caused by dynamic foreground and improves the stitching quality of UAV low-altitude aerial images.
Abstract: UAVs are flexible in action, changeable in shooting angles, and complex and changeable in the shooting environment. Most of the existing stitching algorithms are suitable for images collected by UAVs in static environments, but the images are in fact being captured dynamically, especially in low-altitude flights. Considering that the great changes of the object position may cause the low-altitude aerial images to be affected by the moving foreground during stitching, so as to result in quality problems, such as splicing misalignment and tearing, a UAV aerial image stitching algorithm is proposed based on semantic segmentation and ORB. In the image registration, the algorithm introduces a semantic segmentation network to separate the foreground and background of the image and obtains the foreground semantic information. At the same time, it uses the quadtree decomposition idea and the classical ORB algorithm to extract feature points. By comparing the feature point information with the foreground semantic information, the foreground feature points can be deleted to realize feature point matching. Based on the accurate image registration, the image stitching and fusion will be achieved by the homography matrix and the weighted fusion algorithm. The proposed algorithm not only preserves the details of the original image, but also improves the four objective data points of information entropy, average gradient, peak signal-to-noise ratio and root mean square error. It can solve the problem of splicing misalignment tearing during background stitching caused by dynamic foreground and improves the stitching quality of UAV low-altitude aerial images.

Journal ArticleDOI
TL;DR: A novel adaptive screening feature network (ASF-Net) is proposed, which can independently screen and enhance effective feature information from two aspects and is proposed an Adaptive Information Utilization Block (AIUB) to enlarge the receptive field of feature maps and refine the incomplete building footprint.
Abstract: Building footprint extraction plays an important role in many remote-sensing (RS) applications such as urban planning and disaster monitoring. Mainly, the exploitation of contextual information in a fixed receptive field is the focus of previous research, which makes it difficult to generically extract buildings that vary greatly in size and shape, especially when isolated large buildings are surrounded by dense small buildings. To improve this problem, we attempt to teach the network to adjust the receptive field and enhance useful feature information adaptively. In this article, we propose a novel adaptive screening feature network (ASF-Net), which can independently screen and enhance effective feature information from two aspects. On the one hand, we propose a deepened space up-sampling block to screen useful information and help establish boundaries. On the other hand, we propose an Adaptive Information Utilization Block (AIUB) to enlarge the receptive field of feature maps and refine the incomplete building footprint. As a result, the more accurate multiscale building footprint is inferred from the enhanced features. Experimental results on the popular aerial image segmentation datasets show that ASF-Net obtains competitive results [80.2% intersection over union (IoU) on the Inria aerial image labeling dataset and 74.2% IoU on the Massachusetts buildings dataset] in comparison with several state-of-the-art models. The TensorFlow implementation is available at https://github.com/jyx0516/ASF-Net.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors adopted SIFT (Scale invariant feature transform) and SURF (Speeded up robust features) feature detection algorithms to conduct mosaic experiments based on drone images of a rice field in Dehui City, Jilin Province.
Abstract: Unmanned aerial vehicle (UAV) low-altitude remote sensing image stitching is a new technology to promptly grasp the lodging situation of rice. The effect of image stitching depends on different application scenarios, so that it is necessary to explore low-altitude remote sensing image stitching algorithm suitable for rice lodging monitoring. The research adopts SIFT (Scale invariant feature transform) and SURF (Speeded up robust features) feature detection algorithms to conduct mosaic experiments based on drone images of a rice field in Dehui City, Jilin Province. The results demonstrate that the image stitching technology based on surf algorithm possesses better real-time performance, and the panorama obtained can well reflect the lodging condition of rice field. This research can provide technical reference for the actual lodging monitoring of rice field.

Journal ArticleDOI
14 Apr 2022-Data
TL;DR: In this paper , the HAGDAVS dataset fusing RGB spectral channel and Digital Surface Model DSM for detection and segmentation of vehicles from aerial drone images, including three vehicle classes: cars, motorcycles, and ghosts (motorcycle or car).
Abstract: Detection and Semantic Segmentation of vehicles in drone aerial orthomosaics has applications in a variety of fields such as security, traffic and parking management, urban planning, logistics, and transportation, among many others. This paper presents the HAGDAVS dataset fusing RGB spectral channel and Digital Surface Model DSM for the detection and segmentation of vehicles from aerial drone images, including three vehicle classes: cars, motorcycles, and ghosts (motorcycle or car). We supply DSM as an additional variable to be included in deep learning and computer vision models to increase its accuracy. RGB orthomosaic, RG-DSM fusion, and multi-label mask are provided in Tag Image File Format. Geo-located vehicle bounding boxes are provided in GeoJSON vector format. We also describes the acquisition of drone data, the derived products, and the workflow to produce the dataset. Researchers would benefit from using the proposed dataset to improve results in the case of vehicle occlusion, geo-location, and the need for cleaning ghost vehicles. As far as we know, this is the first openly available dataset for vehicle detection and segmentation, comprising RG-DSM drone data fusion and different color masks for motorcycles, cars, and ghosts.

Journal ArticleDOI
TL;DR: In this paper , a combination of semantic image segmentation and photogrammetry is proposed to monitor changes in built heritage sites, where the authors focus on segmenting potentially damaging plants from the surrounding stone masonry and other image elements.
Abstract: Abstract Crowdsourced images hold information could potentially be used to remotely monitor heritage sites, and reduce human and capital resources devoted to on-site inspections. This article proposes a combination of semantic image segmentation and photogrammetry to monitor changes in built heritage sites. In particular, this article focuses on segmenting potentially damaging plants from the surrounding stone masonry and other image elements. The method compares different backend models and two model architectures: (i) a one-stage model that segments seven classes within the image, and (ii) a two-stage model that uses the results from the first stage to refine a binary segmentation for the plant class. The final selected model can achieve an overall IoU of 66.9% for seven classes (54.6% for one-stage plant, 56.2% for two-stage plant). Further, the segmentation output is combined with photogrammetry to build a 3D segmented model to measure the area of biological growth. Lastly, the main findings from this paper are: (i) With the help of transfer learning and proper choice of model architecture, image segmentation can be easily applied to analyze crowdsourcing data. (ii) Photogrammetry can be combined with image segmentation to alleviate image distortions for monitoring purpose. (iii) Beyond the measurement of plant area, this method has the potential to be easily transferred into other tasks, such as monitoring cracks and erosion, or as a masking tool in the photogrammetry workflow.

Journal ArticleDOI
TL;DR: A complementary information-learning model (CILM) to perform multi-view scene classification of aerial and ground-level images and achieves remarkable performance, indicating that it is an effective model for learning complementary information and thus improving urban scene classification.
Abstract: Traditional urban scene-classification approaches focus on images taken either by satellite or in aerial view. Although single-view images are able to achieve satisfactory results for scene classification in most situations, the complementary information provided by other image views is needed to further improve performance. Therefore, we present a complementary information-learning model (CILM) to perform multi-view scene classification of aerial and ground-level images. Specifically, the proposed CILM takes aerial and ground-level image pairs as input to learn view-specific features for later fusion to integrate the complementary information. To train CILM, a unified loss consisting of cross entropy and contrastive losses is exploited to force the network to be more robust. Once CILM is trained, the features of each view are extracted via the two proposed feature-extraction scenarios and then fused to train the support vector machine classifier for classification. The experimental results on two publicly available benchmark data sets demonstrate that CILM achieves remarkable performance, indicating that it is an effective model for learning complementary information and thus improving urban scene classification.