scispace - formally typeset
Search or ask a question

Showing papers on "Aerial image published in 2018"


Proceedings ArticleDOI
01 Jun 2018
TL;DR: The Dataset for Object Detection in Aerial Images (DOTA) as discussed by the authors is a large-scale dataset of aerial images collected from different sensors and platforms and contains objects exhibiting a wide variety of scales, orientations, and shapes.
Abstract: Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect 2806 aerial images from different sensors and platforms. Each image is of the size about 4000 A— 4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using 15 common object categories. The fully annotated DOTA images contains 188, 282 instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral. To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.

1,502 citations


Journal ArticleDOI
TL;DR: A large-scale aerial image data set is constructed for remote sensing image caption and extensive experiments demonstrate that the content of theRemote sensing image can be completely described by generating language descriptions.
Abstract: Inspired by recent development of artificial satellite, remote sensing images have attracted extensive attention. Recently, notable progress has been made in scene classification and target detection. However, it is still not clear how to describe the remote sensing image content with accurate and concise sentences. In this paper, we investigate to describe the remote sensing images with accurate and flexible sentences. First, some annotated instructions are presented to better describe the remote sensing images considering the special characteristics of remote sensing images. Second, in order to exhaustively exploit the contents of remote sensing images, a large-scale aerial image data set is constructed for remote sensing image caption. Finally, a comprehensive review is presented on the proposed data set to fully advance the task of remote sensing caption. Extensive experiments on the proposed data set demonstrate that the content of the remote sensing image can be completely described by generating language descriptions. The data set is available at https://github.com/201528014227051/RSICD_optimal .

212 citations


Journal ArticleDOI
TL;DR: A multi-constraint fully convolutional network (MC–FCN) model is proposed to perform end-to-end building segmentation and significantly outperforms the classic FCN method and the adaptive boosting method using features extracted by the histogram of oriented gradients.
Abstract: Automatic building segmentation from aerial imagery is an important and challenging task because of the variety of backgrounds, building textures and imaging conditions. Currently, research using variant types of fully convolutional networks (FCNs) has largely improved the performance of this task. However, pursuing more accurate segmentation results is still critical for further applications such as automatic mapping. In this study, a multi-constraint fully convolutional network (MC–FCN) model is proposed to perform end-to-end building segmentation. Our MC–FCN model consists of a bottom-up/top-down fully convolutional architecture and multi-constraints that are computed between the binary cross entropy of prediction and the corresponding ground truth. Since more constraints are applied to optimize the parameters of the intermediate layers, the multi-scale feature representation of the model is further enhanced, and hence higher performance can be achieved. The experiments on a very-high-resolution aerial image dataset covering 18 km 2 and more than 17,000 buildings indicate that our method performs well in the building segmentation task. The proposed MC–FCN method significantly outperforms the classic FCN method and the adaptive boosting method using features extracted by the histogram of oriented gradients. Compared with the state-of-the-art U–Net model, MC–FCN gains 3.2% (0.833 vs. 0.807) and 2.2% (0.893 vs. 0.874) relative improvements of Jaccard index and kappa coefficient with the cost of only 1.8% increment of the model-training time. In addition, the sensitivity analysis demonstrates that constraints at different positions have inconsistent impact on the performance of the MC–FCN.

157 citations


Journal ArticleDOI
TL;DR: The experimental results show that the proposed automated vehicle detection and counting system is efficient and effective, and produces higher precision and recall rate than the comparative methods.
Abstract: Vehicle detection and counting in aerial images have become an interesting research focus since the last decade. It is important for a wide range of applications, such as urban planning and traffic management. However, this task is a challenging one due to the small size of the vehicles, their different types and orientations, and similarity in their visual appearance, and some other objects, such as air conditioning units on buildings, trash bins, and road marks. Many methods have been introduced in the literature for solving this problem. These methods are either based on shallow learning or deep learning approaches. However, these methods suffer from relatively low precision and recall rate. This paper introduces an automated vehicle detection and counting system in aerial images. The proposed system utilizes convolution neural network to regress a vehicle spatial density map across the aerial image. It has been evaluated on two publicly available data sets, namely, Munich and Overhead Imagery Research Data Set. The experimental results show that our proposed system is efficient and effective, and produces higher precision and recall rate than the comparative methods.

133 citations


Journal ArticleDOI
TL;DR: In this article, a multiscale CNN (MCNN) framework is proposed to solve the problem of scale variation of the objects in remote sensing imagery, and a similarity measure layer is added to MCNN, which forces the two feature vectors extracted from the image and its corresponding rescaled image to be as close as possible in the training phase.
Abstract: With the large amount of high-spatial resolution images now available, scene classification aimed at obtaining high-level semantic concepts has drawn great attention. The convolutional neural networks (CNNs), which are typical deep learning methods, have widely been studied to automatically learn features for the images for scene classification. However, scene classification based on CNNs is still difficult due to the scale variation of the objects in remote sensing imagery. In this paper, a multiscale CNN (MCNN) framework is proposed to solve the problem. In MCNN, a network structure containing dual branches of a fixed-scale net (F-net) and a varied-scale net (V-net) is constructed and the parameters are shared by the F-net and V-net. The images and their rescaled images are fed into the F-net and V-net, respectively, allowing us to simultaneously train the shared network weights on multiscale images. Furthermore, to ensure that the features extracted from MCNN are scale invariant, a similarity measure layer is added to MCNN, which forces the two feature vectors extracted from the image and its corresponding rescaled image to be as close as possible in the training phase. To demonstrate the effectiveness of the proposed method, we compared the results obtained using three widely used remote sensing data sets: the UC Merced data set, the aerial image data set, and the google data set of SIRI-WHU. The results confirm that the proposed method performs significantly better than the other state-of-the-art scene classification methods.

105 citations


Journal ArticleDOI
TL;DR: The proposed architecture is based on Convolutional Neural Networks, and it combines neural codes extracted from a CNN with a k-Nearest Neighbor method so as to improve performance, significantly outperforming previous state-of-the-art ship classification methods.
Abstract: The automatic classification of ships from aerial images is a considerable challenge. Previous works have usually applied image processing and computer vision techniques to extract meaningful features from visible spectrum images in order to use them as the input for traditional supervised classifiers. We present a method for determining if an aerial image of visible spectrum contains a ship or not. The proposed architecture is based on Convolutional Neural Networks (CNN), and it combines neural codes extracted from a CNN with a k-Nearest Neighbor method so as to improve performance. The kNN results are compared to those obtained with the CNN Softmax output. Several CNN models have been configured and evaluated in order to seek the best hyperparameters, and the most suitable setting for this task was found by using transfer learning at different levels. A new dataset (named MASATI) composed of aerial imagery with more than 6000 samples has also been created to train and evaluate our architecture. The experimentation shows a success rate of over 99% for our approach, in contrast with the 79% obtained with traditional methods in classification of ship images, also outperforming other methods based on CNNs. A dataset of images (MWPU VHR-10) used in previous works was additionally used to evaluate the proposed approach. Our best setup achieves a success ratio of 86% with these data, significantly outperforming previous state-of-the-art ship classification methods.

101 citations


Journal ArticleDOI
TL;DR: A new two-stream deep architecture for aerial scene classification that uses two pretrained convolutional neural networks as feature extractor and the extreme learning machine (ELM) classifier for final classification with the fused features.
Abstract: One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification A well-designed feature representation method and classifier can improve classification accuracy In this paper, we construct a new two-stream deep architecture for aerial scene classification First, we use two pretrained convolutional neural networks (CNNs) as feature extractor to learn deep features from the original aerial image and the processed aerial image through saliency detection, respectively Second, two feature fusion strategies are adopted to fuse the two different types of deep convolutional features extracted by the original RGB stream and the saliency stream Finally, we use the extreme learning machine (ELM) classifier for final classification with the fused features The effectiveness of the proposed architecture is tested on four challenging datasets: UC-Merced dataset with 21 scene categories, WHU-RS dataset with 19 scene categories, AID dataset with 30 scene categories, and NWPU-RESISC45 dataset with 45 challenging scene categories The experimental results demonstrate that our architecture gets a significant classification accuracy improvement over all state-of-the-art references

89 citations


Proceedings ArticleDOI
22 Jul 2018
TL;DR: The outcomes of the first year of the Inria Aerial Image Labeling Benchmark, which consisted in dense labeling of aerial images into building / not building classes, are discussed, with four methods with the highest numerical accuracies being convolutional neural network approaches.
Abstract: Over the recent years, there has been an increasing interest in large-scale classification of remote sensing images. In this context, the Inria Aerial Image Labeling Benchmark has been released online in December 2016. In this paper, we discuss the outcomes of the first year of the benchmark contest, which consisted in dense labeling of aerial images into building / not building classes, covering areas of five cities not present in the training set. We present four methods with the highest numerical accuracies, all four being convolutional neural network approaches. It is remarkable that three of these methods use the U-net architecture, which has thus proven to become a new standard in image dense labeling.

87 citations


Posted Content
TL;DR: This paper proposes a fully convolutional-deconvolutional network architecture being trained end-to-end, encompassing residual learning, to model the ambiguous mapping between monocular remote sensing images and height maps, and introduces a skip connection to the network to preserve fine edge details of estimated height maps.
Abstract: In this paper we tackle a very novel problem, namely height estimation from a single monocular remote sensing image, which is inherently ambiguous, and a technically ill-posed problem, with a large source of uncertainty coming from the overall scale. We propose a fully convolutional-deconvolutional network architecture being trained end-to-end, encompassing residual learning, to model the ambiguous mapping between monocular remote sensing images and height maps. Specifically, it is composed of two parts, i.e., convolutional sub-network and deconvolutional sub-network. The former corresponds to feature extractor that transforms the input remote sensing image to high-level multidimensional feature representation, whereas the latter plays the role of a height generator that produces height map from the feature extracted from the convolutional sub-network. Moreover, to preserve fine edge details of estimated height maps, we introduce a skip connection to the network, which is able to shuttle low-level visual information, e.g., object boundaries and edges, directly across the network. To demonstrate the usefulness of single-view height prediction, we show a practical example of instance segmentation of buildings using estimated height map. This paper, for the first time in the remote sensing community, attempts to estimate height from monocular vision. The proposed network is validated using a large-scale high resolution aerial image data set covered an area of Berlin. Both visual and quantitative analysis of the experimental results demonstrate the effectiveness of our approach.

87 citations


Journal ArticleDOI
TL;DR: This paper developed a novel deep adversarial network, named Building-A-Nets, that jointly trains a deep convolutional neural network (generator) and an adversarial discriminator network for the robust segmentation of building rooftops in remote sensing images.
Abstract: With the proliferation of high-resolution remote sensing sensor and platforms, vast amounts of aerial image data are becoming easily accessed. High-resolution aerial images provide sufficient structural and texture information for image recognition while also raise new challenges for existing segmentation methods. In recent years, deep neural networks have gained much attention in remote sensing field and achieved remarkable performance for high-resolution remote sensing images segmentation. However, there still exists spatial inconsistency problems caused by independently pixelwise classification while ignoring high-order regularities. In this paper, we developed a novel deep adversarial network, named Building-A-Nets, that jointly trains a deep convolutional neural network (generator) and an adversarial discriminator network for the robust segmentation of building rooftops in remote sensing images. More specifically, the generator produces pixelwise image classification map using a fully convolutional DenseNet model, whereas the discriminator tends to enforce forms of high-order structural features learned from ground-truth label map. The generator and discriminator compete with each other in an adversarial learning process until the equivalence point is reached to produce the optimal segmentation map of building objects. Meanwhile, a soft weight coefficient is adopted to balance the operation of the pixelwise classification and high-order structural feature learning. Experimental results show that our Building-A-Net can successfully detect and rectify spatial inconsistency on aerial images while archiving superior performances compared to other state-of-the-art building extraction methods. Code is available at https://github.com/lixiang-ucas/Building-A-Nets .

82 citations


Journal ArticleDOI
TL;DR: A CNN framework using residual connections and dilated convolutions is used considering both manned and unmanned aerial image samples to perform the satellite image classification of building damages.
Abstract: . The localization and detailed assessment of damaged buildings after a disastrous event is of utmost importance to guide response operations, recovery tasks or for insurance purposes. Several remote sensing platforms and sensors are currently used for the manual detection of building damages. However, there is an overall interest in the use of automated methods to perform this task, regardless of the used platform. Owing to its synoptic coverage and predictable availability, satellite imagery is currently used as input for the identification of building damages by the International Charter, as well as the Copernicus Emergency Management Service for the production of damage grading and reference maps. Recently proposed methods to perform image classification of building damages rely on convolutional neural networks (CNN). These are usually trained with only satellite image samples in a binary classification problem, however the number of samples derived from these images is often limited, affecting the quality of the classification results. The use of up/down-sampling image samples during the training of a CNN, has demonstrated to improve several image recognition tasks in remote sensing. However, it is currently unclear if this multi resolution information can also be captured from images with different spatial resolutions like satellite and airborne imagery (from both manned and unmanned platforms). In this paper, a CNN framework using residual connections and dilated convolutions is used considering both manned and unmanned aerial image samples to perform the satellite image classification of building damages. Three network configurations, trained with multi-resolution image samples are compared against two benchmark networks where only satellite image samples are used. Combining feature maps generated from airborne and satellite image samples, and refining these using only the satellite image samples, improved nearly 4 % the overall satellite image classification of building damages.

Book ChapterDOI
08 Sep 2018
TL;DR: This paper presents a weakly-supervised approach to object instance segmentation, which exceeds the performance of existing weakly supervised methods, without requiring hand-tuned segment proposals, and reaches 90% of supervised performance.
Abstract: This paper presents a weakly-supervised approach to object instance segmentation. Starting with known or predicted object bounding boxes, we learn object masks by playing a game of cut-and-paste in an adversarial learning setup. A mask generator takes a detection box and Faster R-CNN features, and constructs a segmentation mask that is used to cut-and-paste the object into a new image location. The discriminator tries to distinguish between real objects, and those cut and pasted via the generator, giving a learning signal that leads to improved object masks. We verify our method experimentally using Cityscapes, COCO, and aerial image datasets, learning to segment objects without ever having seen a mask in training. Our method exceeds the performance of existing weakly supervised methods, without requiring hand-tuned segment proposals, and reaches \(90\%\) of supervised performance.

Journal ArticleDOI
TL;DR: A multilevel fusion method, which can make judgment by incorporating different levels’ information, is proposed, which gets a significant classification accuracy improvement over all state-of-the-art references.
Abstract: One of the challenging problems in understanding high-resolution remote sensing images is aerial scene classification. A well-designed feature extractor and classifier can improve classification accuracy. In this letter, we construct three different convolutional neural networks with different sizes of receptive field, respectively. More importantly, we further propose a multilevel fusion method, which can make judgment by incorporating different levels’ information. The aerial image and two patches extracted from the image are fed to these three different networks, and then, a probability fusion model is established for final classification. The effectiveness of the proposed method is tested on a more challenging data set-AID that has 10000 high-resolution remote sensing images with 30 categories. Experimental results show that our multilevel fusion model gets a significant classification accuracy improvement over all state-of-the-art references.

Journal ArticleDOI
TL;DR: This work proposes two effective architectures based on the idea of feature-level fusion for aerial scene classification that employ the saliency coded two-stream architecture and fuses them using a novel deep feature fusion model.
Abstract: Aerial scene classification is an active and challenging problem in high-resolution remote sensing imagery understanding. Deep learning models, especially convolutional neural networks (CNNs), have achieved prominent performance in this field. The extraction of deep features from the layers of a CNN model is widely used in these CNN-based methods. Although the CNN-based approaches have obtained great success, there is still plenty of room to further increase the classification accuracy. As a matter of fact, the fusion with other features has great potential for leading to the better performance of aerial scene classification. Therefore, we propose two effective architectures based on the idea of feature-level fusion. The first architecture, i.e., texture coded two-stream deep architecture, uses the raw RGB network stream and the mapped local binary patterns (LBP) coded network stream to extract two different sets of features and fuses them using a novel deep feature fusion model. In the second architecture, i.e., saliency coded two-stream deep architecture, we employ the saliency coded network stream as the second stream and fuse it with the raw RGB network stream using the same feature fusion model. For sake of validation and comparison, our proposed architectures are evaluated via comprehensive experiments with three publicly available remote sensing scene datasets. The classification accuracies of saliency coded two-stream architecture with our feature fusion model achieve 97.79%, 98.90%, 94.09%, 95.99%, 85.02%, and 87.01% on the UC-Merced dataset (50% and 80% training samples), the Aerial Image Dataset (AID) (20% and 50% training samples), and the NWPU-RESISC45 dataset (10% and 20% training samples), respectively, overwhelming state-of-the-art methods.

Journal ArticleDOI
TL;DR: A vehicle detection method for aerial image based on YOLO deep learning algorithm is presented that has a good performance on unknown aerial images, especially for small objects, rotating objects, as well as compact and dense objects, while meeting the real-time requirements.
Abstract: With the application of UAVs in intelligent transportation systems, vehicle detection for aerial images has become a key engineering technology and has academic research significance. In this paper, a vehicle detection method for aerial image based on YOLO deep learning algorithm is presented. The method integrates an aerial image dataset suitable for YOLO training by pro-cessing three public aerial image datasets. Experiments show that the training model has a good performance on unknown aerial images, especially for small objects, rotating objects, as well as compact and dense objects, while meeting the real-time requirements.

Journal ArticleDOI
TL;DR: State-of-the-art results for four benchmark data sets using a variety of deep convolutional neural networks (DCNN) and multiple network fusion techniques are presented and the relative reduction in classification errors achieved is 25%–45% compared with the single best DCNN results.
Abstract: Accurate land cover classification and detection of objects in high-resolution electro-optical remote sensing imagery (RSI) have long been a challenging task. Recently, important new benchmark data sets have been released which are suitable for land cover classification and object detection research. Here, we present state-of-the-art results for four benchmark data sets using a variety of deep convolutional neural networks (DCNN) and multiple network fusion techniques. We achieve 99.70%, 99.66%, 97.74%, and 97.30% classification accuracies on the PatternNet, RSI-CB256, aerial image, and RESISC-45 data sets, respectively, using the Choquet integral with a novel data-driven optimization method presented in this letter. The relative reduction in classification errors achieved by this data driven optimization is 25%–45% compared with the single best DCNN results.

Journal ArticleDOI
TL;DR: This study proposes a boundary regulated network called BR-Net, which utilizes both local and global information, to perform roof segmentation and outline extraction and significantly outperforms the classic FCN8s model.
Abstract: The automatic extraction of building outlines from aerial imagery for the purposes of navigation and urban planning is a long-standing problem in the field of remote sensing. Currently, most methods utilize variants of fully convolutional networks (FCNs), which have significantly improved model performance for this task. However, pursuing more accurate segmentation results is still critical for additional applications, such as automatic mapping and building change detection. In this study, we propose a boundary regulated network called BR-Net, which utilizes both local and global information, to perform roof segmentation and outline extraction. The BR-Net method consists of a shared backend utilizing a modified U-Net and a multitask framework to generate predictions for segmentation maps and building outlines based on a consistent feature representation from the shared backend. Because of the restriction and regulation of additional boundary information, the proposed model can achieve superior performance compared to existing methods. Experiments on an aerial image dataset covering 32 km2 and containing more than 58,000 buildings indicate that our method performs well at both roof segmentation and outline extraction. The proposed BR-Net method significantly outperforms the classic FCN8s model. Compared to the state-of-the-art U-Net model, our BR-Net achieves 6.2% (0.869 vs. 0.818), 10.6% (0.772 vs. 0.698), and 8.7% (0.840 vs. 0.773) improvements in F1 score, Jaccard index, and kappa coefficient, respectively.

Posted Content
TL;DR: A pipeline to tackle the problem of automated objects labeling in aerial imagery using a stack of convolutional neural networks (U-Net architecture) arranged end-to-end, which outperforms current state-of-the-art on two different datasets.
Abstract: Automation of objects labeling in aerial imagery is a computer vision task with numerous practical applications. Fields like energy exploration require an automated method to process a continuous stream of imagery on a daily basis. In this paper we propose a pipeline to tackle this problem using a stack of convolutional neural networks (U-Net architecture) arranged end-to-end. Each network works as post-processor to the previous one. Our model outperforms current state-of-the-art on two different datasets: Inria Aerial Image Labeling dataset and Massachusetts Buildings dataset each with different characteristics such as spatial resolution, object shapes and scales. Moreover, we experimentally validate computation time savings by processing sub-sampled images and later upsampling pixelwise labeling. These savings come at a negligible degradation in segmentation quality. Though the conducted experiments in this paper cover only aerial imagery, the technique presented is general and can handle other types of images.

Posted Content
TL;DR: This paper addresses the broken insulators location problem as a low signal-noise-ratio image location framework with two modules: 1) object detection based on Fast R-CNN, and 2) classification of pixels based on U-net.
Abstract: The location of broken insulators in aerial images is a challenging task. This paper, focusing on the self-blast glass insulator, proposes a deep learning solution. We address the broken insulators location problem as a low signal-noise-ratio image location framework with two modules: 1) object detection based on Fast R-CNN, and 2) classification of pixels based on U-net. A diverse aerial image set of some grid in China is tested to validated the proposed approach. Furthermore, a comparison is made among different methods and the result shows that our approach is accurate and real-time.

Posted Content
17 Nov 2018
TL;DR: A novel multi-category rotation detector is proposed, which can efficiently detect small objects, arbitrary direction objects, and dense objects in complex remote sensing images.
Abstract: Object detection plays a vital role in natural scene and aerial scene and is full of challenges. Although many advanced algorithms have succeeded in the natural scene, the progress in the aerial scene has been slow due to the complexity of the aerial image and the large degree of freedom of remote sensing objects in scale, orientation, and density. In this paper, a novel multi-category rotation detector is proposed, which can efficiently detect small objects, arbitrary direction objects, and dense objects in complex remote sensing images. Specifically, the proposed model adopts a targeted feature fusion strategy called inception fusion network, which fully considers factors such as feature fusion, anchor sampling, and receptive field to improve the ability to handle small objects. Then we combine the pixel attention network and the channel attention network to weaken the noise information and highlight the objects feature. Finally, the rotational object detection algorithm is realized by redefining the rotating bounding box. Experiments on public datasets including DOTA, NWPU VHR-10 demonstrate that the proposed algorithm significantly outperforms state-of-the-art methods. The code and models will be available at https://github.com/DetectionTeamUCAS/R2CNN-Plus-Plus_Tensorflow.

Journal ArticleDOI
TL;DR: Experimental results show that the CNN-CELM model improves the generalization ability and reduces the training time compared to state-of-the-art methods.
Abstract: Classifying land-use scenes from high-resolution remote-sensing imagery with high quality and accuracy is of paramount interest for science and land management applications. In this article, we proposed a new model for land-use scene classification by integrating the recent success of convolutional neural network (CNN) and constrained extreme learning machine (CELM). In the model, the fully connected layers of a pretrained CNN have been removed. Then, CNN works as a deep and robust convolutional feature extractor. After normalization, deep convolutional features are fed to the CELM classifier. To analyse the performance, the proposed method has been evaluated on two challenging high-resolution data sets: (1) the aerial image data set consisting of 30 different aerial scene categories with sub-metre resolution and (2) a Sydney data set that is a large high spatial resolution satellite image. Experimental results show that the CNN-CELM model improves the generalization ability and reduces the traini...

Journal ArticleDOI
TL;DR: An aerial image super-resolution method by training convolutional neural networks with respect to wavelet analysis to enable image restoration subject to sophisticated culture variability and validate the effectiveness of the method for restoring complicated aerial images.
Abstract: We develop an aerial image super-resolution method by training convolutional neural networks (CNNs) with respect to wavelet analysis. To this end, we commence by performing wavelet decomposition to aerial images for multiscale representations. We then train multiple CNNs for approximating the wavelet multiscale representations, separately. The multiple CNNs thus trained characterize aerial images in multiple directions and multiscale frequency bands, and thus enable image restoration subject to sophisticated culture variability. For inference, the trained CNNs regress wavelet multiscale representations from a low-resolution aerial image, followed by wavelet synthesis that forms a restored high-resolution aerial image. Experimental results validate the effectiveness of our method for restoring complicated aerial images.

Posted Content
TL;DR: In this article, a mask generator takes a detection box and Faster R-CNN features, and constructs a segmentation mask that is used to cut-and-paste the object into a new image location.
Abstract: This paper presents a weakly-supervised approach to object instance segmentation. Starting with known or predicted object bounding boxes, we learn object masks by playing a game of cut-and-paste in an adversarial learning setup. A mask generator takes a detection box and Faster R-CNN features, and constructs a segmentation mask that is used to cut-and-paste the object into a new image location. The discriminator tries to distinguish between real objects, and those cut and pasted via the generator, giving a learning signal that leads to improved object masks. We verify our method experimentally using Cityscapes, COCO, and aerial image datasets, learning to segment objects without ever having seen a mask in training. Our method exceeds the performance of existing weakly supervised methods, without requiring hand-tuned segment proposals, and reaches 90% of supervised performance.

Journal ArticleDOI
TL;DR: An approach for learning to segment aerial building footprints in the absence of fully annotated label masks is presented, based on an adversarial architecture that jointly trains two networks to produce building footprint segmentations that resemble synthetic label masks.
Abstract: Aerial image segmentation usually requires a large amount of pixel-level masks in order to achieve quality performance. Obtaining these annotations can be both costly and time-consuming, limiting the amount of data available for training. In this paper, we present an approach for learning to segment aerial building footprints in the absence of fully annotated label masks. Instead, we exploit cheap and efficient scribble annotations to supervise deep convolutional neural networks for segmentation. Our proposed model is based on an adversarial architecture that jointly trains two networks to produce building footprint segmentations that resemble synthetic label masks. We present competitive segmentation results on the Massachusetts Buildings data set by using only scribble supervision signals. Further experiments show that our method effectively alleviates building instance separation issue and displays strong robustness towards different scribble instance levels. We believe our cost-effective approach has the potential to be adapted for other aerial image interpretation tasks.

Journal ArticleDOI
TL;DR: This research aims to improve the extraction of road centerlines using both very-high-resolution aerial images and light detection and ranging (LiDAR) by accounting for road connectivity by applying the fractal net evolution approach to segment remote sensing images into image objects and then classifies image objects using the machine learning classifier, random forest.
Abstract: The road networks provide key information for a broad range of applications such as urban planning, urban management, and navigation. The fast-developing technology of remote sensing that acquires high-resolution observational data of the land surface offers opportunities for automatic extraction of road networks. However, the road networks extracted from remote sensing images are likely affected by shadows and trees, making the road map irregular and inaccurate. This research aims to improve the extraction of road centerlines using both very-high-resolution (VHR) aerial images and light detection and ranging (LiDAR) by accounting for road connectivity. The proposed method first applies the fractal net evolution approach (FNEA) to segment remote sensing images into image objects and then classifies image objects using the machine learning classifier, random forest. A post-processing approach based on the minimum area bounding rectangle (MABR) is proposed and a structure feature index is adopted to obtain the complete road networks. Finally, a multistep approach, that is, morphology thinning, Harris corner detection, and least square fitting (MHL) approach, is designed to accurately extract the road centerlines from the complex road networks. The proposed method is applied to three datasets, including the New York dataset obtained from the object identification dataset, the Vaihingen dataset obtained from the International Society for Photogrammetry and Remote Sensing (ISPRS) 2D semantic labelling benchmark and Guangzhou dataset. Compared with two state-of-the-art methods, the proposed method can obtain the highest completeness, correctness, and quality for the three datasets. The experiment results show that the proposed method is an efficient solution for extracting road centerlines in complex scenes from VHR aerial images and light detection and ranging (LiDAR) data.

Journal ArticleDOI
01 Dec 2018
TL;DR: The capability of Convolutional Neural Networks is investigated for building detection as well as recognition of roof shapes using a single image with effectiveness with approximately 97% and 92% of quality rates in detection and recognition steps, respectively.
Abstract: Automatic detection and reconstruction of buildings have become essential in many remote sensing and computer vision applications. In this paper, the capability of Convolutional Neural Networks (CNNs) is investigated for building detection as well as recognition of roof shapes using a single image. The major steps are including training dataset generation, model training, image segmentation, building detection and roof shape recognition. First, a CNN is trained for extracting urban objects such as trees, roads and buildings. Next, classification of different roof types into flat, gable and hip shapes is performed using the second trained CNN. The assessment results prove effectiveness of the proposed method with approximately 97% and 92% of quality rates in detection and recognition steps, respectively.

Journal ArticleDOI
TL;DR: This paper presents the FGI Aerial Image Reference System (FGI AIRS) and a novel method for optical and mathematical tilt correction of the irradiance measurements and recommends the implementation of the proposed tilt correction method in all future UAV irradiance sensors if they are not to be installed on a stabilizing gimbal.
Abstract: In unstable atmospheric conditions, using on-board irradiance sensors is one of the only robust methods to convert unmanned aerial vehicle (UAV)-based optical remote sensing data to reflectance factors. Normally, such sensors experience significant errors due to tilting of the UAV, if not installed on a stabilizing gimbal. Unfortunately, such gimbals of sufficient accuracy are heavy, cumbersome, and cannot be installed on all UAV platforms. In this paper, we present the FGI Aerial Image Reference System (FGI AIRS) developed at the Finnish Geospatial Research Institute (FGI) and a novel method for optical and mathematical tilt correction of the irradiance measurements. The FGI AIRS is a sensor unit for UAVs that provides the irradiance spectrum, Real Time Kinematic (RTK)/Post Processed Kinematic (PPK) GNSS position, and orientation for the attached cameras. The FGI AIRS processes the reference data in real time for each acquired image and can send it to an on-board or on-cloud processing unit. The novel correction method is based on three RGB photodiodes that are tilted 10° in opposite directions. These photodiodes sample the irradiance readings at different sensor tilts, from which reading of a virtual horizontal irradiance sensor is calculated. The FGI AIRS was tested, and the method was shown to allow on-board measurement of irradiance at an accuracy better than ±0.8% at UAV tilts up to 10° and ±1.2% at tilts up to 15°. In addition, the accuracy of FGI AIRS to produce reflectance-factor-calibrated aerial images was compared against the traditional methods. In the unstable weather conditions of the experiment, both the FGI AIRS and the on-ground spectrometer were able to produce radiometrically accurate and visually pleasing orthomosaics, while the reflectance reference panels and the on-board irradiance sensor without stabilization or tilt correction both failed to do so. The authors recommend the implementation of the proposed tilt correction method in all future UAV irradiance sensors if they are not to be installed on a gimbal.

Posted Content
TL;DR: A novel multi-task multi-stage neural network that is able to handle the two problems at the same time, in a single forward pass, and achieves commercial GPS-level localization accuracy from satellite images with spatial resolution of 1 square meter per pixel in a city-wide area of interest.
Abstract: Semantic segmentation and vision-based geolocalization in aerial images are challenging tasks in computer vision. Due to the advent of deep convolutional nets and the availability of relatively low cost UAVs, they are currently generating a growing attention in the field. We propose a novel multi-task multi-stage neural network that is able to handle the two problems at the same time, in a single forward pass. The first stage of our network predicts pixelwise class labels, while the second stage provides a precise location using two branches. One branch uses a regression network, while the other is used to predict a location map trained as a segmentation task. From a structural point of view, our architecture uses encoder-decoder modules at each stage, having the same encoder structure re-used. Furthermore, its size is limited to be tractable on an embedded GPU. We achieve commercial GPS-level localization accuracy from satellite images with spatial resolution of 1 square meter per pixel in a city-wide area of interest. On the task of semantic segmentation, we obtain state-of-the-art results on two challenging datasets, the Inria Aerial Image Labeling dataset and Massachusetts Buildings.

Patent
08 May 2018
TL;DR: In this paper, the authors proposed an aerial image insulator real-time detection method based on deep learning, which includes the steps of handing a task of extracting characteristics to a deep convolutional neural network, extracting the deep characteristic information which is more comprehensive and can preferably describe an insulator, and inputting the deep feature information into a detector to perform prediction inference to obtain a detection result.
Abstract: The invention relates to an aerial image insulator real-time detection method based on deep learning The aerial image insulator real-time detection method based on deep learning includes the steps: handing a task of extracting characteristics to a deep convolutional neural network, extracting the deep characteristic information which is more comprehensive and can preferably describe an insulator,and inputting the deep characteristic information into a detector to perform prediction inference to obtain a detection result For the aerial image insulator real-time detection method based on deeplearning, the whole process is an end-to-end quick detection channel; a target frame is obtained after the image is input; the efficiency of subsequent automatic fault diagnosis is improved; and theaerial image insulator real-time detection method based on deep learning is conductive to reducing the retrieval pressure and intensity when the line patrol staff retrieves the mass line patrol data at present And at the same time, the aerial image insulator real-time detection method based on deep learning also utilizes the idea of transfer learning to transfer the knowledge obtained from the past task to the current target task, so as to enable the trained model to have inheritability; whenever new data is added into an image library, the target model can continue to train new data on the basis of a source model, so as to quickly achieve the expected effect and enable the old version of model not to be of no use at all because of updating of data; and the detection model can become moreand more powerful following increase of data as time goes on

Proceedings ArticleDOI
25 Mar 2018
TL;DR: This work proposes a new resist modeling framework for contact layers that utilizes existing data from old technology nodes to reduce the amount of data required from a target lithography configuration, effective within a competitive range of accuracy.
Abstract: Lithography simulation is one of the key steps in physical verification, enabled by the substantial optical and resist models. A resist model bridges the aerial image simulation to printed patterns. While the effectiveness of learning-based solutions for resist modeling has been demonstrated, they are considerably data-demanding. Meanwhile, a set of manufactured data for a specific lithography configuration is only valid for the training of one single model, indicating low data efficiency. Due to the complexity of the manufacturing process, obtaining enough data for acceptable accuracy becomes very expensive in terms of both time and cost, especially during the evolution of technology generations when the design space is intensively explored. In this work, we propose a new resist modeling framework for contact layers that utilizes existing data from old technology nodes to reduce the amount of data required from a target lithography configuration. Our framework based on residual neural networks and transfer learning techniques is effective within a competitive range of accuracy, i.e., 2-10X reduction on the amount of training data with comparable accuracy to the state-of-the-art learning approach.