scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Incorporating DeepLabv3+ and object-based image analysis for semantic segmentation of very high resolution remote sensing images

04 Mar 2021-International Journal of Digital Earth (Taylor & Francis)-Vol. 14, Iss: 3, pp 357-378
TL;DR: This study proposes a semantic segmentation method for VHR images by incorporating deep learning semantic segmentsation model (DeepLabv3+) and object-based image analysis (OBIA), wherein DSM is employed to provide geometric information to enhance the interpretation of V HR images.
Abstract: Semantic segmentation of remote sensing images is an important but unsolved problem in the remote sensing society. Advanced image semantic segmentation models, such as DeepLabv3+, have achieved ast...
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, a review and meta-analysis of deep learning-based semantic segmentation in urban remote sensing images is presented. But, the focus of this paper is on urban sensing images.
Abstract: Availability of very high-resolution remote sensing images and advancement of deep learning methods have shifted the paradigm of image classification from pixel-based and object-based methods to deep learning-based semantic segmentation. This shift demands a structured analysis and revision of the current status on the research domain of deep learning-based semantic segmentation. The focus of this paper is on urban remote sensing images. We review and perform a meta-analysis to juxtapose recent papers in terms of research problems, data source, data preparation methods including pre-processing and augmentation techniques, training details on architectures, backbones, frameworks, optimizers, loss functions and other hyper-parameters and performance comparison. Our detailed review and meta-analysis show that deep learning not only outperforms traditional methods in terms of accuracy, but also addresses several challenges previously faced. Further, we provide future directions of research in this domain.

56 citations

Journal ArticleDOI
TL;DR: In this article , an improved deep learning model named RAANet (Residual ASPP with Attention Net) is constructed, which constructed a new residual ASPP by embedding the attention module and residual structure into the ASPP.
Abstract: Classification of land use and land cover from remote sensing images has been widely used in natural resources and urban information management. The variability and complex background of land use in high-resolution imagery poses greater challenges for remote sensing semantic segmentation. To obtain multi-scale semantic information and improve the classification accuracy of land-use types in remote sensing images, the deep learning models have been wildly focused on. Inspired by the idea of the atrous-spatial pyramid pooling (ASPP) framework, an improved deep learning model named RAANet (Residual ASPP with Attention Net) is constructed in this paper, which constructed a new residual ASPP by embedding the attention module and residual structure into the ASPP. There are 5 dilated attention convolution units and a residual unit in its encoder. The former is used to obtain important semantic information at more scales, and residual units are used to reduce the complexity of the network to prevent the disappearance of gradients. In practical applications, according to the characteristics of the data set, the attention unit can select different attention modules such as the convolutional block attention model (CBAM). The experimental results obtained from the land-cover domain adaptive semantic segmentation (LoveDA) and ISPRS Vaihingen datasets showed that this model can enhance the classification accuracy of semantic segmentation compared to the current deep learning models.

31 citations

Journal ArticleDOI
TL;DR: In this article , a geospatial artificial intelligence framework is presented to obtain data for rooftops using high-resolution open-access remote sensing imagery, which can be used for data support and decision-making to facilitate sustainable urban development effectively.
Abstract: Reliable information on building rooftops is crucial for utilizing limited urban space effectively. In recent decades, the demand for accurate and up-to-date data on the areas of rooftops on a large-scale is increasing. However, obtaining these data is challenging due to the limited capability of conventional computer vision methods and the high cost of 3D modeling involving aerial photogrammetry. In this study, a geospatial artificial intelligence framework is presented to obtain data for rooftops using high-resolution open-access remote sensing imagery. This framework is used to generate vectorized data for rooftops in 90 cities in China. The data was validated on test samples of 180 km2 across different regions with spatial resolution, overall accuracy, and F1 score of 1 m, 97.95%, and 83.11%, respectively. In addition, the generated rooftop area conforms to the urban morphological characteristics and reflects urbanization level. These results demonstrate that the generated dataset can be used for data support and decision-making that can facilitate sustainable urban development effectively.

28 citations

Journal ArticleDOI
TL;DR: In this article, a new classifier based on Dempster-Shafer (DS) theory and a convolutional neural network (CNN) architecture for set-valued classification is proposed.

27 citations

Journal ArticleDOI
TL;DR: In this article, the authors provide a thorough review of recent achievements in the field of land-use mapping using deep learning (DL) algorithms, which offer novel opportunities for the development of LUM for HSR-RSIs.
Abstract: Land-use mapping (LUM) using high-spatial resolution remote sensing images (HSR-RSIs) is a challenging and crucial technology. However, due to the characteristics of HSR-RSIs, such as different image acquisition conditions and massive, detailed information, and performing LUM faces unique scientific challenges. With the emergence of new deep learning (DL) algorithms in recent years, methods to LUM with DL have achieved huge breakthroughs, which offer novel opportunities for the development of LUM for HSR-RSIs. This article aims to provide a thorough review of recent achievements in this field. Existing high spatial resolution datasets in the research of semantic segmentation and single-object segmentation are presented first. Next, we introduce several basic DL approaches that are frequently adopted for LUM. After reviewing DL-based LUM methods comprehensively, which highlights the contributions of researchers in the field of LUM for HSR-RSIs, we summarize these DL-based approaches based on two LUM criteria. Individually, the first one has supervised learning, semisupervised learning, or unsupervised learning, while another one is pixel-based or object-based. We then briefly review the fundamentals and the developments of the development of semantic segmentation and single-object segmentation. At last, quantitative results that experiment on the dataset of ISPRS Vaihingen and ISPRS Potsdam are given for several representative models such as fully convolutional network (FCN) and U-Net, following up with a comparison and discussion of the results.

20 citations

References
More filters
Book ChapterDOI
05 Oct 2015
TL;DR: Neber et al. as discussed by the authors proposed a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently, which can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks.
Abstract: There is large consent that successful training of deep networks requires many thousand annotated training samples. In this paper, we present a network and training strategy that relies on the strong use of data augmentation to use the available annotated samples more efficiently. The architecture consists of a contracting path to capture context and a symmetric expanding path that enables precise localization. We show that such a network can be trained end-to-end from very few images and outperforms the prior best method (a sliding-window convolutional network) on the ISBI challenge for segmentation of neuronal structures in electron microscopic stacks. Using the same network trained on transmitted light microscopy images (phase contrast and DIC) we won the ISBI cell tracking challenge 2015 in these categories by a large margin. Moreover, the network is fast. Segmentation of a 512x512 image takes less than a second on a recent GPU. The full implementation (based on Caffe) and the trained networks are available at http://lmb.informatik.uni-freiburg.de/people/ronneber/u-net .

49,590 citations

Journal ArticleDOI
28 May 2015-Nature
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Abstract: Deep learning allows computational models that are composed of multiple processing layers to learn representations of data with multiple levels of abstraction. These methods have dramatically improved the state-of-the-art in speech recognition, visual object recognition, object detection and many other domains such as drug discovery and genomics. Deep learning discovers intricate structure in large data sets by using the backpropagation algorithm to indicate how a machine should change its internal parameters that are used to compute the representation in each layer from the representation in the previous layer. Deep convolutional nets have brought about breakthroughs in processing images, video, speech and audio, whereas recurrent nets have shone light on sequential data such as text and speech.

46,982 citations

Book
18 Nov 2016
TL;DR: Deep learning as mentioned in this paper is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts, and it is used in many applications such as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames.
Abstract: Deep learning is a form of machine learning that enables computers to learn from experience and understand the world in terms of a hierarchy of concepts. Because the computer gathers knowledge from experience, there is no need for a human computer operator to formally specify all the knowledge that the computer needs. The hierarchy of concepts allows the computer to learn complicated concepts by building them out of simpler ones; a graph of these hierarchies would be many layers deep. This book introduces a broad range of topics in deep learning. The text offers mathematical and conceptual background, covering relevant concepts in linear algebra, probability theory and information theory, numerical computation, and machine learning. It describes deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology; and it surveys such applications as natural language processing, speech recognition, computer vision, online recommendation systems, bioinformatics, and videogames. Finally, the book offers research perspectives, covering such theoretical topics as linear factor models, autoencoders, representation learning, structured probabilistic models, Monte Carlo methods, the partition function, approximate inference, and deep generative models. Deep Learning can be used by undergraduate or graduate students planning careers in either industry or research, and by software engineers who want to begin using deep learning in their products or platforms. A website offers supplementary material for both readers and instructors.

38,208 citations

Proceedings ArticleDOI
07 Jun 2015
TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Abstract: Convolutional networks are powerful visual models that yield hierarchies of features. We show that convolutional networks by themselves, trained end-to-end, pixels-to-pixels, exceed the state-of-the-art in semantic segmentation. Our key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning. We define and detail the space of fully convolutional networks, explain their application to spatially dense prediction tasks, and draw connections to prior models. We adapt contemporary classification networks (AlexNet [20], the VGG net [31], and GoogLeNet [32]) into fully convolutional networks and transfer their learned representations by fine-tuning [3] to the segmentation task. We then define a skip architecture that combines semantic information from a deep, coarse layer with appearance information from a shallow, fine layer to produce accurate and detailed segmentations. Our fully convolutional network achieves state-of-the-art segmentation of PASCAL VOC (20% relative improvement to 62.2% mean IU on 2012), NYUDv2, and SIFT Flow, while inference takes less than one fifth of a second for a typical image.

28,225 citations

Proceedings ArticleDOI
21 Jul 2017
TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.
Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

27,821 citations