scispace - formally typeset
Search or ask a question
Author

Fan Zhang

Bio: Fan Zhang is an academic researcher from Wuhan University. The author has contributed to research in topics: Feature extraction & Feature (computer vision). The author has an hindex of 9, co-authored 12 publications receiving 1232 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: The proposed unsupervised-feature-learning-based scene classification method provides more accurate classification results than the other latent-Dirichlet-allocation-based methods and the sparse coding method.
Abstract: Due to the rapid technological development of various different satellite sensors, a huge volume of high-resolution image data sets can now be acquired. How to efficiently represent and recognize the scenes from such high-resolution image data has become a critical task. In this paper, we propose an unsupervised feature learning framework for scene classification. By using the saliency detection algorithm, we extract a representative set of patches from the salient regions in the image data set. These unlabeled data patches are exploited by an unsupervised feature learning method to learn a set of feature extractors which are robust and efficient and do not need elaborately designed descriptors such as the scale-invariant-feature-transform-based algorithm. We show that the statistics generated from the learned feature extractors can characterize a complex scene very well and can produce excellent classification accuracy. In order to reduce overfitting in the feature learning step, we further employ a recently developed regularization method called “dropout,” which has proved to be very effective in image classification. In the experiments, the proposed method was applied to two challenging high-resolution data sets: the UC Merced data set containing 21 different aerial scene categories with a submeter resolution and the Sydney data set containing seven land-use categories with a 60-cm spatial resolution. The proposed method obtained results that were equal to or even better than the previous best results with the UC Merced data set, and it also obtained the highest accuracy with the Sydney data set, demonstrating that the proposed unsupervised-feature-learning-based scene classification method provides more accurate classification results than the other latent-Dirichlet-allocation-based methods and the sparse coding method.

477 citations

Journal ArticleDOI
TL;DR: A gradient boosting random convolutional network (GBRCN) framework for scene classification, which can effectively combine many deep neural networks and can provide more accurate classification results than the state-of-the-art methods.
Abstract: Due to the recent advances in satellite sensors, a large amount of high-resolution remote sensing images is now being obtained each day. How to automatically recognize and analyze scenes from these satellite images effectively and efficiently has become a big challenge in the remote sensing field. Recently, a lot of work in scene classification has been proposed, focusing on deep neural networks, which learn hierarchical internal feature representations from image data sets and produce state-of-the-art performance. However, most methods, including the traditional shallow methods and deep neural networks, only concentrate on training a single model. Meanwhile, neural network ensembles have proved to be a powerful and practical tool for a number of different predictive tasks. Can we find a way to combine different deep neural networks effectively and efficiently for scene classification? In this paper, we propose a gradient boosting random convolutional network (GBRCN) framework for scene classification, which can effectively combine many deep neural networks. As far as we know, this is the first time that a deep ensemble framework has been proposed for scene classification. Moreover, in the experiments, the proposed method was applied to two challenging high-resolution data sets: 1) the UC Merced data set containing 21 different aerial scene categories with a submeter resolution and 2) a Sydney data set containing eight land-use categories with a 1.0-m spatial resolution. The proposed GBRCN framework outperformed the state-of-the-art methods with the UC Merced data set, including the traditional single convolutional network approach. For the Sydney data set, the proposed method again obtained the best accuracy, demonstrating that the proposed framework can provide more accurate classification results than the state-of-the-art methods.

384 citations

Journal ArticleDOI
TL;DR: A band grouping-based long short-term memory model and a multiscale convolutional neural network are proposed as the spectral and spatial feature extractors, respectively, for the hyperspectral image (HSI) classification.
Abstract: In this paper, we propose a spectral–spatial unified network (SSUN) with an end-to-end architecture for the hyperspectral image (HSI) classification. Different from traditional spectral–spatial classification frameworks where the spectral feature extraction (FE), spatial FE, and classifier training are separated, these processes are integrated into a unified network in our model. In this way, both FE and classifier training will share a uniform objective function and all the parameters in the network can be optimized at the same time. In the implementation of the SSUN, we propose a band grouping-based long short-term memory model and a multiscale convolutional neural network as the spectral and spatial feature extractors, respectively. In the experiments, three benchmark HSIs are utilized to evaluate the performance of the proposed method. The experimental results demonstrate that the SSUN can yield a competitive performance compared with existing methods.

259 citations

Journal ArticleDOI
TL;DR: A coupled CNN method, which combines a candidate region proposal network and a localization network to extract the proposals and simultaneously locate the aircraft, which is more efficient and accurate, even in large-scale VHR images.
Abstract: Aircraft detection from very high resolution (VHR) remote sensing images has been drawing increasing interest in recent years due to the successful civil and military applications. However, several challenges still exist: 1) extracting the high-level features and the hierarchical feature representations of the objects is difficult; 2) manual annotation of the objects in large image sets is generally expensive and sometimes unreliable; and 3) locating objects within such a large image is difficult and time consuming. In this paper, we propose a weakly supervised learning framework based on coupled convolutional neural networks (CNNs) for aircraft detection, which can simultaneously solve these problems. We first develop a CNN-based method to extract the high-level features and the hierarchical feature representations of the objects. We then employ an iterative weakly supervised learning framework to automatically mine and augment the training data set from the original image. We propose a coupled CNN method, which combines a candidate region proposal network and a localization network to extract the proposals and simultaneously locate the aircraft, which is more efficient and accurate, even in large-scale VHR images. In the experiments, the proposed method was applied to three challenging high-resolution data sets: the Sydney International Airport data set, the Tokyo Haneda Airport data set, and the Berlin Tegel Airport data set. The extensive experimental results confirm that the proposed method can achieve a higher detection accuracy than the other methods.

229 citations

Journal ArticleDOI
TL;DR: This study proposes an efficient deep learning based method, namely, Random Patches Network (RPNet) for HSI classification, which directly regards the random patches taken from the image as the convolution kernels without any training.
Abstract: Due to the remarkable achievements obtained by deep learning methods in the fields of computer vision, an increasing number of researches have been made to apply these powerful tools into hyperspectral image (HSI) classification. So far, most of these methods utilize a pre-training stage followed by a fine-tuning stage to extract deep features, which is not only tremendously time-consuming but also depends largely on a great deal of training data. In this study, we propose an efficient deep learning based method, namely, Random Patches Network (RPNet) for HSI classification, which directly regards the random patches taken from the image as the convolution kernels without any training. By combining both shallow and deep convolutional features, RPNet has the advantage of multi-scale, which possesses a better adaption for HSI classification, where different objects tend to have different scales. In the experiments, the proposed method and its two variants RandomNet and RPNet–single are tested on three benchmark hyperspectral data sets. The experimental results demonstrate the RPNet can yield a competitive performance compared with existing methods.

160 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The challenges of using deep learning for remote-sensing data analysis are analyzed, recent advances are reviewed, and resources are provided that hope will make deep learning in remote sensing seem ridiculously simple.
Abstract: Central to the looming paradigm shift toward data-intensive science, machine-learning techniques are becoming increasingly important. In particular, deep learning has proven to be both a major breakthrough and an extremely powerful tool in many fields. Shall we embrace deep learning as the key to everything? Or should we resist a black-box solution? These are controversial issues within the remote-sensing community. In this article, we analyze the challenges of using deep learning for remote-sensing data analysis, review recent advances, and provide resources we hope will make deep learning in remote sensing seem ridiculously simple. More importantly, we encourage remote-sensing scientists to bring their expertise into deep learning and use it as an implicit general model to tackle unprecedented, large-scale, influential challenges, such as climate change and urbanization.

2,095 citations

Journal ArticleDOI
TL;DR: A general framework of DL for RS data is provided, and the state-of-the-art DL methods in RS are regarded as special cases of input-output data combined with various deep networks and tuning tricks.
Abstract: Deep-learning (DL) algorithms, which learn the representative and discriminative features in a hierarchical manner from the data, have recently become a hotspot in the machine-learning area and have been introduced into the geoscience and remote sensing (RS) community for RS big data analysis. Considering the low-level features (e.g., spectral and texture) as the bottom level, the output feature representation from the top level of the network can be directly fed into a subsequent classifier for pixel-based classification. As a matter of fact, by carefully addressing the practical demands in RS applications and designing the input?output levels of the whole network, we have found that DL is actually everywhere in RS data analysis: from the traditional topics of image preprocessing, pixel-based classification, and target recognition, to the recent challenging tasks of high-level semantic feature extraction and RS scene understanding.

1,625 citations

Proceedings ArticleDOI
01 Jun 2018
TL;DR: The Dataset for Object Detection in Aerial Images (DOTA) as discussed by the authors is a large-scale dataset of aerial images collected from different sensors and platforms and contains objects exhibiting a wide variety of scales, orientations, and shapes.
Abstract: Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect 2806 aerial images from different sensors and platforms. Each image is of the size about 4000 A— 4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using 15 common object categories. The fully annotated DOTA images contains 188, 282 instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral. To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.

1,502 citations

Journal ArticleDOI
TL;DR: A large-scale data set, termed “NWPU-RESISC45,” is proposed, which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU).
Abstract: Remote sensing image scene classification plays an important role in a wide range of applications and hence has been receiving remarkable attention. During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images. However, a systematic review of the literature concerning datasets and methods for scene classification is still lacking. In addition, almost all existing datasets have a number of limitations, including the small scale of scene classes and the image numbers, the lack of image variations and diversity, and the saturation of accuracy. These limitations severely limit the development of new approaches especially deep learning-based methods. This paper first provides a comprehensive review of the recent progress. Then, we propose a large-scale dataset, termed "NWPU-RESISC45", which is a publicly available benchmark for REmote Sensing Image Scene Classification (RESISC), created by Northwestern Polytechnical University (NWPU). This dataset contains 31,500 images, covering 45 scene classes with 700 images in each class. The proposed NWPU-RESISC45 (i) is large-scale on the scene classes and the total image number, (ii) holds big variations in translation, spatial resolution, viewpoint, object pose, illumination, background, and occlusion, and (iii) has high within-class diversity and between-class similarity. The creation of this dataset will enable the community to develop and evaluate various data-driven algorithms. Finally, several representative methods are evaluated using the proposed dataset and the results are reported as a useful baseline for future research.

1,424 citations

Journal ArticleDOI
TL;DR: A multilevel DL architecture that targets land cover and crop type classification from multitemporal multisource satellite imagery outperforms the one with MLPs allowing us to better discriminate certain summer crop types.
Abstract: Deep learning (DL) is a powerful state-of-the-art technique for image processing including remote sensing (RS) images. This letter describes a multilevel DL architecture that targets land cover and crop type classification from multitemporal multisource satellite imagery. The pillars of the architecture are unsupervised neural network (NN) that is used for optical imagery segmentation and missing data restoration due to clouds and shadows, and an ensemble of supervised NNs. As basic supervised NN architecture, we use a traditional fully connected multilayer perceptron (MLP) and the most commonly used approach in RS community random forest, and compare them with convolutional NNs (CNNs). Experiments are carried out for the joint experiment of crop assessment and monitoring test site in Ukraine for classification of crops in a heterogeneous environment using nineteen multitemporal scenes acquired by Landsat-8 and Sentinel-1A RS satellites. The architecture with an ensemble of CNNs outperforms the one with MLPs allowing us to better discriminate certain summer crop types, in particular maize and soybeans, and yielding the target accuracies more than 85% for all major crops (wheat, maize, sunflower, soybeans, and sugar beet).

1,155 citations