scispace - formally typeset
Search or ask a question
Journal ArticleDOI

More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification

TL;DR: In this article, a general multimodal deep learning (MDL) framework is proposed for geoscience and remote sensing (RS) applications, which is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with CNNs.
Abstract: Classification and identification of the materials lying over or beneath the earth’s surface have long been a fundamental but challenging research topic in geoscience and remote sensing (RS), and have garnered a growing concern owing to the recent advancements of deep learning techniques. Although deep networks have been successfully applied in single-modality-dominated classification tasks, yet their performance inevitably meets the bottleneck in complex scenes that need to be finely classified, due to the limitation of information diversity. In this work, we provide a baseline solution to the aforementioned difficulty by developing a general multimodal deep learning (MDL) framework. In particular, we also investigate a special case of multi-modality learning (MML)—cross-modality learning (CML) that exists widely in RS image classification applications. By focusing on “what,” “where,” and “how” to fuse, we show different fusion strategies as well as how to train deep networks and build the network architecture. Specifically, five fusion architectures are introduced and developed, further being unified in our MDL framework. More significantly, our framework is not only limited to pixel-wise classification tasks but also applicable to spatial information modeling with convolutional neural networks (CNNs). To validate the effectiveness and superiority of the MDL framework, extensive experiments related to the settings of MML and CML are conducted on two different multimodal RS data sets. Furthermore, the codes and data sets will be available at https://github.com/danfenghong/IEEE_TGRS_MDL-RS , contributing to the RS community.
Citations
More filters
Journal ArticleDOI
TL;DR: This article presents an attention-based adaptive spectral-spatial kernel improved residual network (A²S²K-ResNet) with spectral attention to capture discriminative spectral- Spatial features for HSI classification in an end-to-end training fashion.
Abstract: Hyperspectral images (HSIs) provide rich spectral–spatial information with stacked hundreds of contiguous narrowbands. Due to the existence of noise and band correlation, the selection of informative spectral–spatial kernel features poses a challenge. This is often addressed by using convolutional neural networks (CNNs) with receptive field (RF) having fixed sizes. However, these solutions cannot enable neurons to effectively adjust RF sizes and cross-channel dependencies when forward and backward propagations are used to optimize the network. In this article, we present an attention-based adaptive spectral–spatial kernel improved residual network ( A2S2K-ResNet ) with spectral attention to capture discriminative spectral–spatial features for HSI classification in an end-to-end training fashion. In particular, the proposed network learns selective 3-D convolutional kernels to jointly extract spectral–spatial features using improved 3-D ResBlocks and adopts an efficient feature recalibration (EFR) mechanism to boost the classification performance. Extensive experiments are performed on three well-known hyperspectral data sets, i.e., IP, KSC, and UP, and the proposed A2S2K-ResNet can provide better classification results in terms of overall accuracy (OA), average accuracy (AA), and Kappa compared with the existing methods investigated. The source code will be made available at https://github.com/suvojit- $0\times 55$ aa/A2S2K-ResNet.

185 citations

Journal ArticleDOI
TL;DR: In this paper, the authors proposed the concepts of connected vehicles that exploit vehicular ad hoc network (VANET) communication, embedded system integrated with sensors which acquire the static and dynamic parameter of the electrical vehicle, and cloud integration and dig data analytics tools.
Abstract: Testing and implementation of integrated and intelligent transport systems (IITS) of an electrical vehicle need many high-performance and high-precision subsystems. The existing systems confine themselves with limited features and have driving range anxiety, charging and discharging time issues, and inter- and intravehicle communication problems. The above issues are the critical barriers to the penetration of EVs with a smart grid. This paper proposes the concepts which consist of connected vehicles that exploit vehicular ad hoc network (VANET) communication, embedded system integrated with sensors which acquire the static and dynamic parameter of the electrical vehicle, and cloud integration and dig data analytics tools. Vehicle control information is generated based on machine learning-based control systems. This paper also focuses on improving the overall performance (discharge time and cycle life) of a lithium ion battery, increasing the range of the electric vehicle, enhancing the safety of the battery that acquires the static and dynamic parameter and driving pattern of the electrical vehicle, establishing vehicular ad hoc network (VANET) communication, and handling and analyzing the acquired data with the help of various artificial big data analytics techniques.

173 citations

Journal ArticleDOI
TL;DR: Compared to convex models, nonconvex modeling, which is capable of characterizing more complex real scenes and providing model interpretability technically and theoretically, has proven to be a feasible solution that reduces the gap between challenging HS vision tasks and currently advanced intelligent data processing models.
Abstract: Hyperspectral (HS) imaging, also known as image spectrometry, is a landmark technique in geoscience and remote sensing (RS). In the past decade, enormous efforts have been made to process and analyze these HS products, mainly by seasoned experts. However, with an ever-growing volume of data, the bulk of costs in manpower and material resources poses new challenges for reducing the burden of manual labor and improving efficiency. For this reason, it is urgent that more intelligent and automatic approaches for various HS RS applications be developed. Machine learning (ML) tools with convex optimization have successfully undertaken the tasks of numerous artificial intelligence (AI)-related applications; however, their ability to handle complex practical problems remains limited, particularly for HS data, due to the effects of various spectral variabilities in the process of HS imaging and the complexity and redundancy of higher-dimensional HS signals. Compared to convex models, nonconvex modeling, which is capable of characterizing more complex real scenes and providing model interpretability technically and theoretically, has proven to be a feasible solution that reduces the gap between challenging HS vision tasks and currently advanced intelligent data processing models.

168 citations

Journal ArticleDOI
TL;DR: Extensive experiments conducted demonstrate the superiority and advancement of the S2FL model in the task of land cover classification in comparison with previously-proposed state-of-the-art baselines.
Abstract: As remote sensing (RS) data obtained from different sensors become available largely and openly, multimodal data processing and analysis techniques have been garnering increasing interest in the RS and geoscience community. However, due to the gap between different modalities in terms of imaging sensors, resolutions, and contents, embedding their complementary information into a consistent, compact, accurate, and discriminative representation, to a great extent, remains challenging. To this end, we propose a shared and specific feature learning (S2FL) model. S2FL is capable of decomposing multimodal RS data into modality-shared and modality-specific components, enabling the information blending of multi-modalities more effectively, particularly for heterogeneous data sources. Moreover, to better assess multimodal baselines and the newly-proposed S2FL model, three multimodal RS benchmark datasets, i.e., Houston2013 – hyperspectral and multispectral data, Berlin – hyperspectral and synthetic aperture radar (SAR) data, Augsburg – hyperspectral, SAR, and digital surface model (DSM) data, are released and used for land cover classification. Extensive experiments conducted on the three datasets demonstrate the superiority and advancement of our S2FL model in the task of land cover classification in comparison with previously-proposed state-of-the-art baselines. Furthermore, the baseline codes and datasets used in this paper will be made available freely at https://github.com/danfenghong/ISPRS_S2FL .

117 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed an end-member-guided unmixing network (EGU-Net), which is a two-stream Siamese deep network that learns an additional network from the pure or nearly pure endmembers to correct the weights of another unmixer by sharing network parameters and adding spectrally meaningful constraints.
Abstract: Over the past decades, enormous efforts have been made to improve the performance of linear or nonlinear mixing models for hyperspectral unmixing (HU), yet their ability to simultaneously generalize various spectral variabilities (SVs) and extract physically meaningful endmembers still remains limited due to the poor ability in data fitting and reconstruction and the sensitivity to various SVs. Inspired by the powerful learning ability of deep learning (DL), we attempt to develop a general DL approach for HU, by fully considering the properties of endmembers extracted from the hyperspectral imagery, called endmember-guided unmixing network (EGU-Net). Beyond the alone autoencoder-like architecture, EGU-Net is a two-stream Siamese deep network, which learns an additional network from the pure or nearly pure endmembers to correct the weights of another unmixing network by sharing network parameters and adding spectrally meaningful constraints (e.g., nonnegativity and sum-to-one) toward a more accurate and interpretable unmixing solution. Furthermore, the resulting general framework is not only limited to pixelwise spectral unmixing but also applicable to spatial information modeling with convolutional operators for spatial-spectral unmixing. Experimental results conducted on three different datasets with the ground truth of abundance maps corresponding to each material demonstrate the effectiveness and superiority of the EGU-Net over state-of-the-art unmixing algorithms. The codes will be available from the website: https://github.com/danfenghong/IEEE_TNNLS_EGU-Net.

96 citations

References
More filters
Journal ArticleDOI
Jacob Cohen1
TL;DR: In this article, the authors present a procedure for having two or more judges independently categorize a sample of units and determine the degree, significance, and significance of the units. But they do not discuss the extent to which these judgments are reproducible, i.e., reliable.
Abstract: CONSIDER Table 1. It represents in its formal characteristics a situation which arises in the clinical-social-personality areas of psychology, where it frequently occurs that the only useful level of measurement obtainable is nominal scaling (Stevens, 1951, pp. 2526), i.e. placement in a set of k unordered categories. Because the categorizing of the units is a consequence of some complex judgment process performed by a &dquo;two-legged meter&dquo; (Stevens, 1958), it becomes important to determine the extent to which these judgments are reproducible, i.e., reliable. The procedure which suggests itself is that of having two (or more) judges independently categorize a sample of units and determine the degree, significance, and

34,965 citations

Proceedings Article
Sergey Ioffe1, Christian Szegedy1
06 Jul 2015
TL;DR: Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization, and in some cases eliminates the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.82% top-5 test error, exceeding the accuracy of human raters.

30,843 citations

Posted Content
Sergey Ioffe1, Christian Szegedy1
TL;DR: Batch Normalization as mentioned in this paper normalizes layer inputs for each training mini-batch to reduce the internal covariate shift in deep neural networks, and achieves state-of-the-art performance on ImageNet.
Abstract: Training Deep Neural Networks is complicated by the fact that the distribution of each layer's inputs changes during training, as the parameters of the previous layers change. This slows down the training by requiring lower learning rates and careful parameter initialization, and makes it notoriously hard to train models with saturating nonlinearities. We refer to this phenomenon as internal covariate shift, and address the problem by normalizing layer inputs. Our method draws its strength from making normalization a part of the model architecture and performing the normalization for each training mini-batch. Batch Normalization allows us to use much higher learning rates and be less careful about initialization. It also acts as a regularizer, in some cases eliminating the need for Dropout. Applied to a state-of-the-art image classification model, Batch Normalization achieves the same accuracy with 14 times fewer training steps, and beats the original model by a significant margin. Using an ensemble of batch-normalized networks, we improve upon the best published result on ImageNet classification: reaching 4.9% top-5 validation error (and 4.8% test error), exceeding the accuracy of human raters.

17,184 citations

Posted Content
TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

11,866 citations

Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.

11,732 citations