scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Change Detection Based on Deep Siamese Convolutional Network for Optical Aerial Images

Yang Zhan1, Kun Fu1, Menglong Yan1, Xian Sun1, Hongqi Wang1, Xiaosong Qiu1 
30 Aug 2017-IEEE Geoscience and Remote Sensing Letters (IEEE)-Vol. 14, Iss: 10, pp 1845-1849
TL;DR: A novel supervised change detection method based on a deep siamese convolutional network for optical aerial images that is comparable, even better, with the two state-of-the-art methods in terms of F-measure.
Abstract: In this letter, we propose a novel supervised change detection method based on a deep siamese convolutional network for optical aerial images. We train a siamese convolutional network using the weighted contrastive loss. The novelty of the method is that the siamese network is learned to extract features directly from the image pairs. Compared with hand-crafted features used by the conventional change detection method, the extracted features are more abstract and robust. Furthermore, because of the advantage of the weighted contrastive loss function, the features have a unique property: the feature vectors of the changed pixel pair are far away from each other, while the ones of the unchanged pixel pair are close. Therefore, we use the distance of the feature vectors to detect changes between the image pair. Simple threshold segmentation on the distance map can even obtain good performance. For improvement, we use a $k$ -nearest neighbor approach to update the initial result. Experimental results show that the proposed method produces results comparable, even better, with the two state-of-the-art methods in terms of F-measure.
Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes a novel Siamese-based spatial–temporal attention neural network, which improves the F1-score of the baseline model from 83.9 to 87.3 with acceptable computational overhead and introduces a CD dataset LEVIR-CD, which is two orders of magnitude larger than other public datasets of this field.
Abstract: Remote sensing image change detection (CD) is done to identify desired significant changes between bitemporal images. Given two co-registered images taken at different times, the illumination variations and misregistration errors overwhelm the real object changes. Exploring the relationships among different spatial–temporal pixels may improve the performances of CD methods. In our work, we propose a novel Siamese-based spatial–temporal attention neural network. In contrast to previous methods that separately encode the bitemporal images without referring to any useful spatial–temporal dependency, we design a CD self-attention mechanism to model the spatial–temporal relationships. We integrate a new CD self-attention module in the procedure of feature extraction. Our self-attention module calculates the attention weights between any two pixels at different times and positions and uses them to generate more discriminative features. Considering that the object may have different scales, we partition the image into multi-scale subregions and introduce the self-attention in each subregion. In this way, we could capture spatial–temporal dependencies at various scales, thereby generating better representations to accommodate objects of various sizes. We also introduce a CD dataset LEVIR-CD, which is two orders of magnitude larger than other public datasets of this field. LEVIR-CD consists of a large set of bitemporal Google Earth images, with 637 image pairs (1024 × 1024) and over 31 k independently labeled change instances. Our proposed attention module improves the F1-score of our baseline model from 83.9 to 87.3 with acceptable computational overhead. Experimental results on a public remote sensing image CD dataset show our method outperforms several other state-of-the-art methods.

552 citations


Cites background or methods from "Change Detection Based on Deep Siam..."

  • ...Remote sensing image CD requires pixel-wise prediction and benefits from the dense features by FCN based methods [46,47]....

    [...]

  • ...The embedding space can be learned by deep Siamese fully convolutional networks (FCN) [27,28], which contains two identical networks sharing the same weight, each independently generating the feature maps for each temporal image....

    [...]

  • ...During the last few years, deep metric learning has been applied in many remote sensing applications [27,28,60,61]....

    [...]

  • ...The first approach used an FCN to separately classify the land use of each temporal image and then determined the change type by the change trajectory....

    [...]

  • ...The results of DSCNN, rRL and TBSRL are reported by [28]....

    [...]

Proceedings ArticleDOI
05 Oct 2018
TL;DR: This paper presents three fully convolutional neural network architectures which perform change detection using a pair of coregistered images, and proposes two Siamese extensions of fully Convolutional networks which use heuristics about the current problem to achieve the best results.
Abstract: This paper presents three fully convolutional neural network architectures which perform change detection using a pair of coregistered images. Most notably, we propose two Siamese extensions of fully convolutional networks which use heuristics about the current problem to achieve the best results in our tests on two open change detection datasets, using both RGB and multispectral images. We show that our system is able to learn from scratch using annotated change detection images. Our architectures achieve better performance than previously proposed methods, while being at least 500 times faster than related systems. This work is a step towards efficient processing of data from large scale Earth observation systems such as Copernicus or Landsat.

484 citations


Cites methods or result from "Change Detection Based on Deep Siam..."

  • ...Comparison between the results obtained by the method presented in [11] and the ones described in this paper on the Air Change dataset....

    [...]

  • ...For the AC dataset, the methods user for comparison were DSCN [11], CXM [4], and SCCN [8], using the values claimed by Zhan et al. in [11]....

    [...]

  • ...For the AC dataset, the methods user for comparison were DSCN [11], CXM [4], and SCCN [8], using the values claimed by Zhan et al....

    [...]

  • ...The proposed techniques have followed the tendencies of computer vision and image analysis: at first, pixels were analyzed directly using manually crafted techniques; later on, descriptors began to be used in conjunction with simple machine learning techniques [6]; recently, more elaborate machine learning techniques (deep learning) are dominating most problems in the image analysis field, and this evolution is slowly reaching the problem of change detection [7, 8, 9, 10, 11, 3, 12]....

    [...]

  • ...For the AC dataset, we followed the data split that was proposed in [11]: the top left 748x448 rectangle of the Data Network Prec....

    [...]

Journal ArticleDOI
TL;DR: A novel end-to-end CD method based on an effective encoderdecoder architecture for semantic segmentation named UNet++, where change maps could be learned from scratch using available annotated datasets, which outperforms the other state-of-the-art CD methods.
Abstract: Change detection (CD) is essential to the accurate understanding of land surface changes using available Earth observation data. Due to the great advantages in deep feature representation and nonlinear problem modeling, deep learning is becoming increasingly popular to solve CD tasks in remote-sensing community. However, most existing deep learning-based CD methods are implemented by either generating difference images using deep features or learning change relations between pixel patches, which leads to error accumulation problems since many intermediate processing steps are needed to obtain final change maps. To address the above-mentioned issues, a novel end-to-end CD method is proposed based on an effective encoder-decoder architecture for semantic segmentation named UNet++, where change maps could be learned from scratch using available annotated datasets. Firstly, co-registered image pairs are concatenated as an input for the improved UNet++ network, where both global and fine-grained information can be utilized to generate feature maps with high spatial accuracy. Then, the fusion strategy of multiple side outputs is adopted to combine change maps from different semantic levels, thereby generating a final change map with high accuracy. The effectiveness and reliability of our proposed CD method are verified on very-high-resolution (VHR) satellite image datasets. Extensive experimental results have shown that our proposed approach outperforms the other state-of-the-art CD methods.

408 citations

Journal ArticleDOI
TL;DR: The weighted double-margin contrastive loss is proposed to address the imbalanced sample is a serious problem in change detection, i.e., unchanged samples are much more abundant than changed samples, which is one of the main reasons for pseudochanges.
Abstract: Change detection is a basic task of remote sensing image processing. The research objective is to identify the change information of interest and filter out the irrelevant change information as interference factors. Recently, the rise in deep learning has provided new tools for change detection, which have yielded impressive results. However, the available methods focus mainly on the difference information between multitemporal remote sensing images and lack robustness to pseudochange information. To overcome the lack of resistance in current methods to pseudochanges, in this article, we propose a new method, namely, dual attentive fully convolutional Siamese networks, for change detection in high-resolution images. Through the dual attention mechanism, long-range dependencies are captured to obtain more discriminant feature representations to enhance the recognition performance of the model. Moreover, the imbalanced sample is a serious problem in change detection, i.e., unchanged samples are much more abundant than changed samples, which is one of the main reasons for pseudochanges. We propose the weighted double-margin contrastive loss to address this problem by punishing attention to unchanged feature pairs and increasing attention to changed feature pairs. The experimental results of our method on the change detection dataset and the building change detection dataset demonstrate that compared with other baseline methods, the proposed method realizes maximum improvements of 2.9% and 4.2%, respectively, in the F 1 score. Our PyTorch implementation is available at https://github.com/lehaifeng/DASNet .

324 citations


Cites methods from "Change Detection Based on Deep Siam..."

  • ...SCCN [28] uses a deep symmetrical network to study changes in remote sensing images, and DSCN [29] uses two branch networks that share weights for feature extraction and uses the features that are obtained by the last layer of the two branches for threshold segmentation to obtain a binary change map....

    [...]

  • ...[28] uses a deep symmetrical network to study changes in remote sensing images, and DSCN [29] uses two branch networks that share weights for feature extraction and uses the features that...

    [...]

Journal ArticleDOI
TL;DR: A novel unsupervised context-sensitive framework—deep change vector analysis (DCVA)—for CD in multitemporal VHR images that exploit convolutional neural network (CNN) features is proposed and experimental results on mult itemporal data sets of Worldview-2, Pleiades, and Quickbird images confirm the effectiveness of the proposed method.
Abstract: Change detection (CD) in multitemporal images is an important application of remote sensing. Recent technological evolution provided very high spatial resolution (VHR) multitemporal optical satellite images showing high spatial correlation among pixels and requiring an effective modeling of spatial context to accurately capture change information. Here, we propose a novel unsupervised context-sensitive framework—deep change vector analysis (DCVA)—for CD in multitemporal VHR images that exploit convolutional neural network (CNN) features. To have an unsupervised system, DCVA starts from a suboptimal pretrained multilayered CNN for obtaining deep features that can model spatial relationship among neighboring pixels and thus complex objects. An automatic feature selection strategy is employed layerwise to select features emphasizing both high and low prior probability change information. Selected features from multiple layers are combined into a deep feature hypervector providing a multiscale scene representation. The use of the same pretrained CNN for semantic segmentation of single images enables us to obtain coherent multitemporal deep feature hypervectors that can be compared pixelwise to obtain deep change vectors that also model spatial context information. Deep change vectors are analyzed based on their magnitude to identify changed pixels. Then, deep change vectors corresponding to identified changed pixels are binarized to obtain a compressed binary deep change vectors that preserve information about the direction (kind) of change. Changed pixels are analyzed for multiple CD based on the binary features, thus implicitly using the spatial information. Experimental results on multitemporal data sets of Worldview-2, Pleiades, and Quickbird images confirm the effectiveness of the proposed method.

310 citations


Cites methods from "Change Detection Based on Deep Siam..."

  • ...[34] proposed a supervised CD method for optical aerial images based on the deep Siamese network....

    [...]

References
More filters
Posted Content
TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU ($\approx$ 2.5 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

12,531 citations

Posted Content
TL;DR: This work proposes a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit and derives a robust initialization method that particularly considers the rectifier nonlinearities.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLU-nets), we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass human-level performance (5.1%, Russakovsky et al.) on this visual recognition challenge.

11,866 citations


"Change Detection Based on Deep Siam..." refers methods in this paper

  • ...The weights of each convolutional layer are initialized with the Msra algorithm [15]....

    [...]

Proceedings ArticleDOI
07 Dec 2015
TL;DR: In this paper, a Parametric Rectified Linear Unit (PReLU) was proposed to improve model fitting with nearly zero extra computational cost and little overfitting risk, which achieved a 4.94% top-5 test error on ImageNet 2012 classification dataset.
Abstract: Rectified activation units (rectifiers) are essential for state-of-the-art neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on the learnable activation and advanced initialization, we achieve 4.94% top-5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66% [33]). To our knowledge, our result is the first to surpass the reported human-level performance (5.1%, [26]) on this dataset.

11,732 citations

Proceedings ArticleDOI
03 Nov 2014
TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits industry and internet-scale media needs by CUDA GPU computation, processing over 40 million images a day on a single K40 or Titan GPU (approx 2 ms per image). By separating model representation from actual implementation, Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments.Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC) with the help of an active community of contributors on GitHub. It powers ongoing research projects, large-scale industrial applications, and startup prototypes in vision, speech, and multimedia.

10,161 citations


"Change Detection Based on Deep Siam..." refers methods in this paper

  • ...3) Optimization: We implement the proposed siamese network using the Caffe [14] framework....

    [...]

Proceedings ArticleDOI
17 Jun 2006
TL;DR: This work presents a method - called Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) - for learning a globally coherent nonlinear function that maps the data evenly to the output manifold.
Abstract: Dimensionality reduction involves mapping a set of high dimensional input points onto a low dimensional manifold so that 'similar" points in input space are mapped to nearby points on the manifold. We present a method - called Dimensionality Reduction by Learning an Invariant Mapping (DrLIM) - for learning a globally coherent nonlinear function that maps the data evenly to the output manifold. The learning relies solely on neighborhood relationships and does not require any distancemeasure in the input space. The method can learn mappings that are invariant to certain transformations of the inputs, as is demonstrated with a number of experiments. Comparisons are made to other techniques, in particular LLE.

4,524 citations


"Change Detection Based on Deep Siam..." refers background or methods in this paper

  • ...As [10] presented, the contrastive loss can produce the abovementioned function when it gets a minimum value....

    [...]

  • ..., the numbers of changed and unchanged pixels vary greatly) in change detection, we use a weighted contrastive loss [10], in which not only the unchanged pixels but also the changed ones are considered as the objective function when training the network....

    [...]

  • ...In [10], LU and LC are defined as follows:...

    [...]

  • ...LU and LC must be designed, such that Di, j would produce a low value for a pair of unchanged pixels and a high value for the changed pixel pair when L gets the minimum value [10]....

    [...]

  • ...Define Dw(X1, X2)i, j be the Euclidean distance between the feature vector Gw(X1)i, j and Gw(X2)i, j [10]....

    [...]