scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Joint Dictionary Learning for Multispectral Change Detection

01 Apr 2017-IEEE Transactions on Systems, Man, and Cybernetics (IEEE Trans Cybern)-Vol. 47, Iss: 4, pp 884-897
TL;DR: An improved sparse coding method for change detection that minimizes the reconstruction errors of the changed pixels without the prior assumption of the spectral signature, which can adapt to different data due to the characteristic of joint dictionary learning.
Abstract: Change detection is one of the most important applications of remote sensing technology. It is a challenging task due to the obvious variations in the radiometric value of spectral signature and the limited capability of utilizing spectral information. In this paper, an improved sparse coding method for change detection is proposed. The intuition of the proposed method is that unchanged pixels in different images can be well reconstructed by the joint dictionary, which corresponds to knowledge of unchanged pixels, while changed pixels cannot. First, a query image pair is projected onto the joint dictionary to constitute the knowledge of unchanged pixels. Then reconstruction error is obtained to discriminate between the changed and unchanged pixels in the different images. To select the proper thresholds for determining changed regions, an automatic threshold selection strategy is presented by minimizing the reconstruction errors of the changed pixels. Adequate experiments on multispectral data have been tested, and the experimental results compared with the state-of-the-art methods prove the superiority of the proposed method. Contributions of the proposed method can be summarized as follows: 1) joint dictionary learning is proposed to explore the intrinsic information of different images for change detection. In this case, change detection can be transformed as a sparse representation problem. To the authors’ knowledge, few publications utilize joint learning dictionary in change detection; 2) an automatic threshold selection strategy is presented, which minimizes the reconstruction errors of the changed pixels without the prior assumption of the spectral signature. As a result, the threshold value provided by the proposed method can adapt to different data due to the characteristic of joint dictionary learning; and 3) the proposed method makes no prior assumption of the modeling and the handling of the spectral signature, which can be adapted to different data.
Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes a simple but effective method to learn discriminative CNNs (D-CNNs) to boost the performance of remote sensing image scene classification and comprehensively evaluates the proposed method on three publicly available benchmark data sets using three off-the-shelf CNN models.
Abstract: Remote sensing image scene classification is an active and challenging task driven by many applications. More recently, with the advances of deep learning models especially convolutional neural networks (CNNs), the performance of remote sensing image scene classification has been significantly improved due to the powerful feature representations learnt through CNNs. Although great success has been obtained so far, the problems of within-class diversity and between-class similarity are still two big challenges. To address these problems, in this paper, we propose a simple but effective method to learn discriminative CNNs (D-CNNs) to boost the performance of remote sensing image scene classification. Different from the traditional CNN models that minimize only the cross entropy loss, our proposed D-CNN models are trained by optimizing a new discriminative objective function. To this end, apart from minimizing the classification error, we also explicitly impose a metric learning regularization term on the CNN features. The metric learning regularization enforces the D-CNN models to be more discriminative so that, in the new D-CNN feature spaces, the images from the same scene class are mapped closely to each other and the images of different classes are mapped as farther apart as possible. In the experiments, we comprehensively evaluate the proposed method on three publicly available benchmark data sets using three off-the-shelf CNN models. Experimental results demonstrate that our proposed D-CNN methods outperform the existing baseline methods and achieve state-of-the-art results on all three data sets.

1,001 citations


Cites methods from "Joint Dictionary Learning for Multi..."

  • ...Typical unsupervised feature learning methods include, but not limited to, principal component analysis, k-means clustering, sparse coding [26]–[28], [33], [44] and autoencoder [29], [31]....

    [...]

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a deep video saliency network consisting of two modules, for capturing the spatial and temporal saliency information, respectively, which can directly produce spatio-temporal saliency inference without time-consuming optical flow computation.
Abstract: This paper proposes a deep learning model to efficiently detect salient regions in videos. It addresses two important issues: 1) deep video saliency model training with the absence of sufficiently large and pixel-wise annotated video data and 2) fast video saliency training and detection. The proposed deep video saliency network consists of two modules, for capturing the spatial and temporal saliency information, respectively. The dynamic saliency model, explicitly incorporating saliency estimates from the static saliency model, directly produces spatiotemporal saliency inference without time-consuming optical flow computation. We further propose a novel data augmentation technique that simulates video training data from existing annotated image data sets, which enables our network to learn diverse saliency information and prevents overfitting with the limited number of training videos. Leveraging our synthetic video data (150K video sequences) and real videos, our deep video saliency model successfully learns both spatial and temporal saliency cues, thus producing accurate spatiotemporal saliency estimate. We advance the state-of-the-art on the densely annotated video segmentation data set (MAE of .06) and the Freiburg-Berkeley Motion Segmentation data set (MAE of .07), and do so with much improved speed (2 fps with all steps).

550 citations

Journal ArticleDOI
TL;DR: In this article, a novel deep learning architecture, ResUNet-a, is proposed for the task of semantic segmentation of monotemporal very high-resolution aerial images.
Abstract: Scene understanding of high resolution aerial images is of great importance for the task of automated monitoring in various remote sensing applications. Due to the large within-class and small between-class variance in pixel values of objects of interest, this remains a challenging task. In recent years, deep convolutional neural networks have started being used in remote sensing applications and demonstrate state of the art performance for pixel level classification of objects. Here we propose a reliable framework for performant results for the task of semantic segmentation of monotemporal very high resolution aerial images. Our framework consists of a novel deep learning architecture, ResUNet-a , and a novel loss function based on the Dice loss. ResUNet-a uses a UNet encoder/decoder backbone, in combination with residual connections, atrous convolutions, pyramid scene parsing pooling and multi-tasking inference. ResUNet-a infers sequentially the boundary of the objects, the distance transform of the segmentation mask, the segmentation mask and a colored reconstruction of the input. Each of the tasks is conditioned on the inference of the previous ones, thus establishing a conditioned relationship between the various tasks, as this is described through the architecture’s computation graph. We analyse the performance of several flavours of the Generalized Dice loss for semantic segmentation, and we introduce a novel variant loss function for semantic segmentation of objects that has excellent convergence properties and behaves well even under the presence of highly imbalanced classes. The performance of our modeling framework is evaluated on the ISPRS 2D Potsdam dataset. Results show state-of-the-art performance with an average F1 score of 92.9% over all classes for our best model.

462 citations

Journal ArticleDOI
TL;DR: An unsupervised representation learning method is proposed to investigate deconvolution networks for remote sensing scene classification and outperform most state of the arts results, which demonstrates the effectiveness of this method.
Abstract: With the rapid development of the satellite sensor technology, high spatial resolution remote sensing (HSR) data have attracted extensive attention in military and civilian applications In order to make full use of these data, remote sensing scene classification becomes an important and necessary precedent task In this paper, an unsupervised representation learning method is proposed to investigate deconvolution networks for remote sensing scene classification First, a shallow weighted deconvolution network is utilized to learn a set of feature maps and filters for each image by minimizing the reconstruction error between the input image and the convolution result The learned feature maps can capture the abundant edge and texture information of high spatial resolution images, which is definitely important for remote sensing images After that, the spatial pyramid model (SPM) is used to aggregate features at different scales to maintain the spatial layout of HSR image scene A discriminative representation for HSR image is obtained by combining the proposed weighted deconvolution model and SPM Finally, the representation vector is input into a support vector machine to finish classification We apply our method on two challenging HSR image data sets: the UCMerced data set with 21 scene categories and the Sydney data set with seven land-use categories All the experimental results achieved by the proposed method outperform most state of the arts, which demonstrates the effectiveness of the proposed method

254 citations

Journal ArticleDOI
TL;DR: A novel deep model with convolutional neural networks (CNNs), i.e., an end-to-end self-cascaded network (ScasNet), for confusing manmade objects and fine-structured objects, ScasNet improves the labeling coherence with sequential global- to-local contexts aggregation.
Abstract: Semantic labeling for very high resolution (VHR) images in urban areas, is of significant importance in a wide range of remote sensing applications. However, many confusing manmade objects and intricate fine-structured objects make it very difficult to obtain both coherent and accurate labeling results. For this challenging task, we propose a novel deep model with convolutional neural networks (CNNs), i.e., an end-to-end self-cascaded network (ScasNet). Specifically, for confusing manmade objects, ScasNet improves the labeling coherence with sequential global-to-local contexts aggregation. Technically, multi-scale contexts are captured on the output of a CNN encoder, and then they are successively aggregated in a self-cascaded manner. Meanwhile, for fine-structured objects, ScasNet boosts the labeling accuracy with a coarse-to-fine refinement strategy. It progressively refines the target objects using the low-level features learned by CNN’s shallow layers. In addition, to correct the latent fitting residual caused by multi-feature fusion inside ScasNet, a dedicated residual correction scheme is proposed. It greatly improves the effectiveness of ScasNet. Extensive experimental results on three public datasets, including two challenging benchmarks, show that ScasNet achieves the state-of-the-art performance.

231 citations

References
More filters
Journal ArticleDOI
TL;DR: Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions.
Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries --- stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), Matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis Pursuit (BP) is a principle for decomposing a signal into an "optimal" superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, in abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

9,950 citations


"Joint Dictionary Learning for Multi..." refers background in this paper

  • ...Because (2) is NP-hard in general, some relaxation strategies, including basis pursuit [47] and orthogonal matching pursuit [48], are exploited to approximate the solution of (2)....

    [...]

Journal ArticleDOI
TL;DR: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions, chosen in order to best match the signal structures.
Abstract: The authors introduce an algorithm, called matching pursuit, that decomposes any signal into a linear expansion of waveforms that are selected from a redundant dictionary of functions. These waveforms are chosen in order to best match the signal structures. Matching pursuits are general procedures to compute adaptive signal representations. With a dictionary of Gabor functions a matching pursuit defines an adaptive time-frequency transform. They derive a signal energy distribution in the time-frequency plane, which does not include interference terms, unlike Wigner and Cohen class distributions. A matching pursuit isolates the signal structures that are coherent with respect to a given dictionary. An application to pattern extraction from noisy signals is described. They compare a matching pursuit decomposition with a signal expansion over an optimized wavepacket orthonormal basis, selected with the algorithm of Coifman and Wickerhauser see (IEEE Trans. Informat. Theory, vol. 38, Mar. 1992). >

9,380 citations


"Joint Dictionary Learning for Multi..." refers background or methods in this paper

  • ...Because (2) is NP-hard in general, some relaxation strategies, including basis pursuit [47] and orthogonal matching pursuit [48], are exploited to approximate the solution of (2)....

    [...]

  • ...Broadly, the techniques to the selection of dictionary can be categorized into two families: 1) transformation-based methods [48] and 2) learning-based...

    [...]

Journal ArticleDOI
TL;DR: This work gives examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution, and obtains reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.
Abstract: The time-frequency and time-scale communities have recently developed a large number of overcomplete waveform dictionaries---stationary wavelets, wavelet packets, cosine packets, chirplets, and warplets, to name a few. Decomposition into overcomplete systems is not unique, and several methods for decomposition have been proposed, including the method of frames (MOF), matching pursuit (MP), and, for special dictionaries, the best orthogonal basis (BOB). Basis pursuit (BP) is a principle for decomposing a signal into an "optimal"' superposition of dictionary elements, where optimal means having the smallest l1 norm of coefficients among all such decompositions. We give examples exhibiting several advantages over MOF, MP, and BOB, including better sparsity and superresolution. BP has interesting relations to ideas in areas as diverse as ill-posed problems, abstract harmonic analysis, total variation denoising, and multiscale edge denoising. BP in highly overcomplete dictionaries leads to large-scale optimization problems. With signals of length 8192 and a wavelet packet dictionary, one gets an equivalent linear program of size 8192 by 212,992. Such problems can be attacked successfully only because of recent advances in linear and quadratic programming by interior-point methods. We obtain reasonable success with a primal-dual logarithmic barrier method and conjugate-gradient solver.

4,387 citations

Journal ArticleDOI
TL;DR: An evaluation of results indicates that various procedures of change detection produce different maps of change even in the same environment.
Abstract: A variety of procedures for change detection based on comparison of multitemporal digital remote sensing data have been developed. An evaluation of results indicates that various procedures of change detection produce different maps of change even in the same environment.

3,361 citations

Proceedings Article
04 Dec 2006
TL;DR: These algorithms are applied to natural images and it is demonstrated that the inferred sparse codes exhibit end-stopping and non-classical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons.
Abstract: Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higher-level features in the data. However, finding sparse codes remains a very difficult computational problem. In this paper, we present efficient sparse coding algorithms that are based on iteratively solving two convex optimization problems: an L1-regularized least squares problem and an L2-constrained least squares problem. We propose novel algorithms to solve both of these optimization problems. Our algorithms result in a significant speedup for sparse coding, allowing us to learn larger sparse codes than possible with previously described algorithms. We apply these algorithms to natural images and demonstrate that the inferred sparse codes exhibit end-stopping and non-classical receptive field surround suppression and, therefore, may provide a partial explanation for these two phenomena in V1 neurons.

2,731 citations


"Joint Dictionary Learning for Multi..." refers background or methods in this paper

  • ...The constraint in (9) prevents the dictionary D from being large, which will lead to arbitrarily small values of sparse coefficients [42]....

    [...]

  • ...In each feature-sign step [42], [46], [53], the analytical solution ŝnew i is calculated by (12) under the current active set and signs....

    [...]

  • ...According to the feature-sign search method proposed in [42] and [53], the subdifferential of the (12) is discussed in the situation for different values of the coefficient s j) i ....

    [...]

  • ...Sparsity constraint has shown promising results in finding succinct representations of stimuli [42]....

    [...]

  • ...In this paper, a Lagrange dual method [42] is adopted to compute the dictionary D....

    [...]