scispace - formally typeset
Open AccessProceedings ArticleDOI

Hyperspectral image classification via kernel sparse representation

TLDR
Experimental results on several HSIs show that the proposed technique outperforms the linear sparsity-based classification technique, as well as the classical support vector machines and sparse kernel logistic regression classifiers.
Abstract
In this paper, a new technique for hyperspectral image classification is proposed. Our approach relies on the sparse representation of a test sample with respect to all training samples in a feature space induced by a kernel function. Projecting the samples into the feature space and kernelizing the sparse representation improves the separability of the data and thus yields higher classification accuracy compared to the more conventional linear sparsity-based classification algorithm. Moreover, the spatial coherence across neighboring pixels is also incorporated through a kernelized joint sparsity model, where all of the pixels within a small neighborhood are sparsely represented in the feature space by selecting a few common training samples. Two greedy algorithms are also provided in this paper to solve the kernel versions of the pixel-wise and jointly sparse recovery problems. Experimental results show that the proposed technique outperforms the linear sparsity-based classification technique and the classical Support Vector Machine classifiers.

read more

Content maybe subject to copyright    Report

1
Hyperspectral Image Classification via Kernel
Sparse Representation
Yi Chen
1
, Nasser M. Nasrabadi
2
, Fellow, IEEE, and Trac D. Tran
1
, Senior Member, IEEE
1
Department of Electrical and Computer Engineering, The Johns Hopkins University
3400 N. Charles Street, Baltimore, MD 21218
2
US Army Research Laboratory
2800 Powder Mill Road, Adelphi, MD 20783
Abstract—In this paper, a novel nonlinear technique for hy-
perspectral image classification is proposed. Our approach relies
on sparsely representing a test sample in terms of all of the
training samples in a feature space induced by a kernel function.
For each test pixel in the feature space, a sparse representation
vector is obtained by decomposing the test pixel over a training
dictionary, also in the same feature space, by using a kernel-based
greedy pursuit algorithm. The recovered sparse representation
vector is then used directly to determine the class label of the
test pixel. Projecting the samples into a high-dimensional feature
space and kernelizing the sparse representation improves the
data separability between different classes, providing a higher
classification accuracy compared to the more conventional linear
sparsity-based classification algorithms. Moreover, the spatial
coherency across neighboring pixels is also incorporated through
a kernelized joint sparsity model, where all of the pixels within
a small neighborhood are jointly represented in the feature
space by selecting a few common training samples. Kernel
greedy optimization algorithms are suggested in this paper to
solve the kernel versions of the single-pixel and multi-pixel
joint sparsity-based recovery problems. Experimental results on
several hyperspectral images show that the proposed technique
outperforms the linear sparsity-based classification technique, as
well as the classical Support Vector Machines and sparse kernel
logistic regression classifiers.
I. INTRODUCTION
Hyperspectral imaging sensors capture images in hundreds
of continuous narrow spectral bands, spanning the visible to
infrared spectrum. Each pixel in a hyperspectral image (HSI)
is represented by a vector whose entries correspond to various
spectral-band responses. Different materials usually reflect
electromagnetic energy differently at specific wavelengths.
This enables discrimination of materials based on their spectral
characteristics. One of the most important applications of HSI
is image classification, where pixels are labeled to one of the
classes based on their spectral characteristics, given a small
set of training data for each class. Various techniques have
been developed for HSI classification. Among the previous
approaches, the support vector machine (SVM) [1], [2] has
proven to be a powerful tool to solve many supervised clas-
sification problems and has shown good performances in hy-
This work has been partially supported by the National Science Foundation
(NSF) under Grants CCF-1117545 and CCF-0728893; the Army Research
Office (ARO) under Grant 58110-MA-II and Grant 60219-MA; and the Office
of Naval Research (ONR) under Grant N102-183-0208.
perspectral classification, as well [3]–[5]. Variations of SVM-
based algorithms have also been proposed to improve the
classification accuracy. These variations include transductive
SVM, which exploits both labeled and unlabeled samples [6],
and SVM with composite kernels, which incorporates spatial
information directly in the SVM kernels [7]. Multinomial
logistic regression [8] is another widely used classifier, which
uses the logistic function to provide the posterior probability.
A fast algorithm for sparse multinomial logistic regression
has been developed in [9] and successfully adopted for HSI
segmentation in [10], [11]. Some of the other recent HSI
classification techniques can be found in [12]–[17]. In these re-
cent methods, a feature extraction strategy is proposed in [12]
for classification which generalizes the linear discriminative
analysis and nonparametric discriminative analysis. In [13], the
derivative information of the spectral signatures is exploited as
features and then decisions obtained from spectral reflectance
and derivative information are fused for the final decisions.
In [14], each image band is decomposed into intrinsic mode
functions (IMFs) which are adaptive to local properties via
empirical mode decomposition and then SVM is applied to
the lower-order IMFs for classification. In [15], the k-nearest-
neighbor classifier is applied to the local manifolds to exploit
the intrinsic nonlinear structure of hyperspectral images. A
semi-supervised classification algorithm is proposed in [16] in
order to use a kernel machine which is iteratively updated
by manifold regularization. In [17] the results from multi-
ple classification/segmentation techniques are fused by post-
processing to generate the final spectral-spatial classification
map. Most of the above-mentioned HSI image classification
techniques do not directly incorporate the spatial or the con-
textual information into the classifier.
Recently, sparse representation [18], [19] has also been pro-
posed to solve many computer vision tasks [20]–[25], where
the usage of sparsity as a prior often leads to state-of-the-
art performance. Sparse representation has also been applied
to HSI target detection and classification [26]–[28], relying on
the observation that hyperspectral pixels belonging to the same
class approximately lie in the same low-dimensional subspace.
Thus, an unknown test pixel can be sparsely represented by
a few training samples (atoms) from a given dictionary and
the corresponding sparse representation vector will implicitly
encode the class information. The sparse representation-based

2
classifier is different from the conventional sparse classifier
SVM in the following aspects. SVM is a discriminative model,
while the sparse representation method can be viewed as a
generative model, where the signal (pixel) is expressed as a
linear combination of atoms [19]. SVM is a binary classifier
that finds the separating hyperplane between two classes
(multi-class SVM requires a one-against-one or one-against-
all strategy). The sparse representation-based classifier is from
a reconstruction point of view. The sparse decomposition of
the test pixel over the entire dictionary implicitly leads to
a competition between the subspaces (classes) and thus the
recovered sparse representation is discriminative. Moreover, in
SVM, there is an explicit training stage. The SVM classifier
is trained only once and then this classifier with its fixed
sparse support vectors is used to classify all of the test data.
On the other hand, in our proposed approach, a new sparse
representation vector is extracted for each test pixel and is thus
adaptive, representing the sparsely selected atoms which are
adapted to reconstruct the current test pixel.
Hyperspectral images are usually smooth in the sense the
pixels in a small neighborhood represent the same material and
have similar spectral characteristics. Various techniques have
been proposed recently to exploit the contextual correlation
within HSI which have notably improved the classification
and segmentation performance. Post-processing procedures are
used in [29], [30] on the individually-labeled samples based on
certain decision rules to impose the smoothness. Markov ran-
dom fields exploit the statistical dependency among neighbor-
ing pixels and are usually applied in Bayesian approaches [11].
The composite kernel approach [7] is another way to in-
corporate the spatial information, which explicitly extracts
spatial information for each spectral pixel and then combines
the spectral and spatial information via kernel composition.
Joint sparsity model [31] is exploited in sparsity-based HSI
target detection and classification [27], [28], where the neigh-
boring pixels are simultaneously represented by a sparse
linear combination of a few common training samples. Each
pixel, although sharing the same common support, might have
weighting coefficients taking on different values. In this way,
the smoothness across neighboring spectral pixels is enforced
directly in the classification stage, and no post-processing
steps are performed. The details of composite kernels and the
joint sparsity model will be further discussed in the following
sections.
It is well known that for the classical HSI image clas-
sification and target detection algorithms, the use of kernel
methods yields a significant performance improvement [5],
[32], because the kernel-based algorithms implicitly exploit
the higher-order structure of the given data which may not be
captured by the linear models. Therefore, if the data set is not
linearly separable, kernel methods [33]–[36] can be applied
to project the data into a nonlinear feature space in which
the data becomes more separable. In practical implementation,
the kernel trick [37] is often used in order to avoid explicitly
evaluating the data in the feature space.
In this paper, we propose a new HSI classification algorithm
based on kernel sparse representation by assuming that a test
pixel can be linearly represented by a few training samples in
the feature space. The kernel sparse representation vector is
then obtained by decomposing the test pixel represented in a
high dimensional feature space over a structured dictionary
consisting of training samples from all of the classes in
the same feature space. The recovered sparse vector is used
directly for classification. Although the proposed approach has
a similar formulation as previous kernel regression approaches
with a sparse prior such as kernel matching pursuit [33],
kernel basis pursuit [34], and generalized LASSO [38], the
underlying ideas are quite different. The objective of these
previous approaches is to approximate a function as a linear
combination of dictionary functions, which are the kernels
centered at the training points, by minimizing certain loss
function evaluated at these training points and subject to a
sparsity prior. Therefore, the target vector for fitting consists
of the observations of the function value at the training points,
and the dictionary is then the dictionary functions evaluated at
the training points which turns out to be the kernel matrix. In
our proposed approach, the target vector is the test pixel itself
in the feature space. It is not the similarity measure between
the test sample and training samples and may not have an
explicit expression. The dictionary also consists of the training
samples in the feature space and can not assume an explicit
expression either. The recovered sparse representation vector
can be viewed as a discriminative feature extracted from the
test pixel and is used directly for classification.
The contextual correlation between pixels within a small
spatial neighborhood can be incorporated into the kernel sparse
representation through the joint sparsity model [31], where all
neighboring pixels are simultaneously represented by a linear
combination of a few common training samples in the feature
space. Furthermore, the composite kernel approach [7] can
also be used with the proposed kernel sparse representation
model in order to combine spectral and spatial information.
Efficient kernel-based optimization algorithms are discussed in
this paper for the recovery of the kernel sparse representations
for both single-pixel and multi-pixel joint sparsity models.
Notation-wise, vectors and matrices are denoted by lower-
and upper-case bold letters, respectively. For a vector α
α
α R
N
and an index set Λ {1,...,N} with |Λ| = t, α
α
α
Λ
R
t
is the
portion of α
α
α indexed on Λ. For a matrix S
S
S R
N
1
×N
2
, index
sets Λ
1
{1,...,N
1
} with |Λ
1
| = t
1
, and Λ
2
{1,...,N
2
} with
|Λ
2
| = t
2
, S
S
S
Λ
1
,:
R
t
1
×N
2
is a submatrix of S
S
S consisting of the
t
1
rows in S
S
S indexed on Λ
1
, S
S
S
:,Λ
2
R
N
1
×t
2
consists of the t
2
columns in S
S
S indexed on Λ
2
, and S
S
S
Λ
1
,Λ
2
R
t
1
×t
2
is formed by
the rows and columns of S
S
S indexed on Λ
1
and Λ
2
, respectively.
The remainder of this paper is structured as follows. Sec-
tion II briefly introduces the sparsity-based HSI classification
technique. Section III defines the sparsity models in the feature
space, then discusses how to incorporate spatial information,
and describes the kernel sparse recovery algorithms. Experi-
mental results are shown in Section IV, and conclusions are
drawn in Section V.
II. SPARSITY-BASED HSI CLASSIFICATION
This section briefly introduces the sparsity-based algorithm
for HSI classification, and more details can be found in [26]–
[28]. It is assumed that the spectral signatures of pixels

3
belonging to the same class approximately lie in the same low-
dimensional subspace. Thus, an unknown test sample x
x
x R
B
,
where B is the number of spectral bands, can be written as a
sparse linear combination of all of the training pixels as
x
x
x = A
A
Aα
α
α, (1)
where A
A
A =
a
a
a
1
a
a
a
2
··· a
a
a
N
R
B×N
is a structured dictio-
nary whose columns {a
a
a
i
}
i=1,2,...,N
are N training samples (re-
ferred to as atoms) from all classes, and α
α
α R
N
is an unknown
sparse vector. The index set on which α
α
α have nonzero entries is
the support of α
α
α. The number of nonzero entries in α
α
α is called
the sparsity level K of α
α
α and denoted by K = kα
α
αk
0
. Given the
dictionary A
A
A, the sparse coefficient vector α
α
α is obtained by
solving
ˆ
α
α
α = argminkx
x
xA
A
Aα
α
αk
2
subject to kα
α
αk
0
K
0
, (2)
where K
0
is a preset upper bound on the sparsity level.
The problem in (2) is NP-hard, which can be approximately
solved by greedy algorithms, such as Orthogonal Matching
Pursuit (OMP) [39] or Subspace Pursuit (SP) [40]. The class
label of x
x
x is determined by the minimal residual between x
x
x
and its approximation from each class sub-dictionary:
Class(x
x
x) = arg min
m=1,...,M
kx
x
xA
A
A
:,
m
ˆ
α
α
α
m
k
2
, (3)
where
m
{1, 2, ...,N} is the index set associated with
the training samples belonging to the mth class. As pointed
out in [25], the sparse representation-based classifier can be
viewed as a generalization of the nearest neighbor classi-
fier [41].
In HSI, pixels within a small neighborhood usually consist
of similar materials and, thus, their spectral characteristics are
highly correlated. The spatial correlation between neighboring
pixels can be incorporated through a joint sparsity model [27],
[31] by assuming the underlying sparse vectors associated with
these pixels share a common sparsity pattern as follows. Let
{x
x
x
t
}
t=1,...,T
be T pixels in a spatial neighborhood centered at
x
x
x
1
. These pixels can be compactly represented as
X
X
X =
x
x
x
1
x
x
x
2
··· x
x
x
T
=
A
A
Aα
α
α
1
A
A
Aα
α
α
2
··· A
A
Aα
α
α
T
= A
A
A
α
α
α
1
α
α
α
2
··· α
α
α
T
|
{z }
S
S
S
= A
A
AS
S
S. (4)
In the joint sparsity model, the sparse vectors {α
α
α
t
}
t=1,...,T
share the same support Λ and, thus, S
S
S is a sparse matrix
with only |Λ| nonzero rows. The row-sparse matrix S
S
S can be
recovered by solving the following optimization problem
ˆ
S
S
S = argminkX
X
X A
A
AS
S
Sk
F
subject to kS
S
Sk
row,0
K
0
, (5)
where kS
S
Sk
row,0
denotes the number of non-zero rows of S
S
S
and k·k
F
denotes the Frobenius norm. The problem in (5)
can be approximately solved by the simultaneous versions of
OMP (SOMP) [31] or SP (SSP) [28]. The label of the center
pixel x
x
x
1
is then determined by the minimal total residual:
Class(x
x
x
1
) = arg min
m=1,...,M
X
X
X A
A
A
:,
m
ˆ
S
ˆ
S
ˆ
S
m
,:
F
, (6)
where k·k
F
denotes the Frobenius norm.
III. KERNEL SPARSE REPRESENTATION
If the classes in the dataset are not linearly separable,
then the kernel methods can be used to project the data
into a feature space, in which the classes become linearly
separable [1]. The kernel function κ : R
B
× R
B
7→ R is defined
as the inner product
κ(x
x
x
i
,x
x
x
j
) =
φ(x
x
x
i
),φ(x
x
x
j
)
. (7)
Commonly used kernels include the radial Basis Func-
tion (RBF) kernel κ(x
x
x
i
,x
x
x
j
) = exp
γ
x
x
x
i
x
x
x
j
2
with γ > 0
controlling the width of the RBF, and orderd homogeneous
and inhomogeneous polynomial kernels κ(x
x
x
i
,x
x
x
j
) = (x
x
x
i
·x
x
x
j
)
d
and κ(x
x
x
i
,x
x
x
j
) = (x
x
x
i
·x
x
x
j
+ 1)
d
, respectively. In this section, we
describe how the sparsity models in Section II can be extended
to a feature space induced by a kernel function.
A. Pixel-wise Sparsity in Feature Space
Let x
x
x R
B
be the data point of interest and φ(x
x
x) be
its representation in the feature space. The kernel sparse
representation of a sample x
x
x in terms of training atoms a
a
a
i
s
can be formulated as
φ(x
x
x) =
φ(a
a
a
1
) ·· · φ(a
a
a
N
)
|
{z }
A
A
A
φ
α
1
·· · α
N
T
|
{z }
α
α
α
= A
A
A
φ
α
α
α
, (8)
where the columns of A
A
A
φ
are the representations of training
samples in the feature space and α
α
α
is assumed to be a sparse
vector.
Similar to the linear sparse recovery problem in (2), α
α
α
can
be recovered by solving
ˆ
α
α
α
= argmin
φ(x
x
x) A
A
A
φ
α
α
α
2
subject to
α
α
α
0
K
0
. (9)
The problem in (9) can be approximately solved by kernelizing
the OMP and SP algorithms (denoted by KOMP and KSP,
respectively). Note that in the above problem formulation, we
are solving for the sparse vector α
α
α
directly in the feature space
using the implicit feature vectors, but not evaluating the kernel
functions at the training points.
In KOMP and KSP, essentially each dot product operation
in OMP/SP is replaced by the kernel trick in (7). Let K
K
K
A
A
A
R
N×N
be the kernel matrix whose (i, j)th entry is κ(a
a
a
i
,a
a
a
j
), and
k
k
k
A
A
A,x
x
x
R
N
be the vector whose ith entry is κ(a
a
a
i
,x
x
x). Using the
feature representations, the correlation (dot product) between
a pixel φ(x
x
x) and a dictionary atom φ(a
a
a
i
) is then computed by
c
i
= hφ(x
x
x), φ(a
a
a
i
)i = κ(x
x
x,a
a
a
i
) =
k
k
k
A
A
A,x
x
x
i
, (10)
the orthogonal projection coefficient of φ(x
x
x) onto a set of
selected dictionary atoms {φ(a
a
a
n
)}
nΛ
is given as
p
p
p
Λ
=
(K
K
K
A
A
A
)
Λ,Λ
1
k
k
k
A
A
A,x
x
x
Λ
, (11)
and the residual vector between φ(x
x
x) and its approximation us-
ing the selected atoms {φ(a
a
a
n
)}
nΛ
=
A
A
A
φ
:,Λ
is then expressed
as
φ(r
r
r) = φ(x
x
x)
A
A
A
φ
:,Λ
(K
K
K
A
A
A
)
Λ,Λ
1
k
k
k
A
A
A,x
x
x
Λ
. (12)
Note that the feature representation of the residual vector φ(r
r
r)
in (12) cannot be evaluated explicitly. However, the correlation
between φ(r
r
r) and an atom φ(a
a
a
i
) can be computed by
c
i
= hφ(r
r
r) , φ(a
a
a
i
)i =
k
k
k
A
A
A,x
x
x
i
(K
K
K
A
A
A
)
i,Λ
(K
K
K
A
A
A
)
Λ,Λ
1
k
k
k
A
A
A,x
x
x
Λ
.
(13)

4
The KOMP and KSP greedy algorithms, similar to the linear
OMP and SP algorithms, are used to locate the support Λ of the
sparse vector
ˆ
α
α
α
. The KOMP algorithm augments the support
set by only one index, which is given by λ = argmax
i=1,...,N
c
i
with c
i
being defined in (13) and φ(r
r
r) being the residual
vector from the previous iteration, at each iteration until K
0
atoms are selected or the approximation error (i.e., norm of the
residual vector in (12)) is within a preset threshold. The KSP
algorithm maintains a set of K
0
indices with a backtracking
mechanism. At each iteration, the index set is refined by
adding K
0
new candidates, whose associated atoms have the
K
0
highest correlation (13) to the residual vector from the
previous iteration, to the current list and then discarding K
0
insignificant ones from the list of 2K
0
candidates. This process
repeats until certain stopping criterion is met. In both of the
KOMP and KSP algorithms, after the support set Λ of
ˆ
α
α
α
is
determined, the entries of
ˆ
α
α
α
indexed on Λ are computed by
the orthogonal projection of the test pixel onto the selected
dictionary atoms using (11). The KOMP/KSP algorithms can
be viewed as special cases, with T = 1, of the kernelized
SOMP/SSP algorithms (Algorithms 1 and 2) proposed in the
next section, respectively. The details are thus omitted herein.
Once the sparse vector
ˆ
α
α
α
is recovered, the residual between
the test sample and the mth-class reconstruction in the high-
dimensional feature space is then computed by
r
m
(x
x
x) =
φ(x
x
x)
A
A
A
φ
:,
m
ˆ
α
α
α
m
=
D
φ(x
x
x)
A
A
A
φ
:,
m
ˆ
α
α
α
m
,φ(x
x
x)
A
A
A
φ
:,
m
ˆ
α
α
α
m
E
1/2
=
κ(x
x
x,x
x
x) 2
ˆ
α
α
α
T
m
k
k
k
A
A
A,x
x
x
m
+
ˆ
α
α
α
T
m
(K
K
K
A
A
A
)
m
,
m
ˆ
α
α
α
m
1/2
,
(14)
where k
k
k
A
A
A,x
x
x
and K
K
K
A
A
A
are as defined above, and
m
is the index
set associated with the mth class. The class label of x
x
x is
determined as
Class(x
x
x) = arg min
m=1,...,M
r
m
(x
x
x). (15)
B. Joint Sparsity in Feature Space
The joint sparsity model in (4) can also be extended to the
feature space as follows:
X
X
X
φ
=
φ(x
x
x
1
) ··· φ(x
x
x
T
)
=
A
A
A
φ
α
α
α
1
·· · A
A
A
φ
α
α
α
T
= A
A
A
φ
α
α
α
1
·· · α
α
α
T
|
{z }
S
S
S
= A
A
A
φ
S
S
S
,
(16)
where the vectors {α
α
α
t
}
t=1,...,T
share the same support. The
row-sparse matrix S
S
S
is recovered by solving
ˆ
S
S
S
= argmin
X
X
X
φ
A
A
A
φ
S
S
S
F
subject to
S
S
S
row,0
K
0
.
(17)
In this paper, we propose the kernelized SOMP (KSOMP) and
the kernelized SSP (KSSP) algorithms in order to approxi-
mately solve the above joint sparse recovery problem in (17).
In KSOMP, at every iteration, the atom that simultaneously
yields the best approximation to all the T pixels (or residuals
after initialization) is selected. Specifically, let C
C
C R
N×T
be
the correlation matrix whose (i, j)th entry is the correlation
between φ(a
a
a
i
) and φ(r
r
r
j
), where φ(r
r
r
j
) is the residual vector of
φ(x
x
x
j
). The new atom is then selected as the one associated with
the row of C
C
C, which has the maximal
p
-norm for some p 1.
The KSOMP algorithm is summarized in Algorithm 1. Note
that when computing the projection in (11) and correlation
in (13), a regularization term λI
I
I is added in order to have a
stable inversion, where λ is typically a small scalar (e.g. in the
order of 10
5
) and I
I
I is an identity matrix whose dimensionality
should be clear from the context.
Input: B × N dictionary A
A
A =
a
a
a
1
·· · a
a
a
N
, B × T data ma-
trix X
X
X =
x
x
x
1
·· · x
x
x
T
, kernel function κ, and a stopping
criterion
Initialization: compute the kernel matrices K
K
K
A
A
A
in Algo-
rithm 1 (Initialization) and K
K
K
A
A
A,X
X
X
R
N×T
whose (i, j)th entry
is κ(a
a
a
i
,x
x
x
j
). Set index set Λ
0
= arg max
i=1,...,N
K
K
K
A
A
A,X
X
X
i,:
p
with
some p 1 and iteration counter t = 1.
while stopping criterion has not been met do
(1) Compute the correlation matrix
C
C
C = K
K
K
A
A
A,X
X
X
(K
K
K
A
A
A
)
:,Λ
t1
(K
K
K
A
A
A
)
Λ
t1
,Λ
t1
+ λI
I
I
1
K
K
K
A
A
A,X
X
X
Λ
t1
,:
R
N×T
(2) Select the new index as λ
t
= arg max
i=1,...,N
kC
C
C
i,:
k
p
, p 1
(3) Update the index set Λ
t
= Λ
t1
S
{λ
t
}
(4) t t + 1
end while
Output: Index set Λ = Λ
t1
, the sparse representation
ˆ
S
S
S
whose nonzero rows indexed by Λ are
ˆ
S
S
S
Λ,:
=
(K
K
K
Λ,Λ
+ λI
I
I)
1
K
K
K
A
A
A,X
X
X
Λ,:
Algorithm 1: Kernelized Simultaneous Orthogonal Matching
Pursuit (KSOMP)
Similarly, KSSP is a simultaneous version of KSP where
the K
0
atoms that best simultaneously approximate all of the
T residuals in terms of the
p
-norm are chosen. The KSSP
algorithm is summarized in Algorithm 2. Note that the step
for computing the residual vectors (12) is incorporated into
the computation of the correlation vector in Step (1) of both
KSOMP and KSSP.
Once the matrix
ˆ
S
S
S
is recovered, the total residual between
the T neighboring pixels and their approximations from the
mth-class training samples is computed by
r
m
(x
x
x
1
) =
T
i=1
κ(x
x
x
i
,x
x
x
i
) 2
ˆ
S
S
S
T
m
,i
K
K
K
A
A
A,X
X
X
m
,i
+
ˆ
S
S
S
T
m
,i
(K
K
K
A
A
A
)
m
,
m
ˆ
S
S
S
m
,i
!
1/2
,
(18)
where K
K
K
A
A
A,X
X
X
and K
K
K
A
A
A
are as defined in Algorithms 1 and 2, and
m
{1,2,... , N} is the index set associated with the mth
class. The label for the center pixel x
x
x
1
is then determined by
the total residual
Class(x
x
x
1
) = arg min
m=1,...,M
r
m
(x
x
x
1
). (19)
C. Kernel Sparse Representation with a Composite Kernel
Another way to address the contextual correlation within
HSI is though a composite kernel [7], which takes into
account the spatial correlation between neighboring pixels

5
Input: B × N dictionary A
A
A =
a
a
a
1
·· · a
a
a
N
, B × T data ma-
trix X
X
X =
x
x
x
1
·· · x
x
x
T
, kernel function κ, and a stopping
criterion
Initialization: compute the kernel matrices K
K
K
A
A
A
in Algo-
rithm 1 (Initialization) and K
K
K
A
A
A,X
X
X
R
N×T
whose (i, j)th entry
is κ(a
a
a
i
,x
x
x
j
). Set index set Λ
0
=
n
K
0
indices corresponding to
the K
0
largest numbers in
K
K
K
A
A
A,X
X
X
i,:
p
, p 1, i = 1,.. . , N
o
,
and set iteration counter t = 1.
while stopping criterion has not been met do
(1) Compute the correlation matrix
C
C
C = K
K
K
A
A
A,X
X
X
(K
K
K
A
A
A
)
:,Λ
t1
(K
K
K
A
A
A
)
Λ
t1
,Λ
t1
+ λI
I
I
1
K
K
K
A
A
A,X
X
X
Λ
t1
,:
R
N×T
(2) Find the index set I =
n
K
0
indices corresponding to
the K
0
largest numbers in kC
C
C
i,:
k
p
, p 1, i = 1,... ,N
o
(3) Update the candidate index set
˜
Λ
t
= Λ
t1
S
I
(4) Compute the projection coefficients
P
P
P =
(K
K
K
A
A
A
)
˜
Λ
t
,
˜
Λ
t
+ λI
I
I
1
K
K
K
A
A
A,X
X
X
˜
Λ
t
,:
R
2K
0
×T
(5) Update the index set Λ
t
=
n
K
0
indices in
˜
Λ
t
cor-
responding to the K
0
largest numbers in kP
P
P
i,:
k
p
, p 1,
i = 1,..., N
o
(6) t t + 1
end while
Output: Index set Λ = Λ
t1
, the sparse representation
ˆ
S
S
S
whose nonzero rows indexed by Λ are
ˆ
S
S
S
Λ,:
=
(K
K
K
Λ,Λ
+ λI
I
I)
1
K
K
K
A
A
A,X
X
X
Λ,:
Algorithm 2: Kernelized Simultaneous Subspace Pur-
suit (KSSP)
by combining kernels dedicated to the spectral and spatial
information. The composite kernel approach has been shown
to significantly outperform the spectral-only classifier in HSI
classification [42]. This method, although originally proposed
for SVM, can be readily incorporated into other classifiers
which operate in the feature space, such as kernel logistic
regression and the kernel sparse representation-based classifier
proposed in this paper. Specifically, let x
x
x
w
i
be the spectral
pixel at location i in a hyperspectral image and x
x
x
s
i
be the
spatial information extracted from a small neighborhood cen-
tered at location i, which is usually the mean and/or the
standard deviation of the pixels within the neighborhood.
The new pixel entity at this location can be redefined as
x
x
x
i
= {x
x
x
i
,x
x
x
s
i
}. Note that in previous sections x
x
x
i
contains only
spectral information (i.e., x
x
x
i
= x
x
x
w
i
). The spectral and spatial
information can then be combined in a variety of ways,
including stacking, direct summation, weighted summation,
and cross-information kernels [7]. In this paper, we consider
the weighted summation kernel, which is shown to yield the
best classification performance compared to other types of
composite kernels [7]. The kernel function in this case is
κ(x
x
x
i
,x
x
x
j
) = µκ
s
x
x
x
s
i
,x
x
x
s
j
+ (1 µ)κ
w
x
x
x
w
i
,x
x
x
w
j
, (20)
where µ (0,1), and κ
s
and κ
w
are the kernel functions of the
spatial and spectral features, respectively.
The composite kernels can be directly applied to the pixel-
wise sparsity model in the feature space in (8). The sparse
representation vector can be recovered using the KOMP or
KSP algorithm, where the kernel matrix K
A
K
A
K
A
is now a weighted
summation of the spectral and spatial kernel matrices of the
training dictionary A
A
A, and the vector k
k
k
A
A
A,x
x
x
also needs to be
modified accordingly.
It is worth noting that the composite kernel approach is
different from the kernel joint sparsity model discussed in
Section III-B. The joint sparsity model involves only the
spatial information of the test pixels, and no prior knowledge
about the neighbors of the training pixels is needed. On the
other hand, for the composite kernels, the spatial information
for both training and test sets are necessary. Moreover, the joint
sparsity model does not assume a sum or average of the same
samples, but treats all pixels in a small neighborhood equally
and finds the sparsity pattern that simultaneously represents
these pixels.
IV. EXPERIMENTAL RESULTS
In this section, we show the effectiveness of the proposed
algorithms on classification of several hyperspectral datasets.
For each image, we solve the sparse recovery problems
in (2), (5), (9), and (17) for each test sample, and then deter-
mine the class by the minimal residual (the results are denoted
by OMP/SP, KOMP/KSP, SOMP/SSP, and KSOMP/KSSP,
respectively). The results of KOMP and KSP with composite
kernels, as discussed in Section III-C, are denoted by KOM-
PCK and KSPCK, respectively. The classification results are
then compared visually and quantitatively to those obtained
by the classical SVM classifier and sparse multinomial kernel
logistic regression (KLR). For SVM and KLR classifiers,
we use a spectral-only kernel (denoted by SVM/KLR), as
well as a composite kernel (denoted by SVMCK/KLRCK).
In all classifiers with a composite kernel, we use a weighted
summation kernel and the spatial information is the mean
of pixels in a small neighborhood. The parameters for KLR,
KLRCK, SVM, and SVMCK are obtained by cross-validation.
The first hyperspectral image in our experiments is the
Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) im-
age Indian Pines [43]. The AVIRIS sensor generates 220
bands across the spectral range from 0.2 to 2.4 µm. In the
experiments, the number of bands is reduced to 200 by
removing 20 water absorption bands. This image has spatial
resolution of 20 m per pixel and spatial dimension 145×145. It
contains 16 ground-truth classes. For each class, we randomly
choose around 10% of the labeled samples for training and use
the remaining 90% for testing, as shown in Table I and Fig. 1.
Radial Basis Function (RBF) kernels are used in all kernel-
based classifiers (i.e., SVM, SVMCK, KLR, KLRCK, KOMP,
KSP, KSOMP, KSSP, KOMPCK, and KSPCK). Since this
image consists of large homogenous regions, a large spatial
window of size 9× 9 (T = 81) is used in classifiers with a
composite kernel and the joint sparsity models (4) and (16).
The classification performance for each of the 16 classese,
overall accuracy (OA), average accuracy (AA), and the κ

Citations
More filters
Journal ArticleDOI

A survey on object detection in optical remote sensing images

TL;DR: This survey focuses on more generic object categories including, but not limited to, road, building, tree, vehicle, ship, airport, urban-area, and proposes two promising research directions, namely deep learning- based feature representation and weakly supervised learning-based geospatial object detection.
Journal ArticleDOI

Spectral–Spatial Hyperspectral Image Classification With Edge-Preserving Filtering

TL;DR: Experimental results demonstrate that the proposed edge-preserving filtering based classification method can improve the classification accuracy significantly in a very short time and can be easily applied in real applications.
Journal ArticleDOI

Collaborative Representation for Hyperspectral Anomaly Detection

TL;DR: Experimental results indicate that the proposed detector may outperform the traditional detection methods such as the classic Reed-Xiaoli (RX) algorithm, the kernel RX algorithm, and the state-of-the-art robust principal component analysis based and sparse-representation-based anomaly detectors, with low computational cost.
Journal ArticleDOI

Recent Advances on Spectral–Spatial Hyperspectral Image Classification: An Overview and New Guidelines

TL;DR: A concept of spatial dependency system that involves pixel dependency and label dependency, with two main factors: neighborhood covering and neighborhood importance is developed, and several representative spectral–spatial classification methods are applied on real-world hyperspectral data.
Journal ArticleDOI

Anomaly Detection in Hyperspectral Images Based on Low-Rank and Sparse Representation

TL;DR: A novel method for anomaly detection in hyperspectral images (HSIs) is proposed based on low-rank and sparse representation based on the separation of the background and the anomalies in the observed data.
References
More filters
Book

The Nature of Statistical Learning Theory

TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Proceedings ArticleDOI

A training algorithm for optimal margin classifiers

TL;DR: A training algorithm that maximizes the margin between the training patterns and the decision boundary is presented, applicable to a wide variety of the classification functions, including Perceptrons, polynomials, and Radial Basis Functions.
Journal ArticleDOI

Robust Face Recognition via Sparse Representation

TL;DR: This work considers the problem of automatically recognizing human faces from frontal views with varying expression and illumination, as well as occlusion and disguise, and proposes a general classification algorithm for (image-based) object recognition based on a sparse representation computed by C1-minimization.
Journal ArticleDOI

Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit

TL;DR: It is demonstrated theoretically and empirically that a greedy algorithm called orthogonal matching pursuit (OMP) can reliably recover a signal with m nonzero entries in dimension d given O(m ln d) random linear measurements of that signal.
BookDOI

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

TL;DR: Learning with Kernels provides an introduction to SVMs and related kernel methods that provide all of the concepts necessary to enable a reader equipped with some basic mathematical knowledge to enter the world of machine learning using theoretically well-founded yet easy-to-use kernel algorithms.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What are the contributions mentioned in the paper "Hyperspectral image classification via kernel sparse representation" ?

In this paper, a novel nonlinear technique for hyperspectral image classification is proposed. Their approach relies on sparsely representing a test sample in terms of all of the training samples in a feature space induced by a kernel function. Kernel greedy optimization algorithms are suggested in this paper to solve the kernel versions of the single-pixel and multi-pixel joint sparsity-based recovery problems. 

The kernel function in this case isκ(xi,x j) = µκs ( xsi ,x s j ) +(1− µ)κw ( xwi ,x w j ) , (20)where µ∈ (0,1), and κs and κw are the kernel functions of the spatial and spectral features, respectively. 

The sparse representation vector can be recovered using the KOMP or KSP algorithm, where the kernel matrix KA is now a weighted summation of the spectral and spatial kernel matrices of the training dictionary A, and the vector kA,x also needs to be modified accordingly. 

The first hyperspectral image in their experiments is the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) image Indian Pines [43]. 

new training samples can be easily added to the dictionary without re-training the model, unlike the other classifiers (e.g., SVM and KLR) that need to re-train the model for the new training data. 

the kernel joint sparsity model is more stable than the pixel-wise model, as for a large range of sparsity level K0 and sufficiently large γ, the overall accuracy is always around 96% with a small variance. 

Another possible direction is the design/learning of a better dictionary such that the dictionary provides more accurate reconstruction, more discriminative power, and/or better adaptivity to the test data. 

the most intensive part in the sparse recovery is the inversion of a matrix of at most size K0 ×K0 for the OMP-based algorithms and (2K0)× (2K0) for the SP-based algorithms. 

Experimental results on AVIRIS and ROSIS hyperspectral images show that the kernelization of the sparsity-based algorithms improve the classification performance compared to1213TABLE VI CLASSIFICATION ACCURACY (%) FOR THE CENTER OF PAVIA IMAGE USING 5536 TRAINING SAMPLES (AROUND 5% OF ALL LABELED SAMPLES AS SHOWN IN FIG. 

For each class, the authors randomly choose around 10% of the labeled samples for training and use the remaining 90% for testing, as shown in Table The authorand Fig. 

In this section, the authors describe how the sparsity models in Section II can be extended to a feature space induced by a kernel function. 

The label of the center pixel x1 is then determined by the minimal total residual:Class(x1) = arg min m=1,...,M∥ ∥ ∥X −A:,ΩmŜΩm,: ∥ ∥ ∥F , (6)where ‖·‖F denotes the Frobenius norm.