what is the kernel function for the center pixel x1?

The kernel function in this case isκ(xi,x j) = µκs ( xsi ,x s j ) +(1− µ)κw ( xwi ,x w j ) , (20)where µ∈ (0,1), and κs and κw are the kernel functions of the spatial and spectral features, respectively.

What is the advantage of the proposed dictionary-based classifier?

new training samples can be easily added to the dictionary without re-training the model, unlike the other classifiers (e.g., SVM and KLR) that need to re-train the model for the new training data.

How is the kernel joint sparsity model stable?

the kernel joint sparsity model is more stable than the pixel-wise model, as for a large range of sparsity level K0 and sufficiently large γ, the overall accuracy is always around 96% with a small variance.

What is the way to improve the classification accuracy of a dictionary?

Another possible direction is the design/learning of a better dictionary such that the dictionary provides more accurate reconstruction, more discriminative power, and/or better adaptivity to the test data.

What is the intensive part in the sparse recovery?

the most intensive part in the sparse recovery is the inversion of a matrix of at most size K0 ×K0 for the OMP-based algorithms and (2K0)× (2K0) for the SP-based algorithms.

How many HSI algorithms improve the classification performance?

Experimental results on AVIRIS and ROSIS hyperspectral images show that the kernelization of the sparsity-based algorithms improve the classification performance compared to1213TABLE VI CLASSIFICATION ACCURACY (%) FOR THE CENTER OF PAVIA IMAGE USING 5536 TRAINING SAMPLES (AROUND 5% OF ALL LABELED SAMPLES AS SHOWN IN FIG.

What is the minimum residual of the center pixel x1?

The label of the center pixel x1 is then determined by the minimal total residual:Class(x1) = arg min m=1,...,M∥ ∥ ∥X −A:,ΩmŜΩm,: ∥ ∥ ∥F , (6)where ‖·‖F denotes the Frobenius norm.

(Open Access) Hyperspectral image classification via kernel sparse representation (2011) | Yi Chen

Q: What are the contributions mentioned in the paper "Hyperspectral image classification via kernel sparse representation" ?

In this paper, a novel nonlinear technique for hyperspectral image classification is proposed. Their approach relies on sparsely representing a test sample in terms of all of the training samples in a feature space induced by a kernel function. Kernel greedy optimization algorithms are suggested in this paper to solve the kernel versions of the single-pixel and multi-pixel joint sparsity-based recovery problems.

Q: What is the first hyperspectral image in their experiments?

The first hyperspectral image in their experiments is the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) image Indian Pines [43].

Q: How many classes are used for training?

For each class, the authors randomly choose around 10% of the labeled samples for training and use the remaining 90% for testing, as shown in Table The authorand Fig.

Hyperspectral Image Classiﬁcation via Kernel

Sparse Representation

Yi Chen

, Nasser M. Nasrabadi

, Fellow, IEEE, and Trac D. Tran

, Senior Member, IEEE

Department of Electrical and Computer Engineering, The Johns Hopkins University

3400 N. Charles Street, Baltimore, MD 21218

US Army Research Laboratory

2800 Powder Mill Road, Adelphi, MD 20783

Abstract—In this paper, a novel nonlinear technique for hy-

perspectral image classiﬁcation is proposed. Our approach relies

on sparsely representing a test sample in terms of all of the

training samples in a feature space induced by a kernel function.

For each test pixel in the feature space, a sparse representation

vector is obtained by decomposing the test pixel over a training

dictionary, also in the same feature space, by using a kernel-based

greedy pursuit algorithm. The recovered sparse representation

vector is then used directly to determine the class label of the

test pixel. Projecting the samples into a high-dimensional feature

space and kernelizing the sparse representation improves the

data separability between different classes, providing a higher

classiﬁcation accuracy compared to the more conventional linear

sparsity-based classiﬁcation algorithms. Moreover, the spatial

coherency across neighboring pixels is also incorporated through

a kernelized joint sparsity model, where all of the pixels within

a small neighborhood are jointly represented in the feature

space by selecting a few common training samples. Kernel

greedy optimization algorithms are suggested in this paper to

solve the kernel versions of the single-pixel and multi-pixel

joint sparsity-based recovery problems. Experimental results on

several hyperspectral images show that the proposed technique

outperforms the linear sparsity-based classiﬁcation technique, as

well as the classical Support Vector Machines and sparse kernel

logistic regression classiﬁers.

I. INTRODUCTION

Hyperspectral imaging sensors capture images in hundreds

of continuous narrow spectral bands, spanning the visible to

infrared spectrum. Each pixel in a hyperspectral image (HSI)

is represented by a vector whose entries correspond to various

spectral-band responses. Different materials usually reﬂect

electromagnetic energy differently at speciﬁc wavelengths.

This enables discrimination of materials based on their spectral

characteristics. One of the most important applications of HSI

is image classiﬁcation, where pixels are labeled to one of the

classes based on their spectral characteristics, given a small

set of training data for each class. Various techniques have

been developed for HSI classiﬁcation. Among the previous

approaches, the support vector machine (SVM) [1], [2] has

proven to be a powerful tool to solve many supervised clas-

siﬁcation problems and has shown good performances in hy-

This work has been partially supported by the National Science Foundation

(NSF) under Grants CCF-1117545 and CCF-0728893; the Army Research

Ofﬁce (ARO) under Grant 58110-MA-II and Grant 60219-MA; and the Ofﬁce

of Naval Research (ONR) under Grant N102-183-0208.

perspectral classiﬁcation, as well [3]–[5]. Variations of SVM-

based algorithms have also been proposed to improve the

classiﬁcation accuracy. These variations include transductive

SVM, which exploits both labeled and unlabeled samples [6],

and SVM with composite kernels, which incorporates spatial

information directly in the SVM kernels [7]. Multinomial

logistic regression [8] is another widely used classiﬁer, which

uses the logistic function to provide the posterior probability.

A fast algorithm for sparse multinomial logistic regression

has been developed in [9] and successfully adopted for HSI

segmentation in [10], [11]. Some of the other recent HSI

classiﬁcation techniques can be found in [12]–[17]. In these re-

cent methods, a feature extraction strategy is proposed in [12]

for classiﬁcation which generalizes the linear discriminative

analysis and nonparametric discriminative analysis. In [13], the

derivative information of the spectral signatures is exploited as

features and then decisions obtained from spectral reﬂectance

and derivative information are fused for the ﬁnal decisions.

In [14], each image band is decomposed into intrinsic mode

functions (IMFs) which are adaptive to local properties via

empirical mode decomposition and then SVM is applied to

the lower-order IMFs for classiﬁcation. In [15], the k-nearest-

neighbor classiﬁer is applied to the local manifolds to exploit

the intrinsic nonlinear structure of hyperspectral images. A

semi-supervised classiﬁcation algorithm is proposed in [16] in

order to use a kernel machine which is iteratively updated

by manifold regularization. In [17] the results from multi-

ple classiﬁcation/segmentation techniques are fused by post-

processing to generate the ﬁnal spectral-spatial classiﬁcation

map. Most of the above-mentioned HSI image classiﬁcation

techniques do not directly incorporate the spatial or the con-

textual information into the classiﬁer.

Recently, sparse representation [18], [19] has also been pro-

posed to solve many computer vision tasks [20]–[25], where

the usage of sparsity as a prior often leads to state-of-the-

art performance. Sparse representation has also been applied

to HSI target detection and classiﬁcation [26]–[28], relying on

the observation that hyperspectral pixels belonging to the same

class approximately lie in the same low-dimensional subspace.

Thus, an unknown test pixel can be sparsely represented by

a few training samples (atoms) from a given dictionary and

the corresponding sparse representation vector will implicitly

encode the class information. The sparse representation-based

classiﬁer is different from the conventional sparse classiﬁer

SVM in the following aspects. SVM is a discriminative model,

while the sparse representation method can be viewed as a

generative model, where the signal (pixel) is expressed as a

linear combination of atoms [19]. SVM is a binary classiﬁer

that ﬁnds the separating hyperplane between two classes

(multi-class SVM requires a one-against-one or one-against-

all strategy). The sparse representation-based classiﬁer is from

a reconstruction point of view. The sparse decomposition of

the test pixel over the entire dictionary implicitly leads to

a competition between the subspaces (classes) and thus the

recovered sparse representation is discriminative. Moreover, in

SVM, there is an explicit training stage. The SVM classiﬁer

is trained only once and then this classiﬁer with its ﬁxed

sparse support vectors is used to classify all of the test data.

On the other hand, in our proposed approach, a new sparse

representation vector is extracted for each test pixel and is thus

adaptive, representing the sparsely selected atoms which are

adapted to reconstruct the current test pixel.

Hyperspectral images are usually smooth in the sense the

pixels in a small neighborhood represent the same material and

have similar spectral characteristics. Various techniques have

been proposed recently to exploit the contextual correlation

within HSI which have notably improved the classiﬁcation

and segmentation performance. Post-processing procedures are

used in [29], [30] on the individually-labeled samples based on

certain decision rules to impose the smoothness. Markov ran-

dom ﬁelds exploit the statistical dependency among neighbor-

ing pixels and are usually applied in Bayesian approaches [11].

The composite kernel approach [7] is another way to in-

corporate the spatial information, which explicitly extracts

spatial information for each spectral pixel and then combines

the spectral and spatial information via kernel composition.

Joint sparsity model [31] is exploited in sparsity-based HSI

target detection and classiﬁcation [27], [28], where the neigh-

boring pixels are simultaneously represented by a sparse

linear combination of a few common training samples. Each

pixel, although sharing the same common support, might have

weighting coefﬁcients taking on different values. In this way,

the smoothness across neighboring spectral pixels is enforced

directly in the classiﬁcation stage, and no post-processing

steps are performed. The details of composite kernels and the

joint sparsity model will be further discussed in the following

sections.

It is well known that for the classical HSI image clas-

siﬁcation and target detection algorithms, the use of kernel

methods yields a signiﬁcant performance improvement [5],

[32], because the kernel-based algorithms implicitly exploit

the higher-order structure of the given data which may not be

captured by the linear models. Therefore, if the data set is not

linearly separable, kernel methods [33]–[36] can be applied

to project the data into a nonlinear feature space in which

the data becomes more separable. In practical implementation,

the kernel trick [37] is often used in order to avoid explicitly

evaluating the data in the feature space.

In this paper, we propose a new HSI classiﬁcation algorithm

based on kernel sparse representation by assuming that a test

pixel can be linearly represented by a few training samples in

the feature space. The kernel sparse representation vector is

then obtained by decomposing the test pixel represented in a

high dimensional feature space over a structured dictionary

consisting of training samples from all of the classes in

the same feature space. The recovered sparse vector is used

directly for classiﬁcation. Although the proposed approach has

a similar formulation as previous kernel regression approaches

with a sparse prior such as kernel matching pursuit [33],

kernel basis pursuit [34], and generalized LASSO [38], the

underlying ideas are quite different. The objective of these

previous approaches is to approximate a function as a linear

combination of dictionary functions, which are the kernels

centered at the training points, by minimizing certain loss

function evaluated at these training points and subject to a

sparsity prior. Therefore, the target vector for ﬁtting consists

of the observations of the function value at the training points,

and the dictionary is then the dictionary functions evaluated at

the training points which turns out to be the kernel matrix. In

our proposed approach, the target vector is the test pixel itself

in the feature space. It is not the similarity measure between

the test sample and training samples and may not have an

explicit expression. The dictionary also consists of the training

samples in the feature space and can not assume an explicit

expression either. The recovered sparse representation vector

can be viewed as a discriminative feature extracted from the

test pixel and is used directly for classiﬁcation.

The contextual correlation between pixels within a small

spatial neighborhood can be incorporated into the kernel sparse

representation through the joint sparsity model [31], where all

neighboring pixels are simultaneously represented by a linear

combination of a few common training samples in the feature

space. Furthermore, the composite kernel approach [7] can

also be used with the proposed kernel sparse representation

model in order to combine spectral and spatial information.

Efﬁcient kernel-based optimization algorithms are discussed in

this paper for the recovery of the kernel sparse representations

for both single-pixel and multi-pixel joint sparsity models.

Notation-wise, vectors and matrices are denoted by lower-

and upper-case bold letters, respectively. For a vector α

α ∈ R

and an index set Λ ⊆ {1,...,N} with |Λ| = t, α

∈ R

is the

portion of α

α indexed on Λ. For a matrix S

S ∈ R

×N

, index

sets Λ

⊆ {1,...,N

} with |Λ

| = t

, and Λ

⊆ {1,...,N

} with

|Λ

| = t

, S

∈ R

×N

is a submatrix of S

S consisting of the

rows in S

S indexed on Λ

, S

:,Λ

∈ R

×t

consists of the t

columns in S

S indexed on Λ

, and S

,Λ

∈ R

×t

is formed by

the rows and columns of S

S indexed on Λ

and Λ

, respectively.

The remainder of this paper is structured as follows. Sec-

tion II brieﬂy introduces the sparsity-based HSI classiﬁcation

technique. Section III deﬁnes the sparsity models in the feature

space, then discusses how to incorporate spatial information,

and describes the kernel sparse recovery algorithms. Experi-

mental results are shown in Section IV, and conclusions are

drawn in Section V.

II. SPARSITY-BASED HSI CLASSIFICATION

This section brieﬂy introduces the sparsity-based algorithm

for HSI classiﬁcation, and more details can be found in [26]–

[28]. It is assumed that the spectral signatures of pixels

belonging to the same class approximately lie in the same low-

dimensional subspace. Thus, an unknown test sample x

x ∈ R

where B is the number of spectral bands, can be written as a

sparse linear combination of all of the training pixels as

x = A

Aα

α, (1)

where A

A =



··· a



∈ R

B×N

is a structured dictio-

nary whose columns {a

}

i=1,2,...,N

are N training samples (re-

ferred to as atoms) from all classes, and α

α ∈ R

is an unknown

sparse vector. The index set on which α

α have nonzero entries is

the support of α

α. The number of nonzero entries in α

α is called

the sparsity level K of α

α and denoted by K = kα

αk

. Given the

dictionary A

A, the sparse coefﬁcient vector α

α is obtained by

solving

α = argminkx

x−A

Aα

αk

subject to kα

αk

≤ K

, (2)

where K

is a preset upper bound on the sparsity level.

The problem in (2) is NP-hard, which can be approximately

solved by greedy algorithms, such as Orthogonal Matching

Pursuit (OMP) [39] or Subspace Pursuit (SP) [40]. The class

label of x

x is determined by the minimal residual between x

and its approximation from each class sub-dictionary:

Class(x

x) = arg min

m=1,...,M

x−A

:,Ω

Ω

, (3)

where Ω

⊂ {1, 2, ...,N} is the index set associated with

the training samples belonging to the mth class. As pointed

out in [25], the sparse representation-based classiﬁer can be

viewed as a generalization of the nearest neighbor classi-

ﬁer [41].

In HSI, pixels within a small neighborhood usually consist

of similar materials and, thus, their spectral characteristics are

highly correlated. The spatial correlation between neighboring

pixels can be incorporated through a joint sparsity model [27],

[31] by assuming the underlying sparse vectors associated with

these pixels share a common sparsity pattern as follows. Let

}

t=1,...,T

be T pixels in a spatial neighborhood centered at

. These pixels can be compactly represented as

X =



··· x





Aα

··· A

Aα



= A



··· α



{z }

= A

S. (4)

In the joint sparsity model, the sparse vectors {α

}

t=1,...,T

share the same support Λ and, thus, S

S is a sparse matrix

with only |Λ| nonzero rows. The row-sparse matrix S

S can be

recovered by solving the following optimization problem

S = argminkX

X −A

subject to kS

row,0

≤ K

, (5)

where kS

row,0

denotes the number of non-zero rows of S

and k·k

denotes the Frobenius norm. The problem in (5)

can be approximately solved by the simultaneous versions of

OMP (SOMP) [31] or SP (SSP) [28]. The label of the center

pixel x

is then determined by the minimal total residual:

Class(x

) = arg min

m=1,...,M



X −A

:,Ω

Ω



, (6)

where k·k

denotes the Frobenius norm.

III. KERNEL SPARSE REPRESENTATION

If the classes in the dataset are not linearly separable,

then the kernel methods can be used to project the data

into a feature space, in which the classes become linearly

separable [1]. The kernel function κ : R

× R

7→ R is deﬁned

as the inner product

κ(x

) =



φ(x

),φ(x

)



. (7)

Commonly used kernels include the radial Basis Func-

tion (RBF) kernel κ(x

) = exp



−γ



−x





with γ > 0

controlling the width of the RBF, and order−d homogeneous

and inhomogeneous polynomial kernels κ(x

) = (x

·x

)

and κ(x

) = (x

·x

+ 1)

, respectively. In this section, we

describe how the sparsity models in Section II can be extended

to a feature space induced by a kernel function.

A. Pixel-wise Sparsity in Feature Space

Let x

x ∈ R

be the data point of interest and φ(x

x) be

its representation in the feature space. The kernel sparse

representation of a sample x

x in terms of training atoms a

’s

can be formulated as

φ(x

x) =



φ(a

) ·· · φ(a

)



{z }



′

·· · α

′



{z }

′

= A

′

, (8)

where the columns of A

are the representations of training

samples in the feature space and α

′

is assumed to be a sparse

vector.

Similar to the linear sparse recovery problem in (2), α

′

can

be recovered by solving

′

= argmin



φ(x

x) −A

′



subject to



′



≤ K

. (9)

The problem in (9) can be approximately solved by kernelizing

the OMP and SP algorithms (denoted by KOMP and KSP,

respectively). Note that in the above problem formulation, we

are solving for the sparse vector α

′

directly in the feature space

using the implicit feature vectors, but not evaluating the kernel

functions at the training points.

In KOMP and KSP, essentially each dot product operation

in OMP/SP is replaced by the kernel trick in (7). Let K

∈

N×N

be the kernel matrix whose (i, j)th entry is κ(a

), and

A,x

∈ R

be the vector whose ith entry is κ(a

x). Using the

feature representations, the correlation (dot product) between

a pixel φ(x

x) and a dictionary atom φ(a

) is then computed by

= hφ(x

x), φ(a

)i = κ(x

x,a

) =



A,x



, (10)

the orthogonal projection coefﬁcient of φ(x

x) onto a set of

selected dictionary atoms {φ(a

)}

n∈Λ

is given as



)

Λ,Λ



−1



A,x



, (11)

and the residual vector between φ(x

x) and its approximation us-

ing the selected atoms {φ(a

)}

n∈Λ





:,Λ

is then expressed

φ(r

r) = φ(x

x) −





:,Λ



)

Λ,Λ



−1



A,x



. (12)

Note that the feature representation of the residual vector φ(r

in (12) cannot be evaluated explicitly. However, the correlation

between φ(r

r) and an atom φ(a

) can be computed by

= hφ(r

r) , φ(a

)i =



A,x



− (K

)

i,Λ



)

Λ,Λ



−1



A,x



(13)

The KOMP and KSP greedy algorithms, similar to the linear

OMP and SP algorithms, are used to locate the support Λ of the

sparse vector

′

. The KOMP algorithm augments the support

set by only one index, which is given by λ = argmax

i=1,...,N

with c

being deﬁned in (13) and φ(r

r) being the residual

vector from the previous iteration, at each iteration until K

atoms are selected or the approximation error (i.e., norm of the

residual vector in (12)) is within a preset threshold. The KSP

algorithm maintains a set of K

indices with a backtracking

mechanism. At each iteration, the index set is reﬁned by

adding K

new candidates, whose associated atoms have the

highest correlation (13) to the residual vector from the

previous iteration, to the current list and then discarding K

insigniﬁcant ones from the list of 2K

candidates. This process

repeats until certain stopping criterion is met. In both of the

KOMP and KSP algorithms, after the support set Λ of

′

determined, the entries of

′

indexed on Λ are computed by

the orthogonal projection of the test pixel onto the selected

dictionary atoms using (11). The KOMP/KSP algorithms can

be viewed as special cases, with T = 1, of the kernelized

SOMP/SSP algorithms (Algorithms 1 and 2) proposed in the

next section, respectively. The details are thus omitted herein.

Once the sparse vector

′

is recovered, the residual between

the test sample and the mth-class reconstruction in the high-

dimensional feature space is then computed by

x) =



φ(x

x) −





:,Ω

′

Ω



φ(x

x) −





:,Ω

′

Ω

,φ(x

x) −





:,Ω

′

Ω

1/2



κ(x

x,x

x) − 2

′T

Ω



A,x



Ω

′T

Ω

)

Ω

,Ω

′

Ω



1/2

(14)

where k

A,x

and K

are as deﬁned above, and Ω

is the index

set associated with the mth class. The class label of x

x is

determined as

Class(x

x) = arg min

m=1,...,M

x). (15)

B. Joint Sparsity in Feature Space

The joint sparsity model in (4) can also be extended to the

feature space as follows:



φ(x

) ··· φ(x

)





′

·· · A

′



= A



′

·· · α

′



{z }

′

= A

′

(16)

where the vectors {α

′

}

t=1,...,T

share the same support. The

row-sparse matrix S

′

is recovered by solving

′

= argmin



−A

′



subject to



′



row,0

≤ K

(17)

In this paper, we propose the kernelized SOMP (KSOMP) and

the kernelized SSP (KSSP) algorithms in order to approxi-

mately solve the above joint sparse recovery problem in (17).

In KSOMP, at every iteration, the atom that simultaneously

yields the best approximation to all the T pixels (or residuals

after initialization) is selected. Speciﬁcally, let C

C ∈ R

N×T

the correlation matrix whose (i, j)th entry is the correlation

between φ(a

) and φ(r

), where φ(r

) is the residual vector of

φ(x

). The new atom is then selected as the one associated with

the row of C

C, which has the maximal ℓ

-norm for some p ≥ 1.

The KSOMP algorithm is summarized in Algorithm 1. Note

that when computing the projection in (11) and correlation

in (13), a regularization term λI

I is added in order to have a

stable inversion, where λ is typically a small scalar (e.g. in the

order of 10

−5

) and I

I is an identity matrix whose dimensionality

should be clear from the context.

Input: B × N dictionary A

A =



·· · a



, B × T data ma-

trix X

X =



·· · x



, kernel function κ, and a stopping

criterion

Initialization: compute the kernel matrices K

in Algo-

rithm 1 (Initialization) and K

A,X

∈ R

N×T

whose (i, j)th entry

is κ(a

). Set index set Λ

= arg max

i=1,...,N





A,X



i,:



with

some p ≥ 1 and iteration counter t = 1.

while stopping criterion has not been met do

(1) Compute the correlation matrix

C = K

A,X

−(K

)

:,Λ

t−1



)

t−1

,Λ

t−1

+ λI



−1



A,X



t−1

∈ R

N×T

(2) Select the new index as λ

= arg max

i=1,...,N

i,:

, p ≥ 1

(3) Update the index set Λ

= Λ

t−1

{λ

}

(4) t ← t + 1

end while

Output: Index set Λ = Λ

t−1

, the sparse representation

′

whose nonzero rows indexed by Λ are

′

Λ,:

Λ,Λ

+ λI

−1



A,X



Λ,:

Algorithm 1: Kernelized Simultaneous Orthogonal Matching

Pursuit (KSOMP)

Similarly, KSSP is a simultaneous version of KSP where

the K

atoms that best simultaneously approximate all of the

T residuals in terms of the ℓ

-norm are chosen. The KSSP

algorithm is summarized in Algorithm 2. Note that the step

for computing the residual vectors (12) is incorporated into

the computation of the correlation vector in Step (1) of both

KSOMP and KSSP.

Once the matrix

′

is recovered, the total residual between

the T neighboring pixels and their approximations from the

mth-class training samples is computed by

) =

∑

i=1



κ(x

) − 2

′T

Ω



A,X



Ω

′T

Ω

)

Ω

,Ω

′

Ω



1/2

(18)

where K

A,X

and K

are as deﬁned in Algorithms 1 and 2, and

Ω

∈ {1,2,... , N} is the index set associated with the mth

class. The label for the center pixel x

is then determined by

the total residual

Class(x

) = arg min

m=1,...,M

). (19)

C. Kernel Sparse Representation with a Composite Kernel

Another way to address the contextual correlation within

HSI is though a composite kernel [7], which takes into

account the spatial correlation between neighboring pixels

Input: B × N dictionary A

A =



·· · a



, B × T data ma-

trix X

X =



·· · x



, kernel function κ, and a stopping

criterion

Initialization: compute the kernel matrices K

in Algo-

rithm 1 (Initialization) and K

A,X

∈ R

N×T

whose (i, j)th entry

is κ(a

). Set index set Λ

indices corresponding to

the K

largest numbers in





A,X



i,:



, p ≥ 1, i = 1,.. . , N

and set iteration counter t = 1.

while stopping criterion has not been met do

(1) Compute the correlation matrix

C = K

A,X

−(K

)

:,Λ

t−1



)

t−1

,Λ

t−1

+ λI



−1



A,X



t−1

∈ R

N×T

(2) Find the index set I =

indices corresponding to

the K

largest numbers in kC

i,:

, p ≥ 1, i = 1,... ,N

(3) Update the candidate index set

= Λ

t−1

(4) Compute the projection coefﬁcients

P =



)

+ λI



−1



A,X



∈ R

×T

(5) Update the index set Λ

indices in

cor-

responding to the K

largest numbers in kP

i,:

, p ≥ 1,

i = 1,..., N

(6) t ← t + 1

end while

Output: Index set Λ = Λ

t−1

, the sparse representation

′

whose nonzero rows indexed by Λ are

′

Λ,:

Λ,Λ

+ λI

−1



A,X



Λ,:

Algorithm 2: Kernelized Simultaneous Subspace Pur-

suit (KSSP)

by combining kernels dedicated to the spectral and spatial

information. The composite kernel approach has been shown

to signiﬁcantly outperform the spectral-only classiﬁer in HSI

classiﬁcation [42]. This method, although originally proposed

for SVM, can be readily incorporated into other classiﬁers

which operate in the feature space, such as kernel logistic

regression and the kernel sparse representation-based classiﬁer

proposed in this paper. Speciﬁcally, let x

be the spectral

pixel at location i in a hyperspectral image and x

be the

spatial information extracted from a small neighborhood cen-

tered at location i, which is usually the mean and/or the

standard deviation of the pixels within the neighborhood.

The new pixel entity at this location can be redeﬁned as

= {x

}. Note that in previous sections x

contains only

spectral information (i.e., x

= x

). The spectral and spatial

information can then be combined in a variety of ways,

including stacking, direct summation, weighted summation,

and cross-information kernels [7]. In this paper, we consider

the weighted summation kernel, which is shown to yield the

best classiﬁcation performance compared to other types of

composite kernels [7]. The kernel function in this case is

κ(x

) = µκ





+ (1− µ)κ





, (20)

where µ ∈ (0,1), and κ

and κ

are the kernel functions of the

spatial and spectral features, respectively.

The composite kernels can be directly applied to the pixel-

wise sparsity model in the feature space in (8). The sparse

representation vector can be recovered using the KOMP or

KSP algorithm, where the kernel matrix K

is now a weighted

summation of the spectral and spatial kernel matrices of the

training dictionary A

A, and the vector k

A,x

also needs to be

modiﬁed accordingly.

It is worth noting that the composite kernel approach is

different from the kernel joint sparsity model discussed in

Section III-B. The joint sparsity model involves only the

spatial information of the test pixels, and no prior knowledge

about the neighbors of the training pixels is needed. On the

other hand, for the composite kernels, the spatial information

for both training and test sets are necessary. Moreover, the joint

sparsity model does not assume a sum or average of the same

samples, but treats all pixels in a small neighborhood equally

and ﬁnds the sparsity pattern that simultaneously represents

these pixels.

IV. EXPERIMENTAL RESULTS

In this section, we show the effectiveness of the proposed

algorithms on classiﬁcation of several hyperspectral datasets.

For each image, we solve the sparse recovery problems

in (2), (5), (9), and (17) for each test sample, and then deter-

mine the class by the minimal residual (the results are denoted

by OMP/SP, KOMP/KSP, SOMP/SSP, and KSOMP/KSSP,

respectively). The results of KOMP and KSP with composite

kernels, as discussed in Section III-C, are denoted by KOM-

PCK and KSPCK, respectively. The classiﬁcation results are

then compared visually and quantitatively to those obtained

by the classical SVM classiﬁer and sparse multinomial kernel

logistic regression (KLR). For SVM and KLR classiﬁers,

we use a spectral-only kernel (denoted by SVM/KLR), as

well as a composite kernel (denoted by SVMCK/KLRCK).

In all classiﬁers with a composite kernel, we use a weighted

summation kernel and the spatial information is the mean

of pixels in a small neighborhood. The parameters for KLR,

KLRCK, SVM, and SVMCK are obtained by cross-validation.

The ﬁrst hyperspectral image in our experiments is the

Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) im-

age Indian Pines [43]. The AVIRIS sensor generates 220

bands across the spectral range from 0.2 to 2.4 µm. In the

experiments, the number of bands is reduced to 200 by

removing 20 water absorption bands. This image has spatial

resolution of 20 m per pixel and spatial dimension 145×145. It

contains 16 ground-truth classes. For each class, we randomly

choose around 10% of the labeled samples for training and use

the remaining 90% for testing, as shown in Table I and Fig. 1.

Radial Basis Function (RBF) kernels are used in all kernel-

based classiﬁers (i.e., SVM, SVMCK, KLR, KLRCK, KOMP,

KSP, KSOMP, KSSP, KOMPCK, and KSPCK). Since this

image consists of large homogenous regions, a large spatial

window of size 9× 9 (T = 81) is used in classiﬁers with a

composite kernel and the joint sparsity models (4) and (16).

The classiﬁcation performance for each of the 16 classese,

overall accuracy (OA), average accuracy (AA), and the κ

Hyperspectral image classification via kernel sparse representation

Figures

Citations

A survey on object detection in optical remote sensing images

Spectral–Spatial Hyperspectral Image Classification With Edge-Preserving Filtering

Collaborative Representation for Hyperspectral Anomaly Detection

Recent Advances on Spectral–Spatial Hyperspectral Image Classification: An Overview and New Guidelines

Anomaly Detection in Hyperspectral Images Based on Low-Rank and Sparse Representation

References

The Nature of Statistical Learning Theory

A training algorithm for optimal margin classifiers

Robust Face Recognition via Sparse Representation

Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit

Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond

Related Papers (5)

Robust Face Recognition via Sparse Representation

Hyperspectral Image Classification via Kernel Sparse Representation

Hyperspectral Image Classification Using Dictionary-Based Sparse Representation

Classification for hyperspectral imagery based on sparse representation

Nearest Regularized Joint Sparse Representation for Hyperspectral Image Classification

Frequently Asked Questions (12)

Q1. What are the contributions mentioned in the paper "Hyperspectral image classification via kernel sparse representation" ?

Q2. what is the kernel function for the center pixel x1?

Q3. How can the sparse representation vector be recovered?

Q4. What is the first hyperspectral image in their experiments?

Q5. What is the advantage of the proposed dictionary-based classifier?

Q6. How is the kernel joint sparsity model stable?

Q7. What is the way to improve the classification accuracy of a dictionary?

Q8. What is the intensive part in the sparse recovery?

Q9. How many HSI algorithms improve the classification performance?

Q10. How many classes are used for training?

Q11. What is the definition of the sparsity model in Section II?

Q12. What is the minimum residual of the center pixel x1?