What are the contributions in this paper?

This paper introduces a matrix cofactorization approach to perform spectral unmixing and classification jointly. After formulating the unmixing and classification tasks as matrix factorization problems, a link is introduced between the two coding matrices, namely the abundance matrix and the feature matrix.

What are the future works in this paper?

M. Belgiu and L. Drăguţ, “ Random forest in remote sensing: A review of applications and future directions, ” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 114, pp. 24–31, 2016. [ 3 ]

What are the two conventional metrics used to evaluate the classification accuracy?

To evaluate the classification accuracy, two conventional metrics are used, namely Cohen’s kappa coefficient and the averaged F1-score over all classes [18].

What is the purpose of this paper?

This paper introduces a unified framework to perform jointly spectral unmixing and classification by the mean of a cofactorization problem.

What is the RMSE for the unmixing?

To evaluate the unmixing results quantitatively, the reconstruction error (RE) and root global mean squared error (RMSE) are considered, i.e.,RE =√1PL∥ ∥ ∥ Y −MÂ ∥ ∥ ∥ 2F ,RMSE =√1PR∥ ∥ ∥ Atrue −

What are the results of the proposed cofactorization framework?

Results reported in Table The authorshow that the proposed cofactorization framework outperforms both RF and D-KSVD in term of classification.

(Open Access) Matrix Cofactorization for Joint Unmixing and Classification of Hyperspectral Images (2019) | Adrien Lagrange

Q: What is the classification rule for a linear classifier?

Considering a linear classifier parametrized by the matrix Q ∈ RC×K , a vector-wise nonlinear mapping φ(·), such as a sigmoid or a softmax operator, is then applied to the output of the classifier.

Q: What is the penalization of the kmeans clustering problem?

Two constraints are considered in this kmeans clustering problem: i) a positivity constraint on B since centroids are expected to be interpretable as mean abundance vectors and ii) the vectors zp (p ∈ P) are assumed to be defined on the K-dimensional probability simplex SK .

fficial URL

DOI : https://doi.org/10.23919/EUSIPCO.2019.8903037

Any correspondence concerning this service should be sent

to the repository administrator: tech-oatao@listes-diff.inp-toulouse.fr

This is an author’s version published in:

http://oatao.univ-toulouse.fr/24982

Open

Archive

Ouverte

OATAO is an open access repository that collects the work of Toulouse

researchers and makes it freely available over the web where possible

To cite this version: Lagrange, Adrien and Fauvel, Mathieu and

May, Stéphane and Bioucas-Dias, José M. and Dobigeon, Nicolas

Matrix Cofactorization for Joint Unmixing and Classification of

Hyperspectral Images. (2019) In: 27th European Signal Processing

Conference (EUSIPCO 2019), 2 September 2019 - 6 September 2019

(A Coruna, Spain).

Abstract—This paper introduces a matrix cofactorization ap-

proach to perform spectral unmixing and classiﬁcation jointly.

After formulating the unmixing and classiﬁcation tasks as matrix

factorization problems, a link is introduced between the two

coding matrices, namely the abundance matrix and the feature

matrix. This coupling term can be interpreted as a clustering

term where the abundance vectors are clustered and the resulting

attribution vectors are then used as feature vectors. The overall

non-smooth, non-convex optimization problem is solved using a

proximal alternating linearized minimization algorithm (PALM)

ensuring convergence to a critical point. The quality of the

obtained results is ﬁnally assessed by comparison to other

conventional algorithms on semi-synthetic yet realistic dataset.

Index Terms—supervised learning, spectral unmixing, cofac-

torization, hyperspectral images.

I. I

NTRODUCTION

Following the fast increase of available remote sensing

images, many methods have been proposed to extract infor-

mation from such speciﬁc data. In particular classiﬁcation

algorithms received a lot of attention from the scientiﬁc

community. The emergence of state-of-the-art algorithms such

as convolutional neural network [1] or random forest [2]

have brought unprecedented good results. In the so-called

supervised classiﬁcation framework, these algorithms make it

possible to infer, from a reduced number of examples provided

by an expert, a classiﬁcation rule. This rule is then used to

attribute to unknown pixels a class among a predeﬁned set of

classes. Although very efﬁcient, classiﬁcation methods remain

a limited analysis of the image since it only attributes a single

class to each pixel when it is sometimes possible to extract

more information. In the speciﬁc case of hyperspectral images

(HSI), images capture a very rich signal since each pixel is

a sampling of the reﬂectance spectrum of the corresponding

area, typically in the visible and infrared spectral domains

with hundreds of measurements. To fully exploit the available

information, it is interesting to resort to alternative methods of

interpretation such as representation learning methods, namely

spectral unmixing in the case of HSI [3]. Spectral unmixing is

Part of this work has been supported Centre National d’

Etudes Spatiales

(CNES), Occitanie Region, EU FP7 through the ERANETMED JC-WATER

program, MapInvPlnt Project ANR-15-NMED-0002-02 and ANR-3IA Artiﬁ-

cial and Natural Intelligence Toulouse Institute (ANITI).

a physic-based model which assumes that a given pixel, i.e. a

given measured spectrum, is the result of the combination of

a reduced number of elementary spectra called endmembers,

speciﬁc to a given material. The aim of unmixing methods is

to infer the proportion of each material present in the pixel.

The obtained abundance maps display the spatial distribution

of the material in the observed scene.

Even if classiﬁcation and spectral unmixing are two widely-

used techniques, very few attempts have been made to com-

bine them. Most of these works [4], [5] intend to improve

classiﬁcation results by using spectral unmixing to identify

mixed pixels and then process speciﬁcally the identiﬁed mixed

pixels. Instead of using the two methods sequentially, the

method proposed in this paper introduces the idea of a joint

unmixing and classiﬁcation. This method is formulated as a

cofactorization problem, which is known to produce valuable

results in many application ﬁelds such as music source separa-

tion [6], and image analysis [7]. The core concept is to express

the two problems of interest, namely spectral unmixing and

classiﬁcation, as factorization problems and then to introduce

a coupling term to intertwine the two estimations. Similarly

to [8], the coupling term is deﬁned as a clustering term where

the abundance vectors provided by the unmixing step are

clustered and the resulting attribution vectors are then used as

feature vectors for the classiﬁcation. The overall optimization

problem is non-convex non-smooth. Such problems are known

to be challenging to solve but, building on recent advances in

optimization, the PALM algorithm proposed in [9] is used as

an optimization scheme, thus guaranteeing convergence to a

critical point of the objective function.

The rest of this paper is organized as follows. Section II

deﬁnes the two factorization tasks and introduces the global

cofactorization problem. Then, the method used to minimize

the resulting criterion is presented in Section III. Finally,

the method is tested and compared to other unmixing and

classiﬁcation methods in Section IV. Section V draws some

conclusions and perspectives.

II. P

ROBLEM STATEMENT

As presented in Sections II-A and II-B, spectral unmixing

and supervised classiﬁcation are commonly expressed as fac-

torization problems. We propose to derive a uniﬁed framework

Matrix cofactorization for joint unmixing and

classiﬁcation of hyperspectral images

Adrien Lagrange

⋆

, Mathieu Fauvel

†

, St

ephane May

‡

, Jose

M. Bioucas-Dias

⋄

and Nicolas Dobigeon

⋆

IRIT/INP-ENSEEIHT, University of Toulouse, Toulouse, France

†

Centre d’

Etudes Spatiales de la BIOsph

ere (CESBIO), INRA, Toulouse, France

‡

Centre National d’

Etudes Spatiales (CNES), DCT/SI/AP, Toulouse, France

⋄

Instituto de Telecomunicac¸

oes, Instituto Superior Te

cnico, Universidade de Lisboa, 1049-001 Lisbon, Portugal

ﬁrstname.name@{enseeiht,inra,cnes,enseeiht}.fr, bioucas@lx.it.pt

by considering a global cofactorization problem. It relies on

a link between the two factorization problems in order to

perform a joint estimation. In the proposed model, the link

is made between the abundance matrix and the feature matrix.

More precisely, the coupling term is expressed as a clustering

term over the abundance vectors where the attribution vectors

to the clusters are also the feature vectors of the classiﬁcation

as detailed in Section II-C.

A. Spectral unmixing

Each pixel of an HSI is a L-dimensional measurement of

a reﬂectance spectrum. Physics models this spectrum as a

combination of R elementary spectrum, gathered in the so-

called endmember matrix M ∈ R

L×R

, each characterizing a

speciﬁc material. The spectral unmixing task aims at retrieving

the so-called abundance vectors a

∈ R

, with R ≪ L,

from the spectrum y

∈ R

of the pth pixel (p ∈ P where

P ! {1, . . . , P } is the set of pixel indexes). These abundance

vectors describe the mixture contained in the pixel. Using

the conventional linear mixture model, the spectral unmixing

problem can be expressed as follow

min

M,A

&Y − MA&

+ λ

&A&

+ ı

R×P

(A) (1)

where matrix Y ∈ R

L×P

gathers the P pixel spectra and A ∈

R×P

the abundance vectors. In addition to the data ﬁtting

term, two penalization terms are considered in the proposed

unmixing model. The term ı

R×P

(A) enforces a nonnegativity

constraint, ensuring an additive decomposition of the spectra.

The second penalization λ

&A&

is a sparsity penalization

promoting the concept that only a few endmembers are active

in a given pixel. In the following work, the choice has been

made to discard the estimation of the endmember matrix for

the sake of simplicity. The endmember matrix is assumed to

be known or estimated beforehand.

B. Classiﬁcation

In the context of supervised classiﬁcation, a subset of pixels

is available with their corresponding groundtruth. The index

subset of labeled pixel is denoted hereafter L while the index

subset of unlabeled pixel is U ( L ∩ U = ∅ and L ∪ U = P).

Classiﬁcation intends to assign one of the C classes to each

pixel. In practice, classifying can be formulated as estimating

a C × P matrix C whose columns correspond to unknown

C-dimensional attribution vectors c

(p ∈ U). Each vector is

made of 0 except for c

i,p

= 1 when the pth pixel is assigned

the ith class. Numerous decision rules have been proposed

to carry out classiﬁcation. Most of them rely on the use of

feature vectors z

∈ R

(p ∈ P) associated with the P

pixels, gathered in the matrix Z ∈ R

K×P

. Considering a

linear classiﬁer parametrized by the matrix Q ∈ R

C×K

, a

vector-wise nonlinear mapping φ(·), such as a sigmoid or a

softmax operator, is then applied to the output of the classiﬁer.

Finally the classiﬁcation rule can be expressed as the matrix

factorization problem

min

Q,C

(C, φ(QZ)) + ı

|U|

) (2)

where J

(·, ·) is a cost function measuring the quality of the

estimated attribution vectors φ(Qz

) and and S

is the C-

dimensional probability simplex ensuring nonnegativity and

sum-to-one constraints of the attribution vectors. In this work,

the cost function J

(·, ·) has been chosen as the cross-entropy,

deﬁned in a multi-class problem as

(C,

C) = −

p∈P

i∈C

i,p

log (ˆc

i,p

) (3)

with

, if p ∈ L

|U|

, if p ∈ U,

(4)

where L

is the subset of labeled pixels belonging to class i,

is the estimated attribution vector and c

the true one. The

weighing coefﬁcients d

adjust the cost function with respect

to the sizes of the training and test sets, in particular in the

case of unbalanced classes. This particular loss function has

been extensively used in the context of neural networks [10].

Moreover, the nonlinear mapping φ(·) is chosen as a sigmoid,

which makes the proposed classiﬁer interpretable as a one

layer neural network.

To consider a more elaborate case, it is also possible to add a

set of penalizations/constraints. In particular, a penalization of

the classiﬁer parameters Q is considered to prevent an artiﬁcial

decrease of the loss function. This penalization is based on

a Frobenius-norm and is well-known in the neural network

community where it is referred to as weight decay. The second

considered penalization is a spatial regularization enforced

through a smoothed weighted vectorial total variation norm

(vTV). This regularization promotes a piece-wise constant

solution for the classiﬁcation map C. The overall resulting

problem can be written

min

Q,C

−

p∈P

i∈C

i,p

log

1 + exp(−q

)

+ λ

&Q&

+ λ

&C&

vTV

+ ı

|U|

) (5)

where q

is the i-th line of Q, λ

and λ

weight the

regularization terms and

&C&

vTV

p=1

[∇

+ ǫ (6)

where ǫ > 0 is a smoothing parameter and [∇

(·)]

and

[∇

(·)]

denote horizontal and vertical discrete gradients

[∇

(m,n)

= c

(m+1,n)

− c

(m,n)

[∇

(m,n)

= c

(m,n+1)

− c

(m,n)

The weighting coefﬁcients β

m,n

are introduced to account for

the natural boundaries present in the image. They are com-

puted beforehand using external data containing information

on the spatial structures, e.g., a panchromatic image or a

LIDAR image [11]. An example of such weights is described

in Section IV.

With a slight abuse of notations, c

(m,n)

refers to the pth column of C

where the pth pixel is spatially indexed by (m, n).

UNMIXING CLUSTERING CLASSIFICATION

Image

Abund.

Endm.

min

&A − BZ&

Classiﬁcation

Features

Classiﬁer

Fig. 1. Structure of the cofactorization model. Variables in blue stand for

observations or available external data. Variables in olive green are linked

through the clustering term. The variable in a dotted box is assumed to be

known beforehand.

C. Clustering

o deﬁne a global cofactorization problem, a relation is

drawn between the activation matrices of the two factorization

problems, namely the abundance matrix and the feature matrix.

More speciﬁcally, following the idea developed in [8], a clus-

tering term is introduced as a coupling. Abundances vectors

are clustered and the resulting attribution vectors are then

used as feature vectors for the classiﬁcation. Ideally, clustering

attribution vectors z

∈ R

are ﬁlled with zeros except for

k,p

= 1 when a

is associated with the kth cluster. The well-

known k-means is chosen to perform this task since it is easily

expressed as an optimization problem

min

Z,B

&A − BZ&

+ ı

(Z) + ı

R×K

(B) (7)

where columns of B ∈ R

R×K

stands for the centroids of

the K clusters. Two constraints are considered in this k-

means clustering problem: i) a positivity constraint on B since

centroids are expected to be interpretable as mean abundance

vectors and ii) the vectors z

(p ∈ P) are assumed to be

deﬁned on the K-dimensional probability simplex S

. Thus,

the resulting clustering method is a particular instance of k-

means where the attribution vectors are relaxed and can be

interpreted as the collection of probabilities to belong to each

of the clusters.

D. Multi-objective problem

The two factorization problems corresponding to the spec-

tral unmixing and classiﬁcation tasks have been expressed and

the link between these two problems has been set up through

the clustering term. The global cofactorization problem, illus-

trated in Figure 1, is ﬁnally formulated as

min

A,Q,Z

&Y − MA&

+ λ

&A&

+ ı

R×P

(A)

−

p∈P

i∈C

i,p

log

1 + exp(−q

)

&Q&

+ λ

&C&

vTV

+ ı

|U|

)

&A − BZ&

+ ı

(Z) + ı

R×K

(B) (8)

where λ

, λ

and λ

are introduced to weight the contribution

of the various terms.

III. O

PTIMIZATION SCHEME

The proposed global optimization problem (8) is non-

con

vex and non-smooth. Such problem are usually very chal-

lenging to solve. To handle it, we propose to resort to the

PALM algorithm proposed in [9]. PALM algorithm ensures

convergence to a critical point, i.e., a local minimum of the ob-

jective function. To apply PALM, the objective is rewritten as

a sum of independent non-smooth terms f

(·) (j ∈ {1, . . . , 3})

and a smooth coupling term g(·)

min

A,B,Z,

Q,C

(A)+f

(B)+f

(Z)+f

)+g(A, B, Z, C

, Q)

where

(A) = ı

(A) + λ

&A&

, f

(B) = ı

(B)

(Z) = ı

(Z), f

) = ı

|U|

)

g(A, B, Z, C

, Q) =

&Y − MA&

−

p∈P

i∈C

i,p

log

1 + exp(−q

)

&A − BZ&

&Q&

+ λ

&C&

vTV

Algorithm 1: PALM

1 Initialize variables A

, B

, Z

, C

and Q

;

2 Set α > 1;

3 while stopping criterion not reached do

4 A

k+1

∈ prox

αL

−

αL

∇

g(A

, B

, Z

, C

, Q

));

5 B

k+1

∈

prox

αL

−

αL

∇

g(A

k+1

, B

, Z

, C

, Q

));

6 Z

k+1

∈

prox

αL

−

αL

∇

g(A

k+1

, B

k+1

, Z

, C

, Q

));

7 Q

k+1

∈ prox

αL

−

αL

∇

g(A

k+1

, B

k+1

, Z

k+1

, C

, Q

));

8 C

k+1

∈ prox

αL

−

αL

∇

g(A

k+1

, B

k+1

, Z

k+1

, C

, Q

k+1

));

9 end

10 return A

end

, B

end

, Z

end

, Q

end

, C

end

The concept of this algorithm is to perform a proximal

gradient descent according to each variable alternatively. To

apply PALM, the functions f

(·) have to be proper, lower

semi-continuous, extended real-valued. A sufﬁcient condition

on the function g(·) is to be C

, i.e., with continuous ﬁrst

and second derivatives, and its partial gradients have to be

globally Lipschitz. L

denotes herein the Lipschitz constant

associated to the partial gradient according to X. The detailed

steps of the algorithm are summarized in Algorithm 1 and

further theoretical details are available in [9].

In practice, one needs to be able to compute the partial

gradient and its associated Lipschitz constant to perform the

gradient descent. It is also necessary to compute the proximal

operator associated to the non-smooth terms. In the present

case, the partial gradients is easily computed and all globally

Lipschitz. The only problematic term is the vTV term which

is not globally Lipschitz in its canonical form. To alleviate,

a smoothed counterpart has been introduced in (6) with a

smoothing parameter ǫ ∈ R

. As for the proximal operators,

they are are well-known [12] except for f

(·). For f

(·), it

is necessary to resort to the composition of the proximal

operators associated to the non-negative constraint and the ℓ

norm, which is here possible according to [13].

IV. EXPERIMENTS

Data generation – The

HSI used to perform the experiments is

a semi-synthetic image. More speciﬁcally, the image has been

generated using a real HSI. The real image has been unmixed

using a fully constrained least square (FCLS) algorithm [14]

using R = 5 endmembers extracted with the well-known VCA

algorithm [15]. The obtained abundance maps have then been

used to generate a new synthetic image using pure spectra

from the hyperspectral library ASTER [16]. The groundtruth

of the original data, composed of C = 3 classes has been

preserved to assess the quality of the classiﬁcation. A color

composition, a panchromatic version and the groundtruth are

presented in Figure 2. The subset of the image used as training

data is as also shown in Figure 2.

(a) (b) (c) (d)

Fig. 2. Synthetic image: (a) colored composition of the HSI Y, (b)

panchromatic image y

PAN

, (c) classiﬁcation ground-truth, (d) training set.

Initialization and convergence – As

stated before, cofac-

torization is a non-convex problem and PALM only ensures

convergence to a local minimum of the objective function.

It is thus important to carefully initialize the estimated vari-

ables in order to reach a relevant solution. In the presented

experiment, abundance matrix A

has been initialized by

solving min

A∈R

R×P

&Y − MA&

using a projected gradient

algorithm. Then, a k-means algorithm has been applied to the

obtained abundance vectors and the resulting centroids and

attribution vectors have been used to initialize B

and Z

On the other hand, classiﬁer parameters Q

and classiﬁcation

matrix C

have been initialized randomly.

In order to assess the convergence of the optimization

scheme, the normalized difference between two consecutive

values of the objective function is monitored. When this

value reach a certain threshold (10

−4

for this experiment), the

optimization process stops and the last estimation is assumed

to be close enough to the solution.

Hyperparameters – Multiple hyperparameters λ

have been

introduced in problem (8) to weight the various terms of

the objective function. For practical use, these parameters

have been normalized by the size and dynamics of the cor-

responding variables. These normalized parameters, denoted

TABLE I

MIXING AND CLASSIFICATION RESULTS.

Model Kappa F1-mean RMSE(

A) RE Time (s)

RF 0.817 0.842 N\A N\A 0.4

FCLS N\A N\A 0.0701 0.224 1.2

CBPDN N\A N\A 0.0792 0.229 2

D-KSVD 0.494 0.554 N\A 0.923 70

Cofact. 0.847 0.870 0.0504 0.750 180

Cofact. + vTV 0.874 0.895 0.0526 0.752 81

, have been empirically tuned to obtain consistent results

(

= 1,

= 10

−3

= 0.15). For the last

hyperparameter

, two values have been considered 0. and

0.1, standing respectively for the case without and with spatial

regularization. The deﬁnition of the vTV regularization also

includes parameters which has to be properly set. First, the

smoothing parameter is set to ǫ = 0.01 to ensure the gradient-

Lipschitz property without modifying substantially the TV-

norm. Secondly, it is necessary to deﬁne the weighing coefﬁ-

cients β

m,n

. They have been computed from a panchromatic

image y

PAN

, shown in Figure 2, generated by normalizing

hyperspectral bands by their mean and then summing them.

More precisely, to account for possible homogeneous areas in

the image, they are deﬁned as follows

with

(

[∇y

PAN

]

+ σ

)

−1

where σ = 0.01 controls the variation of the weights and

avoids numerical issues.

Compared methods – To assess the quality of the unmix-

ing and classiﬁcation results, the proposed method has been

compared to several well-known unmixing and classiﬁcation

algorithms. Regarding classiﬁcation, we considered the ran-

dom forest (RF) algorithm, known to perform very well to

classify HSI. Parameters of the RF (number of trees, depth)

have been adjusted using gridsearch and cross-validation. The

discriminative K-SVD (D-KSVD) method has been used as a

benchmark [17]. This model is also a cofactorization method

but with a simpler approach where the two coding matrices A

and Z are imposed to be equal. In this case, the ﬁrst term is not

a spectral unmixing task but rather a dictionary learning task

where dictionary elements are assumed to be discriminative

for the classiﬁcation task. Only a sparsity penalization is

considered for D-KSVD using a ℓ

-norm.

As for the unmixing comparison, we considered two meth-

ods described in [14]. The ﬁrst method is the fully constrained

least square method (FCLS) where the corresponding opti-

mization problem is deﬁned as the data ﬁtting term with a

positivity and sum-to-one constraint on abundance vectors a

The second method is the constrained basis pursuit denoising

(CBPDN) corresponding to problem 1. The hyperparameter

, weighting the sparsity penalty is also adjusted using

gridsearch and cross-validation. It should be noted that all

unmixing methods use directly the correct endmember matrix

M which has been used to generate the data. Additionally, the

endmember matrix is used to initialize the dictionary of the

D-KSVD method.

Matrix Cofactorization for Joint Unmixing and Classification of Hyperspectral Images

Figures

Citations

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

References

다중혈관 관상동맥 환자에서 y-문합을 이용하여 양쪽 내흉동맥만을 사용한 우회술의 조기 성적

Assessing the accuracy of remotely sensed data : principles and practices

Random forest in remote sensing: A review of applications and future directions

Vertex component analysis: a fast algorithm to unmix hyperspectral data

Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches

Related Papers (5)

Hyperspectral data unmixing using constrained semi-NMF and PCA transform

Sparsity Constrained Distributed Unmixing of Hyperspectral Data

Self-Supervised Robust Deep Matrix Factorization for Hyperspectral Unmixing

Simultaneously Sparse and Low-Rank Abundance Matrix Estimation for Hyperspectral Image Unmixing

Spectral-Spatial Hyperspectral Unmixing Using Nonnegative Matrix Factorization

Frequently Asked Questions (13)

Q1. What are the contributions in this paper?

Q2. What are the future works in this paper?

Q3. What is the classification rule for a linear classifier?

Q4. What are the two conventional metrics used to evaluate the classification accuracy?

Q5. What is the attribution term for the clustering term?

Q6. What is the penalization of the kmeans clustering problem?

Q7. What is the definition of the unmixing method?

Q8. What is the purpose of this paper?

Q9. What is the index subset of unlabeled pixel?

Q10. What is the RMSE for the unmixing?

Q11. What is the definition of a global cofactorization problem?

Q12. What is the cost function for a classifier?

Q13. What are the results of the proposed cofactorization framework?