Discriminatively Embedded K-Means for Multi-view Clustering

doi:10.1109/CVPR.2016.578

Jinglin Xu

1

, Junwei Han

1

, Feiping Nie

2

1

School of Automation,

2

School of Computer Science and Center for OPTIMAL,

Northwestern Polytechnical University

Xi’an, 710072, P. R. China

xujignlinlove, junweihan2010, feipingnie@gmail.com

Abstract

In real world applicatio ns, more and more data, for ex-

ample, image/video data, are high dimensional a nd repre-

sented by m ultiple views which d escribe different perspec-

tives of the data. Efﬁciently clustering such data is a cha l-

lenge. To address this problem, this paper proposes a novel

multi-view clustering method called Discriminatively Em-

bedded K-Means (DEKM), which embeds the synchronous

learning of multiple discriminative subspaces into multi-

view K-Means clustering to construct a uniﬁed framework,

and adaptively control the intercoordinations between these

subspaces simultaneou sly. In this framework, we ﬁrstly

design a weighted multi-view Line ar Discriminant Analy-

sis (LDA), and the n develop an unsupervised optimization

scheme to alternatively le arn the common clustering indi-

cator, multiple discriminative subspaces and weights for

heterogeneous features with convergence. Comprehensive

evaluations on three benchmark datasets and comparison-

s with several state-of-the-art multi-view clustering algo-

rithms demonstrate the superiority of the proposed work.

1. Introduction

As a fundamental technique in machine learning, p at-

tern recognition and computer vision ﬁelds, clustering is to

assign data of similar patterns into the same cluster and re-

ﬂect the intrinsic structure of the da ta . In past decades, a

variety of classical clustering algorithms such as K-Mea ns

Clustering [ 15] and Spectral Clustering [24, 25] have been

invented.

In recent years, due to the rap id developmen t of infor-

mation technology, we are often confronted with data repre-

sented by heterogeneous features. These features are gener-

ated by using various feature construction ways. One good

example is image/v ideo data. A large number of different

visual descriptors, such as SIFT [20], HOG [7], LBP [22],

GIST [23], CMT [30] and CENT [29], have been proposed

to characterize the rich content of image/video data from

different perspectives. Each type of features may capture

the speciﬁc information about the v isu al data. To cluster

these data, one challenge is how to integrate the strengths

of various heterogeneous features by exploring the rich in-

formation among them, which certainly can lead to more

accurate and robust clustering performanc e than by using

each individual type of features.

Nowadays, the data is often represented by very high di-

mensional features, which renders another challeng e for the

clustering. A number of earlier efforts have been devoted

to addressing these two challenges. Focusing on one chal-

lenge that data is very high dimensional, many dimension-

ality redu ction-based clustering methods [12, 10, 26, 16]

have been developed, which mostly concern simultan eous

subspace selection by L DA and clustering. These methods

generally are more appropriate for single-view data clus-

tering. Altho ugh they may be extended to multi-view data

clustering task by simply concatenatin g different views as

input or integrating each view of clustering results to the ﬁ-

nal results, these extended methods still cannot achieve the

satisfactory performance due to the lack of intercoordina-

tion and complementation between different views during

clustering.

Focusing on another challenge that data is represented by

multi-view, a school of unsupervised multi-view clustering

methods have been presented. Although these methods can

achieve interactions among heterogeneous features, there

still exist some problems regarding heavy computa tional

complexity or curse of dimensionality. Most of these meth-

ods can be roughly classiﬁed into two categories: Multi-

View K-Means Clustering (MVKM) and Multi-View Spec-

tral Clustering ( MVSC). Many MVSC approaches essen-

tially extend the Spectral Clustering from single view to

multiple views and are mainly based on similarity graph-

s or matrices. Although this kind of multi-view clustering

algorithm s [8, 32, 21, 18, 19, 4, 14, 27, 5] can achieve en-

courag ing performance, they still have two main drawback-

s. On the one hand, the construction of the similarity graph

for high dimensional data is a he avy work becau se many

5356

factors must be considered, such as the choice of similarity

function and the type of similarity graph. This heavy work

may greatly affect the ﬁnal clustering performance. On the

other hand, MVSC algorithms generally need to build prop-

er similarity graph for each view. The more the numbe r of

different views, the more complex constructing similarity

graphs will be. Thus, MVSC algorithms cannot effectively

tackle high-dimensional multi-view data clustering.

Different f rom MVSC algorithm s, MVKM approaches

are more superior to deal with high-dimensional da ta be-

cause they do not need to construct a similarity graph for

each view. This kind of methods is originally derived from

the G-orthogonal non -negative matrix factorization (NMF)

which is equivalent to relaxed K-Means clusterin g (RKM)

[9]. Recently, Cai et al. [3] proposed the robust multi-view

K-Means clustering (RMVKM) b y using ℓ

2,1

-norm [11] to

replace the ℓ

2

-norm and learning individual weight for each

view. However, RMVKM was performed in the original

feature space without any discriminative subspace learning

mechanism that may render curse of dimensionality when

dealing with multi-view and high dimensional data. In ad-

dition, although the work in [31] also extende d the model

from [10] to the multi-view case, they sum the scatter ma-

trices and produc e a separate cluster assignment for each

view, wh ic h is quite different from the proposed method.

According to above mentioned analysis, both directly ex-

tending single-view to multi-view and existing multi-view

algorithm s are far from thoroughly addressing the multi-

view clustering issue. In this paper, we propose a novel un-

supervised multi-v iew scheme aim ing to address above two

challenges. The proposed method DEKM embeds the syn-

chronous lear ning of multiple discr iminative subspaces into

multi-view K-Means clustering to construct a uniﬁed fra me-

work, and a daptively contro l the in te rcoord inations betwe en

different views simultaneously.

The highlights o f DEKM method are in two aspects.

Firstly, learning mu ltiple discriminative subspaces is ful-

ﬁlled sy nchronously. Under this uniﬁed and embedd ed

framework, DEKM realizes the intercoordination of these

subspaces and further makes them complemen t each oth-

er. Second ly, DEKM develops an intertwined and iterative

optimization instead of just applying existing methods in

an iterative manner, which not only maintain s the relative

indepen dency on diff erent d iscriminative subspaces, but al-

so keeps the consistency of clustering results of multiple

views. This multi-v iew extension is the ﬁrst work among

the earliest efforts to sum the clustering objectives via a

weighted way. These are quite different from several recent

works. Comp rehensive evaluations on several benchmark

image datasets an d comparisons with some state-of-the-a rt

multi-view clustering appro aches demonstrate the efﬁcien-

cy and superiority of DEKM.

2. The proposed framework

2.1. Formulation

According to [17], the trace ratio LDA for single-view

was d eﬁned as follows:

W= arg max

W

T

W=I

m

T r(W

T

S

B

W)

T r(W

T

S

W

W)

(1)

where W ∈ R

d×m

denotes the projectio n matrix which is

a set of orthogonal and normalized vectors. It enables to

reduce the dimensionality from d to m. S

B

and S

W

denote

the between-c la ss scatter matrix and the within-class scatter

matrix, respectively.

Suppose that X ∈ R

d×N

is the data matrix with N sam-

ples and d-dimension after centralization and G ∈ R

N×C

is

the clustering indicator matrix where each row of G denotes

the clustering indicator vector for each sample, and C is the

number of clusters. G

ic

= 1(i = 1, ..., N ; c = 1, ..., C) if the

i-th sample be longs to the c-th class and G

ic

= 0 otherwise.

Using G, S

B

and S

W

can be rewritten as:

S

B

=XG(G

T

G)

−1

G

T

X

T

S

W

=XX

T

−XG(G

T

G)

−1

G

T

X

T

(2)

Because of S

T

= S

B

+S

W

, (1 ) is equivalent to the fol-

lowing problem:

W= arg max

W

T

W=I

m

T r(W

T

S

B

W)

T r(W

T

S

T

W)

(3)

We know th a t (3), as a supervised method, can seek a dis-

criminative subspace to separate different cla sses maximal-

ly. Recently, the combination of dimensionality reduction

and clustering has become a hot issue [12, 10, 26, 16]. How-

ever, those methods are only designed for single-view issue.

In this paper, we ﬁrstly design a weighted multi-view LDA

and then develop an unsupervised optimiza tion scheme to

solve this multi-view framework.

Given M types of heterogeneous features, k =

1, 2, ..., M, we suppose X

k

∈ R

d

k

×N

as the data matrix for

the k -th view. Referring to the deﬁnition of trace ratio LDA,

we propose that, for two d

k

×d

k

positive semi-deﬁnite ma-

trices S

k

B

and S

k

T

, the weighted multi-view trace ratio LDA

can be deﬁned as ﬁnding M differen projection matrices

W

k

|

M

k=1

respectively:

W

k

|

M

k=1

= arg max

W

T

k

W

k

=I

m

k

|

M

k=1

M

X

k=1

(α

k

)

γ

T r(W

T

k

S

k

B

W

k

)

T r(W

T

k

S

k

T

W

k

)

(4)

where W

k

denotes the projection matrix which reduces the

dimensionality from d

k

to m

k

in the k-th view. α

k

is the

weight for each view and γ is the par ameter to control the

5357

weights distribution. S

k

B

and S

k

T

denote the S

B

and S

T

in

the k-th view, respec tively:

S

k

B

=X

k

G(G

T

G)

−1

G

T

X

T

k

, S

k

T

=X

k

X

T

k

(5)

It is apparent that the weighted multi- view LDA, i.e. (4),

is still sup ervised. However, in the real ap plications, label-

ing data is very expensive. Without any label information,

we know neither projection matrices W

k

|

M

k=1

nor cluster-

ing indicator matrix G of (4), which is adverse f or doing

high-dimensional clustering. Thus, we propose an unsuper-

vised optimization scheme to solve the following weighted

multi-view LDA:

max

W

k

|

M

k=1

,

α

k

|

M

k=1

,G

M

X

k=1

(α

k

)

γ

"

T r(W

T

k

X

k

G(G

T

G)

−1

G

T

X

T

k

W

k

)

T r(W

T

k

S

k

T

W

k

)

−1

#

s.t.W

T

k

W

k

=I

m

k

|

M

k=1

, G ∈ Ind,

M

X

k=1

α

k

=1, α

k

≥0

(6)

where Ind is a set of clustering indicator matrices.

2.2. Optimization

The key difﬁculty of solving (6) is that (6) has become an

unsuper vised complex matter. In other words, the numera-

tor of (6), X

k

G(G

T

G)

−1

G

T

X

T

k

, actually S

k

B

, is closely re-

lated to G. However, W

k

|

M

k=1

, α

k

|

M

k=1

and G a re unknown.

To simultaneously obtain these variables in a better way, we

offer the Theorem 1 to transform (6) into a more tractable

framework (7) which is the proposed method DEKM. Ac-

tually, W

k

|

M

k=1

are n ot dec oupled in (7) since G is also a

variable to be optimiz e d.

Theorem 1. Solving (6) is equivalent to solvin g the follow-

ing o bjective function:

min

W

k

|

M

k=1

,

α

k

|

M

k=1

,G

M

X

k=1

(α

k

)

γ

||W

T

k

X

k

− F

k

G

T

||

2

F

T r(W

T

k

S

k

T

W

k

)

s.t.W

T

k

W

k

=I

m

k

|

M

k=1

, G ∈ Ind,

M

X

k=1

α

k

=1, α

k

≥0

(7)

Proof. Obviously, using the properties of matrix trace, (7)

can be rewritten as the following formula:

min

W

k

|

M

k=1

,

α

k

|

M

k=1

,G

M

X

k=1

(α

k

)

γ

T r



(W

T

k

X

k

−F

k

G

T

)

T

(W

T

k

X

k

−F

k

G

T

)



T r(W

T

k

S

k

T

W

k

)

= min

W

k

|

M

k=1

,

α

k

|

M

k=1

,G

M

X

k=1

(α

k

)

γ



T r(X

T

k

W

k

W

T

k

X

k

)−2T r(F

T

k

W

T

k

X

k

G)

+T r(F

k

G

T

GF

T

k

)



T r(W

T

k

S

k

T

W

k

)

(8)

Due to solving the minimum, we get its derivative with re-

spect to F

k

. Ignoring irrelevant terms and using the rules of

matrix derivative, we can obtain:

F

k

= W

T

k

X

k

G(G

T

G)

−1

(9)

Excitingly, F

k

∈ R

m

k

×C

is the cluster centroid in discrimi-

native subspace for the k-th view. Substituting (9) into (8),

there is:

min

W

k

|

M

k=1

,

α

k

|

M

k=1

,G

M

X

k=1

(α

k

)

γ



1−

T r(W

T

k

X

k

G(G

T

G)

−1

G

T

X

T

k

W

k

)

T r(W

T

k

S

k

T

W

k

)



⇔ max

W

k

|

M

k=1

,

α

k

|

M

k=1

,G

M

X

k=1

(α

k

)

γ

"

T r(W

T

k

X

k

G(G

T

G)

−1

G

T

X

T

k

W

k

)

T r(W

T

k

S

k

T

W

k

)

−1

#

(10)

Therefore, solving (6) is equivalent to solving (7).

Further, we decompose (7) into three subproblems and

solve them via alternate iteration method.

Step1: Solving G when W

k

|

M

k=1

, F

k

|

M

k=1

and α

k

|

M

k=1

are ﬁxed.

Obtaining G via a weighted multi-view K-Means clus-

tering is an unsupervised learning stage. The clustering in-

dicator matrix G is unknown and we search the optimal so -

lution of G among multiple low-dimensio nal discr iminative

subspaces.

We separate X

k

and G into independent vectors respec-

tively. Then (7) can be replaced by the following problem:

min

G

M

X

k=1

(α

k

)

γ

||W

T

k

X

k

−F

k

G

T

||

2

F

= min

G

N

X

i=1

M

X

k=1

(α

k

)

γ

||W

T

k

x

i

k

−F

k

g

i

||

2

s.t.G∈Ind, g

i

∈G, g

ic

∈{0, 1},

C

X

c=1

g

ic

=1

(11)

where x

i

k

is the i-th column of X

k

, which correspon ds to

the i-th sam ple in the k-th view and g

i

is the i-th row of

G, which denotes the clu ster ing indicator vector for the i-th

sample. Assigning G into (11) one by one is equivalent to

tackling the following problem for the i-th sample :

c

∗

=arg min

c

M

X

k=1

(α

k

)

γ

||W

T

k

x

i

k

−F

k

e

c

||

2

(12)

where e

c

is the c-th row of identity matrix I

C

and c

∗

means

that the c

∗

-th element of g

i

is 1 and others are 0. There are

only C kin ds of candidate clustering indicator vectors, so

we can easily ﬁnd out the solution of (12).

5358

Step2: Solving W

k

|

M

k=1

and F

k

|

M

k=1

when G and

α

k

|

M

k=1

are ﬁxed.

Calculating W

k

|

M

k=1

and F

k

|

M

k=1

via a weighted multi-

view LDA is a supervised learning stage. Moreover, the

discriminative subspace W

k

for each view is closely related

to the clustering indicator matr ix G and its weight α

k

.

From (9), we know that F

k

is a function of W

k

and G.

When G and α

k

|

M

k=1

are ﬁxed, substituting (9) into (7) and

omitting constant terms, the obje c tive function becomes:

min

W

k

|

M

k=1

M

X

k=1

T r(W

T

k

˜

S

k

W

k

)

T r(W

T

k

S

k

T

W

k

)

, s.t.W

T

k

W

k

= I

m

k

|

M

k=1

(13)

where

˜

S

k

W

= (α

k

)

γ

[X

k

X

T

k

−X

k

G(G

T

G)

−1

G

T

X

T

k

] denotes

the weighted within-class scatter matrix for the k-th view.

Thus, so lving (13) equals to solving the following formula:

max

W

k

|

M

k=1

M

X

k=1

T r(W

T

k

˜

S

k

B

W

k

)

T r(W

T

k

S

k

T

W

k

)

, s.t.W

T

k

W

k

= I

m

k

|

M

k=1

(14)

where

˜

S

k

B

=(α

k

)

γ

X

k

G(G

T

G)

−1

G

T

X

T

k

denotes the weight-

ed between-class scatter matrix for the k-th view. (14) jo int-

ly optimizes M distinct discriminative subspaces in paral-

lel. The solution W

k

for each view is solved by a trace ratio

LDA whe n G and α

k

|

M

k=1

are ﬁxed.

Step3: Solving α

k

|

M

k=1

when W

k

|

M

k=1

and G are ﬁxed.

Learning the non-negative normalized weight α

k

for

each view assigns the more discriminative image feature

with higher weight. To derive the solution of α

k

|

M

k=1

, we

rewrite (7) as:

min

α

k

|

M

k=1

M

X

k=1

(α

k

)

γ

H

k

, s.t.

M

X

k=1

α

k

=1, α

k

≥ 0

(15)

where

H

k

=

kW

T

k

X

k

−F

k

G

T

k

2

F

T r(W

T

k

S

k

T

W

k

)

(16)

Thus, the Lagrange function of (15) is:

M

X

k=1

(α

k

)

γ

H

k

− λ(

M

X

k=1

α

k

− 1) (17)

where λ is the Lagrange multiplier. In order to get the op-

timal solution, we set the derivative of (17) with respect to

α

k

to zero an d then substitute the result into the constraint

P

M

k=1

α

k

=1. There is:

α

k

=

(γH

k

)

1

1−γ

P

M

v=1

(γH

v

)

1

1−γ

(18)

Algorithm 1 The algorithm of DEKM method

Input:

Data for M views {X

k

|k = 1, 2, ..., M}, X

k

∈ R

d

k

×N

. The number of

clusters C. The reduced dimension m

k

for each view and the parameter

γ.

Output:

The projection matrix W

k

, cluster centroid matrix F

k

and weight α

k

for the k-th view. The common clustering indicator matrix G.

Initialization:

Set t = 0. Initialize G ∈ Ind. Initialize W

k

by W

T

k

W

k

= I

m

k

and

initialize the weight α

k

=1/M for the k-th view.

While not converge do

1: Calculate G by :

c

∗

=arg min

c

M

X

k=1

(α

k

)

γ

||W

T

k

x

i

k

−F

k

e

c

||

2

2: Calculate F

k

by F

k

= W

T

k

X

k

G(G

T

G)

−1

and update W

k

|

M

k=1

by

max

W

k

|

M

k=1

M

X

k=1

T r(W

T

k

˜

S

k

B

W

k

)

T r(W

T

k

S

k

T

W

k

)

3: Update α

k

|

M

k=1

by:

α

k

=

(γH

k

)

1

1−γ

P

M

v=1

(γH

v

)

1

1−γ

End While, return W

k

|

M

k=1

, G and α

k

|

M

k=1

To sum up, in Algo rithm 1 , we can obtain G via Step1,

which is equivalent to the Discr iminative K-Means inc lud-

ing the interrelations among multi-view features. Updating

W

k

|

M

k=1

via Step2 is the dimensionality reduction for eac h

view. Updating α

k

|

M

k=1

via Step3 fulﬁlls the learning of

multiple we ights simultaneously. Then we repea t this pro-

cess iteratively until the objective function value becomes

converged.

3. Convergence anal ysis

As mentioned above, DEKM is a uniﬁed and embedded

multi-view f ramework solved by an unsupervised optimiza-

tion scheme. It is o bvious that when we transform (6) into

(7), it can be d ivided into three subproblems. Here we show

the following p roof to verify the convergence of Discrimi-

natively Embedded K-Means (DEKM) algorithm.

Theorem 2. In ea ch iteration, no matter the objective func-

tion value of (6) or that of its variant (7), which all decrease

until the algo rithm converges.

Proof. Supposin g after the t-th itera tion, we have obtained

W

(t)

k

|

M

k=1

, G

(t)

and α

(t)

k

|

M

k=1

. In the t +1-th iteration, we

ﬁrstly ﬁx G and α

k

|

M

k=1

as G

(t)

and α

(t)

k

|

M

k=1

respectively,

and then solve W

(t+1)

k

for each view. Thus, when G

(t)

and

α

(t)

k

|

M

k=1

are ﬁxed, according to (6), W

(t+1)

k

can be so lved

5359

by th e following equation:

W

(t+1)

k

=arg max

W

k

(α

(t)

k

)

γ

· · ·



T r[W

(t)T

k

X

k

G

(t)

(G

(t)T

G

(t)

)

−1

G

(t)T

X

T

k

W

(t)

k

]

T r(W

(t)T

k

S

k

T

W

(t)

k

)

−1



=arg min

W

k

(α

(t)

k

)

γ

· · ·



T r[W

(t)T

k

(S

k

T

−X

k

G

(t)

(G

(t)T

G

(t)

)

−1

G

(t)T

X

T

k

)W

(t)

k

]

T r(W

(t)T

k

S

k

T

W

(t)

k

)



(19)

Referring to the way of argumen tation f or [6], through

rewriting (19) we have:

T r



W

(t+1)T

k

˜

S

k(t)

W

(t+1)

k



T r



W

(t+1)T

k

S

k

T

W

(t+1)

k



≤

T r



W

(t)T

k

˜

S

k(t)

W

(t)

k



T r



W

(t)T

k

S

k

T

W

(t)

k



(20)

where

˜

S

k(t)

W

= (α

(t)

k

)

γ

[S

k

T

− X

k

G

(t)

(G

(t)T

G

(t)

)

−1

G

(t)T

X

T

k

]

= (α

(t)

k

)

γ

S

k(t)

W

and it denotes the weighted within-class scatter matrix for

the k-th view at th e t-th iteration.

In the same way, we ﬁx W

k

|

M

k=1

and α

k

|

M

k=1

as W

(t)

k

|

M

k=1

and α

(t)

k

|

M

k=1

respectively, and solve for G

(t+1)

. According

to (6), we can obtain:

G

(t+1)

=arg max

G

M

X

k=1

(α

(t)

k

)

γ

· · ·



T r[W

(t)T

k

X

k

G

(t)

(G

(t)T

G

(t)

)

−1

G

(t)T

X

T

k

W

(t)

k

]

T r(W

(t)T

k

S

k

T

W

(t)

k

)

−1



=arg min

G

M

X

k=1

(α

(t)

k

)

γ

· · ·



T r[W

(t)T

k

(S

k

T

−X

k

G

(t)

(G

(t)T

G

(t)

)

−1

G

(t)T

X

T

k

)W

(t)

k

]

T r(W

(t)T

k

S

k

T

W

(t)

k

)



(21)

By rewriting (21), there is:

M

X

k=1

T r



W

(t)T

k

˜

S

k(t+1)

W

(t)

k



T r



W

(t)T

k

S

k

T

W

(t)

k



≤

M

X

k=1

T r



W

(t)T

k

˜

S

k(t)

W

(t)

k



T r



W

(t)T

k

S

k

T

W

(t)

k



(22)

where

˜

S

k(t+1)

W

=(α

(t)

k

)

γ



S

k

T

−X

k

G

(t+1)

(G

(t+1)T

G

(t+1)

)

−1

G

(t+1)T

X

T

k



= (α

(t)

k

)

γ

S

k(t+1)

W

and it is the weighted within-class scatter matrix for the k-th

view at the t+1-th iteration.

Similarly, we ﬁx W

k

|

M

k=1

and G as W

(t)

k

|

M

k=1

and G

(t)

respectively, and solve for α

(t+1)

k

|

M

k=1

. According to (6),

for each view, α

(t+1)

k

can be calculated by:

α

(t+1)

k

=arg max

α

k

(α

(t)

k

)

γ

· · ·



T r[W

(t)T

k

X

k

G

(t)

(G

(t)T

G

(t)

)

−1

G

(t)T

X

T

k

W

(t)

k

]

T r(W

(t)T

k

S

k

T

W

(t)

k

)

−1



=arg min

α

k

(α

(t)

k

)

γ

· · ·



T r[W

(t)T

k

(S

k

T

−X

k

G

(t)

(G

(t)T

G

(t)

)

−1

G

(t)T

X

T

k

)W

(t)

k

]

T r(W

(t)T

k

S

k

T

W

(t)

k

)



(23)

Thus, (23) can be further rewritten as follows:

(α

(t+1)

k

)

γ

T r



W

(t)T

k

S

k(t)

W

(t)

k



T r



W

(t)T

k

S

k

T

W

(t)

k



≤(α

(t)

k

)

γ

T r



W

(t)T

k

S

k(t)

W

(t)

k



T r



W

(t)T

k

S

k

T

W

(t)

k



(24)

Integrating (20), (22) and (24), we arrive at:

M

X

k=1

T r



W

(t+1)T

k

(α

(t+1)

k

)

γ

S

k(t+1)

W

(t+1)

k



T r



W

(t+1)T

k

S

k

T

W

(t+1)

k



≤

M

X

k=1

T r



W

(t)T

k

(α

(t+1)

k

)

γ

S

k(t+1)

W

(t)

k



T r



W

(t)T

k

S

k

T

W

(t)

k



≤

M

X

k=1

T r



W

(t)T

k

(α

(t+1)

k

)

γ

S

k(t)

W

(t)

k



T r



W

(t)T

k

S

k

T

W

(t)

k



≤

M

X

k=1

T r



W

(t)T

k

(α

(t)

k

)

γ

S

k(t)

W

(t)

k



T r



W

(t)T

k

S

k

T

W

(t)

k



(25)

Thus, (25) proves that (6) and its variant (7) are lower

bounded and their objective function value decreases after

each iteration.

4. Experiments

In this section, we evaluate the performance of DEK-

M on three benchmark datasets in ter ms of two standard

clustering evaluation metrics, namely A ccuracy (ACC) [2]

and Norm alized Mutual Information (NMI) [2]. Before do

anything, we need to centralize the data and normalize all

values in the range of [−1, 1].

4.1. Datasets

In our experiments, by following [3], thre e benchmark

image datasets in c luding Caltech101 [13], MSRC [28] and

5360

Discriminatively Embedded K-Means for Multi-view Clustering

Citations

Generalized Latent Multi-View Subspace Clustering

Latent Multi-view Subspace Clustering

Multiview Consensus Graph Clustering

Multi-View Clustering via Deep Matrix Factorization.

Multi-view Clustering: A Survey

References

Distinctive Image Features from Scale-Invariant Keypoints

Histograms of oriented gradients for human detection

UCI Machine Learning Repository

Distinctive Image Features from Scale-Invariant Keypoints

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

Related Papers (5)

Co-regularized Multi-view Spectral Clustering

A Co-training Approach for Multi-view Spectral Clustering

Multi-view clustering via joint nonnegative matrix factorization

On Spectral Clustering: Analysis and an algorithm

Normalized cuts and image segmentation