scispace - formally typeset
Proceedings ArticleDOI

Discriminatively Embedded K-Means for Multi-view Clustering

Jinglin Xu, +2 more
- pp 5356-5364
TLDR
A novel multi-view clustering method called Discriminatively Embedded K-Means (DEKM) is proposed, which embeds the synchronous learning of multiple discriminative subspaces into multi- view K- means clustering to construct a unified framework, and adaptively control the intercoordinations between these subspacing simultaneously.
Abstract
In real world applications, more and more data, for example, image/video data, are high dimensional and repre-sented by multiple views which describe different perspectives of the data. Efficiently clustering such data is a challenge. To address this problem, this paper proposes a novel multi-view clustering method called Discriminatively Embedded K-Means (DEKM), which embeds the synchronous learning of multiple discriminative subspaces into multi-view K-Means clustering to construct a unified framework, and adaptively control the intercoordinations between these subspaces simultaneously. In this framework, we firstly design a weighted multi-view Linear Discriminant Analysis (LDA), and then develop an unsupervised optimization scheme to alternatively learn the common clustering indicator, multiple discriminative subspaces and weights for heterogeneous features with convergence. Comprehensive evaluations on three benchmark datasets and comparisons with several state-of-the-art multi-view clustering algorithms demonstrate the superiority of the proposed work.

read more

Content maybe subject to copyright    Report

Discriminatively Embedded K-Means for Multi-view Clustering
Jinglin Xu
1
, Junwei Han
1
, Feiping Nie
2
1
School of Automation,
2
School of Computer Science and Center for OPTIMAL,
Northwestern Polytechnical University
Xi’an, 710072, P. R. China
xujignlinlove, junweihan2010, feipingnie@gmail.com
Abstract
In real world applicatio ns, more and more data, for ex-
ample, image/video data, are high dimensional a nd repre-
sented by m ultiple views which d escribe different perspec-
tives of the data. Efficiently clustering such data is a cha l-
lenge. To address this problem, this paper proposes a novel
multi-view clustering method called Discriminatively Em-
bedded K-Means (DEKM), which embeds the synchronous
learning of multiple discriminative subspaces into multi-
view K-Means clustering to construct a unified framework,
and adaptively control the intercoordinations between these
subspaces simultaneou sly. In this framework, we firstly
design a weighted multi-view Line ar Discriminant Analy-
sis (LDA), and the n develop an unsupervised optimization
scheme to alternatively le arn the common clustering indi-
cator, multiple discriminative subspaces and weights for
heterogeneous features with convergence. Comprehensive
evaluations on three benchmark datasets and comparison-
s with several state-of-the-art multi-view clustering algo-
rithms demonstrate the superiority of the proposed work.
1. Introduction
As a fundamental technique in machine learning, p at-
tern recognition and computer vision fields, clustering is to
assign data of similar patterns into the same cluster and re-
flect the intrinsic structure of the da ta . In past decades, a
variety of classical clustering algorithms such as K-Mea ns
Clustering [ 15] and Spectral Clustering [24, 25] have been
invented.
In recent years, due to the rap id developmen t of infor-
mation technology, we are often confronted with data repre-
sented by heterogeneous features. These features are gener-
ated by using various feature construction ways. One good
example is image/v ideo data. A large number of different
visual descriptors, such as SIFT [20], HOG [7], LBP [22],
GIST [23], CMT [30] and CENT [29], have been proposed
to characterize the rich content of image/video data from
different perspectives. Each type of features may capture
the specific information about the v isu al data. To cluster
these data, one challenge is how to integrate the strengths
of various heterogeneous features by exploring the rich in-
formation among them, which certainly can lead to more
accurate and robust clustering performanc e than by using
each individual type of features.
Nowadays, the data is often represented by very high di-
mensional features, which renders another challeng e for the
clustering. A number of earlier efforts have been devoted
to addressing these two challenges. Focusing on one chal-
lenge that data is very high dimensional, many dimension-
ality redu ction-based clustering methods [12, 10, 26, 16]
have been developed, which mostly concern simultan eous
subspace selection by L DA and clustering. These methods
generally are more appropriate for single-view data clus-
tering. Altho ugh they may be extended to multi-view data
clustering task by simply concatenatin g different views as
input or integrating each view of clustering results to the fi-
nal results, these extended methods still cannot achieve the
satisfactory performance due to the lack of intercoordina-
tion and complementation between different views during
clustering.
Focusing on another challenge that data is represented by
multi-view, a school of unsupervised multi-view clustering
methods have been presented. Although these methods can
achieve interactions among heterogeneous features, there
still exist some problems regarding heavy computa tional
complexity or curse of dimensionality. Most of these meth-
ods can be roughly classified into two categories: Multi-
View K-Means Clustering (MVKM) and Multi-View Spec-
tral Clustering ( MVSC). Many MVSC approaches essen-
tially extend the Spectral Clustering from single view to
multiple views and are mainly based on similarity graph-
s or matrices. Although this kind of multi-view clustering
algorithm s [8, 32, 21, 18, 19, 4, 14, 27, 5] can achieve en-
courag ing performance, they still have two main drawback-
s. On the one hand, the construction of the similarity graph
for high dimensional data is a he avy work becau se many
5356

factors must be considered, such as the choice of similarity
function and the type of similarity graph. This heavy work
may greatly affect the final clustering performance. On the
other hand, MVSC algorithms generally need to build prop-
er similarity graph for each view. The more the numbe r of
different views, the more complex constructing similarity
graphs will be. Thus, MVSC algorithms cannot effectively
tackle high-dimensional multi-view data clustering.
Different f rom MVSC algorithm s, MVKM approaches
are more superior to deal with high-dimensional da ta be-
cause they do not need to construct a similarity graph for
each view. This kind of methods is originally derived from
the G-orthogonal non -negative matrix factorization (NMF)
which is equivalent to relaxed K-Means clusterin g (RKM)
[9]. Recently, Cai et al. [3] proposed the robust multi-view
K-Means clustering (RMVKM) b y using
2,1
-norm [11] to
replace the
2
-norm and learning individual weight for each
view. However, RMVKM was performed in the original
feature space without any discriminative subspace learning
mechanism that may render curse of dimensionality when
dealing with multi-view and high dimensional data. In ad-
dition, although the work in [31] also extende d the model
from [10] to the multi-view case, they sum the scatter ma-
trices and produc e a separate cluster assignment for each
view, wh ic h is quite different from the proposed method.
According to above mentioned analysis, both directly ex-
tending single-view to multi-view and existing multi-view
algorithm s are far from thoroughly addressing the multi-
view clustering issue. In this paper, we propose a novel un-
supervised multi-v iew scheme aim ing to address above two
challenges. The proposed method DEKM embeds the syn-
chronous lear ning of multiple discr iminative subspaces into
multi-view K-Means clustering to construct a unified fra me-
work, and a daptively contro l the in te rcoord inations betwe en
different views simultaneously.
The highlights o f DEKM method are in two aspects.
Firstly, learning mu ltiple discriminative subspaces is ful-
filled sy nchronously. Under this unified and embedd ed
framework, DEKM realizes the intercoordination of these
subspaces and further makes them complemen t each oth-
er. Second ly, DEKM develops an intertwined and iterative
optimization instead of just applying existing methods in
an iterative manner, which not only maintain s the relative
indepen dency on diff erent d iscriminative subspaces, but al-
so keeps the consistency of clustering results of multiple
views. This multi-v iew extension is the first work among
the earliest efforts to sum the clustering objectives via a
weighted way. These are quite different from several recent
works. Comp rehensive evaluations on several benchmark
image datasets an d comparisons with some state-of-the-a rt
multi-view clustering appro aches demonstrate the efficien-
cy and superiority of DEKM.
2. The proposed framework
2.1. Formulation
According to [17], the trace ratio LDA for single-view
was d efined as follows:
W= arg max
W
T
W=I
m
T r(W
T
S
B
W)
T r(W
T
S
W
W)
(1)
where W R
d×m
denotes the projectio n matrix which is
a set of orthogonal and normalized vectors. It enables to
reduce the dimensionality from d to m. S
B
and S
W
denote
the between-c la ss scatter matrix and the within-class scatter
matrix, respectively.
Suppose that X R
d×N
is the data matrix with N sam-
ples and d-dimension after centralization and G R
N×C
is
the clustering indicator matrix where each row of G denotes
the clustering indicator vector for each sample, and C is the
number of clusters. G
ic
= 1(i = 1, ..., N ; c = 1, ..., C) if the
i-th sample be longs to the c-th class and G
ic
= 0 otherwise.
Using G, S
B
and S
W
can be rewritten as:
S
B
=XG(G
T
G)
1
G
T
X
T
S
W
=XX
T
XG(G
T
G)
1
G
T
X
T
(2)
Because of S
T
= S
B
+S
W
, (1 ) is equivalent to the fol-
lowing problem:
W= arg max
W
T
W=I
m
T r(W
T
S
B
W)
T r(W
T
S
T
W)
(3)
We know th a t (3), as a supervised method, can seek a dis-
criminative subspace to separate different cla sses maximal-
ly. Recently, the combination of dimensionality reduction
and clustering has become a hot issue [12, 10, 26, 16]. How-
ever, those methods are only designed for single-view issue.
In this paper, we firstly design a weighted multi-view LDA
and then develop an unsupervised optimiza tion scheme to
solve this multi-view framework.
Given M types of heterogeneous features, k =
1, 2, ..., M, we suppose X
k
R
d
k
×N
as the data matrix for
the k -th view. Referring to the definition of trace ratio LDA,
we propose that, for two d
k
×d
k
positive semi-definite ma-
trices S
k
B
and S
k
T
, the weighted multi-view trace ratio LDA
can be defined as finding M differen projection matrices
W
k
|
M
k=1
respectively:
W
k
|
M
k=1
= arg max
W
T
k
W
k
=I
m
k
|
M
k=1
M
X
k=1
(α
k
)
γ
T r(W
T
k
S
k
B
W
k
)
T r(W
T
k
S
k
T
W
k
)
(4)
where W
k
denotes the projection matrix which reduces the
dimensionality from d
k
to m
k
in the k-th view. α
k
is the
weight for each view and γ is the par ameter to control the
5357

weights distribution. S
k
B
and S
k
T
denote the S
B
and S
T
in
the k-th view, respec tively:
S
k
B
=X
k
G(G
T
G)
1
G
T
X
T
k
, S
k
T
=X
k
X
T
k
(5)
It is apparent that the weighted multi- view LDA, i.e. (4),
is still sup ervised. However, in the real ap plications, label-
ing data is very expensive. Without any label information,
we know neither projection matrices W
k
|
M
k=1
nor cluster-
ing indicator matrix G of (4), which is adverse f or doing
high-dimensional clustering. Thus, we propose an unsuper-
vised optimization scheme to solve the following weighted
multi-view LDA:
max
W
k
|
M
k=1
,
α
k
|
M
k=1
,G
M
X
k=1
(α
k
)
γ
"
T r(W
T
k
X
k
G(G
T
G)
1
G
T
X
T
k
W
k
)
T r(W
T
k
S
k
T
W
k
)
1
#
s.t.W
T
k
W
k
=I
m
k
|
M
k=1
, G Ind,
M
X
k=1
α
k
=1, α
k
0
(6)
where Ind is a set of clustering indicator matrices.
2.2. Optimization
The key difficulty of solving (6) is that (6) has become an
unsuper vised complex matter. In other words, the numera-
tor of (6), X
k
G(G
T
G)
1
G
T
X
T
k
, actually S
k
B
, is closely re-
lated to G. However, W
k
|
M
k=1
, α
k
|
M
k=1
and G a re unknown.
To simultaneously obtain these variables in a better way, we
offer the Theorem 1 to transform (6) into a more tractable
framework (7) which is the proposed method DEKM. Ac-
tually, W
k
|
M
k=1
are n ot dec oupled in (7) since G is also a
variable to be optimiz e d.
Theorem 1. Solving (6) is equivalent to solvin g the follow-
ing o bjective function:
min
W
k
|
M
k=1
,
α
k
|
M
k=1
,G
M
X
k=1
(α
k
)
γ
||W
T
k
X
k
F
k
G
T
||
2
F
T r(W
T
k
S
k
T
W
k
)
s.t.W
T
k
W
k
=I
m
k
|
M
k=1
, G Ind,
M
X
k=1
α
k
=1, α
k
0
(7)
Proof. Obviously, using the properties of matrix trace, (7)
can be rewritten as the following formula:
min
W
k
|
M
k=1
,
α
k
|
M
k=1
,G
M
X
k=1
(α
k
)
γ
T r
(W
T
k
X
k
F
k
G
T
)
T
(W
T
k
X
k
F
k
G
T
)
T r(W
T
k
S
k
T
W
k
)
= min
W
k
|
M
k=1
,
α
k
|
M
k=1
,G
M
X
k=1
(α
k
)
γ
T r(X
T
k
W
k
W
T
k
X
k
)2T r(F
T
k
W
T
k
X
k
G)
+T r(F
k
G
T
GF
T
k
)
T r(W
T
k
S
k
T
W
k
)
(8)
Due to solving the minimum, we get its derivative with re-
spect to F
k
. Ignoring irrelevant terms and using the rules of
matrix derivative, we can obtain:
F
k
= W
T
k
X
k
G(G
T
G)
1
(9)
Excitingly, F
k
R
m
k
×C
is the cluster centroid in discrimi-
native subspace for the k-th view. Substituting (9) into (8),
there is:
min
W
k
|
M
k=1
,
α
k
|
M
k=1
,G
M
X
k=1
(α
k
)
γ
1
T r(W
T
k
X
k
G(G
T
G)
1
G
T
X
T
k
W
k
)
T r(W
T
k
S
k
T
W
k
)
max
W
k
|
M
k=1
,
α
k
|
M
k=1
,G
M
X
k=1
(α
k
)
γ
"
T r(W
T
k
X
k
G(G
T
G)
1
G
T
X
T
k
W
k
)
T r(W
T
k
S
k
T
W
k
)
1
#
(10)
Therefore, solving (6) is equivalent to solving (7).
Further, we decompose (7) into three subproblems and
solve them via alternate iteration method.
Step1: Solving G when W
k
|
M
k=1
, F
k
|
M
k=1
and α
k
|
M
k=1
are fixed.
Obtaining G via a weighted multi-view K-Means clus-
tering is an unsupervised learning stage. The clustering in-
dicator matrix G is unknown and we search the optimal so -
lution of G among multiple low-dimensio nal discr iminative
subspaces.
We separate X
k
and G into independent vectors respec-
tively. Then (7) can be replaced by the following problem:
min
G
M
X
k=1
(α
k
)
γ
||W
T
k
X
k
F
k
G
T
||
2
F
= min
G
N
X
i=1
M
X
k=1
(α
k
)
γ
||W
T
k
x
i
k
F
k
g
i
||
2
2
s.t.GInd, g
i
G, g
ic
{0, 1},
C
X
c=1
g
ic
=1
(11)
where x
i
k
is the i-th column of X
k
, which correspon ds to
the i-th sam ple in the k-th view and g
i
is the i-th row of
G, which denotes the clu ster ing indicator vector for the i-th
sample. Assigning G into (11) one by one is equivalent to
tackling the following problem for the i-th sample :
c
=arg min
c
M
X
k=1
(α
k
)
γ
||W
T
k
x
i
k
F
k
e
c
||
2
2
(12)
where e
c
is the c-th row of identity matrix I
C
and c
means
that the c
-th element of g
i
is 1 and others are 0. There are
only C kin ds of candidate clustering indicator vectors, so
we can easily find out the solution of (12).
5358

Step2: Solving W
k
|
M
k=1
and F
k
|
M
k=1
when G and
α
k
|
M
k=1
are fixed.
Calculating W
k
|
M
k=1
and F
k
|
M
k=1
via a weighted multi-
view LDA is a supervised learning stage. Moreover, the
discriminative subspace W
k
for each view is closely related
to the clustering indicator matr ix G and its weight α
k
.
From (9), we know that F
k
is a function of W
k
and G.
When G and α
k
|
M
k=1
are fixed, substituting (9) into (7) and
omitting constant terms, the obje c tive function becomes:
min
W
k
|
M
k=1
M
X
k=1
T r(W
T
k
˜
S
k
W
W
k
)
T r(W
T
k
S
k
T
W
k
)
, s.t.W
T
k
W
k
= I
m
k
|
M
k=1
(13)
where
˜
S
k
W
= (α
k
)
γ
[X
k
X
T
k
X
k
G(G
T
G)
1
G
T
X
T
k
] denotes
the weighted within-class scatter matrix for the k-th view.
Thus, so lving (13) equals to solving the following formula:
max
W
k
|
M
k=1
M
X
k=1
T r(W
T
k
˜
S
k
B
W
k
)
T r(W
T
k
S
k
T
W
k
)
, s.t.W
T
k
W
k
= I
m
k
|
M
k=1
(14)
where
˜
S
k
B
=(α
k
)
γ
X
k
G(G
T
G)
1
G
T
X
T
k
denotes the weight-
ed between-class scatter matrix for the k-th view. (14) jo int-
ly optimizes M distinct discriminative subspaces in paral-
lel. The solution W
k
for each view is solved by a trace ratio
LDA whe n G and α
k
|
M
k=1
are fixed.
Step3: Solving α
k
|
M
k=1
when W
k
|
M
k=1
and G are fixed.
Learning the non-negative normalized weight α
k
for
each view assigns the more discriminative image feature
with higher weight. To derive the solution of α
k
|
M
k=1
, we
rewrite (7) as:
min
α
k
|
M
k=1
M
X
k=1
(α
k
)
γ
H
k
, s.t.
M
X
k=1
α
k
=1, α
k
0
(15)
where
H
k
=
kW
T
k
X
k
F
k
G
T
k
2
F
T r(W
T
k
S
k
T
W
k
)
(16)
Thus, the Lagrange function of (15) is:
M
X
k=1
(α
k
)
γ
H
k
λ(
M
X
k=1
α
k
1) (17)
where λ is the Lagrange multiplier. In order to get the op-
timal solution, we set the derivative of (17) with respect to
α
k
to zero an d then substitute the result into the constraint
P
M
k=1
α
k
=1. There is:
α
k
=
(γH
k
)
1
1γ
P
M
v=1
(γH
v
)
1
1γ
(18)
Algorithm 1 The algorithm of DEKM method
Input:
Data for M views {X
k
|k = 1, 2, ..., M}, X
k
R
d
k
×N
. The number of
clusters C. The reduced dimension m
k
for each view and the parameter
γ.
Output:
The projection matrix W
k
, cluster centroid matrix F
k
and weight α
k
for the k-th view. The common clustering indicator matrix G.
Initialization:
Set t = 0. Initialize G Ind. Initialize W
k
by W
T
k
W
k
= I
m
k
and
initialize the weight α
k
=1/M for the k-th view.
While not converge do
1: Calculate G by :
c
=arg min
c
M
X
k=1
(α
k
)
γ
||W
T
k
x
i
k
F
k
e
c
||
2
2
2: Calculate F
k
by F
k
= W
T
k
X
k
G(G
T
G)
1
and update W
k
|
M
k=1
by
max
W
k
|
M
k=1
M
X
k=1
T r(W
T
k
˜
S
k
B
W
k
)
T r(W
T
k
S
k
T
W
k
)
3: Update α
k
|
M
k=1
by:
α
k
=
(γH
k
)
1
1γ
P
M
v=1
(γH
v
)
1
1γ
End While, return W
k
|
M
k=1
, G and α
k
|
M
k=1
To sum up, in Algo rithm 1 , we can obtain G via Step1,
which is equivalent to the Discr iminative K-Means inc lud-
ing the interrelations among multi-view features. Updating
W
k
|
M
k=1
via Step2 is the dimensionality reduction for eac h
view. Updating α
k
|
M
k=1
via Step3 fulfills the learning of
multiple we ights simultaneously. Then we repea t this pro-
cess iteratively until the objective function value becomes
converged.
3. Convergence anal ysis
As mentioned above, DEKM is a unified and embedded
multi-view f ramework solved by an unsupervised optimiza-
tion scheme. It is o bvious that when we transform (6) into
(7), it can be d ivided into three subproblems. Here we show
the following p roof to verify the convergence of Discrimi-
natively Embedded K-Means (DEKM) algorithm.
Theorem 2. In ea ch iteration, no matter the objective func-
tion value of (6) or that of its variant (7), which all decrease
until the algo rithm converges.
Proof. Supposin g after the t-th itera tion, we have obtained
W
(t)
k
|
M
k=1
, G
(t)
and α
(t)
k
|
M
k=1
. In the t +1-th iteration, we
firstly fix G and α
k
|
M
k=1
as G
(t)
and α
(t)
k
|
M
k=1
respectively,
and then solve W
(t+1)
k
for each view. Thus, when G
(t)
and
α
(t)
k
|
M
k=1
are fixed, according to (6), W
(t+1)
k
can be so lved
5359

by th e following equation:
W
(t+1)
k
=arg max
W
k
(α
(t)
k
)
γ
· · ·
· · ·
T r[W
(t)T
k
X
k
G
(t)
(G
(t)T
G
(t)
)
1
G
(t)T
X
T
k
W
(t)
k
]
T r(W
(t)T
k
S
k
T
W
(t)
k
)
1
=arg min
W
k
(α
(t)
k
)
γ
· · ·
· · ·
T r[W
(t)T
k
(S
k
T
X
k
G
(t)
(G
(t)T
G
(t)
)
1
G
(t)T
X
T
k
)W
(t)
k
]
T r(W
(t)T
k
S
k
T
W
(t)
k
)
(19)
Referring to the way of argumen tation f or [6], through
rewriting (19) we have:
T r
W
(t+1)T
k
˜
S
k(t)
W
W
(t+1)
k
T r
W
(t+1)T
k
S
k
T
W
(t+1)
k
T r
W
(t)T
k
˜
S
k(t)
W
W
(t)
k
T r
W
(t)T
k
S
k
T
W
(t)
k
(20)
where
˜
S
k(t)
W
= (α
(t)
k
)
γ
[S
k
T
X
k
G
(t)
(G
(t)T
G
(t)
)
1
G
(t)T
X
T
k
]
= (α
(t)
k
)
γ
S
k(t)
W
and it denotes the weighted within-class scatter matrix for
the k-th view at th e t-th iteration.
In the same way, we fix W
k
|
M
k=1
and α
k
|
M
k=1
as W
(t)
k
|
M
k=1
and α
(t)
k
|
M
k=1
respectively, and solve for G
(t+1)
. According
to (6), we can obtain:
G
(t+1)
=arg max
G
M
X
k=1
(α
(t)
k
)
γ
· · ·
· · ·
T r[W
(t)T
k
X
k
G
(t)
(G
(t)T
G
(t)
)
1
G
(t)T
X
T
k
W
(t)
k
]
T r(W
(t)T
k
S
k
T
W
(t)
k
)
1
=arg min
G
M
X
k=1
(α
(t)
k
)
γ
· · ·
· · ·
T r[W
(t)T
k
(S
k
T
X
k
G
(t)
(G
(t)T
G
(t)
)
1
G
(t)T
X
T
k
)W
(t)
k
]
T r(W
(t)T
k
S
k
T
W
(t)
k
)
(21)
By rewriting (21), there is:
M
X
k=1
T r
W
(t)T
k
˜
S
k(t+1)
W
W
(t)
k
T r
W
(t)T
k
S
k
T
W
(t)
k
M
X
k=1
T r
W
(t)T
k
˜
S
k(t)
W
W
(t)
k
T r
W
(t)T
k
S
k
T
W
(t)
k
(22)
where
˜
S
k(t+1)
W
=(α
(t)
k
)
γ
S
k
T
X
k
G
(t+1)
(G
(t+1)T
G
(t+1)
)
1
G
(t+1)T
X
T
k
= (α
(t)
k
)
γ
S
k(t+1)
W
and it is the weighted within-class scatter matrix for the k-th
view at the t+1-th iteration.
Similarly, we fix W
k
|
M
k=1
and G as W
(t)
k
|
M
k=1
and G
(t)
respectively, and solve for α
(t+1)
k
|
M
k=1
. According to (6),
for each view, α
(t+1)
k
can be calculated by:
α
(t+1)
k
=arg max
α
k
(α
(t)
k
)
γ
· · ·
· · ·
T r[W
(t)T
k
X
k
G
(t)
(G
(t)T
G
(t)
)
1
G
(t)T
X
T
k
W
(t)
k
]
T r(W
(t)T
k
S
k
T
W
(t)
k
)
1
=arg min
α
k
(α
(t)
k
)
γ
· · ·
· · ·
T r[W
(t)T
k
(S
k
T
X
k
G
(t)
(G
(t)T
G
(t)
)
1
G
(t)T
X
T
k
)W
(t)
k
]
T r(W
(t)T
k
S
k
T
W
(t)
k
)
(23)
Thus, (23) can be further rewritten as follows:
(α
(t+1)
k
)
γ
T r
W
(t)T
k
S
k(t)
W
W
(t)
k
T r
W
(t)T
k
S
k
T
W
(t)
k
(α
(t)
k
)
γ
T r
W
(t)T
k
S
k(t)
W
W
(t)
k
T r
W
(t)T
k
S
k
T
W
(t)
k
(24)
Integrating (20), (22) and (24), we arrive at:
M
X
k=1
T r
W
(t+1)T
k
(α
(t+1)
k
)
γ
S
k(t+1)
W
W
(t+1)
k
T r
W
(t+1)T
k
S
k
T
W
(t+1)
k
M
X
k=1
T r
W
(t)T
k
(α
(t+1)
k
)
γ
S
k(t+1)
W
W
(t)
k
T r
W
(t)T
k
S
k
T
W
(t)
k
M
X
k=1
T r
W
(t)T
k
(α
(t+1)
k
)
γ
S
k(t)
W
W
(t)
k
T r
W
(t)T
k
S
k
T
W
(t)
k
M
X
k=1
T r
W
(t)T
k
(α
(t)
k
)
γ
S
k(t)
W
W
(t)
k
T r
W
(t)T
k
S
k
T
W
(t)
k
(25)
Thus, (25) proves that (6) and its variant (7) are lower
bounded and their objective function value decreases after
each iteration.
4. Experiments
In this section, we evaluate the performance of DEK-
M on three benchmark datasets in ter ms of two standard
clustering evaluation metrics, namely A ccuracy (ACC) [2]
and Norm alized Mutual Information (NMI) [2]. Before do
anything, we need to centralize the data and normalize all
values in the range of [1, 1].
4.1. Datasets
In our experiments, by following [3], thre e benchmark
image datasets in c luding Caltech101 [13], MSRC [28] and
5360

Citations
More filters
Journal ArticleDOI

Generalized Latent Multi-View Subspace Clustering

TL;DR: This work proposes a novel subspace clustering model for multi-view data using a latent representation termed Latent Multi-View Subspace Clustering (LMSC), which explores underlying complementary information from multiple views and simultaneously seeks the underlying latent representation.
Proceedings ArticleDOI

Latent Multi-view Subspace Clustering

TL;DR: A novel Latent Multi-view Subspace Clustering method, which clusters data points with latent representation and simultaneously explores underlying complementary information from multiple views, which makes subspace representation more accurate and robust as well.
Journal ArticleDOI

Multiview Consensus Graph Clustering

TL;DR: A multiview consensus clustering method to learn a consensus graph with minimizing disagreement between different views and constraining the rank of the Laplacian matrix is proposed.
Proceedings Article

Multi-View Clustering via Deep Matrix Factorization.

TL;DR: A deep matrix factorization framework for MVC is presented, where semi-nonnegative matrix factors are adopted to learn the hierarchical semantics of multi-view data in a layerwise fashion and enforce the non-negative representation of each view in the final layer to be the same.
Journal ArticleDOI

Multi-view Clustering: A Survey

TL;DR: A large number of multi-view clustering algorithms are summarized, a taxonomy according to the mechanisms and principles involved is provided, and a few examples for how these techniques are used are given.
References
More filters
Journal ArticleDOI

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

Distinctive Image Features from Scale-Invariant Keypoints

TL;DR: The Scale-Invariant Feature Transform (or SIFT) algorithm is a highly robust method to extract and consequently match distinctive invariant features from images that can then be used to reliably match objects in diering images.
Journal ArticleDOI

Multiresolution gray-scale and rotation invariant texture classification with local binary patterns

TL;DR: A generalized gray-scale and rotation invariant operator presentation that allows for detecting the "uniform" patterns for any quantization of the angular space and for any spatial resolution and presents a method for combining multiple operators for multiresolution analysis.
Related Papers (5)