scispace - formally typeset
Open AccessJournal ArticleDOI

Graph Regularized Nonnegative Matrix Factorization for Data Representation

TLDR
In GNMF, an affinity graph is constructed to encode the geometrical information and a matrix factorization is sought, which respects the graph structure, and the empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.
Abstract
Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. On the other hand, from the geometric perspective, the data is usually sampled from a low-dimensional manifold embedded in a high-dimensional ambient space. One then hopes to find a compact representation,which uncovers the hidden semantics and simultaneously respects the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Graph Regularized Nonnegative Matrix Factorization (GNMF), for this purpose. In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a matrix factorization, which respects the graph structure. Our empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.

read more

Content maybe subject to copyright    Report

Graph Regularized Nonnegative Matrix
Factorization for Data Representation
Deng Cai, Member, IE EE,XiaofeiHe,Senior Member, IEEE,
Jiawei Han, Fellow, IEE E, and Thomas S. Huang, Fellow, IEEE
Abstract—Matrix factorization techniques have been frequently applied in information retrieval, computer vision, and pattern
recognition. Among them, Nonnegative Matrix Factorization (NMF) has received considerable attention due to its psychological and
physiological interpretation of naturally occurring data whose representation may be parts based in the human brain. On the other
hand, from the geometric perspective, the data is usually sampled from a low-dimensional manifold embedded in a high-dimensional
ambient space. One then hopes to find a compact representation,which uncovers the hidden semantics and simultaneously respects
the intrinsic geometric structure. In this paper, we propose a novel algorithm, called Graph Regularized Nonnegative Matrix
Factorization (GNMF), for this purpose. In GNMF, an affinity graph is constructed to encode the geometrical information and we seek a
matrix factorization, which respects the graph structure. Our empirical study shows encouraging results of the proposed algorithm in
comparison to the state-of-the-art algorithms on real-world problems.
Index Terms—Nonnegative matrix factorization, graph Laplacian, manifold regularization, clustering.
Ç
1INTRODUCTION
T
HE techniques for matrix factorization have become
popular in recent years for data representation. In many
problems in information retrieval, computer vision, and
pattern recognition, the input data matrix is of very high
dimension. This makes learning from example infeasible [15].
One then hopes to find two or more lower dimensional
matrices whose product provides a good approximation to
the original one. The canonical matrix factorization techni-
ques include LU decomposition, QR decomposition, vector
quantization, and Singular Value Decomposition (SVD).
SVD is one of the most frequently used matrix factoriza-
tion techniques. A singular value decomposition of an M
N matrix X has the following form:
X ¼ UV
T
;
where U is an M M orthogonal matrix, V is an N N
orthogonal matrix, and is an M N diagonal matrix with
ij
¼ 0 if i j and
ii
0. The quantities
ii
are called the
singular values of X, and the columns of U and V are called
left and right singular vectors, respectively. By removing
those singular vectors corresponding to sufficiently small
singular values, we get a low-rank approximation to the
original matrix. This approximation is optimal in terms of
the reconstruction e rror, and thus optimal for data
representation when euclidean structure is concerned. For
this reason, SVD has been applied to various real-world
applications such as face recognition ( eigenface, [40]) and
document representation (latent semantic indexing, [11]).
Previous studies have shown that there is psychological
and physiological evidence for parts-based representation
in the human brain [34], [41], [31]. The Nonnegative Matrix
Factorization (NMF) algorithm is proposed to learn the
parts of objects like human faces and text documents [33],
[26]. NMF aims to find two nonnegative matrices whose
product provides a good approximation to the original
matrix. The nonnegative constraints lead to a parts-based
representation because they allow only additive, not
subtractive, combinations. NMF has been shown to be
superior to SVD in face recognition [29] and document
clustering [42]. It is optimal for learning the parts of objects.
Recently, various researchers (see [39], [35], [1], [36], [2])
have considered the case when the data is drawn from
sampling a probability distribution that has support on or
near to a submanifold of the ambient space. Here, a
d-dimensional submanifold of a euclidean space IR
M
is a
subset M
d
IR
M
, which locally looks like a flat d-dimen-
sional euclidean space [28]. In order to detect the under-
lying manifold structure, many manifold learning algorithms
have been proposed, such as Locally Linear Embedding
(LLE) [35], ISOMAP [39], and Laplacian Eigenmap [1]. All
of these algorithms use the so-called locally invariant idea
[18], i.e., the nearby points are likely to have similar
embeddings. It has been shown that learning performance
can be significantly enhanced if the geometrical structure is
exploited and the local invariance is considered.
1548 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 8, AUGU ST 2011
. D. Cai and X. He are with the State Key Lab of CAD&CG, College of
Computer Science, Zhejiang University, 388 Yu Hang Tang Rd.,
Hangzhou, Zhejiang 310058, China.
E-mail: {dengcai, xiaofeihe}@cad.zju.edu.cn.
. J. Han is with the Department of Computer Science, University of Illinois
at Urbana Champaign, Siebel Center, 201 N. Goodwin Ave., Urbana, IL
61801. E-mail: hanj@cs.uiuc.edu.
. T.S. Huang is with the Beckman Institute for Advanced Sciences and
Technology, Univ ersity of Illinois at Urbana Champaign, Beckman
Institute Center, 405 North Mathews Ave., Urbana, IL 61801.
E-mail: huang@ifp.uiuc.edu.
Manuscript received 28 Apr. 2009; revised 21 Dec. 2009; accepted 22 Oct.
2010; published online 13 Dec. 2010.
Recommended for acceptance by D.D. Lee.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number
TPAMI-2009-04-0266.
Digital Object Identifier no. 10.1109/TPAMI.2010.231.
0162-8828/11/$26.00 ß 2011 IEEE Published by the IEEE Computer Society

Motivated by recent progress in matrix factorization and
manifold learning [2], [5], [6], [7], in this paper we propose a
novel algorithm, called Graph regularized Nonnegative
Matrix Factorization (GNMF), which explicitly considers
the local invariance. We encode the geometrical information
of the data space by constructing a nearest neighbor graph.
Our goal is to find a parts-based representation space in
which two data points are sufficiently close to each other, if
they are connected in the graph. To achieve this, we design
a new matrix factorization objective function and incorpo-
rate the graph structure into it. We also develop an
optimization scheme to solve the objective function based
on iterative updates of the two factor matrices. This leads to
a new parts-based data representation which respects the
geometrical structure of the data space. The convergence
proof of our optimization scheme is provided.
It is worthwhile to highlight several aspects of the
proposed approach here:
1. While the standard NMF fits the data in a euclidean
space, our algorithm exploits the intrinsic geometry
of the data distribution and incorporates it as an
additional regularization term. Hence, our algorithm
is particularly applicable when the data are sampled
from a submanifold which is embedded in high-
dimensional ambient space.
2. Our algorithm constructs a nearest neighbor graph
to model the manifold structure. The weight matrix
of the graph is highly sparse. Therefore, the multi-
plicative update rules for GNMF are very efficient.
By preserving the graph structure, our algorithm can
have more discriminating power than the standard
NMF algorithm.
3. Recent studies [17], [13] show that NMF is closely
related to Probabilistic Latent Semantic A nalysis
(PLSA) [21]. The latter is one of the most popular
topic modeling algorithms. Specifically, NMF with
KL-divergence formulation is equivalent to PLSA
[13]. From this viewpoint, th e pro posed GNMF
approach also provides a principled way for incorpor-
ating the geometrical structure into topic modeling.
4. The proposed framework is a general one that can
leverage the power of both NMF and graph Laplacian
regularization. Besides the nearest neighbor informa-
tion, other knowledge (e.g., label information, social
network structure) about the data can also be used to
construct the graph. This naturally leads to other
extensions (e.g., semi-supervised NMF).
The rest of the paper is organized as follows: In Section 2,
we give a brief review of NMF. Section 3 introduces our
algorithm and provides a convergence proof of our
optimization scheme. Extensive experimental results on
clustering are presented in Section 4. Finally, we provide
some concluding remarks and suggestions for future work
in Section 5.
2ABRIEF REVIEW OF NMF
NMF [26] is a matrix factorization algorithm that focuses
on the analysis of data matrices whose elements are
nonnegative.
Given a data matrix X ¼½x
1
; ...; x
N
2IR
MN
,each
column of X is a sample vector. NMF aims to find two
nonnegative matrices U ¼½u
ik
2IR
MK
and V ¼½v
jk
2
IR
NK
whose product can well approximate the original
matrix X:
X UV
T
:
There are two commonly used cost functions that quantify
the quality of the approximation. The first one is the square of
the euclidean distance between two matrices (the square of
the Frobenius norm of two matrices difference) [33]:
O
1
¼kX UV
T
k
2
¼
X
i;j
x
ij
X
K
k¼1
u
ik
v
jk
!
2
: ð1Þ
The second one is the “divergence” between two
matrices [27]:
O
2
¼ DðXkUV
T
Þ¼
X
i;j
x
ij
log
x
ij
y
ij
x
ij
þ y
ij

; ð2Þ
where Y ¼½y
ij
¼UV
T
. This cost function is referred to as
“divergence” of X from Y instead of “distance” between X
and Y because it is not symmet ric. In other words,
DðXkYÞ DðYkXÞ. It reduces to the Kullback-Leibler
divergence or relative entropy, when
P
ij
x
ij
¼
P
ij
y
ij
¼ 1,
so that X and Y can be regarded as normalized probability
distributions. We will refer O
1
as F-norm formulation and
O
2
as divergence formulation in the rest of the paper.
Although the objective functions O
1
in (1) and O
2
in (2)
are convex in U only or V only, they are not convex in both
variables together. Therefore, it is unrealistic to expect an
algorithm to find the global minimum of O
1
(or O
2
). Lee
and Seung [27] presented two iterative update algorithms.
The algorithm minimizing the objective function O
1
in (1) is
as follows:
u
ik
u
ik
ðXVÞ
ik
ðUV
T
VÞ
ik
;v
jk
v
jk
ðX
T
UÞ
jk
ðVU
T
UÞ
jk
:
The algorithm minimizing the objective function O
2
in (2) is
u
ik
u
ik
P
j
x
ij
v
jk
=
P
k
u
ik
v
jk

P
j
v
jk
;
v
jk
v
jk
P
i
x
ij
u
ik
=
P
k
u
ik
v
jk

P
i
u
ik
:
It is proven that the above two algorithms will find local
minima of the objective functions O
1
and O
2
[27].
In reality, we have K M and K N. Thus, NMF
essentially tries to find a compressed approximation of the
original data matrix. We can view this approximation
column by column as
x
j
X
K
k¼1
u
k
v
jk
; ð3Þ
where u
k
is the kth column vector of U. Thus, each data
vector x
j
is approximated by a linear combination of the
columns of U, weighted by the components of V. Therefore,
U can be regarded as containing a basis, that is, optimized
CAI ET AL.: GRAPH REGULARIZED NONNEGATIVE MATRIX FACTORIZATION FOR DATA REPRESENTATION 1549

for the linear approximation of the data in X. Let z
T
j
denote
the jth row of V, z
j
¼½v
j1
; ...;v
jk
T
. z
j
can be regarded as the
new representation of the jth data point with respect to the
new basis U. Since relatively few basis vectors are used to
represent many data vectors, a good approximation can only
be achieved if the basis vectors discover structure that is
latent in the data [27].
The nonnegative constraints on U and V only allow
additive combinations among different bases. This is the
most significant difference between NMF and the other
matrix factorization methods, e.g., SVD. Unlike SVD, no
subtractions can occur in NMF. For this reason, it is
believed that NMF can learn a parts-based representation
[26]. The advantages of this parts-based representation have
been observed in many real-world problems such as face
analysis [29], document clustering [42], and DNA gene
expression analysis [3].
3GRAPH REGULARIZED NONNEGATIVE MATRIX
FACTORIZATION
By using the nonnegative constraints, NMF can learn a
parts-based representation. However, NMF performs this
learning in the euclidean space. It fails to discover the
intrinsic geometrical and discriminating structure of the
data space, which is essential to the real-world applications.
In this section, we introduce our GNMF algorithm, which
avoids this limitation by incorporating a geometrically
based regularizer.
3.1 NMF with Manifold Regularization
Recall that NMF tries to find a set of basis vectors that can
be used to best approximate the data. One might further
hope that the basis vectors can respect the intrinsic
Riemannian stru cture, rather than ambient euclidean
structure. A natural assumption here could be that if two
data points x
j
; x
l
are close in the intrinsic geometry of the
data distribution, then z
j
and z
l
, the representations of these
two points with respect to the new basis, are also close to
each other. This assumption is usually referred to as local
invariance assumption [1], [19], [7], which plays an essential
role in the development of various kinds of algorithms,
including dimensionality reduction algorithms [1] a nd
semi-supervised learning algorithms [2], [46], [45].
Recent studies in spectral graph theory [9] and manifold
learning theory [1] have demonstrated that the local
geometric structure can be effectively modeled through a
nearest neighbor graph on a scatter of data points. Consider
a graph with N vertices, where each vertex corresponds to a
data point. For each data point x
j
, we find its p nearest
neighbors and put edges between x
j
and its neighbors.
There are many choices to define the weight matrix W on
the graph. Three of the most commonly used are as follows:
1. 0-1 Weighting. W
jl
¼ 1, if and only if nodes j and l
are connected by an edge. This is the simplest
weighting method and is very easy to compute.
2. Heat Kernel Weighting.Ifnodesj and l are
connected, put
W
jl
¼ e
kx
j
x
l
k
2
:
Heat kernel has an intrinsic connection to the
Laplace-Beltrami operator on differentiable func-
tions on a manifold [1].
3. Dot-Product Weighting.Ifnodesj and l are
connected, put
W
jl
¼ x
T
j
x
l
:
Note that if x is normalized to 1, the dot product of
two vectors is equivalent to the cosine similarity of
the two vectors.
The W
jl
is used to measure the closeness of two points x
j
and x
l
. The different similarity measures are suitable for
different situations. For example, the cosine similarity (dot-
product weighting) is very popular in the IR community
(for processing documents), while for image data, the heat
kernel weight may be a better choice. Since W
jl
in our paper
is only for measuring the closeness, we do not treat the
different weighting schemes separately.
The low-dimensional representation of x
j
with respect to
the new basis is z
j
¼½v
j1
; ...;v
jk
T
. Again, we can use either
euclidean distance
dðz
j
; z
l
Þ¼kz
j
z
l
k
2
;
or divergence
Dðz
j
kz
l
Þ¼
X
K
k¼1
v
jk
log
v
jk
v
lk
v
jk
þ v
lk

;
to measure the “dissimilarity” between the low-dimen-
sional representations of two data points with respect to the
new basis.
With the above defined weight matrix W, we can use the
following two terms to measure the smoothness of the low-
dimensional representation
R
2
¼
1
2
X
N
j;l¼1
ðDðz
j
kz
l
ÞþDðz
l
kz
j
ÞÞW
jl
¼
1
2
X
N
j;l¼1
X
K
k¼1
v
jk
log
v
jk
v
lk
þ v
lk
log
v
lk
v
jk

W
jl
;
ð4Þ
and
R
1
¼
1
2
X
N
j;l¼1
kz
j
z
l
k
2
W
jl
¼
X
N
j¼1
z
T
j
z
j
D
jj
X
N
j;l¼1
z
T
j
z
l
W
jl
¼ TrðV
T
DVÞTrðV
T
WVÞ¼TrðV
T
LVÞ;
ð5Þ
where TrðÞ denotes the trace of a matrix and D is a
diagonal matrix whose entries are column (or row, since W
is symmetric) sums of W; D
jj
¼
P
l
W
jl
. L ¼ D W, which
is called graph Laplacian [9].
By minimizing R
1
(or R
2
), we expect that if two data
points x
j
and x
l
are close (i.e., W
jl
is big), z
j
and z
l
are also
close to each other. Combining this geometrically-based
regularizer with the original NMF objective function leads
to our GNMF.
Given a data matrix X ¼½x
ij
2IR
MN
, our GNMF aims
to find two nonnegative matrices U ¼½u
ik
2IR
MK
and
1550 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 8, AUGU ST 2011

V ¼½v
jk
2IR
NK
. Similarly to NMF, we can also use two
“distance” measures here. If the euclidean distance is used,
GNMF minimizes the objective function as follows:
O
1
¼kX UV
T
k
2
þ TrðV
T
LVÞ: ð6Þ
If the divergence is used, GNMF minimizes
O
2
¼
X
M
i¼1
X
N
j¼1
x
ij
log
x
ij
P
K
k¼1
u
ik
v
jk
x
ij
þ
X
K
k¼1
u
ik
v
jk
!
þ
2
X
N
j¼1
X
N
l¼1
X
K
k¼1
v
jk
log
v
jk
v
lk
þ v
lk
log
v
lk
v
jk

W
jl
;
ð7Þ
where the regularizat ion parameter 0 controls the
smoothness of the new representation.
3.2 Updating Rules Minim izing (6)
The objective functions O
1
and O
2
of GNMF in (6) and (7)
are not convex in both U and V together. Therefore, it is
unrealistic to expect an algorithm to find the global minima.
In the following, we introduce two iterative algorithms
which can achieve local minima.
We first discuss how to minimize the objective functionO
1
,
which can be rewritten as
O
1
¼ Tr
ðX UV
T
ÞðX UV
T
Þ
T
þ TrðV
T
LVÞ
¼ Tr
XX
T
2Tr
XVU
T
þ Tr
UV
T
VU
T
þ TrðV
T
LVÞ;
ð8Þ
where the second equality applies the matrix properties
TrðABÞ¼TrðBAÞ and TrðAÞ¼TrðA
T
Þ. Let
ik
and
jk
be
the lagrange multiplier for constraint u
ik
0 and v
jk
0,
respectively, and ¼½
ik
, ¼½
jk
, the Lagrange L is
Tr
XX
T
2Tr
XVU
T
þ Tr
UV
T
VU
T
þ TrðV
T
LVÞþTrðU
T
ÞþTrðV
T
Þ:
ð9Þ
The partial derivatives of L with respect to U and V are
@L
@U
¼2XV þ 2UV
T
V þ ; ð10Þ
@L
@V
¼2X
T
U þ 2VU
T
U þ 2LV þ : ð11Þ
Using the KKT conditions
ik
u
ik
¼ 0 and
jk
v
jk
¼ 0, we get
the following equations for u
ik
and v
jk
:
ðXVÞ
ik
u
ik
þðUV
T
VÞ
ik
u
ik
¼ 0; ð12Þ
ðX
T
UÞ
jk
v
jk
þðVU
T
UÞ
jk
v
jk
þ ðLVÞ
jk
v
jk
¼ 0: ð13Þ
These equations lead to the following updating rules:
u
ik
u
ik
ðXVÞ
ik
ðUV
T
VÞ
ik
; ð14Þ
v
jk
v
jk
ðX
T
U þ WVÞ
jk
ðVU
T
U þ DVÞ
jk
: ð15Þ
Regardingthesetwoupdatingrules,wehavethe
following theorem:
Theorem 1. The objective function O
1
in (6) is nonincreasing
under the updating rules in (14) and (15).
Please see the Appendix for a detailed proof for the
above theorem. Our proof essentially follows the idea in the
proof of Lee and Seung’s [27] paper for the original NMF.
Recent studies [8], [30] show that Lee and Seung’s [27]
multiplicative algorithm cannot guarantee the convergence
to a stationary point. Particularly, Lin [30] suggests minor
modifications on Lee and Seung’s algorithm, which can
converge. Our updating rules in (14) and (15) are essentially
similar to the updating rules for NMF, and therefore, Lin’s
modifications can also be applied.
When ¼ 0, it is easy to check that the updating rules in
(14) and (15) reduce to the updating rules of the original NMF.
For the objective function of NMF, it is easy to check
that if U and V are the solution, then UD; VD
1
will also
form a solution for any positive diagonal matrix D.To
eliminate this uncertainty, in practice, people will further
require that the euclidean length of each column vector in
matrix U (or V) is 1 [42]. The matrix V (or U) will be
adjusted accordingly so that UV
T
does not change. This
can be achieved by
u
ik
u
ik
ffiffiffiffiffiffiffiffiffiffiffiffiffi
P
i
u
2
ik
q
;v
jk
v
jk
ffiffiffiffiffiffiffiffiffiffiffiffiffi
X
i
u
2
ik
r
: ð16Þ
Our GNMF also adopts this strategy. After the multi-
plicative updating procedure converges, we set the
euclidean length of each column vector in matrix U to 1
and adjust the matrix V so that UV
T
does not change.
3.3 Connection to Gradient Descent Method
Another general algorithm for minimizing the objective
function of GNMF in (6) is gradient descent [25]. For our
problem, gradient descent leads to the following additive
update rules:
u
ik
u
ik
þ
ik
@O
1
@u
ik
;v
jk
v
jk
þ
jk
@O
1
@v
jk
: ð17Þ
The
ik
and
jk
are usually referred as step size parameters.
As long as
ik
and
jk
are sufficiently small, the above
updates should reduce O
1
unless U and V are at a
stationary point.
Generally speaking, it is relatively difficult to set these
step size parameters while still maintaining the non-
negativity of u
ik
and v
jk
. However, with the special form of
the partial derivatives, we can use some tricks to set the step
size parameters automatically. Let
ik
¼u
ik
=2ðUV
T
VÞ
ik
,
we have
u
ik
þ
ik
@O
1
@u
ik
¼ u
ik
u
ik
2ðUV
T
VÞ
ik
@O
1
@u
ik
¼ u
ik
u
ik
2ðUV
T
VÞ
ik
ð2ðXVÞ
ik
þ 2ðUV
T
VÞ
ik
Þ
¼ u
ik
ðXVÞ
ik
ðUV
T
VÞ
ik
:
ð18Þ
CAI ET AL.: GRAPH REGULARIZED NONNEGATIVE MATRIX FACTORIZATION FOR DATA REPRESENTATION 1551

Similarly, letting
jk
¼v
jk
=2ðVU
T
U þ DVÞ
jk
, we have
v
jk
þ
jk
@O
1
@v
jk
¼ v
jk
v
jk
2ðVU
T
U þ DVÞ
jk
@O
1
@v
jk
¼ v
jk
v
jk
2ðVU
T
U þ DVÞ
jk
ð2ðX
T
UÞ
jk
þ 2ðVU
T
UÞ
jk
þ 2ðLVÞ
jk
Þ
¼ v
jk
ðX
T
U þ WVÞ
jk
ðVU
T
U þ DVÞ
jk
:
ð19Þ
Now, it is clear that the multiplicative updating rules in (14)
and (15) are special cases of gradient descent with an
automatic step parameter selection. The adv antage of
multiplicative updating rules is the guarantee of nonnega-
tivity of U and V. Theorem 1 also guarantees that the
multiplicative updating rules in (14) and (15) converge to a
local optimum.
3.4 Updating Rules Minim izing (7)
For the divergence formulation of GNMF, we also have two
updating rules, which can achieve a local minimum of (7):
u
ik
u
ik
P
j
x
ij
v
jk
=
P
k
u
ik
v
jk

P
j
v
jk
; ð20Þ
v
k
X
i
u
ik
I þ L
!
1
v
1k
P
i
x
i1
u
ik
=
P
k
u
ik
v
1k

v
2k
P
i
x
i2
u
ik
=
P
k
u
ik
v
2k

.
.
.
v
Nk
P
i
x
iN
u
ik
=
P
k
u
ik
v
Nk

2
6
6
6
4
3
7
7
7
5
;
ð21Þ
where v
k
is the kth column of V and I is an N N identity
matrix.
Similarly, we have the following theorem:
Theorem 2. The objective function O
2
in (7) is nonincreasing
with the updating rules in (20) and (21). The objective
function is invariant under these updates if and only if U and
V are at a stationary point.
Please see the Appendix for a detailed proof. The
updating rules in this section (minimizing the divergence
formulation of (7)) are different from the updating rules in
Section 3.2 (minimizing the F-norm formulation). For the
divergence formulation of NMF, previous studies [16]
successfully analyzed the convergence property of the
multiplicative algorithm [27] from EM algorithm’s max-
imum likelihood point of view. Such an analysis is also
valid in the GNMF case.
When ¼ 0, it is easy to check that the updating rules
in (20) and (21) reduce to the updating rules of the
original NMF.
3.5 Computational Complexity Analysis
In this section, we discuss the extra computational cost of our
proposed algorithm in comparison to standard NMF.
Specifically, we provide the computational complexity
analysis of GNMF for both the F-Norm and KL-Divergence
formulations.
The common way to express the complexity of one
algorithm is using big O notation [10]. However, this is not
precise enough to differentiate between the complexities of
GNMF and NMF. Thus, we count the arithmetic operations
for each algorithm.
Based on the updating rules, it is not hard to count the
arithmetic operations of each iteration in NMF. We
summarize the result in Table 1. For GNMF, it is important
to note that W is a sparse matrix. If we use a p-nearest
neighbor graph, the average nonzero elements on each row
of W is p. Thus, we only need NpK flam (a floating-point
addition and multiplication) to compute WV. We also
summarize the arithmetic operations for GNMF in Table 1.
The updating rule (21) in GNMF with the divergence
formulation involves inverting a large matrix
P
i
u
ik
I þ L.
In reality, there is no need to actually compute the inversion.
We only need to solve the linear equations system as follows:
X
i
u
ik
I þ L
!
v
k
¼
v
1k
P
i
x
i1
u
ik
=
P
k
u
ik
v
1k

v
2k
P
i
x
i2
u
ik
=
P
k
u
ik
v
2k

.
.
.
v
Nk
P
i
x
iN
u
ik
=
P
k
u
ik
v
Nk

2
6
6
6
6
6
4
3
7
7
7
7
7
5
:
Since matrix
P
i
u
ik
I þ L is symmetric, positive definite,
and sparse, we can use the iterative algorithm CG [20] to
solve this linear system of equations very efficiently. In each
iteration, CG needs to compute the matrix-vector products
in the form of ð
P
i
u
ik
I þ LÞp. The remaining work load of
CG in each iteration is 4N flam. Thus, the time cost of CG in
1552 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 33, NO. 8, AUGU ST 2011
TABLE 1
Computational Operation Counts for Each Iteration in NMF and GNMF
fladd: a floating-point addition, flmlt: a floating-point multiplication, fldiv: a floating-point division.
N: the number of sample points, M: the number of features, K: the number of factors.
p: the number of nearest neighbors, q: the number of iterations in Conjugate Gradient (CG).

Citations
More filters
Journal ArticleDOI

Parameter-less Auto-weighted multiple graph regularized Nonnegative Matrix Factorization for data representation

TL;DR: In GNMF, an affinity graph is constructed to encode the geometrical information and a matrix factorization is sought, which respects the graph structure, and the empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.
Proceedings Article

Community preserving network embedding

TL;DR: A novel Modularized Nonnegative Matrix Factorization (M-NMF) model is proposed to incorporate the community structure into network embedding and jointly optimize NMF based representation learning model and modularity based community detection model in a unified framework, which enables the learned representations of nodes to preserve both of the microscopic and community structures.
Journal ArticleDOI

Nonnegative Matrix Factorization: A Comprehensive Review

TL;DR: A comprehensive survey of NMF algorithms can be found in this paper, where the principles, basic models, properties, and algorithms along with its various modifications, extensions, and generalizations are summarized systematically.
Journal ArticleDOI

Unsupervised K-Means Clustering Algorithm

TL;DR: An unsupervised learning schema is constructed for the k-means algorithm so that it is free of initializations without parameter selection and can also simultaneously find an optimal number of clusters.
Journal ArticleDOI

Adaptation Regularization: A General Framework for Transfer Learning

TL;DR: A novel transfer learning framework, referred to as Adaptation Regularization based Transfer Learning (ARTL), to model adaptive classifiers in a unified way based on the structural risk minimization principle and the regularization theory, and can significantly outperform state-of-the-art learning methods on several public text and image datasets.
References
More filters
Book

Principal Component Analysis

TL;DR: In this article, the authors present a graphical representation of data using Principal Component Analysis (PCA) for time series and other non-independent data, as well as a generalization and adaptation of principal component analysis.
Journal ArticleDOI

Nonlinear dimensionality reduction by locally linear embedding.

TL;DR: Locally linear embedding (LLE) is introduced, an unsupervised learning algorithm that computes low-dimensional, neighborhood-preserving embeddings of high-dimensional inputs that learns the global structure of nonlinear manifolds.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What have the authors contributed in "Graph regularized nonnegative matrix factorization for data representation" ?

In this paper, the authors propose a novel algorithm, called Graph Regularized Nonnegative Matrix Factorization ( GNMF ), for this purpose. 

Several questions remain to be investigated in their future work: 1. There is a parameter which controls the smoothness of their GNMF model. This suggests another way to extend NMF. For the F-norm formulation, Lin [ 30 ] shows that Lee and Seung ’ s multiplicative algorithm can not guarantee the convergence to a stationary point and suggests minor modifications on Lee and Seung ’ s algorithm, which can converge. 

The Nonnegative Matrix Factorization (NMF) algorithm is proposed to learn the parts of objects like human faces and text documents [33], [26]. 

Besides the nearest neighbor information, other knowledge (e.g., label information, social network structure) about the data can also be used to construct the graph. 

Two metrics, the accuracy (AC) and the normalized mutual information metric (NMI) are used to measure the clustering performance. 

In this case, the dot-product of two document vectors becomes their cosine similarity, which is a widely used similarity measure for document in information retrieval community. 

The advantage of multiplicative updating rules is the guarantee of nonnegativity of U and V. Theorem 1 also guarantees that the multiplicative updating rules in (14) and (15) converge to a local optimum. 

Zha et al. [44] have shown that K-means clustering in the SVD subspace has a close connection to average association [38], which is a popular spectral clustering algorithm. 

In many problems in information retrieval, computer vision, and pattern recognition, the input data matrix is of very high dimension. 

In this experiment, those documents appearing in two or more categories were removed and only the largest 30 categories were kept, thus leaving us with 9,394 documents in total. 

This shows that by leveraging the power of both the parts-based representation and graph Laplacian regularization, GNMF can learn a better compact representation. 

For this reason, SVD has been applied to various real-world applications such as face recognition (eigenface, [40]) and document representation (latent semantic indexing, [11]).