scispace - formally typeset
Open AccessJournal ArticleDOI

Nonnegative Matrix Factorization: A Comprehensive Review

Yu-Xiong Wang, +1 more
- 01 Jun 2013 - 
- Vol. 25, Iss: 6, pp 1336-1353
Reads0
Chats0
TLDR
A comprehensive survey of NMF algorithms can be found in this paper, where the principles, basic models, properties, and algorithms along with its various modifications, extensions, and generalizations are summarized systematically.
Abstract
Nonnegative Matrix Factorization (NMF), a relatively novel paradigm for dimensionality reduction, has been in the ascendant since its inception. It incorporates the nonnegativity constraint and thus obtains the parts-based representation as well as enhancing the interpretability of the issue correspondingly. This survey paper mainly focuses on the theoretical research into NMF over the last 5 years, where the principles, basic models, properties, and algorithms of NMF along with its various modifications, extensions, and generalizations are summarized systematically. The existing NMF algorithms are divided into four categories: Basic NMF (BNMF), Constrained NMF (CNMF), Structured NMF (SNMF), and Generalized NMF (GNMF), upon which the design principles, characteristics, problems, relationships, and evolution of these algorithms are presented and analyzed comprehensively. Some related work not on NMF that NMF should learn from or has connections with is involved too. Moreover, some open issues remained to be solved are discussed. Several relevant application areas of NMF are also briefly described. This survey aims to construct an integrated, state-of-the-art framework for NMF concept, from which the follow-up research may benefit.

read more

Content maybe subject to copyright    Report

Nonnegative Matrix Factorization:
A Comprehensive Review
Yu-Xiong Wang, Student Member, IEEE, and Yu-Jin Zhang, Senior Member, IE EE
Abstract—Nonnegative Matrix Factorization (NMF), a relatively novel paradigm for dimensionality reduction, has been in the
ascendant since its inception. It incorporates the nonnegativity constraint and thus obtains the parts-based representation as well as
enhancing the interpretability of the issue correspondingly. This survey paper mainly focuses on the theoretical research into NMF over
the last 5 years, where the principles, basic models, properties, and algorithms of NMF along with its various modifications, extensions,
and generalizations are summarized systematically. The existing NMF algorithms are divided into four categories: Basic NMF (BNMF),
Constrained NMF (CNMF), Structured NMF (SNMF), and Generalized NMF (GNMF), upon which the design principles,
characteristics, problems, relationships, and evolution of these algorithms are presented and analyzed comprehensively. Some related
work not on NMF that NMF should learn from or has connections with is involved too. Moreover, some open issues remained to be
solved are discussed. Several relevant application areas of NMF are also briefly described. This survey aims to construct an
integrated, state-of-the-art framework for NMF concept, from which the follow-up research may benefit.
Index Terms—Data mining, dimensionality reduction, multivariate data analysis, nonnegative matrix factorization (NMF)
Ç
1INTRODUCTION
O
NE of the basic concepts deeply rooted in science and
engineering is that there must be something simple,
compact, and elegant playing the fundamental roles under
the apparent chaos and complexity. This is also the case in
signal processing, data analysis, data mining, pattern
recognition, and machine learning. With the increasing
quantities of available raw data due to the development in
sensor and computer technology, how to obtain such an
effective way of representation by appropriate dimension-
ality reduction technique has become important, necessary,
and challenging in multivariate data analysis. Generally
speaking, two basic properties are supposed to be satisfied:
first, the dimension of the original data should be reduced;
second, the principal components, hidden concepts, promi-
nent features, or latent variables of the data, depending on
the application context, should be identified efficaciously.
In many cases, the primitive data sets or observations are
organized as data matrices (or tensors), and described by
linear (or multilinear) combination models; whereupon the
formulation of dimensionality reduction can be regarded as,
from the algebraic perspective, decomposing the original
data matrix into two factor matrices. The canonical methods,
such as Principal Component Analysis (PCA), Linear
Discriminant Analysis (LDA), Ind ependent Component
Analysis (ICA), Vector Quantization (VQ), etc., are the
exemplars of such low-rank approximations. They differ
from one another in the statistical properties attributable to
the different constraints imposed on the component
matrices and their underlying structures; however, they
have something in common that there is no constraint in the
sign of the elements in the factorized matrices. In other
words, the negative component or the subtractive combina-
tion is allowed in the representation. By contrast, a new
paradigm of factorization—Nonnegative Matrix Factoriza-
tion (NMF), which incorporates the nonnegativity constraint
and thus obtains the parts-based representation as well as
enhancing the interpretability of the issue correspondingly,
was initiated by Paatero and Tapper [1], [2] together with
Lee and Seung [3], [4].
As a matter of fact, the notion of NMF has a long history
under the name “self modeling curve resoluti on” in
chemometrics, where the vectors are continuous curves
rather than discrete vectors [5]. NMF was first introduced
by Paatero and Tapper as the concept of Positive Matrix
Factorization, which concentrated on a specific application
with Byzantine algorithms. These shortcomings limit both
the theoretical analysis, such as the convergence of the
algorithms or the properties of the solutions, and the
generalization of the algorithms in other applications.
Fortunately, NMF was popularized by Lee and Seung due
to their contributing work of a simple yet effective
algorithmic procedure, and more importantly the emphasis
on its potential value of parts-based representation.
Far beyond a mathematical exploration, the philosophy
underlying NMF, which tries to formulate a feasible model
for learning object parts, is closely relevant to perception
mechanism. While the parts-based representation seems
intuitive, it is indeed on the basis of physiological and
psychological evidence: perception of the whole is based on
perception of its parts [6], one of the core concepts in certain
computational theories of recognition problems. In fact there
are two complementary connotations in nonnegativity—
nonnegative component and purely additive combination.
On the one hand, the negative values of both observations
1336 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 6, JUNE 2013
. The authors are with the Tsinghua National Laboratory for Information
Science and Technology and the Department of Electronic Engineering,
Tsinghua University, Rohm Building, Beijing 100084, China.
E-mail: albertwyx@gmail.com, zhang-yj@tsinghua.edu.cn.
Manuscript received 21 July 2011; revised 2 Nov. 2011; accepted 4 Feb. 2012;
published online 2 Mar. 2012.
Recommended for acceptance by L. Chen.
For information on obtaining reprints of this article, please send e-mail to:
tkde@computer.org, and reference IEEECS Log Number TKDE-2011-07-0429.
Digital Object Identifier no. 10.1109/TKDE.2012.51.
1041-4347/13/$31.00 ß 2013 IEEE Published by the IEEE Computer Society

and latent components are physically meaningless in many
kinds of real-world data, such as image, spectra, and gene
data, analysis tasks. Meanwhile, the discovered prototypes
commonly correspond with certain semantic interpretation.
For instance, in face recognition, the learned basis images are
localized rather than holistic, resembling parts of faces, such
as eyes, nose, mouth, and cheeks [3]. On the other hand,
objects of interest are most naturally characterized by the
inventory of its parts, and the exclusively additive combina-
tion means that they can be reassembled by adding required
parts together similar to identikits. NMF thereupon has
achieved great success in real-word scenarios and tasks. In
document clustering, NMF surpasses the classic methods,
such as spectral clustering, not only in accuracy improve-
ment but also in latent semantic topic identification [7].
To boot, the nonnegativity constraint will lead to sort of
sparseness naturally [3], which is proved to be a highly
effective representation distinguished from both the com-
pletely distributed and the solely active component de-
scription [8]. When NMF is interpreted as a neural network
learning algorithm depicting how the visible variables are
generated from the hidden ones, the parts-based represen-
tation is obtained from the additive model. A positive
number indicates the presence and a zero value represents
the absence of some event or component. This conforms
nicely to the dualistic properties of neural activity and
synaptic strengths in neurophysiology: either excitatory or
inhibitory without changing sign [3].
Because of the enhanced semantic interpretability under
the nonnegativity and the ensuing sparsity, NMF has
become an imperative tool in multivariate data analysis,
and been widely used in the fields of m athematics,
optimization, neural computing, pattern recognition and
machine learning [9], data mining [10], signal processing
[11], image engineering and computer vision [11], spectral
data analysis [12], bioinformatics [13], chemometrics [1],
geophysics [14], finance and economics [15]. More specifi-
cally, such applications include text data mining [16], digital
watermark, image denoising [17], image restoration, image
segmentation [18], image fusion, image classification [19],
image retrieval, face hallucination, face recognition [20],
facial expression recognition [21], audio pattern separation
[22], music genre classification [23], speech recognition,
microarray analysis, blind source separation [24], spectro-
scopy [25], gene expression classification [26], cell analysis,
EEG signal processing [17], pathologic diagnosis, email
surveillance [10], online discussion participation prediction,
network security, automatic personalized summarization,
identification of compounds in atmosphere analysis [14],
earthquake prediction, stock market pricing [15], and so on.
There have been numerous results devoted to NMF
research since its inception. Researchers from various fields,
mathematicians, statisticians, computer scientists, biolo-
gists, and neuroscientists, have explored the NMF concept
from diverse perspectives. So a systematic survey is of
necessity and consequence. Although there have been such
survey papers as [27], [28], [12], [13], [10], [11], [29] and
one book [9], they fail to reflect either the updated or the
comprehensive results. This review paper will summarize
the principles, basic models, properties, and algorithms of
NMF systematically over the last 5 years, including its
various modifications, extensions, and generalizations. A
taxonomy is accordingly proposed to logically group them,
which have not been presented before. Besides these, some
related work not on NMF that NMF should learn from or
has connections with will also be involved. Furthermore,
this survey mainly focuses on the theoretical research rather
than the specific applications, the practical usage will also
be concerned though. It aims to construct an integrated,
state-of-the-art framework for NMF concept, from which
the follow-up research may benefit.
In conclusion, the theory of NMF has advanced sig-
nificantly by now yet is still a work in progress. To be
specific: 1) the properties of NMF itself have been explored
more deeply; whereas a firm statistical underpinning like
those of the traditional factorization methods—PCA or
LDA—is not developed fully (partly due to its knottiness).
2) Some problems like the ones mentioned in [29] have been
solved, especially those with additional constraints; never-
theless a lot of other questions are still left open.
The existing NMF algorithms are divided into four
categories here given in Fig. 1, following some unified
criteria:
1. Basic NMF (BNMF), which only imposes the non-
negativity constraint.
2. Constrained NMF (CNMF), which imposes some
additional constraints as regularization.
3. Structured NMF (SNMF), which modifies the stan-
dard factorization formulations.
4. Generalized NMF (GNMF), which breaks through
the conventional data types or factorization modes
in a broad sense.
The model level from B asic to Generalized NMF
becomes broader. Therein Basic NMF formulates the
fundamental analytical framework upon which all other
NMF models are built. We will present the optimization
tools and computational methods to efficiently and robustly
solve Basic NMF. Moreover, the pragmatic issue of NMF
with respect to large-scale data sets and online processing
will also be discussed.
Constrained NMF is categorized into four subclasses:
1. Sparse NMF (SPNMF), which imposes the sparse-
ness constraint.
2. Orthogonal NMF (ONMF), which imposes the
orthogonality constraint.
3. Discriminant NMF (DNMF), which involves the
information for classification and discrimination.
4. NMF on manifold (MNMF), which preserves the
local topological properties.
We will demonstrate why these morphological constraints
are essentially necessary and how to incorporate them into
the existing solution framework of Basic NMF.
Correspondingly, Structured NMF is categorized into
three subclasses:
1. Weighed NMF (WNMF), which attaches different
weights to different elements regarding their relative
importance.
WANG AND ZHANG: NONNEGATIVE MATRIX FACTORIZATION: A COMPREHENSIVE REVIEW 1337

2. Convolutive NMF (CVNMF), which considers the
time-frequency domain factorization.
3. Nonnegative Matrix Trifactorization (NMTF), which
decomposes the data matrix into three factor
matrices.
Besides, Generalized NMF is categorized into four
subclasses:
1. Semi-NMF, which relaxes the nonnegativity con-
straint only on the specific factor matrix.
2. Nonnegative Tensor Factorization ( NTF), whi ch
generalizes the matrix-form data to higher dimen-
sional tensors.
3. Nonnegative Matrix-Set Factorization (NMSF),
which extends the data sets from matrices to
matrix-sets.
4. Kernel NMF (KNMF), which is the nonlinear model
of NMF.
The remainder of this paper is organized as follows: first,
the mathematic formulation of NMF model is presented,
and the unearthed properties of NMF are summarized.
Then the algorithmic details of foregoing categories of NMF
are elaborated. Finally, conclusions are drawn, and some
open issues remained to be solved are discussed.
2CONCEPT AND PROPERTIES OF NMF
Definition. Given an M dimensional random vector xx with
nonnegative elements, whose N observations are denoted as
xx
j;j¼1;2;...;N
, let data matrix be XX ¼½xx
1
;xx
2
; ...;xx
N
2IR
MN
0
,
NMF seeks to decompose XX into nonnegative M L basis
matrix UU ¼½uu
1
;uu
2
; ...;uu
L
2IR
ML
0
and nonnegative L N
coefficient matrix VV ¼½vv
1
;vv
2
; ...;vv
N
2IR
LN
0
,suchthat
XX UUV ,whereIR
MN
0
stands for the set of M N
element-wise nonnegative matrices. This can also be written
as the equivalent vector formula xx
j
P
L
i¼1
uu
i
VV
ij
.
It is obvious that vv
j
is the weight coefficient of the
observation xx
j
on the columns of UU, the basis vectors or the
latent feature vectors of XX. Hence, NMF decomposes each
data into the linear combination of the basis vectors.
Because of the initial condition L minðM;NÞ,the
obtained basis vectors are incomplete over the original
vector space. In other words, this approach tries to
represent the high-dimensional stochastic pattern with far
fewer bases, so the perfect approximation can be achieved
successfully only if the intrinsic features are identified in UU.
Here, we discuss the relationship between L and M, N a
little more. In most cases, NMF is viewed as a dimension-
ality reduction and feature extraction technique with L
M; L N; that is, the basis set learned from NMF model is
incomplete, and the energy is compacted. However, in
general, L can be smaller, equal or larger than M. But there
are fundamental differences in the decomposition for L<
M and L>M. It is a sort of sparse coding and compressed
sensing with overcomplete basis when L>M. Hence, L
need not be limited by the dimensionality of the data,
which is useful for some applications, like classification. In
this situation, it may benefit from the sparseness due to
both nonnegativity and redundant representation. One
approach to obtain this NMF model is to perform the
decomposition on the residue matrix EE ¼ XX UUV repeat-
edly and sequentially [30].
As a kind of matrix factorization model, three essential
questions need answering: 1) existence, whether the
nontrivial NMF solutions exist; 2) uniqueness, under what
assumptions NMF is, at least in some sense, unique;
3) effectiveness, under what assumptions NMF is able to
recover the “right answer.” The existence was showed via
the theory of Completely Positive (CP) Factorization for the
first time in [31]. The last two concerns were first mentioned
and discussed from a geometric viewpoint in [32].
Complete NMF XX ¼ UUV is considered first for the
analysis of existence, convexity, and computational com-
plexity. The trivial solution always exists as UU ¼ XX and
VV ¼ II
N
. By relating NMF to CP Factorization, Vasiloglou
et al. showed that every nonnegative matrix has a
nontrivial complete NMF [31]. As such, CP Factorization
is a special case, where a nonnegative matrix XX 2 IR
MM
0
is
CP if it can be factored in the form XX ¼ UUUU
T
;UU 2 IR
ML
0
.
The minimum L is called the CP-rank of XX. When
1338 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 6, JUNE 2013
Fig. 1. The categorization of NMF models and algorithms.

combining that the set of CP matrices forms a convex cone
with that the solution to NMF belongs to a CP cone,
solving NMF is a convex optimization problem [31].
Nevertheless, finding a practical description of the CP
cone is still open, and it remains hard to formulate NMF
as a convex optimizat ion problem, despit e a convex
relaxation to rank reduction with theoretical merit pro-
posed in [31].
Using the bilinear model, c omplete NMF can be
rewritten as linear combination of rank-one nonnegative
matrices expressed by
XX ¼
X
L
i¼1
UU
i
VV
i
¼
X
L
i¼1
UU
i
VV
i
ðÞ
T
; ð1Þ
where UU
i
is the ith column vector of UU while VV
i
is the ith
row vector of VV , and denotes the outer product of two
vectors. The smallest L making the decomposition possible
is called the nonnegative rank of the nonnegative matrix XX,
denoted as rank
þ
ðXXÞ. And it satisfies the following trivial
bounds [33]
rankðXXÞrank
þ
ðXXÞminðM; NÞ: ð2Þ
While PCA can be solved in polynomial time, t he
optimization problem of NMF, with respect to determining
the nonnegative rank and computing the associated factor-
ization, is more difficult than its unconstrained counterpart.
It is in fact NP-hard when requiring both the dimension and
the factorization rank of XX to increase, which was proved
via relating it to NP-hard intermediate simplex problem by
Vavasis [34]. This is also the corollary of CP programming,
since the CP cone cannot be described in polynomial time
despite its convexity. In the special case when rankðXXÞ¼1,
complete NMF can be solved in polynomial time. However,
the complexity of complete NMF for fixed factorization
rank generally is still unknown [35].
Another related work is so-called Nonnegative Rank
Factorization (NRF) focusing on the situation of rankðXXÞ¼
rank
þ
ðXXÞ, i.e., selecting rankðXXÞ as the minimum L [33].
This is not always possible, and only nonnegative matrix
with a corresponding simplicial cone (A polyhedral cone is
simplicial if its vertex rays are line arly independent.)
existed has an NRF [36].
In most cases, the approximation version of NMF XX
UUV instead of the complete factorization is widely utilized.
An alternative generative model is
XX ¼ UUV þ EE; ð3Þ
where EE 2 IR
MN
is the residue or noise matrix represent-
ing the approximation error.
These two modes of NMF are essentially coupled with
each other, though much more attention is devoted to the
latter. The theoretical results on complete NMF will be
helpful to design more efficient NMF algorithms [31], [34].
The selection of the factorization rank L of NMF may be
more creditable if tighter bound for the nonnegative rank is
obtained [37].
In essence, NMF is an ill-posed problem with nonunique
solutions [32], [38]. From the geometric perspective, NMF
can be viewed as finding a simplicial cone involving all the
data points in the positive orthant. Given a simplicial cone
satisfying all these conditions, it is not difficult to construct
another cone containing the former one to meet the same
conditions, so the nesting can work on infinitively thus
leading to an ill-defined factorization notion. From the
algebraic perspective, if there exists a solution XX UU
0
VV
0
,
let UU ¼ UU
0
DD, VV ¼ DD
1
VV
0
, then XX UUV . If a nonsingular
matrix and its inverse are both nonnegative, then the
matrix is a generalized permutation with the form of PPS,
where PP and SS are permutation and scaling matrices,
respectively. So the permutation and scaling ambiguities
for NMF are inevitable. For that matter, NMF is called
unique factorization up to a permutation and a scaling
transformation when DD ¼ PPS. Unfortunately, there are
many ways to select a rotational matrix DD which is not
necessarily a generalized permutation or even nonnegative
matrix, so that the transformed factor matrices UU and VV are
still nonnegative. In other words, the sole nonnegativity
constraint in itself will not suffi ce to guarantee the
uniqueness, let alone the effectiveness. Nevertheless, the
uniqueness will be achieved if the original data satisfy
certain generative model. Intuitively, if UU
0
and VV
0
are
sufficiently sparse, only generalized permutation matrices
are possible rotation matrices satisfying the nonnegativity
constraint. Strictly speaking, this is called boundary close
condition for sufficiency and necessity of the uniqueness of
NMF solution [39]. The deep discussions about this issue
can be found in [32], [38], [39], [40], [41], and [42]. In
practice, incorporating additional constraints such as
sparseness in the factor matrices or normalizing the
columns of UU (respectively rows of VV ) to unit length is
helpful in alleviating the rotational indeterminacy [9].
It was hoped that NMF would produce an intrinsically
parts-based and sparse representation in unsupervised
mode [3], which is the most inspiring benefit of NMF.
Intuitively, this can be explained by that the stationary
points of NMF solutions will typically be located at the
boundary of the feasible domain due to the first order
optimality conditions, leading to zero elements [37]. Further
experiments by Li et al. have shown, however, that the pure
additivity does not necessarily mean sparsity and that NMF
will not necessarily learn the localized features [43].
Further more, NMF is equivalent to k-means clustering
when using Square of Euclidian Distance (SED) [44], [45],
while tantamount to Probabilistic Latent Semantic Analy-
sis (PLSA) when using Generalized Kullback-Leibler
Divergence (GKLD) as the objective function [46], [47].
So far we may conclude that the merits of NMF, parts-
based representation and sparseness included, come at the
price of more complexity. Besides, SVD or PCA has always
a more compact spectrum than NMF [31]. You just cannot
have the best of both worlds.
3BASIC NMF ALGORITHMS
The cynosure in Basic NMF is trying to find more efficient
and effective solutions to NMF problem under the sole
nonnegativity constraint, which lays the foundation for the
practicability of NMF. Due to its NP-hardness and lack of
appropriate convex formulations, the nonconvex formula-
tions with relatively easy solvability are generally adopted,
WANG AND ZHANG: NONNEGATIVE MATRIX FACTORIZATION: A COMPREHENSIVE REVIEW 1339

and only local minima are achievable in a reasonable
computational time. Hence, the classic and also more
practical approach is to perform alternating minimization
of a suitable cost function as the similarity measures
between XX and the product UUV . The different optimization
models vary from one another mainly in the object
functions and the optimization procedures.
These optimization models, even serving to give sight of
some possible directions for the solutions to Constrained,
Structured, and Generalized NMF, are the kernel discus-
sions of this section. We will first summarize the objective
functions. Then the details about the classic Basic NMF
framework and the paragon algorithms are presented.
Moreover, some new vision of NMF, such as the geometric
formulation of NMF, and the pragmatic issue of NMF, such
as large-scale data sets, online processing, parallel comput-
ing, and incremental NMF, will be discussed. In the last part
of this section, some other relevant issues are also involved.
3.1 Similarity Measures or Objective Functions
In order to quantify the difference between the original data
XX and the approximation UUV, a similarity measure
DðXXUUV
k
Þ needs to be defined first. This is also the objective
function of the optimization model. These similarity mea-
sures can be either distances or divergences, and correspond-
ing objective functions can be either a sole cost function or
optionally a set of cost functions with the same global
minima to be minimized sequentially or simultaneously.
The most commonly used objective functions are SED
(i.e., Frobenius norm) (4) and GKLD (i.e., I-divergence) (5) [4]
D
F
XXUUVkðÞ¼
1
2
XX UUVkk
2
F
¼
1
2
X
ij
XX
ij
UUV½
ij

2
; ð4Þ
D
KL
XXUUV
k
ðÞ¼
X
ij
XX
ij
ln
XX
ij
UUV½
ij
XX
ij
þ UUV½
ij
!
: ð5Þ
There are some drawbacks of GKLD, especially the gradients
needed in optimization heavily depend on the scales of
factorizing matrices leading to many iterations. Thus, the
original KLD is renewed for NMF by normalizing the input
data in [48]. Other cost functions consist of Minkowski family
of metrics known as
p
-norm, Earth Mover’s distance metric
[18], -divergence [17], -divergence [49], -divergence [50],
Csisza
´
r’s -divergence [51], Bregman divergence [52], and
--divergence [53]. Most of them are element-wise mea-
sures. Some similarity measures are more robust with respect
to noise and outliers, such as hypersurface cost function [54],
-divergence [50], and --divergence [53].
Statistically, different similarity measures can be deter-
mined based on a prior knowledge about the probability
distribution of the noise, which actually reflects the
statistical structure of the signals and the disclosed compo-
nents. For example, the SED minimization can be seen as a
maximum likelihood estimator where the difference is due
to additive Gaussian noise, whereas GKLD can be shown to
be equivalent to the Expectation Maximization (EM) algo-
rithm and maximum likelihood for Poisson processes [9].
Given that while the optimization problem is not jointly
convex in both UU and VV , it is separately convex in either UU
or VV , the alternating minimizations are seemly the feasible
direction. A phenomenon worthy of notice is that although
the generative model of NMF is linear, the inference
computation is nonlinear.
3.2 Classic Basic NMF Optimization Framework
The prototypical multiplicative update rules originated by
Lee and Seung—the SED-MU and GKLD-MU [4] have still
been widely used as the baseline. The SED-MU and GKLD-
MU algorithms use SED and GKLD as objective functions,
respectively, and both apply iterative multiplicative updates
as the optimization approach similar to EM algorithms. In
essence, they can be viewed as adaptive rescaled gradient
descent algorithms. Considering the efficiency, they are
relatively simple and parameter free with low cost per
iteration, but they converge slowly due to a first-order
convergence rate [28], [55]. Regarding the quality of the
solutions, Lee and Seung claimed that the multiplicative
update rules converge to a local minimum [4]. Gonzales and
Zhang indicated that the gradient a nd properties of
continual nonincreasing by no means, however, ensure the
convergence to a limit point that is also a stationary point,
which can be understood under the Karush-Kuhn-Tucker
(KKT) optimality conditions [55], [56]. So the accurate
conclusion is that the algorithms converge to a stationary
point which is not necessarily a local minimum when the
limit point is in the interior of the feasible region; its
stationarity cannot be even determined when the limit point
lies on the boundary of the feasible region [10]. However, a
minor modification in their step size of the gradient descent
formula achieves a first-order stationary point [57]. Another
drawback is the strong correlation enforced by the multi-
plication. Once an element in the factor matrices becomes
zero, it must remain zero. This means the gradual shrinkage
of the feasible region, which is harmful for getting more
superior solution. In practice, to reduce the numerical
difficulties, like numerical instabilities or ill-conditioning,
the normalization of the
1
or
2
norm of the columns in UU is
often needed as an extra procedure, yet this simple trick has
changed the original optimization problem, thereby making
searching for the global minimum more complicated.
Besides, to preclude the computational difficulty due to
division by zero, an extra positive additive value in the
denominator is helpful [56].
To accelerate the convergence rate, one popular method
is to apply gradient descent algorithms with additive
update rules. Other techniques such as conjugate gradient,
projected gradient, and more sophisticated second-order
scheme like Newton and Quasi-Newton methods et al. are
also in consideration. They choose appropriate descent
direction, such as the gradient direction, and update the
element additively in the fac tor matrices at a certain
learning rate. They differ from one another as for either
the descent direction or the learning rate strategy. To satisfy
the nonnegativity constraint, the updated matri ces are
brought back to the feasible region, namely the nonnegative
orthant, by additional projection, like simply setting all
negative elements to zero. Usually under certain mild
additional conditions, they can guarantee the first-order
stationarity. These are the widely developed algorithms in
Basic NMF recent years.
1340 IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 6, JUNE 2013

Citations
More filters
Journal ArticleDOI

Parameter-less Auto-weighted multiple graph regularized Nonnegative Matrix Factorization for data representation

TL;DR: In GNMF, an affinity graph is constructed to encode the geometrical information and a matrix factorization is sought, which respects the graph structure, and the empirical study shows encouraging results of the proposed algorithm in comparison to the state-of-the-art algorithms on real-world problems.
Journal ArticleDOI

Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications

TL;DR: An extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in WSNs is presented and a comparative guide is provided to aid WSN designers in developing suitable machine learning solutions for their specific application challenges.
Journal ArticleDOI

State-of-the-Art Deep Learning: Evolving Machine Intelligence Toward Tomorrow’s Intelligent Network Traffic Control Systems

TL;DR: An overview of the state-of-the-art deep learning architectures and algorithms relevant to the network traffic control systems, and a new use case, i.e., deep learning based intelligent routing, which is demonstrated to be effective in contrast with the conventional routing strategy.
Journal ArticleDOI

Community discovery using nonnegative matrix factorization

TL;DR: This paper investigates another important issue, community discovery, in network analysis, and chooses Nonnegative Matrix Factorization (NMF) as a tool to find the communities because of its powerful interpretability and close relationship between clustering methods.
References
More filters
Journal ArticleDOI

Regression Shrinkage and Selection via the Lasso

TL;DR: A new method for estimation in linear models called the lasso, which minimizes the residual sum of squares subject to the sum of the absolute value of the coefficients being less than a constant, is proposed.
Journal ArticleDOI

Learning the parts of objects by non-negative matrix factorization

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.

Learning parts of objects by non-negative matrix factorization

D. D. Lee
TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.
Proceedings Article

Algorithms for Non-negative Matrix Factorization

TL;DR: Two different multiplicative algorithms for non-negative matrix factorization are analyzed and one algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence.
Related Papers (5)
Frequently Asked Questions (19)
Q1. What have the authors stated for future works in "Nonnegative matrix factorization: a comprehensive review" ?

Here, the authors just list a few possibilities as follows: 1. Statistical underpinning. Although NMF can be interpreted as the maximum likelihood algorithm in different residual distribution, a solid statistical underpinning has not been developed adequately by now, which is an essential yet neglected, to some extent, issue. It is indeed hard ; anyhow, it will provide some worthy suggestions for approximate NMF. Points ( 1 ) and ( 2 ) can be viewed as two complementary directions from statistical and algebraic standpoints, respectively. 

This survey paper mainly focuses on the theoretical research into NMF over the last 5 years, where the principles, basic models, properties, and algorithms of NMF along with its various modifications, extensions, and generalizations are summarized systematically. Moreover, some open issues remained to be solved are discussed. This survey aims to construct an integrated, state-of-the-art framework for NMF concept, from which the follow-up research may benefit. 

The canonical methods, such as Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Independent Component Analysis (ICA), Vector Quantization (VQ), etc., are the exemplars of such low-rank approximations. 

It was hoped that NMF would produce an intrinsically parts-based and sparse representation in unsupervised mode [3], which is the most inspiring benefit of NMF. 

Because of the enhanced semantic interpretability under the nonnegativity and the ensuing sparsity, NMF has become an imperative tool in multivariate data analysis, and been widely used in the fields of mathematics, optimization, neural computing, pattern recognition and machine learning [9], data mining [10], signal processing [11], image engineering and computer vision [11], spectral data analysis [12], bioinformatics [13], chemometrics [1], geophysics [14], finance and economics [15]. 

The smallest L making the decomposition possible is called the nonnegative rank of the nonnegative matrix X, denoted as rankþðXÞ. 

Given that computing a globally optimal rank-oneapproximation can be done in polynomial time while thegeneral NMF problem is NP-hard, Gillis and Glineur [72]introduced Nonnegative Matrix Underapproximation(NMU) to solve the higher rank NMF problem in arecursive way. 

In practice, the trial and error approach is often adopted, where L is set in advance and then adjusted according to the feedback of the factorization results, such as the approximation errors. 

The “ANLS using Projected Gradient (PG) methods” proposed by Lin [56] is the crest of the previous work on Basic NMF, which makes headway in the bound-constrained optimization. 

Given that the norm of the gradient of a mapping H from the lowdimensional manifold to the original high-dimensional space provides the measure of how far apart H maps nearby points, a constrained gradient distance minimization problem is formulated, whose goal is to find the map that best preserves local topology. 

Because of the local rather than global minimization characteristic, it is obvious that the initialization of U and V will directly influence the convergence rate and the solution quality. 

Another consideration to decrease the computational consumption is the parallel implementation of the existing Basic NMF algorithms, which tries to divide and distribute the factorization task block-wisely among several CPUs or GPUs [74]. 

Berry et al. [10] recommended ALS NMF algorithm by computing the solutions to the subproblems as unconstrained LS problems with multiple right-hand sides and maintaining nonnegativity via setting negative values to zero per iteration. 

Graph regularized NMF (GRNMF) proposed by Cai et al. [99], [98] modeled the manifold structure by constructing a nearest neighborhood graph on a scatter of data points. 

To mitigate the problem of local minima, Cichocki andZdunek [85], [60] recommended a simple yet effectiveapproach named multilayer NMF by replacing the basismatrix U with a set of cascaded factor matrices. 

Given that while the optimization problem is not jointly convex in both U and V , it is separately convex in either Uor V , the alternating minimizations are seemly the feasible direction. 

In other words, this approach tries to represent the high-dimensional stochastic pattern with far fewer bases, so the perfect approximation can be achieved successfully only if the intrinsic features are identified in U . 

there are many ways to select a rotational matrix D which is not necessarily a generalized permutation or even nonnegative matrix, so that the transformed factor matrices U and V are still nonnegative. 

The various Constrained NMF models can be unified under the similar extended objective functionDC X UVkð Þ ¼ D X UVkð Þ þ J1ðUÞ þ J2ðV Þ; ð13Þwhere J1ðUÞ and J2ðV Þ are the penalty terms to enforce certain application dependent constraints, and are small regularization parameters balancing the tradeoff between the fitting goodness and the constraints.