scispace - formally typeset
Open AccessJournal ArticleDOI

Convex and Semi-Nonnegative Matrix Factorizations

TLDR
This work considers factorizations of the form X = FGT, and focuses on algorithms in which G is restricted to containing nonnegative entries, but allowing the data matrix X to have mixed signs, thus extending the applicable range of NMF methods.
Abstract
We present several new variations on the theme of nonnegative matrix factorization (NMF). Considering factorizations of the form X = FGT, we focus on algorithms in which G is restricted to containing nonnegative entries, but allowing the data matrix X to have mixed signs, thus extending the applicable range of NMF methods. We also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. This is used for a kernel extension of NMF. We provide algorithms for computing these new factorizations and we provide supporting theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, we present experimental results that explore the properties of these new methods.

read more

Content maybe subject to copyright    Report

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 1
Convex and Semi-Nonnegative Matrix
Factorizations
Chris Ding, Tao Li, and Michael I. Jordan
Chris Ding is with the Department of Computer Science and Engineering, University of Texas, Arlington, TX 76019. Tao Li
is with the School of Computer Science at Florida International University, Miami, FL 33199, USA. Michael I. Jordan is with
the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California
at Berkeley, CA 94720, USA.
November 5, 2008 DRAFT

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 2
Abstract
We present several new variations on the theme of nonnegative matrix factorization (NMF). Con-
sidering factorizations of the form X = F G
T
, we focus on algorithms in which G is restricted to
contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the
applicable range of NMF methods. We also consider algorithms in which the basis vectors of F
are constrained to be convex combinations of the data points. This is used for a kernel extension
of NMF. We provide algorithms for computing these new factorizations and we provide supporting
theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms,
and consider the implications for sparseness of solutions. Finally, we present experimental results that
explore the properties of these new methods.
Index Terms
Nonnegative Matrix Factorization, Singular Value Decomposition, Clustering
I. INTRODUCTION
Matrix factorization is a unifying theme in numerical linear algebra. A wide variety of matrix
factorization algorithms have been developed over many decades, providing a numerical platform
for matrix operations such as solving linear systems, spectral decomposition, and subspace
identification. Some of these algorithms have also proven useful in statistical data analysis, most
notably the singular value decomposition (SVD), which underlies principal component analysis
(PCA).
Recent work in machine learning has focused on matrix factorizations that directly target some
of the special features of statistical data analysis. In particular, nonnegative matrix factorization
(NMF) (1; 2) focuses on the analysis of data matrices whose elements are nonnegative, a common
occurrence in data sets derived from text and images. Moreover, NMF yields nonnegative factors,
which can be advantageous from the point of view of interpretability.
The scope of research on NMF has grown rapidly in recent years. NMF has been shown
to be useful in a variety of applied settings, including environmetrics (3), chemometrics (4),
pattern recognition (5), multimedia data analysis (6), text mining (7; 8), DNA gene expression
analysis (9; 10) and protein interaction (11). Algorithmic extensions of NMF have been developed
to accommodate a variety of objective functions (12; 13) and a variety of data analysis problems,
November 5, 2008 DRAFT

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 3
including classification (14) and collaborative filtering (15). A number of studies have focused on
further developing computational methodologies for NMF (16; 17; 18; 19). Finally, researchers
have begun to explore some of the relationships between matrix factorizations and K-means
clustering (20), making use of the least square objectives of NMF; as we emphasize in the
current paper, this relationship has implications for the interpretability of matrix factors. NMF
with the Kullback-Leibler (KL) divergence objective has been shown (21; 13) to be equivalent
to probabilistic latent semantic analysis (22) which has been further developed into the fully-
probabilistic latent Dirichlet allocation model (23; 24).
Our goal in this paper is to expand the repertoire of nonnegative matrix factorization. Our focus
is on algorithms that constrain the matrix factors; we do not require the data matrix to be similarly
constrained. In particular, we develop NMF-like algorithms that yield nonnegative factors but
do not require the data matrix to be nonnegative. This extends the range of application of NMF
ideas. Moreover, by focusing on constraints on the matrix factors, we are able to strengthen
the connections between NMF and K-means clustering. Note in particular that the result of a
K-means clustering run can be written as a matrix factorization X = F G
T
, where X is the
data matrix, F contains the cluster centroids, and G contains the cluster membership indicators.
Although F typically has entries with both positive and negative signs, G is nonnegative. This
motivates us to propose general factorizations in which G is restricted to be nonnegative and
F is unconstrained. We also consider algorithms that constrain F ; in particular, restricting the
columns of F to be convex combinations of data points in X we obtain a matrix factorization
that can be interpreted in terms of weighted cluster centroids.
The paper is organized as follows. In Section II we present the new matrix factorizations and
in Section III we present algorithms for computing these factorizations. Section IV provides a
theoretical analysis which provides insights into the sparseness of matrix factors for a convex
variant of NMF. In Section V we consider extensions of Convex-NMF and the relationships
of NMF-like factorizations. In Section V-A we show that a convex variant of NMF has the
advantage that it is readily kernelized. In Section VI we present comparative experiments that
show that constraining the F factors to be convex combinations of input data enhances their
interpretability. We also present experiments that compare the performance of the NMF variants
to K-means clustering, where we assess the extent to which the imposition of constraints that
aim to enhance interpretability leads to poorer clustering performance. Finally, we present our
November 5, 2008 DRAFT

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 4
conclusions in Section VII.
II. SEMI-NMF AND CONVEX-NMF
Let the input data matrix X = (x
1
, . . . , x
n
) contain a collection of n data vectors as columns.
We consider factorizations of the form:
X F G
T
, (1)
where X
p×n
, F
p×k
and G
n×k
. For example, the SVD can be written in this form.
In the case of the SVD, there are no restrictions on the signs of F and G; moreover, the data
matrix X is also unconstrained. NMF can also be written in this form, where the data matrix
X is assumed to be nonnegative, as are the factors F and G. We now consider some additional
examples.
A. Semi-NMF
When the data matrix is unconstrained (i.e., it may have mixed signs), we consider a factor-
ization that we refer to as Semi-NMF, in which we restrict G to be nonnegative while placing
no restriction on the signs of F .
We can motivate Semi-NMF from the perspective of clustering. Suppose we do a K-means
clustering on X and obtain cluster centroids F = (f
1
, . . . , f
k
). Let G denote the cluster indicators:
i.e., g
ik
= 1 if x
i
belongs to cluster c
k
; g
ik
= 0 otherwise. We can write the K-means clustering
objective function as
J
K-means
=
n
X
i=1
K
X
k=1
g
ik
||x
i
f
k
||
2
= ||X F G
T
||
2
.
In this paper, ||v|| denotes the L
2
norm of a vector v and ||A|| denotes the Frobenius norm
of a matrix A. We see that the K-means clustering objective can be alternatively viewed as
an objective function for matrix approximation. Moreover, this approximation will generally be
tighter if we relax the optimization by allowing g
ij
to range over values in (0, 1), or values in
(0, ). This yields the Semi-NMF matrix factorization.
November 5, 2008 DRAFT

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 5
B. Convex-NMF
While in NMF and Semi-NMF there are no constraints on the basis vectors F = (f
1
, ··· , f
k
),
for reasons of interpretability it may be useful to impose the constraint that the vectors defining
F lie within the column space of X:
f
`
= w
1`
x
1
+ ···+ w
n`
x
n
= Xw
`
, or F = XW. (2)
Moreover, again for reasons of interpretability, we may wish to restrict ourselves to convex
combinations of the columns of X. This constraint has the advantage that we could interpret the
columns f
`
as weighted sums of certain data points; in particular, these columns would capture
a notion of centroids. We refer to this restricted form of the F factor as Convex-NMF. Convex-
NMF applies to both nonnegative and mixed-sign data matrices. As we will see, Convex-NMF
has an interesting property: the factors W and G both tend to be very sparse.
(25) considered a model in which the F factors were restricted to the unit interval; i.e.,
0 F
ik
1. This so-called convex coding does not require the f
k
to be nonnegative linear
combinations of input data vectors and thus in general do not capture the notion of cluster
centroid. Indeed, the emphasis in (25) and in (1; 2) is the parts-of-whole encoding provided by
NMF, not the relationship of nonnegative factorizations to vector quantization.
To summarize our development thus far, let us write the different factorizations as follows:
SVD: X
±
F
±
G
T
±
(3)
NMF: X
+
F
+
G
T
+
(4)
Semi-NMF: X
±
F
±
G
T
+
(5)
Convex-NMF: X
±
X
±
W
+
G
T
+
, (6)
where the subscripts are intended to suggest the constraints imposed by the different factoriza-
tions.
Before turning to a presentation of algorithms for computing Semi-NMF and Convex-NMF
factorizations and supporting theoretical analysis, we provide an illustrative example.
November 5, 2008 DRAFT

Citations
More filters
Book

Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation

TL;DR: This book provides a broad survey of models and efficient algorithms for Nonnegative Matrix Factorization (NMF), including NMFs various extensions and modifications, especially Nonnegative Tensor Factorizations (NTF) and Nonnegative Tucker Decompositions (NTD).
Journal ArticleDOI

Data-Driven Intelligent Transportation Systems: A Survey

TL;DR: A survey on the development of D2ITS is provided, discussing the functionality of its key components and some deployment issues associated with D2 ITS Future research directions for the developed system are presented.
Journal ArticleDOI

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

TL;DR: A fully convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time- domain speech separation, which significantly outperforms previous time–frequency masking methods in separating two- and three-speaker mixtures.
Journal ArticleDOI

Algorithms for nonnegative matrix factorization with the β-divergence

TL;DR: This letter describes algorithms for nonnegative matrix factorization (NMF) with the β-divergence, a family of cost functions parameterized by a single shape parameter β that takes the Euclidean distance, the Kullback-Leibler divergence, and the Itakura-Saito divergence as special cases.
Journal ArticleDOI

Nonnegative Matrix Factorization: A Comprehensive Review

TL;DR: A comprehensive survey of NMF algorithms can be found in this paper, where the principles, basic models, properties, and algorithms along with its various modifications, extensions, and generalizations are summarized systematically.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

Learning the parts of objects by non-negative matrix factorization

TL;DR: An algorithm for non-negative matrix factorization is demonstrated that is able to learn parts of faces and semantic features of text and is in contrast to other methods that learn holistic, not parts-based, representations.

Learning parts of objects by non-negative matrix factorization

D. D. Lee
TL;DR: In this article, non-negative matrix factorization is used to learn parts of faces and semantic features of text, which is in contrast to principal components analysis and vector quantization that learn holistic, not parts-based, representations.
Proceedings Article

Algorithms for Non-negative Matrix Factorization

TL;DR: Two different multiplicative algorithms for non-negative matrix factorization are analyzed and one algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence.
Related Papers (5)
Frequently Asked Questions (10)
Q1. What contributions have the authors mentioned in the paper "Convex and semi-nonnegative matrix factorizations" ?

The authors present several new variations on the theme of nonnegative matrix factorization ( NMF ). The authors also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. The authors provide algorithms for computing these new factorizations and they provide supporting theoretical analysis. The authors also analyze the relationships between their algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, the authors present experimental results that explore the properties of these new methods. 

The objective function that the authors minimize is the following sum of squared residuals:J = ||X − FGT ||2 = Tr (XT X − 2XT FGT + GF T FGT ). (13)Fixing G, the solution for F is obtained by computing dJ/dF = −2XG + 2FGTG = 0. 

The log messages are grouped into 9 categories: configuration, connection, create, dependency, other, report, request, start, and stop. 

A function Z(H, H̃) is called an auxiliary functionof J(H) if it satisfiesZ(H, H̃) ≥ J(H), Z(H, H) = J(H), (19)November 5, 2008 DRAFTfor any H, H̃ . 

November 5, 2008 DRAFTWhile their algorithms apply directly to mixed-sign data, it is also possible to consider shifting mixed-sign data to be nonnegative by adding the smallest constant so all entries are nonnegative. 

The authors can absorb D − 1 2 n into G and solve forCluster-NMF : X ≈ XG+GT+. (38)The authors call this factorization Cluster-NMF because the degree of freedom in this factorization is the cluster indicator G, as in a standard clustering problem. 

To measure the sparsity of G in the experiments, the authors compute the average of each column of G and set all elements below 0.001 times the average to zero. 

The authors believe this is due to the flexibility of matrix factorization as compared to the rigid spherical clusters that the K-means clustering objective function attempts to capture. 

Theorem 1: (A) Fixing F , the residual ||X−FGT ||2 decreases monotonically (i.e., it is nonincreasing) under the update rule for G. (B) Fixing G, the update rule for F gives the optimal solution to minF ||X − FG||2. Proof. 

The authors also showed that the NMF variants can be viewed as relaxations of K-means clustering, thus providing a closer tie between NMF and clustering than has been present in the literature to date.