What contributions have the authors mentioned in the paper "Convex and semi-nonnegative matrix factorizations" ?

The authors present several new variations on the theme of nonnegative matrix factorization ( NMF ). The authors also consider algorithms in which the basis vectors of F are constrained to be convex combinations of the data points. The authors provide algorithms for computing these new factorizations and they provide supporting theoretical analysis. The authors also analyze the relationships between their algorithms and clustering algorithms, and consider the implications for sparseness of solutions. Finally, the authors present experimental results that explore the properties of these new methods.

What is the objective function for fixing G?

The objective function that the authors minimize is the following sum of squared residuals:J = ||X − FGT ||2 = Tr (XT X − 2XT FGT + GF T FGT ). (13)Fixing G, the solution for F is obtained by computing dJ/dF = −2XG + 2FGTG = 0.

What are the log messages in the DRAFT?

The log messages are grouped into 9 categories: configuration, connection, create, dependency, other, report, request, start, and stop.

What is the auxiliary function of the iterative update algorithm?

A function Z(H, H̃) is called an auxiliary functionof J(H) if it satisfiesZ(H, H̃) ≥ J(H), Z(H, H) = J(H), (19)November 5, 2008 DRAFTfor any H, H̃ .

How can the authors consider shifting mixed-sign data to be nonnegative?

November 5, 2008 DRAFTWhile their algorithms apply directly to mixed-sign data, it is also possible to consider shifting mixed-sign data to be nonnegative by adding the smallest constant so all entries are nonnegative.

What is the objective function for clustering?

The authors can absorb D − 1 2 n into G and solve forCluster-NMF : X ≈ XG+GT+. (38)The authors call this factorization Cluster-NMF because the degree of freedom in this factorization is the cluster indicator G, as in a standard clustering problem.

How do the authors measure the sparsity of G in the experiments?

To measure the sparsity of G in the experiments, the authors compute the average of each column of G and set all elements below 0.001 times the average to zero.

Why do the authors believe that NMF is better than K-means?

The authors believe this is due to the flexibility of matrix factorization as compared to the rigid spherical clusters that the K-means clustering objective function attempts to capture.

What is the correct solution for the update rule for G?

Theorem 1: (A) Fixing F , the residual ||X−FGT ||2 decreases monotonically (i.e., it is nonincreasing) under the update rule for G. (B) Fixing G, the update rule for F gives the optimal solution to minF ||X − FG||2. Proof.

What is the description of the NMF algorithm?

The authors also showed that the NMF variants can be viewed as relaxations of K-means clustering, thus providing a closer tie between NMF and clustering than has been present in the literature to date.

(Open Access) Convex and Semi-Nonnegative Matrix Factorizations (2010) | Chris Ding

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 1

Convex and Semi-Nonnegative Matrix

Factorizations

Chris Ding, Tao Li, and Michael I. Jordan

Chris Ding is with the Department of Computer Science and Engineering, University of Texas, Arlington, TX 76019. Tao Li

is with the School of Computer Science at Florida International University, Miami, FL 33199, USA. Michael I. Jordan is with

the Department of Electrical Engineering and Computer Science and the Department of Statistics at the University of California

at Berkeley, CA 94720, USA.

November 5, 2008 DRAFT

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 2

Abstract

We present several new variations on the theme of nonnegative matrix factorization (NMF). Con-

sidering factorizations of the form X = F G

, we focus on algorithms in which G is restricted to

contain nonnegative entries, but allow the data matrix X to have mixed signs, thus extending the

applicable range of NMF methods. We also consider algorithms in which the basis vectors of F

are constrained to be convex combinations of the data points. This is used for a kernel extension

of NMF. We provide algorithms for computing these new factorizations and we provide supporting

theoretical analysis. We also analyze the relationships between our algorithms and clustering algorithms,

and consider the implications for sparseness of solutions. Finally, we present experimental results that

explore the properties of these new methods.

Index Terms

Nonnegative Matrix Factorization, Singular Value Decomposition, Clustering

I. INTRODUCTION

Matrix factorization is a unifying theme in numerical linear algebra. A wide variety of matrix

factorization algorithms have been developed over many decades, providing a numerical platform

for matrix operations such as solving linear systems, spectral decomposition, and subspace

identiﬁcation. Some of these algorithms have also proven useful in statistical data analysis, most

notably the singular value decomposition (SVD), which underlies principal component analysis

(PCA).

Recent work in machine learning has focused on matrix factorizations that directly target some

of the special features of statistical data analysis. In particular, nonnegative matrix factorization

(NMF) (1; 2) focuses on the analysis of data matrices whose elements are nonnegative, a common

occurrence in data sets derived from text and images. Moreover, NMF yields nonnegative factors,

which can be advantageous from the point of view of interpretability.

The scope of research on NMF has grown rapidly in recent years. NMF has been shown

to be useful in a variety of applied settings, including environmetrics (3), chemometrics (4),

pattern recognition (5), multimedia data analysis (6), text mining (7; 8), DNA gene expression

analysis (9; 10) and protein interaction (11). Algorithmic extensions of NMF have been developed

to accommodate a variety of objective functions (12; 13) and a variety of data analysis problems,

November 5, 2008 DRAFT

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 3

including classiﬁcation (14) and collaborative ﬁltering (15). A number of studies have focused on

further developing computational methodologies for NMF (16; 17; 18; 19). Finally, researchers

have begun to explore some of the relationships between matrix factorizations and K-means

clustering (20), making use of the least square objectives of NMF; as we emphasize in the

current paper, this relationship has implications for the interpretability of matrix factors. NMF

with the Kullback-Leibler (KL) divergence objective has been shown (21; 13) to be equivalent

to probabilistic latent semantic analysis (22) which has been further developed into the fully-

probabilistic latent Dirichlet allocation model (23; 24).

Our goal in this paper is to expand the repertoire of nonnegative matrix factorization. Our focus

is on algorithms that constrain the matrix factors; we do not require the data matrix to be similarly

constrained. In particular, we develop NMF-like algorithms that yield nonnegative factors but

do not require the data matrix to be nonnegative. This extends the range of application of NMF

ideas. Moreover, by focusing on constraints on the matrix factors, we are able to strengthen

the connections between NMF and K-means clustering. Note in particular that the result of a

K-means clustering run can be written as a matrix factorization X = F G

, where X is the

data matrix, F contains the cluster centroids, and G contains the cluster membership indicators.

Although F typically has entries with both positive and negative signs, G is nonnegative. This

motivates us to propose general factorizations in which G is restricted to be nonnegative and

F is unconstrained. We also consider algorithms that constrain F ; in particular, restricting the

columns of F to be convex combinations of data points in X we obtain a matrix factorization

that can be interpreted in terms of weighted cluster centroids.

The paper is organized as follows. In Section II we present the new matrix factorizations and

in Section III we present algorithms for computing these factorizations. Section IV provides a

theoretical analysis which provides insights into the sparseness of matrix factors for a convex

variant of NMF. In Section V we consider extensions of Convex-NMF and the relationships

of NMF-like factorizations. In Section V-A we show that a convex variant of NMF has the

advantage that it is readily kernelized. In Section VI we present comparative experiments that

show that constraining the F factors to be convex combinations of input data enhances their

interpretability. We also present experiments that compare the performance of the NMF variants

to K-means clustering, where we assess the extent to which the imposition of constraints that

aim to enhance interpretability leads to poorer clustering performance. Finally, we present our

November 5, 2008 DRAFT

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 4

conclusions in Section VII.

II. SEMI-NMF AND CONVEX-NMF

Let the input data matrix X = (x

, . . . , x

) contain a collection of n data vectors as columns.

We consider factorizations of the form:

X ≈ F G

, (1)

where X ∈



p×n

, F ∈



p×k

and G ∈



n×k

. For example, the SVD can be written in this form.

In the case of the SVD, there are no restrictions on the signs of F and G; moreover, the data

matrix X is also unconstrained. NMF can also be written in this form, where the data matrix

X is assumed to be nonnegative, as are the factors F and G. We now consider some additional

examples.

A. Semi-NMF

When the data matrix is unconstrained (i.e., it may have mixed signs), we consider a factor-

ization that we refer to as Semi-NMF, in which we restrict G to be nonnegative while placing

no restriction on the signs of F .

We can motivate Semi-NMF from the perspective of clustering. Suppose we do a K-means

clustering on X and obtain cluster centroids F = (f

, . . . , f

). Let G denote the cluster indicators:

i.e., g

= 1 if x

belongs to cluster c

; g

= 0 otherwise. We can write the K-means clustering

objective function as

K-means

i=1

k=1

||x

− f

= ||X − F G

In this paper, ||v|| denotes the L

norm of a vector v and ||A|| denotes the Frobenius norm

of a matrix A. We see that the K-means clustering objective can be alternatively viewed as

an objective function for matrix approximation. Moreover, this approximation will generally be

tighter if we relax the optimization by allowing g

to range over values in (0, 1), or values in

(0, ∞). This yields the Semi-NMF matrix factorization.

November 5, 2008 DRAFT

CONVEX AND SEMI-NONNEGATIVE MATRIX FACTORIZATIONS: DING, LI AND JORDAN 5

B. Convex-NMF

While in NMF and Semi-NMF there are no constraints on the basis vectors F = (f

, ··· , f

for reasons of interpretability it may be useful to impose the constraint that the vectors deﬁning

F lie within the column space of X:

= w

+ ···+ w

= Xw

, or F = XW. (2)

Moreover, again for reasons of interpretability, we may wish to restrict ourselves to convex

combinations of the columns of X. This constraint has the advantage that we could interpret the

columns f

as weighted sums of certain data points; in particular, these columns would capture

a notion of centroids. We refer to this restricted form of the F factor as Convex-NMF. Convex-

NMF applies to both nonnegative and mixed-sign data matrices. As we will see, Convex-NMF

has an interesting property: the factors W and G both tend to be very sparse.

(25) considered a model in which the F factors were restricted to the unit interval; i.e.,

0 ≤ F

≤ 1. This so-called convex coding does not require the f

to be nonnegative linear

combinations of input data vectors and thus in general do not capture the notion of cluster

centroid. Indeed, the emphasis in (25) and in (1; 2) is the parts-of-whole encoding provided by

NMF, not the relationship of nonnegative factorizations to vector quantization.

To summarize our development thus far, let us write the different factorizations as follows:

SVD: X

≈ F

(3)

NMF: X

≈ F

(4)

Semi-NMF: X

≈ F

(5)

Convex-NMF: X

≈ X

, (6)

where the subscripts are intended to suggest the constraints imposed by the different factoriza-

tions.

Before turning to a presentation of algorithms for computing Semi-NMF and Convex-NMF

factorizations and supporting theoretical analysis, we provide an illustrative example.

November 5, 2008 DRAFT

Convex and Semi-Nonnegative Matrix Factorizations

Figures

Citations

Nonnegative Matrix and Tensor Factorizations: Applications to Exploratory Multi-way Data Analysis and Blind Source Separation

Data-Driven Intelligent Transportation Systems: A Survey

Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation

Algorithms for nonnegative matrix factorization with the β-divergence

Nonnegative Matrix Factorization: A Comprehensive Review

References

Latent dirichlet allocation

Latent Dirichlet Allocation

Learning the parts of objects by non-negative matrix factorization

Learning parts of objects by non-negative matrix factorization

Algorithms for Non-negative Matrix Factorization

Related Papers (5)

Learning the parts of objects by non-negative matrix factorization

Algorithms for non-negative matrix factorization

Graph Regularized Nonnegative Matrix Factorization for Data Representation

Non-negative Matrix Factorization with Sparseness Constraints

Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values†

Frequently Asked Questions (10)

Q1. What contributions have the authors mentioned in the paper "Convex and semi-nonnegative matrix factorizations" ?

Q2. What is the objective function for fixing G?

Q3. What are the log messages in the DRAFT?

Q4. What is the auxiliary function of the iterative update algorithm?

Q5. How can the authors consider shifting mixed-sign data to be nonnegative?

Q6. What is the objective function for clustering?

Q7. How do the authors measure the sparsity of G in the experiments?

Q8. Why do the authors believe that NMF is better than K-means?

Q9. What is the correct solution for the update rule for G?

Q10. What is the description of the NMF algorithm?