scispace - formally typeset
Open AccessJournal ArticleDOI

A Singular Value Thresholding Algorithm for Matrix Completion

TLDR
This paper develops a simple first-order and easy-to-implement algorithm that is extremely efficient at addressing problems in which the optimal solution has low rank, and develops a framework in which one can understand these algorithms in terms of well-known Lagrange multiplier algorithms.
Abstract
This paper introduces a novel algorithm to approximate the matrix with minimum nuclear norm among all matrices obeying a set of convex constraints. This problem may be understood as the convex relaxation of a rank minimization problem and arises in many important applications as in the task of recovering a large matrix from a small subset of its entries (the famous Netflix problem). Off-the-shelf algorithms such as interior point methods are not directly amenable to large problems of this kind with over a million unknown entries. This paper develops a simple first-order and easy-to-implement algorithm that is extremely efficient at addressing problems in which the optimal solution has low rank. The algorithm is iterative, produces a sequence of matrices $\{\boldsymbol{X}^k,\boldsymbol{Y}^k\}$, and at each step mainly performs a soft-thresholding operation on the singular values of the matrix $\boldsymbol{Y}^k$. There are two remarkable features making this attractive for low-rank matrix completion problems. The first is that the soft-thresholding operation is applied to a sparse matrix; the second is that the rank of the iterates $\{\boldsymbol{X}^k\}$ is empirically nondecreasing. Both these facts allow the algorithm to make use of very minimal storage space and keep the computational cost of each iteration low. On the theoretical side, we provide a convergence analysis showing that the sequence of iterates converges. On the practical side, we provide numerical examples in which $1,000\times1,000$ matrices are recovered in less than a minute on a modest desktop computer. We also demonstrate that our approach is amenable to very large scale problems by recovering matrices of rank about 10 with nearly a billion unknowns from just about 0.4% of their sampled entries. Our methods are connected with the recent literature on linearized Bregman iterations for $\ell_1$ minimization, and we develop a framework in which one can understand these algorithms in terms of well-known Lagrange multiplier algorithms.

read more

Content maybe subject to copyright    Report

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
SIAM J. OPTIM.
c
2010 Society for Industrial and Applied Mathematics
Vol. 20, No. 4, pp. 1956–1982
A SINGULAR VALUE THRESHOLDING ALGORITHM FOR
MATRIX COMPLETION
JIAN-FENG CAI
, EMMANUEL J. CAND
`
ES
, AND ZUOWEI SHEN
§
Abstract. This paper introduces a novel algorithm to approximate the matrix with minimum
nuclear norm among all matrices obeying a set of convex constraints. This problem may be un-
derstood as the convex relaxation of a rank minimization problem and arises in many important
applications as in the task of recovering a large matrix from a small subset of its entries (the famous
Netflix problem). Off-the-shelf algorithms such as interior point methods are not directly amenable
to large problems of this kind with over a million unknown entries. This paper develops a simple
first-order and easy-to-implement algorithm that is extremely efficient at addressing problems in
which the optimal solution has low rank. The algorithm is iterative, produces a sequence of matrices
{X
k
, Y
k
}, and at each step mainly performs a soft-thresholding operation on the singular values
of the matrix Y
k
. There are two remarkable features making this attractive for low-rank matrix
completion problems. The first is that the soft-thresholding operation is applied to a sparse matrix;
the second is that the rank of the iterates {X
k
} is empirically nondecreasing. Both these facts allow
the algorithm to make use of very minimal storage space and keep the computational cost of each
iteration low. On the theoretical side, we provide a convergence analysis showing that the sequence
of iterates converges. On the practical side, we provide numerical examples in which 1, 000 × 1, 000
matrices are recovered in less than a minute on a modest desktop computer. We also demonstrate
that our approach is amenable to very large scale problems by recovering matrices of rank about
10 with nearly a billion unknowns from just about 0.4% of their sampled entries. Our methods are
connected with the recent literature on linearized Bregman iterations for
1
minimization, and we
develop a framework in which one can understand these algorithms in terms of well-known Lagrange
multiplier algorithms.
Key words. nuclear norm minimization, matrix completion, singular value thresholding, La-
grange dual function, Uzawa’s algorithm, linearized Bregman iteration
AMS subject classifications. 90C25, 15A83, 65K05
DOI. 10.1137/080738970
1. Introduction.
1.1. Motivation. There is a rapidly growing interest in the recovery of an un-
known low-rank or approximately low-rank matrix from very limited information.
This problem occurs in many areas of engineering and applied science such as ma-
chine learning [2, 3, 4], control [53], and computer vision; see [61]. As a motivating
example, consider the problem of recovering a data matrix from a sampling of its
entries. This routinely comes up whenever one collects partially filled out surveys,
and one would like to infer the many missing entries. In the area of recommender
systems, users submit ratings on a subset of entries in a database, and the vendor
provides recommendations based on the user’s preferences. Because users only rate a
Received by the editors October 23, 2008; accepted for publication (in revised form) January 5,
2010; published electronically March 3, 2010.
http://www.siam.org/journals/siopt/20-4/73897.html
Department of Mathematics, University of California, Los Angeles, CA 90095 (cai@math.ucla.
edu). This author is supported by the Wavelets and Information Processing Programme under a
grant from DSTA, Singapore.
Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 (candes@stanford.edu).
This author is partially supported by the Waterman Award from the National Science Foundation
and by ONR grant N00014-08-1-0749.
§
Department of Mathematics, National University of Singapore, 117543 Singapore (matzuows@
nus.edu.sg). This author is supported in part by grant R-146-000-113-112 from the National Univer-
sity of Singapore.
1956

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A SINGULAR VALUE THRESHOLDING ALGORITHM 1957
few items, one would like to infer their preference for unrated items; this is the famous
Netflix problem [1]. Recovering a rectangular matrix from a sampling of its entries is
known as the matrix completion problem. The issue is, of course, that this problem
is extraordinarily ill posed since with fewer samples than entries we have infinitely
many completions. Therefore, it is apparently impossible to identify which of these
candidate solutions is indeed the “correct” one without some additional information.
In many instances, however, the matrix we wish to recover has low rank or ap-
proximately low rank. For instance, the Netflix data matrix of all user ratings may
be approximately low rank because it is commonly believed that only a few factors
contribute to anyone’s taste or preference. In computer vision, inferring scene geom-
etry and camera motion from a sequence of images is a well-studied problem known
as the structure-from-motion problem. This is an ill-conditioned problem for objects
that may be distant with respect to their size, or especially for “missing data” which
occur because of occlusion or tracking failures. However, when properly stacked and
indexed, these images form a matrix which has very low rank (e.g., rank 3 under or-
thography) [24,61]. Other examples of low-rank matrix fitting abound; e.g., in control
(system identification), machine learning (multiclass learning), and so on. Having said
this, the premise that the unknown has (approximately) low rank radically changes
the problem, making the search for solutions feasible since the lowest-rank solution
now tends to be the right one.
In a recent paper [16], Cand`es and Recht showed that matrix completion is not
as ill posed as people thought. Indeed, they proved that most low-rank matrices can
be recovered exactly from most sets of sampled entries even though these sets have
surprisingly small cardinality, and more importantly, they proved that this can be
done by solving a simple convex optimization problem. To state their results, suppose
to simplify that the unknown matrix M R
n×n
is square and that one has available
m sampled entries {M
ij
:(i, j) Ω} where Ω is a random subset of cardinality m.
Then [16] proves that most matrices M of rank r can be perfectly recovered by solving
the optimization problem
(1.1)
minimize X
subject to X
ij
= M
ij
, (i, j) Ω,
provided that the number of samples obeys m Cn
6/5
r log n for some positive nu-
merical constant C.
1
In (1.1), the functional X
is the nuclear norm of the matrix
M, which is the sum of its singular values. The optimization problem (1.1) is convex
and can be recast as a semidefinite program [34,35]. In some sense, this is the tightest
convex relaxation of the NP-hard rank minimization problem
(1.2)
minimize rank(X)
subject to X
ij
= M
ij
, (i, j) Ω,
since the nuclear ball {X : X
1} is the convex hull of the set of rank-one
matrices with spectral norm bounded by one. Another interpretation of Cand`es and
Recht’s result is that under suitable conditions, the rank minimization program (1.2)
and the convex program (1.1) are formally equivalent in the sense that they have
exactly the same unique solution.
1.2. Algorithm outline. Because minimizing the nuclear norm both provably
recovers the lowest-rank matrix subject to constraints (see [56] for related results) and
1
Note that an n × n matrix of rank r depends upon r(2n r) degrees of freedom.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1958 JIAN-FENG CAI, EMMANUEL J. CAND
`
ES, AND ZUOWEI SHEN
gives generally good empirical results in a variety of situations, it is understandably of
great interest to develop numerical methods for solving (1.1). In [16], this optimization
problem was solved using one of the most advanced semidefinite programming solvers,
namely, SDPT3 [59]. This solver and others like SeDuMi are based on interior-point
methods and are problematic when the size of the matrix is large because they need
to solve huge systems of linear equations to compute the Newton direction. In fact,
SDPT3 can only handle n × n matrices with n 100. Presumably, one could resort
to iterative solvers such as the method of conjugate gradients to solve for the Newton
step, but this is problematic as well since it is well known that the condition number
of the Newton system increases rapidly as one gets closer to the solution. In addition,
none of these general purpose solvers use the fact that the solution may have low
rank. We refer the reader to [50] for some recent progress on interior-point methods
concerning some special nuclear norm-minimization problems.
This paper develops the singular value thresholding algorithm for approximately
solving the nuclear norm minimization problem (1.1) and, by extension, problems of
the form
(1.3)
minimize X
subject to A(X)=b,
where A is a linear operator acting on the space of n
1
×n
2
matrices and b R
m
.This
algorithm is a simple first-order method and is especially well suited for problems of
very large sizes in which the solution has low rank. We sketch this algorithm in the
special matrix completion setting and let P
Ω
be the orthogonal projector onto the
span of matrices vanishing outside of Ω so that the (i, j)th component of P
Ω
(X)is
equal to X
ij
if (i, j) Ω and zero otherwise. Our problem may be expressed as
(1.4)
minimize X
subject to P
Ω
(X)=P
Ω
(M),
with optimization variable X R
n
1
×n
2
.Fixτ>0 and a sequence {δ
k
}
k1
of scalar
step sizes. Then starting with Y
0
=0 R
n
1
×n
2
, the algorithm inductively defines
(1.5)
X
k
=shrink(Y
k1
),
Y
k
= Y
k1
+ δ
k
P
Ω
(M X
k
)
until a stopping criterion is reached. In (1.5), shrink(Y ) is a nonlinear function
which applies a soft-thresholding rule at level τ to the singular values of the input
matrix; see section 2 for details. The key property here is that for large values of τ,
the sequence {X
k
} converges to a solution which very nearly minimizes (1.4). Hence,
at each step, one only needs to compute at most one singular value decomposition
and perform a few elementary matrix additions. Two important remarks are in order:
1. Sparsity. For each k 0, Y
k
vanishes outside of Ω and is, therefore, sparse,
a fact which can be used to evaluate the shrink function rapidly.
2. Low-rank property. The matrices X
k
turn out to have low rank, and hence
the algorithm has a minimum storage requirement since we need to keep only
principal factors in memory.
Our numerical experiments demonstrate that the proposed algorithm can solve
problems, in Matlab, involving matrices of size 30,000 × 30,000 having close to a
billion unknowns in 17 minutes on a standard desktop computer with a 1.86 GHz
CPU (dual core with Matlab’s multithreading option enabled) and 3 GB of memory.
As a consequence, the singular value thresholding algorithm may become a rather
powerful computational tool for large scale matrix completion.

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
A SINGULAR VALUE THRESHOLDING ALGORITHM 1959
1.3. General formulation. The singular value thresholding algorithm can be
adapted to deal with other types of convex constraints. For instance, it may address
problems of the form
(1.6)
minimize X
subject to f
i
(X) 0,i=1,...,m,
where each f
i
is a Lipschitz convex function (note that one can handle linear equality
constraints by considering pairs of affine functionals). In the simpler case where the
f
i
’s are affine functionals, the general algorithm goes through a sequence of iterations
which greatly resemble (1.5). This is useful because this enables the development of
numerical algorithms which are effective for recovering matrices from a small subset
of sampled entries possibly contaminated with noise.
1.4. Contents and notations. The rest of the paper is organized as follows.
In section 2, we derive the singular value thresholding (SVT) algorithm for the ma-
trix completion problem and recast it in terms of a well-known Lagrange multiplier
algorithm. In section 3, we extend the SVT algorithm and formulate a general itera-
tion which is applicable to general convex constraints. In section 4, we establish the
convergence results for the iterations given in sections 2 and 3. We demonstrate the
performance and effectiveness of the algorithm through numerical examples in sec-
tion 5, and review additional implementation details. Finally, we conclude the paper
with a short discussion in section 6.
Before continuing, we provide here a brief summary of the notations used through-
out the paper. Matrices are bold capital, vectors are bold lowercase, and scalars or
entries are not bold. For instance, X is a matrix and X
ij
its (i, j)th entry. Likewise,
x is a vector and x
i
its ith component. The nuclear norm of a matrix is denoted
by X
, the Frobenius norm by X
F
, and the spectral norm by X
2
;notethat
these are, respectively, the 1-norm, the 2-norm, and the sup-norm of the vector of
singular values. The adjoint of a matrix X is X
and similarly for vectors. The
notation diag(x), where x is a vector, stands for the diagonal matrix with {x
i
} as
diagonal elements. We denote by X, Y =trace(X
Y ) the standard inner prod-
uct between two matrices (X
2
F
= X, X). The Cauchy–Schwarz inequality gives
X, Y ≤X
F
Y
F
, and it is well known that we also have X, Y ≤X
Y
2
(the spectral and nuclear norms are dual from one another); see, e.g., [16,56].
2. The singular value thresholding algorithm. This section introduces the
singular value thresholding algorithm and discusses some of its basic properties. We
begin with the definition of a key building block, namely, the singular value thresh-
olding operator.
2.1. The singular value shrinkage operator. Consider the singular value
decomposition (SVD) of a matrix X R
n
1
×n
2
of rank r
(2.1) X = UΣV
, Σ =diag({σ
i
}
1ir
),
where U and V are, respectively, n
1
× r and n
2
× r matrices with orthonormal
columns, and the singular values σ
i
are positive (unless specified otherwise, we will
always assume that the SVD of a matrix is given in the reduced form above). For
each τ 0, we introduce the soft-thresholding operator D
τ
defined as follows:
(2.2) D
τ
(X):=UD
τ
(Σ)V
, D
τ
(Σ)=diag({σ
i
τ)
+
}),

Copyright © by SIAM. Unauthorized reproduction of this article is prohibited.
1960 JIAN-FENG CAI, EMMANUEL J. CAND
`
ES, AND ZUOWEI SHEN
where t
+
is the positive part of t,namely,t
+
=max(0,t). In words, this operator
simply applies a soft-thresholding rule to the singular values of X, effectively shrinking
these toward zero. This is the reason why we will also refer to this transformation
as the singular value shrinkage operator. Even though the SVD may not be unique,
it is easy to see that the singular value shrinkage operator is well defined, and we
do not elaborate further on this issue. In some sense, this shrinkage operator is a
straightforward extension of the soft-thresholding rule for scalars and vectors. In
particular, note that if many of the singular values of X are below the threshold
τ, the rank of D
τ
(X) may be considerably lower than that of X, just like the soft-
thresholding rule applied to vectors leads to sparser outputs whenever some entries
of the input are below threshold.
The singular value thresholding operator is the proximity operator associated with
the nuclear norm. Details about the proximity operator can be found in, e.g., [42].
Theorem 2.1.
2
For each τ 0 and Y R
n
1
×n
2
, the singular value shrinkage
operator (2.2) obeys
(2.3) D
τ
(Y )=argmin
X
1
2
X Y
2
F
+ τX
.
Proof. Since the function h
0
(X):=τX
+
1
2
X Y
2
F
is strictly convex, it is
easy to see that there exists a unique minimizer, and we thus need to prove that it is
equal to D
τ
(Y ). To do this, recall the definition of a subgradient of a convex function
f : R
n
1
×n
2
R.WesaythatZ is a subgradient of f at X
0
, denoted Z ∂f(X
0
), if
(2.4) f(X) f(X
0
)+Z, X X
0
for all X.Now
ˆ
X minimizes h
0
if and only if 0 is a subgradient of the functional h
0
at the point
ˆ
X, i.e.,
(2.5) 0
ˆ
X Y + τ∂
ˆ
X
,
where
ˆ
X
is the set of subgradients of the nuclear norm. Let X R
n
1
×n
2
be an
arbitrary matrix and UΣV
be its SVD. It is known [16, 46, 64] that
(2.6) X
=
UV
+ W : W R
n
1
×n
2
, U
W =0, WV =0, W
2
1
.
Set
ˆ
X := D
τ
(Y ) for short. In order to show that
ˆ
X obeys (2.5), decompose the
SVD of Y as Y = U
0
Σ
0
V
0
+ U
1
Σ
1
V
1
,whereU
0
, V
0
(resp., U
1
, V
1
) are the singular
vectors associated with singular values greater than τ (resp., smaller than or equal to
τ). With these notations, we have
ˆ
X = U
0
(Σ
0
τI)V
0
and, therefore,
Y
ˆ
X = τ(U
0
V
0
+ W ), W = τ
1
U
1
Σ
1
V
1
.
By definition, U
0
W =0,WV
0
= 0, and since the diagonal elements of Σ
1
have
magnitudes bounded by τ , we also have W
2
1. Hence Y
ˆ
X τ∂
ˆ
X
,which
concludes the proof.
2
One reviewer pointed out that a similar result had been mentioned in a talk given by Donald
Goldfarb at the Foundations of Computational Mathematics conference which took place in Hong
Kong in June 2008.

Figures
Citations
More filters
Journal ArticleDOI

Robust principal component analysis

TL;DR: In this paper, the authors prove that under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by solving a very convenient convex program called Principal Component Pursuit; among all feasible decompositions, simply minimize a weighted combination of the nuclear norm and of the e1 norm.
Journal ArticleDOI

Exact Matrix Completion via Convex Optimization

TL;DR: It is proved that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries, and that objects other than signals and images can be perfectly reconstructed from very limited information.
Book

Proximal Algorithms

TL;DR: The many different interpretations of proximal operators and algorithms are discussed, their connections to many other topics in optimization and applied mathematics are described, some popular algorithms are surveyed, and a large number of examples of proxiesimal operators that commonly arise in practice are provided.
Journal ArticleDOI

Robust Recovery of Subspace Structures by Low-Rank Representation

TL;DR: It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, it is proved that under certain conditions LRR can exactly recover the row space of the original data.
Journal ArticleDOI

A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography

TL;DR: The genetic identity of each virus particle present in the mixture can be assigned based solely on the structural information derived from single envelope glycoproteins displayed on the virus surface by the nuclear norm-based, collaborative alignment method presented here.
References
More filters
Book

Convex Optimization

TL;DR: In this article, the focus is on recognizing convex optimization problems and then finding the most appropriate technique for solving them, and a comprehensive introduction to the subject is given. But the focus of this book is not on the optimization problem itself, but on the problem of finding the appropriate technique to solve it.
Book

Compressed sensing

TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Journal ArticleDOI

Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information

TL;DR: In this paper, the authors considered the model problem of reconstructing an object from incomplete frequency samples and showed that with probability at least 1-O(N/sup -M/), f can be reconstructed exactly as the solution to the lscr/sub 1/ minimization problem.
Journal ArticleDOI

Decoding by linear programming

TL;DR: F can be recovered exactly by solving a simple convex optimization problem (which one can recast as a linear program) and numerical experiments suggest that this recovery procedure works unreasonably well; f is recovered exactly even in situations where a significant fraction of the output is corrupted.
Journal ArticleDOI

Near-Optimal Signal Recovery From Random Projections: Universal Encoding Strategies?

TL;DR: If the objects of interest are sparse in a fixed basis or compressible, then it is possible to reconstruct f to within very high accuracy from a small number of random measurements by solving a simple linear program.
Related Papers (5)
Frequently Asked Questions (12)
Q1. What contributions have the authors mentioned in the paper "A singular value thresholding algorithm for matrix completion∗" ?

This paper introduces a novel algorithm to approximate the matrix with minimum nuclear norm among all matrices obeying a set of convex constraints. This paper develops a simple first-order and easy-to-implement algorithm that is extremely efficient at addressing problems in which the optimal solution has low rank. On the theoretical side, the authors provide a convergence analysis showing that the sequence of iterates converges. On the practical side, the authors provide numerical examples in which 1, 000 × 1, 000 matrices are recovered in less than a minute on a modest desktop computer. The authors also demonstrate that their approach is amenable to very large scale problems by recovering matrices of rank about 10 with nearly a billion unknowns from just about 0. 4 % of their sampled entries. Their methods are connected with the recent literature on linearized Bregman iterations for 1 minimization, and the authors develop a framework in which one can understand these algorithms in terms of well-known Lagrange multiplier algorithms. 

The algorithm also recovers 30,000× 30,000 matrices of rank 10 from about 0.4% of their sampled entries in just about 17 minutes. 

Their numerical experiments demonstrate that the proposed algorithm can solve problems, in Matlab, involving matrices of size 30,000 × 30,000 having close to a billion unknowns in 17 minutes on a standard desktop computer with a 1.86 GHz CPU (dual core with Matlab’s multithreading option enabled) and 3 GB of memory. 

In (1.5), shrink(Y , τ) is a nonlinear function which applies a soft-thresholding rule at level τ to the singular values of the input matrix; see section 2 for details. 

Despite this significant speedup, the authors have used only the Matlab version, but since the singular value shrinkage operator is by and large the dominant cost in the SVT algorithm, the authors expect that a Fortran implementation would run about 3 to 4 times faster. 

With geodesic distances, however, a numerical test suggests that the geodesic-distance matrix M can be well approximated by a low-rank matrix. 

When this approximation consists of the truncated SVD retaining the part of the expansion corresponding to singular values greater than τ , this can be used to evaluate Dτ (Y ). 

Just as iterative soft-thresholding methods are designed to find sparse solutions, their iterative singular value thresholding scheme is designed to find a sparse vector of singular values. 

In particular, the sequence {Xk} obtained via (2.7) converges to the unique solution of (2.8) provided that 0 < inf δk ≤ sup δk < 2.4.2. 

The authors note that it is not necessary to rerun the Lanczos iterations for the first sk vectors since they have already been computed; only a few new singular values ( of them) need to be numerically evaluated. 

Having said this, the premise that the unknown has (approximately) low rank radically changes the problem, making the search for solutions feasible since the lowest-rank solution now tends to be the right one. 

they proved that most low-rank matrices can be recovered exactly from most sets of sampled entries even though these sets have surprisingly small cardinality, and more importantly, they proved that this can be done by solving a simple convex optimization problem. 

Trending Questions (1)
How many Matrix movies are there total?

There are two remarkable features making this attractive for low-rank matrix completion problems.