scispace - formally typeset
Open AccessJournal ArticleDOI

Exact matrix completion via convex optimization

TLDR
In this paper, a convex programming problem is used to find the matrix with the minimum nuclear norm that is consistent with the observed entries in a low-rank matrix, which is then used to recover all the missing entries from most sufficiently large subsets.
Abstract
Suppose that one observes an incomplete subset of entries selected from a low-rank matrix. When is it possible to complete the matrix and recover the entries that have not been seen? We demonstrate that in very general settings, one can perfectly recover all of the missing entries from most sufficiently large subsets by solving a convex programming problem that finds the matrix with the minimum nuclear norm agreeing with the observed entries. The techniques used in this analysis draw upon parallels in the field of compressed sensing, demonstrating that objects other than signals and images can be perfectly reconstructed from very limited information.

read more

Content maybe subject to copyright    Report

june 2012 | VOL. 55 | NO. 6 | COMMUNICATIONS OF THE ACM 111
DOI:10.1145/2184319.2184343
Exact Matrix Completion
via Convex Optimization
by Emmanuel Candès and benjamin recht
Abstract
Suppose that one observes an incomplete subset of entries
selected from a low-rank matrix. When is it possible to com-
plete the matrix and recover the entries that have not been
seen? We demonstrate that in very general settings, one
can perfectly recover all of the missing entries from most
sufficiently large subsets by solving a convex programming
problem that finds the matrix with the minimum nuclear
norm agreeing with the observed entries. The techniques
used in this analysis draw upon parallels in the field of
compressed sensing, demonstrating that objects other
than signals and images can be perfectly reconstructed
from very limited information.
1. INTRODUCTION
In many practical problems of interest, one would like to
recover a matrix from a sampling of its entries. As a moti-
vating example, consider the task of inferring answers in
a partially filled out survey in which questions are asked
to a collection of individuals. Then we can form a matrix
where the rows index the individuals and the columns
index the questions. We collect data to fill out this table,
but unfortunately, many questions are left unanswered. Is
it possible to make an educated guess about what the miss-
ing answers should be? How can one make such a guess?
Formally, we may view this problem as follows. We are
interested in recovering a data matrix M with n
1
rows and
n
2
columns but have access to only m of its entries, where m
is much smaller than the total number of entries, n
1
n
2
. Can
one recover the matrix M from m of its entries? In general,
everyone would agree that this is impossible without some
additional information.
In many instances, however, the matrix we wish to
recover is known to be structured in the sense that it is low-
rank or approximately low-rank. (We recall for complete-
ness that a matrix has rank r if its rows or columns span an
r- dimensional space.) Consider the following two scenarios
as prototypical examples.
˲
The Netflix problem. In the area of recommender sys-
tems, users submit ratings on a subset of entries in a da-
tabase, and the vendor provides recommendations based
on the user’s preferences.
31
Because users only rate a few
items, one would like to infer their preference for unrated
items. A special instance of this problem is the now famous
Netflix problem.
24
Users (rows of the data matrix) are given
the opportunity to rate movies (columns of the data ma-
trix), but users typically rate only very few movies so that
there are very few scattered observed entries of this data
matrix. Yet, one would like to complete this matrix so that
the vendor (here Netflix) might recommend titles that any
particular user is likely to be willing to order. In this case,
the data matrix of all user-ratings may be approximately
low-rank, because only a few factors contribute to an indi-
vidual’s tastes or preferences.
˲
Triangulation from incomplete data. Suppose we are
given partial information about the distances between ob-
jects and would like to reconstruct the low- dimensional
geometry describing their locations. For example, we may
have a network of low-power, wirelessly networked sensors
scattered randomly across a region. Suppose each sensor
only has the ability to construct distance estimates based
on signal strength readings from its nearest fellow sensors.
From these local distance estimates, we can form a partially
observed distance matrix. We can then estimate the true dis-
tance matrix whose rank will be equal to 2 if the sensors are
located in a plane or 3 if they are located in three-dimension-
al space.
26, 32
In this case, we only need to observe a few dis-
tances per node to have enough information to reconstruct
the positions of the objects.
These examples are of course far from exhaustive and
there are many other problems which fall in this general
category.
Suppose for simplicity that we wish to recover a square
n × n matrix M of rank r. Although M contains n
2
numbers,
our assumption that its rank is r means that it can be rep-
resented exactly by its singular value decomposition (SVD)
(1.1)
where V
T
denotes the transpose of V. S is an r × r diag-
onal matrix with real, positive elements s
k
> 0. U is an
n × r matrix with orthonormal columns u
1
, …, u
r
. That is,
u
k
T
u
k
= 1 and u
i
T
u
j
= 0 if i j. V is also n × r with orthonor-
mal columns v
1
, …, v
r
. The column space of M is spanned
by the columns of U, and the row space is spanned by the
columns of V.
The number of degrees of freedom associated with a rank
r matrix M is r(2nr). To see this, note that S has r nonzero
entries, and U and V each have nr total entries. Since U and
V each satisfy r(r + 1)/2 orthogonality constraints, the total
number of degrees of freedom is r + 2nrr(r + 1) = r(2nr).
Thus, when r is much smaller than n, there are significantly
The original version of this paper was published in
Foundations of Computational Mathematics 9, 6 (2009),
717–772.

112 COMMUNICATIONS OF THE ACM | june 2012 | vol. 55 | no. 6
research highlights
fewer degrees of freedom than the size of M would suggest.
The question is then whether M can be recovered from a
suitably chosen sampling of its entries without collecting
n
2
measurements.
In this paper, we demonstrate that most low-rank
matrices can be indeed recovered from a very sparse
sampling of their entries. In Section 2, we summarize
the main results of our paper, highlighting the necessary
assumptions, algorithmic ingredients, and theoretical
foundations of reconstructing matrices from a presented
collection of entries. In Section 3, we survey the subse-
quent developments in this area, including refinements
and important extensions of our theory. We close with a
discussion of further progress and advances in low-rank
and sparse modeling.
2. MATRIX COMPLETION
Which matrices?
In general, one cannot hope to be able to recover a low-rank
matrix from a sample of its entries. Consider the rank 1
matrix M equal to
(2.1)
where here and throughout, e
i
is the ith canonical basis vec-
tor in Euclidean space (the vector with all entries equal to
0 but the ith equal to 1). The matrix M has the entries of x
along its first row and all the other entries are 0. Clearly, this
matrix cannot be recovered from a sampling of its entries
unless we see all of the entries in the first row. As another
example, the matrix e
1
e
T
n
is a matrix with a 1 in the (1, n)
entry and 0s everywhere else. If we do not see this upper
right corner, then we cannot distinguish the matrix from
the all 0s matrix.
Even if it is impossible to recover all low-rank matrices
from a set of sampled entries, can one recover most of them?
To investigate this possibility, we introduce a simple model
of low-rank matrices.
D 2.1. Let M be a rank r matrix with SVD defined
by (1.1). Then we say that M belongs to the random orthog-
onal model if the family {u
k
}
1 k r
is selected uniformly at
random among all families of r orthonormal vectors, and
similarly for {v
k
}
1 k r
. The two families may or may not be
independent of each other. We make no assumptions about the
singular values, s
k
.
If a matrix is sampled from the random orthogonal
model, then we would expect most of the entries to be non-
zero. This model is convenient in the sense that it is both
very concrete and simple, and useful in the sense that it will
help us fix the main ideas. In the sequel, however, we will
consider far more general models. The question for now is
whether or not one can recover such a generic matrix from
a sampling of its entries.
Which sampling sets?
Clearly, one cannot hope to reconstruct any low-rank matrix
M—even of rank 1—if the sampling set avoids any column
or row of M. Suppose that M is of rank 1 and of the form xy
T
,
x, y R
n
so that the (i, j) entry is given by M
ij
= x
i
y
j
. Then, if we
do not have samples from the first row, one could never infer
the value of the first component x
1
as no information about
x
1
is observed. There is, of course, nothing special about the
first row and this argument extends to any row or column.
To have any hope of recovering an unknown matrix, one
needs to have access to at least one observation per row and
one observation per column.
This example demonstrates that there are sampling sets
where one would not even be able to recover matrices of
rank 1. But what happens for typical sampling sets? Can
one recover a low-rank matrix from almost all sampling sets
of cardinality m? Formally, suppose that the set W of loca-
tions corresponding to the observed entries ( (i, j) W if M
ij
is observed) is a set of cardinality m sampled uniformly at
random. Then, can one recover a generic low-rank matrix
M, perhaps with very large probability, from the knowledge
of the value of its entries in the set W?
Which algorithm?
If the number of measurements is sufficiently large, and
if the entries are close to uniformly distributed, one might
hope that there is only one low-rank matrix with these
entries. If this were true, one would want to recover the data
matrix by solving the optimization problem
(2.2)
where X is the decision variable and rank(X) is equal to the
rank of the matrix X. The program (2.2) is a common sense
approach which simply seeks the simplest explanation fit-
ting the observed data. If there were only one low-rank object
fitting the data, the solution of (2.2) would recover M per-
fectly. This is unfortunately of little practical use, because
not only is this optimization problem NP-hard but also all
known algorithms which provide exact solutions require
time doubly exponential in the dimension n of the matrix in
both theory and practice.
If a matrix has rank r, then it has exactly r nonzero sin-
gular values so that the rank function in (2.2) is simply the
number of nonvanishing singular values. In this paper, we
consider an alternative which minimizes the sum of the sin-
gular values over the constraint set. This sum is called the
nuclear norm,
where, here and below, s
k
(X) denotes the kth largest singular
value of X. The heuristic optimization we study is then given by
(2.3)

june 2012 | VOL. 55 | NO. 6 | COMMUNICATIONS OF THE ACM 113
Whereas the rank function is equal to the number of non-
vanishing singular values, the nuclear norm equals their
sum. The nuclear norm is to the rank functional what the
convex
1
norm is to the
0
norm in the area of sparse signal
recovery. The main point here is that the nuclear norm is a
convex function and can be optimized efficiently via semi-
definite programming.
14
There are many norms one could define for a given
matrix. The operator norm is the largest singular value.
The Frobenius norm is equal to the square root of the sum
of the squares of the entries. This norm is akin to the stan-
dard Euclidean norm on a real vector space. Why should the
nuclear norm provide lower rank solutions than either of
these two more commonly studied norms?
One can gain further intuition by analyzing the geometric
structure of the nuclear norm ball. The unit nuclear norm
ball is precisely the convex hull of the rank 1 matrices of unit
Frobenius norm. The nuclear norm minimization problem
(2.3) can be interpreted as inflating the unit ball until it
just touches the affine space X
ij
= M
ij
. Such an intersection
will occur at an extreme point of the nuclear norm ball, and
these extreme points are sparse convex combinations of
rank 1 matrices. That is, the extreme points of the nuclear
norm ball have low rank. This phenomenon is depicted
graphically in Figure 1. There, we plot the unit ball of the
nuclear norm for matrices parametrized as
The extreme points of this cylindrical object are the rank 1
matrices with unit Frobenius norm. The red line in this figure
is a “random,” one-dimensional, affine subspace which, as
expected, intersects the nuclear norm ball at a rank 1 matrix.
As further motivation, an interesting connection exists
between the nuclear norm and popular algorithms in
data-mining and collaborative filtering. In these fields,
researchers commonly aim to find an explicit factorization
X = LR
T
that agrees with the measured entries. Here L and R
are n × k matrices. Since there are many possible such factor-
izations that might agree with the observations, a common
approach searches for matrices L and R that have Frobenius
norm as small as possible, that is, the solution of the optimi-
zation problem
(2.4)
where we are minimizing with respect to L R
n×k
, R R
n×k
,
and X R
n×n
, and
F
denotes the Frobenius norm.
Surprisingly, the optimization problem (2.4) is equivalent
to minimization of the nuclear norm subject to the same
equality constraints provided k is chosen to be larger than
the rank of the optimum of the nuclear norm problem (2.3).
30
To get an intuition for this equivalence, take any matrix X
of rank k. Suppose the SVD is X = USV
T
. If we set and
, we see that
because for all j. Thus, the optimal solution
of (2.3) is suboptimal for (2.4). The full equivalence can be
seen via an appeal to semidefinite programming and can be
found in Recht et al.
30
The main advantage of this reformulation (2.4) is to sub-
stantially decrease the number of decision variables from n
2
to 2nr. For large problems, this leads to a significant reduc-
tion in computation time, such that very large instances can
be solved on a desktop computer. On the other hand, the for-
mulation (2.4) is nonconvex and thus potentially has local
minima that are not globally optimal. Nonetheless, this fac-
tored approximation (2.4) of the nuclear norm is one of the
most successful stand-alone approaches to solving the Net-
flix Prize problem.
16, 24
Indeed, it was one of the foundational
components of the winning team’s prediction engine.
2.1. Main results
As seen in our first example (2.1), it is impossible to recover
a matrix which is equal to 0 in nearly all of its entries unless
we see all the entries of the matrix. This is particularly likely
if the singular vectors of a matrix M have most of their mass
concentrated in a few coordinates. For instance, consider
the rank 2 symmetric matrix M given by
where the singular values are arbitrary. Then, this matrix
vanishes everywhere except in the top-left 2 × 2 corner, and
one would basically need to see all the entries of M to be
able to recover this matrix exactly. There is an endless list
of examples of this sort. Hence, we arrive at the notion that
the singular vectors need to be sufficiently spread across
Figure 1. Unit ball of the nuclear norm for symmetric 2 × 2 matrices.
The red line depicts a random one-dimensional affine space. Such a
subspace will generically intersect a sufficiently large nuclear norm
ball at a rank one matrix.
0.8
0.6
0.4
0.2
0
–0.2
–0.4
–0.6
–0.8
0.5
0
–0.5
–0.5
0.5
0
x
y
N

114 COMMUNICATIONS OF THE ACM | june 2012 | vol. 55 | no. 6
research highlights
all components—that is, uncorrelated with the standard
basis—in order to minimize the number of observations
needed to recover a low-rank matrix. This motivates the
following definition.
D 2.2. Let U be a subspace of R
n
of dimension r
and P
U
be the orthogonal projection onto U. Then the coher-
ence of U (vis-à-vis the standard basis (e
i
)) is defined to be
Note that for any subspace, the smallest m(U) can be is 1,
achieved, for example, if U is spanned by vectors whose
entries all have magnitude
. The largest possible value
for m(U) is n/r which would correspond to any subspace that
contains a standard basis element. Matrices whose column
and row spaces have low coherence are likely not to vanish
in too many entries and are our most likely candidates for
matrices that are recoverable from a few samples. As we dis-
cuss below, subspaces sampled from the random orthogo-
nal model (Definition 2.1) have nearly minimal coherence.
To state our main result, we introduce two assumptions
about an n
1
× n
2
, rank r matrix M whose SVD is given by
(1.1) and with column and row spaces denoted by U and V,
respectively.
A0 The coherences obey max(m (U), m(V) ) m
0
for some
m
0
> 0.
A1 The n
1
× n
2
matrix has a maximum entry boun-
ded by
in absolute value for some m
1
> 0.
These definitions implicitly define two critical parameters,
m
0
and m
1
. These m’s may depend on r and n
1
, n
2
. Moreover,
note that A1 always holds with
since the (i, j)th
entry of the matrix
is given by and by
the Cauchy–Schwarz inequality,
Hence, for sufficiently small ranks, m
1
is comparable to m
0
.
We say that a subspace U R
n
is incoherent with the stan-
dard basis if m(U) is at most logarithmic in n. As we show
in the full version of this paper that, for larger ranks, both
subspaces selected from the uniform distribution and
spaces constructed as the span of singular vectors with
bounded entries are not only incoherent with the standard
basis but also obey A1 with high probability for values of m
1
at most logarithmic in n
1
and/or n
2
.
We are now in a position to state our main result: if a
matrix has row and column spaces that are incoherent with
the standard basis, then nuclear norm minimization can
recover this matrix from a random sampling of a small num-
ber of entries.
T 2.3. Let M be an n
1
× n
2
matrix of rank r obey-
ing A0 and A1 and put n = max(n
1
, n
2
). Suppose we observe m
entries of M with locations sampled uniformly at random. Then
there exist constants C, c such that if
for some b > 2, then the minimizer to the problem (2.3) is unique
and equal to M with probability at least 1 − cn
b
. For
this estimate can be improved to
with the same probability of success.
Theorem 2.3, proven in the full version of this paper,
asserts that if the coherence is low, few samples are required
to recover M. For example, if m
0
is a small constant and the
rank is not too large, then the recovery is exact with large
probability provided that
(2.5)
We give two illustrative examples of matrices with inco-
herent column and row spaces. This list is by no means
exhaustive.
1. The first example is the random orthogonal model
(see Definition 2.1). For values of the rank r greater
than log n, m(U) and m(V) are O(1), m
1
= O(log n) both
with very large probability. Hence, the recovery is
exact on most sampling sets provided that m Cn
5/4
r
log n. When r n
1/5
, we can strengthen this bound to
m Cn
6/5
r log n.
2. The second example is more general and simply
requires that the components of the singular vectors of
M are small. Assume that the u
j
and v
j
’s obey
(2.6)
for some value of m
B
= O(1). Then, the maximum coherence is
at most m
B
since m(U) m
B
and m(V) m
B
. Further, we show in
the full version of this paper that A1 holds most of the time
with
. Thus, for matrices with singular vectors
obeying (2.6), the recovery is exact provided that m obeys
(2.5) for values of the rank not exceeding m
B
−1
n
1/5
.
2.2. Numerical validation
To demonstrate the practical applicability of the nuclear
norm heuristic for recovering low-rank matrices from their
entries, we conducted a series of numerical experiments
for a variety of the matrix sizes n, ranks r, and numbers of
entries m. For each (n, m, r) triple, we repeated the follow-
ing procedure 50 times. We generated M, an n × n matrix of
rank r, by sampling two n × r factors M
L
and M
R
with i.i.d.
Gaussian entries and setting M = M
L
M
R
T
. We sampled a sub-
set W of m entries uniformly at random. Then, the nuclear
norm minimization problem was solved using the semi-
definite programming solver, SeDuMi.
33
We declared M to
be recovered if the solution returned by the solver, X
opt
, sat-
isfied X
opt
M
F
/M
F
< 10
−3
. Figure 2 shows the results of
these experiments for n = 50. The x-axis corresponds to the
fraction of the entries of the matrix that are revealed to the
SDP solver. The y-axis corresponds to the ratio between the

june 2012 | VOL. 55 | NO. 6 | COMMUNICATIONS OF THE ACM 115
dimension of the rank r matrices, d
r
= r (2nr), and the num-
ber of measurements m.
Note that the axes range from 0 to 1 as a value >1 on
the x-axis corresponds to an overdetermined linear sys-
tem where the semidefinite program always succeeds, and
a value > 1 on the y-axis corresponds to when there are an
infinite number of rank r matrices with the provided entries.
The color of each cell in the figures reflects the empiri-
cal recovery rate of the 50 runs (scaled between 0 and 1).
Interestingly, the experiments reveal very similar plots for
different n, suggesting that our theoretical upper bounds on
recovery may be rather conservative.
For a second experiment, we generated random posi-
tive semidefinite matrices and tried to recover them from
their entries using the nuclear norm heuristic. As above,
we repeated the same procedure 50 times for each (n,
m, r) triple. We generated M, an n × n positive semi-
definite matrix of rank r, by sampling an n × r factor M
F
with i.i.d. Gaussian entries and setting M = M
F
M
F
T
. We
sampled a subset W of m entries uniformly at random.
Then, we solved the nuclear norm minimization prob-
lem with an additional constraint that the decision vari-
able be positive definite. Figure 2(b) shows the results
of these experiments for n = 50. The x-axis again corre-
sponds to the fraction of the entries of the matrix that
are revealed to the solver, but, in this case, the number of
measurements is divided by D
n
= n(n + 1)/2, the number
of unique entries in a positive-semidefinite matrix, and
the dimension of the rank r matrices is d
r
= nrr(r − 1)/2.
The color of each cell is chosen in the same fashion as
in the experiment with full matrices. Interestingly, the
recovery region is much larger for positive semidefinite
matrices, and future work is needed to investigate if the
theoretical scaling is also more favorable in this scenario
of low-rank matrix completion.
These phase transition diagrams reveal a considerably
smaller region of parameter space than the Gaussian models
studied in Recht et al.
30
In the experiments in Recht et al.,
30
M was generated in the same fashion as above, but, in the
place of sampling entries, we generated m random Gaussian
projections of the data (see the discussion in Section 2.4).
In these experiments, the recovery regime is far larger than
that in the case of sampling entries, but this is not particu-
larly surprising as each Gaussian observation measures a
contribution from every entry in the matrix M.
2.3. More general bases
Our main result (Theorem 2.3) extends to a variety of other
low-rank matrix completion problems beyond the sampling
of entries. Indeed, suppose we have two orthonormal bases
f
1
, …, f
n
and g
1
, …, g
n
of R
n
, and that we are interested in solv-
ing the rank minimization problem
(2.7)
The machine learning community’s interest in specialized
algorithms for multiclass and multitask learning provides
a motivating example (see, e.g., Amit et al.
1
and Argyriou
et al.
2
). In multiclass learning, the goal is to build multi-
ple classifiers with the same training data to distinguish
between more than two categories. For example, in face
recognition, one might want to classify whether an image
patch corresponds to an eye, nose, or mouth. In multitask
learning, we have a large set of data and a variety of dif-
ferent classification tasks, but, for each task, only partial
subsets of the data are relevant. For instance, in activity
recognition, we may have acquired sets of observations of
multiple subjects and want to determine if each observed
person is walking or running. However, a different classi-
fier is desired for each individual, and it is not clear how
having access to the full collection of observations can
improve classification performance. Multitask learning
aims to take advantage of access to the full database to
improve performance on individual tasks. A description
of how to apply our results to the multiclass setting can be
m/n
2
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
m/D
n
d
r
/m
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
(b)(a)
d
r
/m
Figure 2. Recovery of full matrices from their entries. For each (n, m, r) triple, we repeated the following procedure 50 times. A matrix M
of rank r and a subset of m entries were selected at random. Then, we solved the nuclear norm minimization for X subject to X
ij
= M
ij
on the
selected entries. We declared M to be recovered if
X
opt
− M
F
/
M
F
< 10
−3
. The results are shown for (a) general 50 × 50 matrices (b) 50 × 50
positive definite matrices. The color of each cell reflects the empirical recovery rate (scaled between 0 and 1). White denotes perfect recovery
in all experiments, and black denotes failure for all experiments.

Citations
More filters
Journal ArticleDOI

A Singular Value Thresholding Algorithm for Matrix Completion

TL;DR: This paper develops a simple first-order and easy-to-implement algorithm that is extremely efficient at addressing problems in which the optimal solution has low rank, and develops a framework in which one can understand these algorithms in terms of well-known Lagrange multiplier algorithms.
Journal ArticleDOI

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

TL;DR: This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions.
Journal ArticleDOI

Geometric Deep Learning: Going beyond Euclidean data

TL;DR: In many applications, such geometric data are large and complex (in the case of social networks, on the scale of billions) and are natural targets for machine-learning techniques as mentioned in this paper.
Posted Content

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

TL;DR: In this article, a modular framework for constructing randomized algorithms that compute partial matrix decompositions is presented, which uses random sampling to identify a subspace that captures most of the action of a matrix and then the input matrix is compressed to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization.
Journal ArticleDOI

The Power of Convex Relaxation: Near-Optimal Matrix Completion

TL;DR: This paper shows that, under certain incoherence assumptions on the singular vectors of the matrix, recovery is possible by solving a convenient convex program as soon as the number of entries is on the order of the information theoretic limit (up to logarithmic factors).
References
More filters
Book

Compressed sensing

TL;DR: It is possible to design n=O(Nlog(m)) nonadaptive measurements allowing reconstruction with accuracy comparable to that attainable with direct knowledge of the N most important coefficients, and a good approximation to those N important coefficients is extracted from the n measurements by solving a linear program-Basis Pursuit in signal processing.
Journal ArticleDOI

Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information

TL;DR: In this paper, the authors considered the model problem of reconstructing an object from incomplete frequency samples and showed that with probability at least 1-O(N/sup -M/), f can be reconstructed exactly as the solution to the lscr/sub 1/ minimization problem.
Journal ArticleDOI

Matrix Factorization Techniques for Recommender Systems

TL;DR: As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.
Book

Random Graphs

Journal ArticleDOI

Using SeDuMi 1.02, a MATLAB toolbox for optimization over symmetric cones

TL;DR: This paper describes how to work with SeDuMi, an add-on for MATLAB, which lets you solve optimization problems with linear, quadratic and semidefiniteness constraints by exploiting sparsity.
Related Papers (5)
Frequently Asked Questions (14)
Q1. What contributions have the authors mentioned in the paper "Exact matrix completion via convex optimization" ?

The authors demonstrate that in very general settings, one can perfectly recover all of the missing entries from most sufficiently large subsets by solving a convex programming problem that finds the matrix with the minimum nuclear norm agreeing with the observed entries. 

Matrices whose column and row spaces have low coherence are likely not to vanish in too many entries and are their most likely candidates for matrices that are recoverable from a few samples. 

For instance, if x is k-sparse in the Fourier domain, that is, x is a superposition of k sinusoids, then it can be perfectly recovered with high probability—by 1 minimization—from the knowledge of about k log n of its entries sampled uniformly at random. 

If F is chosen randomly from a suitable distribution, then with very high probability, all sparse signals with about k nonzero entries can be recovered from on the order of k log n measurements. 

Since U and v each satisfy r(r + 1)/2 orthogonality constraints, the total number of degrees of freedom is r + 2nr − r (r + 1) = r(2n − r). 

If the number of measurements is sufficiently large, and if the entries are close to uniformly distributed, one might hope that there is only one low-rank matrix with these entries. 

While it was known that the nuclear norm problem could be efficiently solved by semidefinite programming, the results of Recht et al.30 and the full version of this paper have inspired the development of many special purpose algorithms to rapidly minimize the nuclear norm. 

The nuclear norm minimization problem (2.3) can be interpreted as inflating the unit ball until it just touches the affine space Xij = Mij. 

There are likely notions of simplicity beyond rank and sparsity that can also be leveraged in highdimensional data analysis to open new frontiers in low-rate sampling. 

24 Users (rows of the data matrix) are given the opportunity to rate movies (columns of the data matrix), but users typically rate only very few movies so that there are very few scattered observed entries of this data matrix. 

This is unfortunately of little practical use, because not only is this optimization problem NP-hard but also all known algorithms which provide exact solutions require time doubly exponential in the dimension n of the matrix in both theory and practice. 

The x-axis again corresponds to the fraction of the entries of the matrix that are revealed to the solver, but, in this case, the number of measurements is divided by Dn = n(n + 1)/2, the number of unique entries in a positive-semidefinite matrix, and the dimension of the rank r matrices is dr = nr − r(r − 1)/2. 

These algorithms are projected gradient algorithms which operate by alternately correcting the predictions on the observed entries and soft-thresholding the singular values of the iterate. 

the recovery region is much larger for positive semidefinite matrices, and future work is needed to investigate if the theoretical scaling is also more favorable in this scenario of low-rank matrix completion.