scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion

01 Oct 2011-Annals of Statistics (Institute of Mathematical Statistics)-Vol. 39, Iss: 5, pp 2302-2329
TL;DR: In this article, a new nuclear-norm penalized estimator of A0 was proposed and established a general sharp oracle inequality for this estimator for arbitrary values of n, m1, m2 under the condition of isometry in expectation.
Abstract: This paper deals with the trace regression model where n entries or linear combinations of entries of an unknown m1 × m2 matrix A0 corrupted by noise are observed. We propose a new nuclear-norm penalized estimator of A0 and establish a general sharp oracle inequality for this estimator for arbitrary values of n, m1, m2 under the condition of isometry in expectation. Then this method is applied to the matrix completion problem. In this case, the estimator admits a simple explicit form, and we prove that it satisfies oracle inequalities with faster rates of convergence than in the previous works. They are valid, in particular, in the high-dimensional setting m1m2 ≫ n. We show that the obtained rates are optimal up to logarithmic factors in a minimax sense and also derive, for any fixed matrix A0, a nonminimax lower bound on the rate of convergence of our estimator, which coincides with the upper bound up to a constant factor. Finally, we show that our procedure provides an exact recovery of the rank of A0 with probability close to 1. We also discuss the statistical learning setting where there is no underlying model determined by A0, and the aim is to find the best trace regression model approximating the data. As a by-product, we show that, under the restricted eigenvalue condition, the usual vector Lasso estimator satisfies a sharp oracle inequality (i.e., an oracle inequality with leading constant 1).

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: In this article, the authors proposed a method to construct confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model by turning the regression data into an approximate Gaussian sequence of point estimators of individual regression coefficients.
Abstract: Summary The purpose of this paper is to propose methodologies for statistical inference of low dimensional parameters with high dimensional data. We focus on constructing confidence intervals for individual coefficients and linear combinations of several of them in a linear regression model, although our ideas are applicable in a much broader context. The theoretical results that are presented provide sufficient conditions for the asymptotic normality of the proposed estimators along with a consistent estimator for their finite dimensional covariance matrices. These sufficient conditions allow the number of variables to exceed the sample size and the presence of many small non-zero coefficients. Our methods and theory apply to interval estimation of a preconceived regression coefficient or contrast as well as simultaneous interval estimation of many regression coefficients. Moreover, the method proposed turns the regression data into an approximate Gaussian sequence of point estimators of individual regression coefficients, which can be used to select variables after proper thresholding. The simulation results that are presented demonstrate the accuracy of the coverage probability of the confidence intervals proposed as well as other desirable properties, strongly supporting the theoretical results.

892 citations

Book
11 Apr 2019
TL;DR: This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level, and includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices.
Abstract: Recent years have witnessed an explosion in the volume and variety of data collected in all scientific disciplines and industrial settings. Such massive data sets present a number of challenges to researchers in statistics and machine learning. This book provides a self-contained introduction to the area of high-dimensional statistics, aimed at the first-year graduate level. It includes chapters that are focused on core methodology and theory - including tail bounds, concentration inequalities, uniform laws and empirical process, and random matrices - as well as chapters devoted to in-depth exploration of particular model classes - including sparse linear models, matrix models with rank constraints, graphical models, and various types of non-parametric models. With hundreds of worked examples and exercises, this text is intended both for courses and for self-study by graduate students and researchers in statistics, machine learning, and related fields who must understand, apply, and adapt modern statistical methods suited to large-scale data.

748 citations

Journal ArticleDOI
TL;DR: In this article, the authors proposed a scaled lasso method to jointly estimate the regression coefficients and the noise level in a linear model, which is a convex minimization of a penalized joint loss function.
Abstract: SUMMARY Scaled sparse linear regression jointly estimates the regression coefficients and noise level in a linear model. It chooses an equilibrium with a sparse regression method by iteratively estimating the noise level via the mean residual square and scaling the penalty in proportion to the estimated noise level. The iterative algorithm costs little beyond the computation of a path or grid of the sparse regression estimator for penalty levels above a proper threshold. For the scaled lasso, the algorithm is a gradient descent in a convex minimization of a penalized joint loss function for the regression coefficients and noise level. Under mild regularity conditions, we prove that the scaled lasso simultaneously yields an estimator for the noise level and an estimated coefficient vector satisfying certain oracle inequalities for prediction, the estimation of the noise level and the regression coefficients. These inequalities provide sufficient conditions for the consistency and asymptotic normality of the noise-level estimator, including certain cases where the number of variables is of greater order than the sample size. Parallel results are provided for least-squares estimation after model selection by the scaled lasso. Numerical results demonstrate the superior performance of the proposed methods over an earlier proposal of joint convex minimization. Somekeywords:Convexminimization;Estimationaftermodelselection;Iterativealgorithm;Linearregression;Oracle inequality; Penalized least squares; Scale invariance; Variance estimation.

454 citations

Journal ArticleDOI
TL;DR: This paper provides an overview of modern techniques for exploiting low-rank structure to perform matrix recovery in these settings, providing a survey of recent advances in this rapidly-developing field.
Abstract: Low-rank matrices play a fundamental role in modeling and computational methods for signal processing and machine learning. In many applications where low-rank matrices arise, these matrices cannot be fully sampled or directly observed, and one encounters the problem of recovering the matrix given only incomplete and indirect observations. This paper provides an overview of modern techniques for exploiting low-rank structure to perform matrix recovery in these settings, providing a survey of recent advances in this rapidly-developing field. Specific attention is paid to the algorithms most commonly used in practice, the existing theoretical guarantees for these algorithms, and representative practical applications of these techniques.

424 citations

Journal ArticleDOI
TL;DR: This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has "a little bit of structure" and achieves the minimax error rate up to a constant factor.
Abstract: Consider the problem of estimating the entries of a large matrix, when the observed entries are noisy versions of a small random fraction of the original entries. This problem has received widespread attention in recent times, especially after the pioneering works of Emmanuel Candes and collaborators. This paper introduces a simple estimation procedure, called Universal Singular Value Thresholding (USVT), that works for any matrix that has "a little bit of structure." Surprisingly, this simple estimator achieves the minimax error rate up to a constant factor. The method is applied to solve problems related to low rank matrix estimation, blockmodels, distance matrix completion, latent space models, positive definite matrix completion, graphon estimation and generalized Bradley--Terry models for pairwise comparison.

405 citations

References
More filters
Journal ArticleDOI
TL;DR: It is proved that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries, and that objects other than signals and images can be perfectly reconstructed from very limited information.
Abstract: We consider a problem of considerable practical interest: the recovery of a data matrix from a sampling of its entries. Suppose that we observe m entries selected uniformly at random from a matrix M. Can we complete the matrix and recover the entries that we have not seen? We show that one can perfectly recover most low-rank matrices from what appears to be an incomplete set of entries. We prove that if the number m of sampled entries obeys $$m\ge C\,n^{1.2}r\log n$$ for some positive numerical constant C, then with very high probability, most n×n matrices of rank r can be perfectly recovered by solving a simple convex optimization program. This program finds the matrix with minimum nuclear norm that fits the data. The condition above assumes that the rank is not too large. However, if one replaces the 1.2 exponent with 1.25, then the result holds for all values of the rank. Similar results hold for arbitrary rectangular matrices as well. Our results are connected with the recent literature on compressed sensing, and show that objects other than signals and images can be perfectly reconstructed from very limited information.

5,274 citations

Journal ArticleDOI
TL;DR: It is shown that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum-rank solution can be recovered by solving a convex optimization problem, namely, the minimization of the nuclear norm over the given affine space.
Abstract: The affine rank minimization problem consists of finding a matrix of minimum rank that satisfies a given system of linear equality constraints. Such problems have appeared in the literature of a diverse set of fields including system identification and control, Euclidean embedding, and collaborative filtering. Although specific instances can often be solved with specialized algorithms, the general affine rank minimization problem is NP-hard because it contains vector cardinality minimization as a special case. In this paper, we show that if a certain restricted isometry property holds for the linear transformation defining the constraints, the minimum-rank solution can be recovered by solving a convex optimization problem, namely, the minimization of the nuclear norm over the given affine space. We present several random ensembles of equations where the restricted isometry property holds with overwhelming probability, provided the codimension of the subspace is sufficiently large. The techniques used in our analysis have strong parallels in the compressed sensing framework. We discuss how affine rank minimization generalizes this preexisting concept and outline a dictionary relating concepts from cardinality minimization to those of rank minimization. We also discuss several algorithmic approaches to minimizing the nuclear norm and illustrate our results with numerical examples.

3,432 citations

Book
22 Oct 2008
TL;DR: The main idea is to introduce the fundamental concepts of the theory while maintaining the exposition suitable for a first approach in the field, and many important and useful results on optimal and adaptive estimation are provided.
Abstract: This is a concise text developed from lecture notes and ready to be used for a course on the graduate level The main idea is to introduce the fundamental concepts of the theory while maintaining the exposition suitable for a first approach in the field Therefore, the results are not always given in the most general form but rather under assumptions that lead to shorter or more elegant proofs The book has three chapters Chapter 1 presents basic nonparametric regression and density estimators and analyzes their properties Chapter 2 is devoted to a detailed treatment of minimax lower bounds Chapter 3 develops more advanced topics: Pinskers theorem, oracle inequalities, Stein shrinkage, and sharp minimax adaptivity This book will be useful for researchers and grad students interested in theoretical aspects of smoothing techniques Many important and useful results on optimal and adaptive estimation are provided As one of the leading mathematical statisticians working in nonparametrics, the author is an authority on the subject

2,842 citations

Book
01 Jan 1990
TL;DR: In this article, the Perturbation of Eigenvalues and Generalized Eigenvalue Problems are studied. But they focus on linear systems and Least Squares problems and do not consider invariant subspaces.
Abstract: Preliminaries. Norms and Metrics. Linear Systems and Least Squares Problems. The Perturbation of Eigenvalues. Invariant Subspaces. Generalized Eigenvalue Problems.

2,754 citations

Journal ArticleDOI
TL;DR: In this article, the Lasso estimator and the Dantzig selector exhibit similar behavior under a sparsity scenario, and they derive, in parallel, oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the l p estimation loss for 1 ≤ p ≤ 2.
Abstract: We show that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior. For both methods, we derive, in parallel, oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the l p estimation loss for 1 ≤ p ≤ 2 in the linear model when the number of variables can be much larger than the sample size.

2,524 citations