scispace - formally typeset
Search or ask a question

Showing papers by "Joel A. Tropp published in 2012"


Journal ArticleDOI
TL;DR: This paper presents new probability inequalities for sums of independent, random, self-adjoint matrices and provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid.
Abstract: This paper presents new probability inequalities for sums of independent, random, self-adjoint matrices. These results place simple and easily verifiable hypotheses on the summands, and they deliver strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. Tail bounds for the norm of a sum of random rectangular matrices follow as an immediate corollary. The proof techniques also yield some information about matrix-valued martingales. In other words, this paper provides noncommutative generalizations of the classical bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Hoeffding, and McDiarmid. The matrix inequalities promise the same diversity of application, ease of use, and strength of conclusion that have made the scalar inequalities so valuable.

1,675 citations


Journal ArticleDOI
TL;DR: This paper demonstrates that the sth-order restricted isometry constant is small when the number m of samples satisfies m ≳ (s logn)^(3/2), where n is the length of the pulse.

212 citations


Posted Content
TL;DR: In this article, a data-driven nonnegative matrix factorization (NMF) algorithm based on linear programming is proposed, where the most salient features in the data are used to express the remaining features.
Abstract: This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C such that X approximately equals CX and some linear constraints. The constraints are chosen to ensure that the matrix C selects features; these features can then be used to find a low-rank NMF of X. A theoretical analysis demonstrates that this approach has guarantees similar to those of the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method extends to more general noise models and leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation can factor a multigigabyte matrix in a matter of minutes.

200 citations


Proceedings Article
03 Dec 2012
TL;DR: A data-driven model for the factorization where the most salient features in the data are used to express the remaining features and this method extends to more general noise models and leads to efficient, scalable algorithms.
Abstract: This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to express the remaining features. More precisely, given a data matrix X, the algorithm identifies a matrix C that satisfies X ≈ CX and some linear constraints. The constraints are chosen to ensure that the matrix C selects features; these features can then be used to find a low-rank NMF of X. A theoretical analysis demonstrates that this approach has guarantees similar to those of the recent NMF algorithm of Arora et al. (2012). In contrast with this earlier work, the proposed method extends to more general noise models and leads to efficient, scalable algorithms. Experiments with synthetic and real datasets provide evidence that the new approach is also superior in practice. An optimized C++ implementation can factor a multigigabyte matrix in a matter of minutes.

119 citations


Journal ArticleDOI
TL;DR: In this article, a convex optimization problem, called REAPER, is described, which can reliably fit a low-dimensional model to this type of data, and it uses a relaxation of the set of orthogonal projectors to reach the convex formulation.
Abstract: Consider a dataset of vector-valued observations that consists of noisy inliers, which are explained well by a low-dimensional subspace, along with some number of outliers. This work describes a convex optimization problem, called REAPER, that can reliably fit a low-dimensional model to this type of data. This approach parameterizes linear subspaces using orthogonal projectors, and it uses a relaxation of the set of orthogonal projectors to reach the convex formulation. The paper provides an efficient algorithm for solving the REAPER problem, and it documents numerical experiments which confirm that REAPER can dependably find linear structure in synthetic and natural data. In addition, when the inliers lie near a low-dimensional subspace, there is a rigorous theory that describes when REAPER can approximate this subspace.

98 citations


Posted Content
TL;DR: In this paper, a convex demixing framework based on convex optimization is proposed to solve the problem of identifying two structured signals given only the sum of the two signals and prior information about their structures.
Abstract: Demixing refers to the challenge of identifying two structured signals given only the sum of the two signals and prior information about their structures. Examples include the problem of separating a signal that is sparse with respect to one basis from a signal that is sparse with respect to a second basis, and the problem of decomposing an observed matrix into a low-rank matrix plus a sparse matrix. This paper describes and analyzes a framework, based on convex optimization, for solving these demixing problems, and many others. This work introduces a randomized signal model which ensures that the two structures are incoherent, i.e., generically oriented. For an observation from this model, this approach identifies a summary statistic that reflects the complexity of a particular signal. The difficulty of separating two structured, incoherent signals depends only on the total complexity of the two structures. Some applications include (i) demixing two signals that are sparse in mutually incoherent bases; (ii) decoding spread-spectrum transmissions in the presence of impulsive errors; and (iii) removing sparse corruptions from a low-rank matrix. In each case, the theoretical analysis of the convex demixing method closely matches its empirical behavior.

85 citations


ReportDOI
17 Feb 2012
TL;DR: The paper provides an efficient algorithm for solving the reaper problem, and it documents numerical experiments which confirm that reaper can dependably linear structure in synthetic and natural data.
Abstract: : Consider a dataset of vector-valued observations that consists of a modest number of noisy inliers, which are explained well by a low-dimensional subspace, along with a large number of outliers, which have no linear structure. This work describes a convex optimization problem, called reaper, that can reliably t a low-dimensional model to this type of data. The paper provides an efficient algorithm for solving the reaper problem, and it documents numerical experiments which confirm that reaper can dependably nd linear structure in synthetic and natural data. In addition, when the inliers are contained in a low-dimensional subspace, there is a rigorous theory that describes when reaper can recover the subspace exactly.

57 citations


Posted Content
08 May 2012
TL;DR: This work introduces a randomized signal model which ensures that the two structures are incoherent, i.e., generically oriented, and describes and analyzes a framework, based on convex optimization, for solving deconvolution problems and many others.

44 citations


Journal ArticleDOI
TL;DR: In this paper, a short analysis of the masked covariance estimator by means of a matrix concentration inequality is provided, and it is shown that n = O(B log 2 p) samples suffice to estimate a banded covariance matrix with bandwidth B up to a relative spectral norm error.
Abstract: Covariance estimation becomes challenging in the regime where the number p of variables outstrips the number n of samples available to construct the estimate. One way to circumvent this problem is to assume that the covariance matrix is nearly sparse and to focus on estimating only the significant entries. To analyze this approach, Levina and Vershynin (2011) introduce a formalism called masked covariance estimation, where each entry of the sample covariance estimator is reweighted to reflect an a priori assessment of its importance. This paper provides a short analysis of the masked sample covariance estimator by means of a matrix concentration inequality. The main result applies to general distributions with at least four moments. Specialized to the case of a Gaussian distribution, the theory offers qualitative improvements over earlier work. For example, the new results show that n = O(B log^2 p) samples suffice to estimate a banded covariance matrix with bandwidth B up to a relative spectral-norm error, in contrast to the sample complexity n = O(B log^5 p) obtained by Levina and Vershynin.

41 citations


Journal ArticleDOI
01 May 2012
TL;DR: In this article, the authors provide a succinct proof of a 1973 theorem of Lieb that establishes the concavity of a trace function, relying on a deep result from quantum information theory, the joint convexity of quantum relative entropy, as well as a recent argument due to Carlen and Lieb.
Abstract: This paper provides a succinct proof of a 1973 theorem of Lieb that establishes the concavity of a certain trace function. The development relies on a deep result from quantum information theory, the joint convexity of quantum relative entropy, as well as a recent argument due to Carlen and Lieb.

37 citations


Journal ArticleDOI
TL;DR: In this paper, a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein's method of exchangeable pairs is presented. But it is not a generalization of the classical inequalities due to Hoeffding, Bernstein, Khintchine and Rosenthal.
Abstract: This paper derives exponential concentration inequalities and polynomial moment inequalities for the spectral norm of a random matrix. The analysis requires a matrix extension of the scalar concentration theory developed by Sourav Chatterjee using Stein's method of exchangeable pairs. When applied to a sum of independent random matrices, this approach yields matrix generalizations of the classical inequalities due to Hoeffding, Bernstein, Khintchine and Rosenthal. The same technique delivers bounds for sums of dependent random matrices and more general matrix-valued functions of dependent random variables.

ReportDOI
01 Feb 2012
TL;DR: In this article, a new analysis of the masked sample covariance estimator based on the matrix Laplace transform method is presented, which is applied to general subgaussian distributions.
Abstract: Covariance estimation becomes challenging in the regime where the number p of variables outstrips the number n of samples available to construct the estimate. One way to circumvent this problem is to assume that the covariance matrix is nearly sparse and to focus on estimating only the significant entries. To analyze this approach, Levina and Vershynin (2011) introduce a formalism called masked covariance estimation, where each entry of the sample covariance estimator is reweighed to reflect an a priori assessment of its importance. This paper provides a new analysis of the masked sample covariance estimator based on the matrix Laplace transform method. The main result applies to general subgaussian distributions. Specialized to the case of a Gaussian distribution, the theory offers qualitative improvements over earlier work. For example, the new results show that n = O(B log ^2 p) samples suffice to estimate a banded covariance matrix with bandwidth B up to a relative spectral-norm error, in contrast to the sample complexity n = O(B log ^5 p) obtained by Levina and Vershynin.

Journal ArticleDOI
TL;DR: In this paper, it was shown that it is possible to bound the expectation of a random matrix drawn from the Stiefel manifold in terms of the expected norm of a standard Gaussian matrix with the same dimensions.
Abstract: This note demonstrates that it is possible to bound the expectation of an arbitrary norm of a random matrix drawn from the Stiefel manifold in terms of the expected norm of a standard Gaussian matrix with the same dimensions. A related comparison holds for any convex function of a random matrix drawn from the Stiefel manifold. For certain norms, a reversed inequality is also valid.