scispace - formally typeset
Search or ask a question

Showing papers by "Joel A. Tropp published in 2011"


Journal ArticleDOI
TL;DR: This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions.
Abstract: Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the $k$ dominant components of the singular value decomposition of an $m \times n$ matrix. (i) For a dense input matrix, randomized algorithms require $\bigO(mn \log(k))$ floating-point operations (flops) in contrast to $ \bigO(mnk)$ for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to $\bigO(k)$ passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

3,248 citations


01 Jan 2011
TL;DR: In this article, the authors present a modular framework for constructing randomized algorithms that compute partial matrix decompositions, which use random sampling to identify a subspace that captures most of the action of a matrix.
Abstract: Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that ran- domization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the k dominant components of the singular value decomposition of an m × n matrix. (i) For a dense input matrix, randomized algorithms require O(mn log(k)) floating-point operations (flops) in contrast to O(mnk) for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multi- processor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to O(k) passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.

494 citations


Journal ArticleDOI
TL;DR: In this article, an improved analysis of a structured dimension reduction map called the subsampled randomized Hadamard transform is presented, and the new proof is much simpler than previous approaches, and it offers optimal constants in the estimate on the number of dimensions required for the embedding.
Abstract: This paper presents an improved analysis of a structured dimension-reduction map called the subsampled randomized Hadamard transform. This argument demonstrates that the map preserves the Euclidean geometry of an entire subspace of vectors. The new proof is much simpler than previous approaches, and it offers — for the first time — optimal constants in the estimate on the number of dimensions required for the embedding.

350 citations


Journal ArticleDOI
TL;DR: Oliveira et al. as mentioned in this paper showed that the large deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the Martingale difference sequence.
Abstract: Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the martingale difference sequence. Oliveira has recently established a natural extension of Freedman's inequality that provides tail bounds for the maximum singular value of a matrix-valued martingale. This note describes a different proof of the matrix Freedman inequality that depends on a deep theorem of Lieb from matrix analysis. This argument delivers sharp constants in the matrix Freedman inequality, and it also yields tail bounds for other types of matrix martingales. The new techniques are adapted from recent work by the present author.

148 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed two approaches for robust principal component analysis based on semidefinite programming, which seek directions of large spread in the data while damping the effect of outliers.
Abstract: The performance of principal component analysis suffers badly in the presence of outliers. This paper proposes two novel approaches for robust principal component analysis based on semidefinite programming. The first method, maximum mean absolute deviation rounding, seeks directions of large spread in the data while damping the effect of outliers. The second method produces a low-leverage decomposition of the data that attempts to form a low-rank model for the data by separating out corrupted observations. This paper also presents efficient computational methods for solving these semidefinite programs. Numerical experiments confirm the value of these new techniques.

125 citations


Posted Content
TL;DR: Oliveira et al. as discussed by the authors showed that the large deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the Martingale difference sequence.
Abstract: Freedman's inequality is a martingale counterpart to Bernstein's inequality. This result shows that the large-deviation behavior of a martingale is controlled by the predictable quadratic variation and a uniform upper bound for the martingale difference sequence. Oliveira has recently established a natural extension of Freedman's inequality that provides tail bounds for the maximum singular value of a matrix-valued martingale. This note describes a different proof of the matrix Freedman inequality that depends on a deep theorem of Lieb from matrix analysis. This argument delivers sharp constants in the matrix Freedman inequality, and it also yields tail bounds for other types of matrix martingales. The new techniques are adapted from recent work by the present author.

106 citations


ReportDOI
16 Jan 2011
TL;DR: In this article, the authors present probability inequalities for sums of adapted sequences of random, self-adjoint matrices, and they frame simple, easily verifiable hypotheses on the summands, and yield strong conclusions about the large deviation behavior of the maximum eigenvalue of the ∆-sum.
Abstract: This report presents probability inequalities for sums of adapted sequences of random, self-adjoint matrices. The results frame simple, easily verifiable hypotheses on the summands, and they yield strong conclusions about the large-deviation behavior of the maximum eigenvalue of the sum. The methods also specialize to sums of independent random matrices.

76 citations


Journal ArticleDOI
TL;DR: This paper establishes the restricted isometry property for a Gabor system generated by n2 time–frequency shifts of a random window function in n dimensions by establishing the sth order restricted isometric constant of the associated n × n2 Gabor synthesis matrix is small.
Abstract: This paper establishes the restricted isometry property for a Gabor system generated by n2 time–frequency shifts of a random window function in n dimensions. The sth order restricted isometry constant of the associated n × n2 Gabor synthesis matrix is small provided that s ≤ cn2/3 / log2n. This bound provides a qualitative improvement over previous estimates, which achieve only quadratic scaling of the sparsity s with respect to n. The proof depends on an estimate for the expected supremum of a second-order chaos.

67 citations


ReportDOI
TL;DR: The minimax Laplace transform (MLT) as discussed by the authors is a modification of the cumulant-based matrix Laplace Transform (CBLT) that yields both upper and lower bounds on each eigenvalue of a sum of random self-adjoint matrices.
Abstract: This work introduces the minimax Laplace transform method, a modification of the cumulant-based matrix Laplace transform method developed in [Tro11c] that yields both upper and lower bounds on each eigenvalue of a sum of random self-adjoint matrices. This machinery is used to derive eigenvalue analogs of the classical Chernoff, Bennett, and Bernstein bounds. Two examples demonstrate the efficacy of the minimax Laplace transform. The first concerns the effects of column sparsification on the spectrum of a matrix with orthonormal rows. Here, the behavior of the singular values can be described in terms of coherence-like quantities. The second example addresses the question of relative accuracy in the estimation of eigenvalues of the covariance matrix of a random process. Standard results on the convergence of sample covariance matrices provide bounds on the number of samples needed to obtain relative accuracy in the spectral norm, but these results only guarantee relative accuracy in the estimate of the maximum eigenvalue. The minimax Laplace transform argument establishes that if the lowest eigenvalues decay sufficiently fast, Ω(e^(-2)κ^2_l l log p) samples, where κ_l = λ_1(C)/λ_l(C), are sufficient to ensure that the dominant l eigenvalues of the covariance matrix of a N(0,C) random vector are estimated to within a factor of 1 ± e with high probability.

56 citations


Posted Content
TL;DR: In this paper, a short analysis of the masked covariance estimator by means of a matrix concentration inequality is provided, and it is shown that n = O(B log 2 p) samples suffice to estimate a banded covariance matrix with bandwidth B up to a relative spectral norm error.
Abstract: Covariance estimation becomes challenging in the regime where the number p of variables outstrips the number n of samples available to construct the estimate. One way to circumvent this problem is to assume that the covariance matrix is nearly sparse and to focus on estimating only the significant entries. To analyze this approach, Levina and Vershynin (2011) introduce a formalism called masked covariance estimation, where each entry of the sample covariance estimator is reweighted to reflect an a priori assessment of its importance. This paper provides a short analysis of the masked sample covariance estimator by means of a matrix concentration inequality. The main result applies to general distributions with at least four moments. Specialized to the case of a Gaussian distribution, the theory offers qualitative improvements over earlier work. For example, the new results show that n = O(B log^2 p) samples suffice to estimate a banded covariance matrix with bandwidth B up to a relative spectral-norm error, in contrast to the sample complexity n = O(B log^5 p) obtained by Levina and Vershynin.

41 citations


Journal ArticleDOI
TL;DR: A succinct proof of a 1973 theorem of Lieb that establishes the concavity of a certain trace function is provided that relies on a deep result from quantum information theory, the joint convexity of quantum relative entropy, and a recent argument due to Carlen and Lieb.
Abstract: This note provides a succinct proof of a 1973 theorem of Lieb that establishes the concavity of a certain trace function. The development relies on a deep result from quantum information theory, the joint convexity of quantum relative entropy, as well as a recent argument due to Carlen and Lieb.

Journal ArticleDOI
TL;DR: In this article, it was shown that it is possible to bound the expectation of a random matrix drawn from the Stiefel manifold in terms of the expected norm of a standard Gaussian matrix with the same dimensions.
Abstract: This note demonstrates that it is possible to bound the expectation of an arbitrary norm of a random matrix drawn from the Stiefel manifold in terms of the expected norm of a standard Gaussian matrix with the same dimensions. A related comparison holds for any convex function of a random matrix drawn from the Stiefel manifold. For certain norms, a reversed inequality is also valid.

Journal ArticleDOI
TL;DR: In this article, the authors established the restricted isometry property for finite dimensional Gabor systems, that is, for families of time shifts of a randomly chosen window function, and developed bounds for a corresponding chaos process.
Abstract: We establish the restricted isometry property for finite dimensional Gabor systems, that is, for families of time--frequency shifts of a randomly chosen window function. We show that the $s$-th order restricted isometry constant of the associated $n \times n^2$ Gabor synthesis matrix is small provided $s \leq c \, n^{2/3} / \log^2 n$. This improves on previous estimates that exhibit quadratic scaling of $n$ in $s$. Our proof develops bounds for a corresponding chaos process.