Low-Rank Approximation and Regression in Input Sparsity Time

doi:10.1145/3019134

Open AccessJournal ArticleDOI

Low-Rank Approximation and Regression in Input Sparsity Time

Kenneth L. Clarkson, +1 more

- 30 Jan 2017 -

Journal of the ACM

- Vol. 63, Iss: 6, pp 54

Chats0

TLDR

A new distribution over m × n matrices S is designed so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, ∥SAx∥2 = (1 ± ε)∥Ax∢2 simultaneously for all x ∈ Rd.

Abstract:

We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, pSAxp2 = (1 ± e)pAxp2 simultaneously for all x ∈ Rd. Here, m is bounded by a polynomial in re− 1, and the parameter e ∈ (0, 1]. Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A)) time, where nnz(A) is the number of nonzero entries of A. This improves over all previous subspace embeddings, for which computing SA required at least Ω(ndlog d) time. We call these Ssparse embedding matrices.Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and ep regression.More specifically, let b be an n × 1 vector, e > 0 a small enough value, and integers k, p g 1. Our results include the following.—Regression: The regression problem is to find d × 1 vector x′ for which pAx′ − bpp l (1 + e)min xpAx − bpp. For the Euclidean case p = 2, we obtain an algorithm running in O(nnz(A)) + O(d3e −2) time, and another in O(nnz(A)log(1/e)) + O(d3 log (1/e)) time. (Here, O(f) = f c log O(1)(f).) For p ∈ [1, ∞), more generally, we obtain an algorithm running in O(nnz(A) log n) + O(r\e −1)C time, for a fixed C.—Low-rank approximation: We give an algorithm to obtain a rank-k matrix Âk such that pA − ÂkpF ≤ (1 + e )p A − AkpF, where Ak is the best rank-k approximation to A. (That is, Ak is the output of principal components analysis, produced by a truncated singular value decomposition, useful for latent semantic indexing and many other statistical problems.) Our algorithm runs in O(nnz(A)) + O(nk2e−4 + k3e−5) time.—Leverage scores: We give an algorithm to estimate the leverage scores of A, up to a constant factor, in O(nnz(A)log n) + O(r3)time.

Low-Rank Approximation and Regression in Input Sparsity Time

Citations

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression

RandNLA: randomized numerical linear algebra

Modeling and optimization for big data analytics: (Statistical) learning tools for our era of data deluge

References

Matrix computations (3rd ed.)

Authoritative sources in a hyperlinked environment

Numerical Linear Algebra

Extensions of Lipschitz mappings into Hilbert space

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Related Papers (5)

Sketching as a Tool for Numerical Linear Algebra

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

Improved Approximation Algorithms for Large Matrices via Random Projections

Faster least squares approximation

Randomized Algorithms for Matrices and Data