scispace - formally typeset
Open AccessJournal ArticleDOI

Low-Rank Approximation and Regression in Input Sparsity Time

Kenneth L. Clarkson, +1 more
- 30 Jan 2017 - 
- Vol. 63, Iss: 6, pp 54
Reads0
Chats0
TLDR
A new distribution over m × n matrices S is designed so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, ∥SAx∥2 = (1 ± ε)∥Ax∢2 simultaneously for all x ∈ Rd.
Abstract
We design a new distribution over m × n matrices S so that, for any fixed n × d matrix A of rank r, with probability at least 9/10, pSAxp2 = (1 ± e)pAxp2 simultaneously for all x ∈ Rd. Here, m is bounded by a polynomial in re− 1, and the parameter e ∈ (0, 1]. Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A)) time, where nnz(A) is the number of nonzero entries of A. This improves over all previous subspace embeddings, for which computing SA required at least Ω(ndlog d) time. We call these Ssparse embedding matrices.Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and ep regression.More specifically, let b be an n × 1 vector, e > 0 a small enough value, and integers k, p g 1. Our results include the following.—Regression: The regression problem is to find d × 1 vector x′ for which pAx′ − bpp l (1 + e)min xpAx − bpp. For the Euclidean case p = 2, we obtain an algorithm running in O(nnz(A)) + O(d3e −2) time, and another in O(nnz(A)log(1/e)) + O(d3 log (1/e)) time. (Here, O(f) = f c log O(1)(f).) For p ∈ [1, ∞), more generally, we obtain an algorithm running in O(nnz(A) log n) + O(r\e −1)C time, for a fixed C.—Low-rank approximation: We give an algorithm to obtain a rank-k matrix Âk such that pA − ÂkpF ≤ (1 + e )p A − AkpF, where Ak is the best rank-k approximation to A. (That is, Ak is the output of principal components analysis, produced by a truncated singular value decomposition, useful for latent semantic indexing and many other statistical problems.) Our algorithm runs in O(nnz(A)) + O(nk2e−4 + k3e−5) time.—Leverage scores: We give an algorithm to estimate the leverage scores of A, up to a constant factor, in O(nnz(A)log n) + O(r3)time.

read more

Citations
More filters
Proceedings ArticleDOI

Dimensionality Reduction for k-Means Clustering and Low Rank Approximation

TL;DR: In this paper, the authors show how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+e) error.
Posted Content

OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings

TL;DR: The main result is essentially a Bai-Yin type theorem in random matrix theory and is likely to be of independent interest: for any fixed U ∈ Rn×d with orthonormal columns and random sparse Π, all singular values of ΠU lie in [1 - ε, 1 + ε] with good probability.
Proceedings ArticleDOI

Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression

TL;DR: In this article, a low-distortion embedding matrix Π ∈ RO(poly(d)) x n that embeds Ap, the lp subspace spanned by A's columns, into the poly(d)), |~cdot~|p, was constructed in O(nnz(A)) time.
Journal ArticleDOI

RandNLA: randomized numerical linear algebra

TL;DR: RandNLA is an interdisciplinary research area that exploits randomization as a computational resource to develop improved algorithms for large-scale linear algebra problems and promises a sound algorithmic and statistical foundation for modern large- scale data analysis.
Journal ArticleDOI

Modeling and optimization for big data analytics: (Statistical) learning tools for our era of data deluge

TL;DR: This article contributes to the ongoing cross-disciplinary efforts in data science by putting forth encompassing models capturing a wide range of SP-relevant data analytic tasks, such as principal component analysis (PCA), dictionary learning (DL), compressive sampling (CS), and subspace clustering.
References
More filters
Journal ArticleDOI

Authoritative sources in a hyperlinked environment

TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
Book

Numerical Linear Algebra

Posted Content

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

TL;DR: In this article, a modular framework for constructing randomized algorithms that compute partial matrix decompositions is presented, which uses random sampling to identify a subspace that captures most of the action of a matrix and then the input matrix is compressed to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization.