scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Low rank approximation and regression in input sparsity time

01 Jun 2013-pp 81-90
TL;DR: The fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and lp-regression are obtained.
Abstract: We design a new distribution over poly(r e-1) x n matrices S so that for any fixed n x d matrix A of rank r, with probability at least 9/10, SAx2 = (1 pm e)Ax2 simultaneously for all x ∈ Rd. Such a matrix S is called a subspace embedding. Furthermore, SA can be computed in O(nnz(A)) + ~O(r2e-2) time, where nnz(A) is the number of non-zero entries of A. This improves over all previous subspace embeddings, which required at least Ω(nd log d) time to achieve this property. We call our matrices S sparse embedding matrices.Using our sparse embedding matrices, we obtain the fastest known algorithms for overconstrained least-squares regression, low-rank approximation, approximating all leverage scores, and lp-regression: to output an x' for which Ax'-b2 ≤ (1+e)minx Ax-b2 for an n x d matrix A and an n x 1 column vector b, we obtain an algorithm running in O(nnz(A)) + ~O(d3e-2) time, and another in O(nnz(A)log(1/e)) + ~O(d3log(1/e)) time. (Here ~O(f) = f ⋅ logO(1)(f).) to obtain a decomposition of an n x n matrix A into a product of an n x k matrix L, a k x k diagonal matrix D, and a n x k matrix W, for which F{A - L D W} ≤ (1+e)F{A-Ak}, where Ak is the best rank-k approximation, our algorithm runs in O(nnz(A)) + ~O(nk2 e-4log n + k3e-5log2n) time. to output an approximation to all leverage scores of an n x d input matrix A simultaneously, with constant relative error, our algorithms run in O(nnz(A) log n) + ~O(r3) time. to output an x' for which Ax'-bp ≤ (1+e)minx Ax-bp for an n x d matrix A and an n x 1 column vector b, we obtain an algorithm running in O(nnz(A) log n) + poly(r e-1) time, for any constant 1 ≤ p
Citations
More filters
Book
29 May 2015
TL;DR: The matrix concentration inequalities as discussed by the authors are a family of matrix inequalities that can be found in many areas of theoretical, applied, and computational mathematics. But they are not suitable for the analysis of random matrices.
Abstract: Random matrices now play a role in many areas of theoretical, applied,and computational mathematics. Therefore, it is desirable to have toolsfor studying random matrices that are flexible, easy to use, and powerful.Over the last fifteen years, researchers have developed a remarkablefamily of results, called matrix concentration inequalities, that achieveall of these goals.This monograph offers an invitation to the field of matrix concentrationinequalities. It begins with some history of random matrix theory;it describes a flexible model for random matrices that is suitablefor many problems; and it discusses the most important matrix concentrationresults. To demonstrate the value of these techniques, thepresentation includes examples drawn from statistics, machine learning,optimization, combinatorics, algorithms, scientific computing, andbeyond.

690 citations

Book
David P. Woodruff1
14 Nov 2014
TL;DR: A survey of linear sketching algorithms for numeric allinear algebra can be found in this paper, where the authors consider least squares as well as robust regression problems, low rank approximation, and graph sparsification.
Abstract: This survey highlights the recent advances in algorithms for numericallinear algebra that have come from the technique of linear sketching,whereby given a matrix, one first compresses it to a much smaller matrixby multiplying it by a (usually) random matrix with certain properties.Much of the expensive computation can then be performed onthe smaller matrix, thereby accelerating the solution for the originalproblem. In this survey we consider least squares as well as robust regressionproblems, low rank approximation, and graph sparsification.We also discuss a number of variants of these problems. Finally, wediscuss the limitations of sketching methods.

584 citations

Journal ArticleDOI
David P. Woodruff1
TL;DR: This survey highlights the recent advances in algorithms for numericallinear algebra that have come from the technique of linear sketching, and considers least squares as well as robust regression problems, low rank approximation, and graph sparsification.
Abstract: This survey highlights the recent advances in algorithms for numerical linear algebra that have come from the technique of linear sketching, whereby given a matrix, one first compresses it to a much smaller matrix by multiplying it by a (usually) random matrix with certain properties. Much of the expensive computation can then be performed on the smaller matrix, thereby accelerating the solution for the original problem. In this survey we consider least squares as well as robust regression problems, low rank approximation, and graph sparsification. We also discuss a number of variants of these problems. Finally, we discuss the limitations of sketching methods.

335 citations

Posted Content
TL;DR: In this paper, the authors developed and analyzed a method to reduce the size of a very large set of data points in a high dimensional Euclidean space R d to a small set of weighted points such that the result of a predetermined data analysis task on the reduced set is approximately the same as that for the original point set.
Abstract: We develop and analyze a method to reduce the size of a very large set of data points in a high dimensional Euclidean space R d to a small set of weighted points such that the result of a predetermined data analysis task on the reduced set is approximately the same as that for the original point set. For example, computing the first k principal components of the reduced set will return approximately the first k principal components of the original set or computing the centers of a k-means clustering on the reduced set will return an approximation for the original set. Such a reduced set is also known as a coreset. The main new feature of our construction is that the cardinality of the reduced set is independent of the dimension d of the input space and that the sets are mergable. The latter property means that the union of two reduced sets is a reduced set for the union of the two original sets (this property has recently also been called composability, see Indyk et. al., PODS 2014). It allows us to turn our methods into streaming or distributed algorithms using standard approaches. For problems such as k-means and subspace approximation the coreset sizes are also independent of the number of input points. Our method is based on projecting the points on a low dimensional subspace and reducing the cardinality of the points inside this subspace using known methods. The proposed approach works for a wide range of data analysis techniques including k-means clustering, principal component analysis and subspace clustering. The main conceptual contribution is a new coreset definition that allows to charge costs that appear for every solution to an additive constant.

318 citations

Proceedings ArticleDOI
14 Jun 2015
TL;DR: In this paper, the authors show how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+e) error.
Abstract: We show how to approximate a data matrix A with a much smaller sketch ~A that can be used to solve a general class of constrained k-rank approximation problems to within (1+e) error. Importantly, this class includes k-means clustering and unconstrained low rank approximation (i.e. principal component analysis). By reducing data points to just O(k) dimensions, we generically accelerate any exact, approximate, or heuristic algorithm for these ubiquitous problems. For k-means dimensionality reduction, we provide (1+e) relative error results for many common sketching techniques, including random row projection, column selection, and approximate SVD. For approximate principal component analysis, we give a simple alternative to known algorithms that has applications in the streaming setting. Additionally, we extend recent work on column-based matrix reconstruction, giving column subsets that not only 'cover' a good subspace for A}, but can be used directly to compute this subspace.Finally, for k-means clustering, we show how to achieve a (9+e) approximation by Johnson-Lindenstrauss projecting data to just O(log k/e2) dimensions. This is the first result that leverages the specific structure of k-means to achieve dimension independent of input size and sublinear in k.

314 citations

References
More filters
Journal ArticleDOI
Jon Kleinberg1
TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
Abstract: The network structure of a hyperlinked environment can be a rich source of information about the content of the environment, provided we have effective means for understanding it. We develop a set of algorithmic tools for extracting information from the link structures of such environments, and report on experiments that demonstrate their effectiveness in a variety of context on the World Wide Web. The central issue we address within our framework is the distillation of broad search topics, through the discovery of “authorative” information sources on such topics. We propose and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure. Our formulation has connections to the eigenvectors of certain matrices associated with the link graph; these connections in turn motivate additional heuristrics for link-based analysis.

8,328 citations

Book
01 Jan 1997

3,140 citations

Posted Content
TL;DR: In this article, a modular framework for constructing randomized algorithms that compute partial matrix decompositions is presented, which uses random sampling to identify a subspace that captures most of the action of a matrix and then the input matrix is compressed to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization.
Abstract: Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed---either explicitly or implicitly---to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.

2,356 citations

Journal ArticleDOI
08 Jul 2002
TL;DR: This work presents a 1-pass algorithm for estimating the most frequent items in a data stream using limited storage space, which achieves better space bounds than the previously known best algorithms for this problem for several natural distributions on the item frequencies.
Abstract: We present a 1-pass algorithm for estimating the most frequent items in a data stream using limited storage space. Our method relies on a data structure called a COUNT SKETCH, which allows us to reliably estimate the frequencies of frequent items in the stream. Our algorithm achieves better space bounds than the previously known best algorithms for this problem for several natural distributions on the item frequencies. In addition, our algorithm leads directly to a 2-pass algorithm for the problem of estimating the items with the largest (absolute) change in frequency between two data streams. To our knowledge, this latter problem has not been previously studied in the literature.

1,589 citations

Proceedings ArticleDOI
21 Oct 2006
TL;DR: In this paper, the authors present a (1 + ∆)-approximation algorithm for the singular value decomposition of an m? n matrix A with M non-zero entries that requires 2 passes over the data and runs in time O(n 2 ).
Abstract: Recently several results appeared that show significant reduction in time for matrix multiplication, singular value decomposition as well as linear (\ell_ 2) regression, all based on data dependent random sampling. Our key idea is that low dimensional embeddings can be used to eliminate data dependence and provide more versatile, linear time pass efficient matrix computation. Our main contribution is summarized as follows. --Independent of the recent results of Har-Peled and of Deshpande and Vempala, one of the first -- and to the best of our knowledge the most efficient -- relative error (1 + \in) \parallel A - A_k \parallel _F approximation algorithms for the singular value decomposition of an m ? n matrix A with M non-zero entries that requires 2 passes over the data and runs in time O\left( {\left( {M(\frac{k} { \in } + k\log k) + (n + m)(\frac{k} { \in } + k\log k)^2 } \right)\log \frac{1} {\delta }} \right) --The first o(nd^{2}) time (1 + \in) relative error approximation algorithm for n ? d linear (\ell_2) regression. --A matrix multiplication and norm approximation algorithm that easily applies to implicitly given matrices and can be used as a black box probability boosting tool.

852 citations