Author
Alex Gittens
Other affiliations: University of California, Berkeley, eBay, California Institute of Technology ...read more
Bio: Alex Gittens is an academic researcher from Rensselaer Polytechnic Institute. The author has contributed to research in topics: Spark (mathematics) & Kernel (statistics). The author has an hindex of 21, co-authored 52 publications receiving 1634 citations. Previous affiliations of Alex Gittens include University of California, Berkeley & eBay.
Papers
More filters
Proceedings Article•
16 Jun 2013TL;DR: An empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices and a suite of worst-case theoretical bounds for both random sampling and random projection methods are complemented.
Abstract: We reconsider randomized algorithms for the low-rank approximation of SPSD matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our results highlight complementary aspects of sampling versus projection methods, and they point to differences between uniform and nonuniform sampling methods based on leverage scores. We complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods. These bounds are qualitatively superior to existing bounds--e.g., improved additive-error bounds for spectral and Frobenius norm error and relative-error bounds for trace norm error.
308 citations
Journal Article•
TL;DR: In this paper, the authors evaluate the performance of sampling and projection algorithms for the low-rank approximation of symmetric positive semi-definite matrices such as Laplacian and kernel matrices.
Abstract: We reconsider randomized algorithms for the low-rank approximation of symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices Our results highlight complementary aspects of sampling versus projection methods; they characterize the effects of common data preprocessing steps on the performance of these algorithms; and they point to important differences between uniform sampling and nonuniform sampling methods based on leverage scores In addition, our empirical results illustrate that existing theory is so weak that it does not provide even a qualitative guide to practice Thus, we complement our empirical results with a suite of worst-case theoretical bounds for both random sampling and random projection methods These bounds are qualitatively superior to existing bounds--eg, improved additive-error bounds for spectral and Frobenius norm error and relative-error bounds for trace norm error--and they point to future directions to make these algorithms useful in even larger-scale machine learning applications
189 citations
TL;DR: This article addresses the efficacy, in the Frobenius and spectral norms, of an SRHT-based low-rank matrix approximation technique introduced by Woolfe, Liberty, Rohklin, and Tygert, and produces several results on matrix operations with SRHTs that may be of independent interest.
Abstract: Several recent randomized linear algebra algorithms rely upon fast dimension reduction methods. A popular choice is the subsampled randomized Hadamard transform (SRHT). In this article, we address the efficacy, in the Frobenius and spectral norms, of an SRHT-based low-rank matrix approximation technique introduced by Woolfe, Liberty, Rohklin, and Tygert. We establish a slightly better Frobenius norm error bound than is currently available, and a much sharper spectral norm error bound (in the presence of reasonable decay of the singular values). Along the way, we produce several results on matrix operations with SRHTs (such as approximate matrix multiplication) that may be of independent interest. Our approach builds upon Tropp's in “Improved Analysis of the Subsampled Randomized Hadamard Transform” [Adv. Adaptive Data Anal., 3 (2011), pp. 115--126].
126 citations
Posted Content•
TL;DR: In this article, the efficacy of the SRHT-based low-rank matrix approximation technique has been investigated, and a slightly better Frobenius norm error bound has been established in the presence of a reasonable decay of the singular values.
Abstract: Several recent randomized linear algebra algorithms rely upon fast dimension reduction methods. A popular choice is the Subsampled Randomized Hadamard Transform (SRHT). In this article, we address the efficacy, in the Frobenius and spectral norms, of an SRHT-based low-rank matrix approximation technique introduced by Woolfe, Liberty, Rohklin, and Tygert. We establish a slightly better Frobenius norm error bound than currently available, and a much sharper spectral norm error bound (in the presence of reasonable decay of the singular values). Along the way, we produce several results on matrix operations with SRHTs (such as approximate matrix multiplication) that may be of independent interest. Our approach builds upon Tropp's in "Improved analysis of the Subsampled Randomized Hadamard Transform".
88 citations
Posted Content•
TL;DR: This paper provides the first relative-error bound on the spectral norm error incurred in this process, which follows from a natural connection between the Nystrom extension and the column subset selection problem.
Abstract: The naive Nystrom extension forms a low-rank approximation to a positive-semidefinite matrix by uniformly randomly sampling from its columns. This paper provides the first relative-error bound on the spectral norm error incurred in this process. This bound follows from a natural connection between the Nystrom extension and the column subset selection problem. The main tool is a matrix Chernoff bound for sampling without replacement.
83 citations
Cited by
More filters
TL;DR: This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions.
Abstract: Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets. This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed—either explicitly or implicitly—to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, robustness, and/or speed. These claims are supported by extensive numerical experiments and a detailed error analysis. The specific benefits of randomized techniques depend on the computational environment. Consider the model problem of finding the $k$ dominant components of the singular value decomposition of an $m \times n$ matrix. (i) For a dense input matrix, randomized algorithms require $\bigO(mn \log(k))$ floating-point operations (flops) in contrast to $ \bigO(mnk)$ for classical algorithms. (ii) For a sparse input matrix, the flop count matches classical Krylov subspace methods, but the randomized approach is more robust and can easily be reorganized to exploit multiprocessor architectures. (iii) For a matrix that is too large to fit in fast memory, the randomized techniques require only a constant number of passes over the data, as opposed to $\bigO(k)$ passes for classical algorithms. In fact, it is sometimes possible to perform matrix approximation with a single pass over the data.
3,248 citations
TL;DR: This paper reviews significant deep learning related models and methods that have been employed for numerous NLP tasks and provides a walk-through of their evolution.
Abstract: Deep learning methods employ multiple processing layers to learn hierarchical representations of data, and have produced state-of-the-art results in many domains. Recently, a variety of model designs and methods have blossomed in the context of natural language processing (NLP). In this paper, we review significant deep learning related models and methods that have been employed for numerous NLP tasks and provide a walk-through of their evolution. We also summarize, compare and contrast the various models and put forward a detailed understanding of the past, present and future of deep learning in NLP.
2,466 citations
Posted Content•
TL;DR: In this article, a modular framework for constructing randomized algorithms that compute partial matrix decompositions is presented, which uses random sampling to identify a subspace that captures most of the action of a matrix and then the input matrix is compressed to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization.
Abstract: Low-rank matrix approximations, such as the truncated singular value decomposition and the rank-revealing QR decomposition, play a central role in data analysis and scientific computing. This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation. These techniques exploit modern computational architectures more fully than classical methods and open the possibility of dealing with truly massive data sets.
This paper presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions. These methods use random sampling to identify a subspace that captures most of the action of a matrix. The input matrix is then compressed---either explicitly or implicitly---to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization. In many cases, this approach beats its classical competitors in terms of accuracy, speed, and robustness. These claims are supported by extensive numerical experiments and a detailed error analysis.
2,356 citations
25 Apr 2018
TL;DR: An overview of core ideas in GSP and their connection to conventional digital signal processing are provided, along with a brief historical perspective to highlight how concepts recently developed build on top of prior research in other areas.
Abstract: Research in graph signal processing (GSP) aims to develop tools for processing data defined on irregular graph domains. In this paper, we first provide an overview of core ideas in GSP and their connection to conventional digital signal processing, along with a brief historical perspective to highlight how concepts recently developed in GSP build on top of prior research in other areas. We then summarize recent advances in developing basic GSP tools, including methods for sampling, filtering, or graph learning. Next, we review progress in several application areas using GSP, including processing and analysis of sensor network data, biological data, and applications to image processing and machine learning.
1,306 citations
Book•
27 Sep 2018TL;DR: A broad range of illustrations is embedded throughout, including classical and modern results for covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, machine learning, compressed sensing, and sparse regression.
Abstract: High-dimensional probability offers insight into the behavior of random vectors, random matrices, random subspaces, and objects used to quantify uncertainty in high dimensions Drawing on ideas from probability, analysis, and geometry, it lends itself to applications in mathematics, statistics, theoretical computer science, signal processing, optimization, and more It is the first to integrate theory, key tools, and modern applications of high-dimensional probability Concentration inequalities form the core, and it covers both classical results such as Hoeffding's and Chernoff's inequalities and modern developments such as the matrix Bernstein's inequality It then introduces the powerful methods based on stochastic processes, including such tools as Slepian's, Sudakov's, and Dudley's inequalities, as well as generic chaining and bounds based on VC dimension A broad range of illustrations is embedded throughout, including classical and modern results for covariance estimation, clustering, networks, semidefinite programming, coding, dimension reduction, matrix completion, machine learning, compressed sensing, and sparse regression
1,190 citations