A randomized algorithm for principal component analysis

Open AccessPosted Content

A randomized algorithm for principal component analysis

Vladimir Rokhlin, +2 more

- 12 Sep 2008 -

arXiv: Computation

Chats0

TLDR

This work describes an efficient algorithm for the low-rank approximation of matrices that produces accuracy that is very close to the best possible accuracy, for matrices of arbitrary sizes.

Abstract:

Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, efficient algorithms have not come with guarantees of good accuracy, unless one or both dimensions of the matrix being approximated are small. We describe an efficient algorithm for the low-rank approximation of matrices that produces accuracy very close to the best possible, for matrices of arbitrary sizes. We illustrate our theoretical results via several numerical examples.

Citations

PDF

Open Access

More filters

Posted Content

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Nathan Halko, +2 more

- 22 Sep 2009 -

arXiv: Numerical Analysis

TL;DR: In this article, a modular framework for constructing randomized algorithms that compute partial matrix decompositions is presented, which uses random sampling to identify a subspace that captures most of the action of a matrix and then the input matrix is compressed to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization.

...read moreread less

Journal ArticleDOI

Machine Learning for Fluid Mechanics

Steven L. Brunton, +3 more

- 27 May 2019 -

arXiv: Fluid Dynamics

TL;DR: An overview of machine learning for fluid mechanics can be found in this article, where the strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation.

...read moreread less

Posted Content

Randomized algorithms for matrices and data

Michael W. Mahoney

- 29 Apr 2011 -

arXiv: Data Structures and Algorithms

TL;DR: This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.

...read moreread less

Journal ArticleDOI

Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis

Vicente E. Oropeza, +1 more

- 01 May 2011 -

Geophysics

TL;DR: In this article, a rank reduction algorithm for simultaneous reconstruction and random noise attenuation of seismic records is proposed, which is based on multichannel singular spectrum analysis (MSSA).

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990 -

Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Book

The algebraic eigenvalue problem

James Hardy Wilkinson

TL;DR: Theoretical background Perturbation theory Error analysis Solution of linear algebraic equations Hermitian matrices Reduction of a general matrix to condensed form Eigenvalues of matrices of condensed forms The LR and QR algorithms Iterative methods Bibliography.

...read moreread less

Proceedings ArticleDOI

Latent semantic indexing: a probabilistic analysis

Christos H. Papadimitriou, +3 more

TL;DR: It is proved that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance.

...read moreread less

Proceedings Article

EM Algorithms for PCA and SPCA

Sam T. Roweis

TL;DR: An expectation-maximization (EM) algorithm for principal component analysis (PCA) which allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data and defines a proper density model in the data space.

...read moreread less

Proceedings ArticleDOI

Improved Approximation Algorithms for Large Matrices via Random Projections

Tamas Sarlos

TL;DR: In this paper, the authors present a (1 + ∆)-approximation algorithm for the singular value decomposition of an m? n matrix A with M non-zero entries that requires 2 passes over the data and runs in time O(n 2 ).

...read moreread less

Collapse

A randomized algorithm for principal component analysis

Citations

Scikit-learn: Machine Learning in Python

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

Machine Learning for Fluid Mechanics

Randomized algorithms for matrices and data

Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis

References

Indexing by Latent Semantic Analysis

The algebraic eigenvalue problem

Latent semantic indexing: a probabilistic analysis

EM Algorithms for PCA and SPCA

Improved Approximation Algorithms for Large Matrices via Random Projections

Related Papers (5)

Value function approximation via low-rank models

Estimation of (near) low-rank matrices with noise and high-dimensional scaling

On the numerical rank of radial basis function kernel matrices in high dimension

Spectral methods for matrices and tensors

Fast approximation of matrix coherence and statistical leverage