scispace - formally typeset
Open AccessPosted Content

A randomized algorithm for principal component analysis

Reads0
Chats0
TLDR
This work describes an efficient algorithm for the low-rank approximation of matrices that produces accuracy that is very close to the best possible accuracy, for matrices of arbitrary sizes.
Abstract
Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, efficient algorithms have not come with guarantees of good accuracy, unless one or both dimensions of the matrix being approximated are small. We describe an efficient algorithm for the low-rank approximation of matrices that produces accuracy very close to the best possible, for matrices of arbitrary sizes. We illustrate our theoretical results via several numerical examples.

read more

Citations
More filters
Posted Content

Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions

TL;DR: In this article, a modular framework for constructing randomized algorithms that compute partial matrix decompositions is presented, which uses random sampling to identify a subspace that captures most of the action of a matrix and then the input matrix is compressed to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization.
Journal ArticleDOI

Machine Learning for Fluid Mechanics

TL;DR: An overview of machine learning for fluid mechanics can be found in this article, where the strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation.
Posted Content

Randomized algorithms for matrices and data

TL;DR: This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.
Journal ArticleDOI

Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis

TL;DR: In this article, a rank reduction algorithm for simultaneous reconstruction and random noise attenuation of seismic records is proposed, which is based on multichannel singular spectrum analysis (MSSA).
References
More filters
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book

The algebraic eigenvalue problem

TL;DR: Theoretical background Perturbation theory Error analysis Solution of linear algebraic equations Hermitian matrices Reduction of a general matrix to condensed form Eigenvalues of matrices of condensed forms The LR and QR algorithms Iterative methods Bibliography.
Proceedings ArticleDOI

Latent semantic indexing: a probabilistic analysis

TL;DR: It is proved that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance.
Proceedings Article

EM Algorithms for PCA and SPCA

TL;DR: An expectation-maximization (EM) algorithm for principal component analysis (PCA) which allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data and defines a proper density model in the data space.
Proceedings ArticleDOI

Improved Approximation Algorithms for Large Matrices via Random Projections

TL;DR: In this paper, the authors present a (1 + ∆)-approximation algorithm for the singular value decomposition of an m? n matrix A with M non-zero entries that requires 2 passes over the data and runs in time O(n 2 ).