Open AccessPosted Content
A randomized algorithm for principal component analysis
Reads0
Chats0
TLDR
This work describes an efficient algorithm for the low-rank approximation of matrices that produces accuracy that is very close to the best possible accuracy, for matrices of arbitrary sizes.Abstract:
Principal component analysis (PCA) requires the computation of a low-rank approximation to a matrix containing the data being analyzed. In many applications of PCA, the best possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spectral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, efficient algorithms have not come with guarantees of good accuracy, unless one or both dimensions of the matrix being approximated are small. We describe an efficient algorithm for the low-rank approximation of matrices that produces accuracy very close to the best possible, for matrices of arbitrary sizes. We illustrate our theoretical results via several numerical examples.read more
Citations
More filters
Posted Content
Scikit-learn: Machine Learning in Python
Fabian Pedregosa,Gaël Varoquaux,Alexandre Gramfort,Vincent Michel,Bertrand Thirion,Olivier Grisel,Mathieu Blondel,Andreas Müller,Joel Nothman,Gilles Louppe,Peter Prettenhofer,Ron Weiss,Vincent Dubourg,Jake Vanderplas,Alexandre Passos,David Cournapeau,Matthieu Brucher,Matthieu Perrot,Edouard Duchesnay +18 more
TL;DR: Scikit-learn as mentioned in this paper is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.
Posted Content
Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions
TL;DR: In this article, a modular framework for constructing randomized algorithms that compute partial matrix decompositions is presented, which uses random sampling to identify a subspace that captures most of the action of a matrix and then the input matrix is compressed to this subspace, and the reduced matrix is manipulated deterministically to obtain the desired low-rank factorization.
Journal ArticleDOI
Machine Learning for Fluid Mechanics
TL;DR: An overview of machine learning for fluid mechanics can be found in this article, where the strengths and limitations of these methods are addressed from the perspective of scientific inquiry that considers data as an inherent part of modeling, experimentation, and simulation.
Posted Content
Randomized algorithms for matrices and data
TL;DR: This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.
Journal ArticleDOI
Simultaneous seismic data denoising and reconstruction via multichannel singular spectrum analysis
TL;DR: In this article, a rank reduction algorithm for simultaneous reconstruction and random noise attenuation of seismic records is proposed, which is based on multichannel singular spectrum analysis (MSSA).
References
More filters
Journal ArticleDOI
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book
The algebraic eigenvalue problem
TL;DR: Theoretical background Perturbation theory Error analysis Solution of linear algebraic equations Hermitian matrices Reduction of a general matrix to condensed form Eigenvalues of matrices of condensed forms The LR and QR algorithms Iterative methods Bibliography.
Proceedings ArticleDOI
Latent semantic indexing: a probabilistic analysis
TL;DR: It is proved that under certain conditions LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance.
Proceedings Article
EM Algorithms for PCA and SPCA
TL;DR: An expectation-maximization (EM) algorithm for principal component analysis (PCA) which allows a few eigenvectors and eigenvalues to be extracted from large collections of high dimensional data and defines a proper density model in the data space.
Proceedings ArticleDOI
Improved Approximation Algorithms for Large Matrices via Random Projections
TL;DR: In this paper, the authors present a (1 + ∆)-approximation algorithm for the singular value decomposition of an m? n matrix A with M non-zero entries that requires 2 passes over the data and runs in time O(n 2 ).