Relative-Error $CUR$ Matrix Decompositions

doi:10.1137/07070471X

Open AccessJournal ArticleDOI

Relative-Error $CUR$ Matrix Decompositions

Petros Drineas, +2 more

- 01 May 2008 -

SIAM Journal on Matrix Analysis and Appl...

- Vol. 30, Iss: 2, pp 844-881

Chats0

TLDR

Subspace sampling as discussed by the authors is a sampling method for low-rank matrix decompositions with relative error guarantees. But it is not known whether such a matrix decomposition exists in general.

Abstract:

Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an $m\times n$ matrix $A$ and a rank parameter $k$. In our first algorithm, $C$ is chosen, and we let $A'=CC^+A$, where $C^+$ is the Moore-Penrose generalized inverse of $C$. In our second algorithm $C$, $U$, $R$ are chosen, and we let $A'=CUR$. ($C$ and $R$ are matrices that consist of actual columns and rows, respectively, of $A$, and $U$ is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least $1-\delta$, $\|A-A'\|_F\leq(1+\epsilon)\,\|A-A_k\|_F$, where $A_k$ is the “best” rank-$k$ approximation provided by truncating the SVD of $A$, and where $\|X\|_F$ is the Frobenius norm of the matrix $X$. The number of columns of $C$ and rows of $R$ is a low-degree polynomial in $k$, $1/\epsilon$, and $\log(1/\delta)$. Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants of these matrix decompositions over the last ten years. However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple and they take time of the order needed to approximately compute the top $k$ singular vectors of $A$. The technical crux of our analysis is a novel, intuitive sampling method we introduce in this paper called “subspace sampling.” In subspace sampling, the sampling probabilities depend on the Euclidean norms of the rows of the top singular vectors. This allows us to obtain provable relative-error guarantees by deconvoluting “subspace” information and “size-of-$A$” information in the input matrix. This technique is likely to be useful for other matrix approximation and data analysis problems.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

Nathan Halko, +2 more

- 01 May 2011 -

Siam Review

TL;DR: This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions.

...read moreread less

Journal ArticleDOI

Hybrid Whale Optimization Algorithm with Simulated Annealing for Feature Selection

Majdi Mafarja, +1 more

- 18 Oct 2017 -

Neurocomputing

TL;DR: The experimental results confirm the efficiency of the proposed approaches in improving the classification accuracy compared to other wrapper-based algorithms, which insures the ability of WOA algorithm in searching the feature space and selecting the most informative attributes for classification tasks.

...read moreread less

Journal ArticleDOI

CUR matrix decompositions for improved data analysis

Michael W. Mahoney, +1 more

- 20 Jan 2009 -

Proceedings of the National Academy of S...

TL;DR: An algorithm is presented that preferentially chooses columns and rows that exhibit high “statistical leverage” and exert a disproportionately large “influence” on the best low-rank fit of the data matrix, obtaining improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work.

...read moreread less

Posted Content

Randomized algorithms for matrices and data

Michael W. Mahoney

- 29 Apr 2011 -

arXiv: Data Structures and Algorithms

TL;DR: This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.

...read moreread less

Book

Sketching as a Tool for Numerical Linear Algebra

David P. Woodruff

TL;DR: A survey of linear sketching algorithms for numeric allinear algebra can be found in this paper, where the authors consider least squares as well as robust regression problems, low rank approximation, and graph sparsification.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Matrix computations

Gene H. Golub

Book

Matrix Analysis

Roger A. Horn, +1 more

TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.

...read moreread less

Journal ArticleDOI

An introduction to variable and feature selection

Isabelle Guyon, +1 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

...read moreread less

Journal ArticleDOI

A haplotype map of the human genome

John W. Belmont, +232 more

TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.

...read moreread less

Book

Generalized inverses: theory and applications

Adi Ben-Israel, +1 more

TL;DR: In this paper, the Moore of the Moore-Penrose Inverse is described as a generalized inverse of a linear operator between Hilbert spaces, and a spectral theory for rectangular matrices is proposed.

...read moreread less