scispace - formally typeset
Open AccessJournal ArticleDOI

Relative-Error $CUR$ Matrix Decompositions

Reads0
Chats0
TLDR
Subspace sampling as discussed by the authors is a sampling method for low-rank matrix decompositions with relative error guarantees. But it is not known whether such a matrix decomposition exists in general.
Abstract
Many data analysis applications deal with large matrices and involve approximating the matrix using a small number of “components.” Typically, these components are linear combinations of the rows and columns of the matrix, and are thus difficult to interpret in terms of the original features of the input data. In this paper, we propose and study matrix approximations that are explicitly expressed in terms of a small number of columns and/or rows of the data matrix, and thereby more amenable to interpretation in terms of the original data. Our main algorithmic results are two randomized algorithms which take as input an $m\times n$ matrix $A$ and a rank parameter $k$. In our first algorithm, $C$ is chosen, and we let $A'=CC^+A$, where $C^+$ is the Moore-Penrose generalized inverse of $C$. In our second algorithm $C$, $U$, $R$ are chosen, and we let $A'=CUR$. ($C$ and $R$ are matrices that consist of actual columns and rows, respectively, of $A$, and $U$ is a generalized inverse of their intersection.) For each algorithm, we show that with probability at least $1-\delta$, $\|A-A'\|_F\leq(1+\epsilon)\,\|A-A_k\|_F$, where $A_k$ is the “best” rank-$k$ approximation provided by truncating the SVD of $A$, and where $\|X\|_F$ is the Frobenius norm of the matrix $X$. The number of columns of $C$ and rows of $R$ is a low-degree polynomial in $k$, $1/\epsilon$, and $\log(1/\delta)$. Both the Numerical Linear Algebra community and the Theoretical Computer Science community have studied variants of these matrix decompositions over the last ten years. However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist. Both of our algorithms are simple and they take time of the order needed to approximately compute the top $k$ singular vectors of $A$. The technical crux of our analysis is a novel, intuitive sampling method we introduce in this paper called “subspace sampling.” In subspace sampling, the sampling probabilities depend on the Euclidean norms of the rows of the top singular vectors. This allows us to obtain provable relative-error guarantees by deconvoluting “subspace” information and “size-of-$A$” information in the input matrix. This technique is likely to be useful for other matrix approximation and data analysis problems.

read more

Citations
More filters
Journal ArticleDOI

Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

TL;DR: This work surveys and extends recent research which demonstrates that randomization offers a powerful tool for performing low-rank matrix approximation, and presents a modular framework for constructing randomized algorithms that compute partial matrix decompositions.
Journal ArticleDOI

Hybrid Whale Optimization Algorithm with Simulated Annealing for Feature Selection

TL;DR: The experimental results confirm the efficiency of the proposed approaches in improving the classification accuracy compared to other wrapper-based algorithms, which insures the ability of WOA algorithm in searching the feature space and selecting the most informative attributes for classification tasks.
Journal ArticleDOI

CUR matrix decompositions for improved data analysis

TL;DR: An algorithm is presented that preferentially chooses columns and rows that exhibit high “statistical leverage” and exert a disproportionately large “influence” on the best low-rank fit of the data matrix, obtaining improved relative-error and constant-factor approximation guarantees in worst-case analysis, as opposed to the much coarser additive-error guarantees of prior work.
Posted Content

Randomized algorithms for matrices and data

TL;DR: This monograph will provide a detailed overview of recent work on the theory of randomized matrix algorithms as well as the application of those ideas to the solution of practical problems in large-scale data analysis.
Book

Sketching as a Tool for Numerical Linear Algebra

TL;DR: A survey of linear sketching algorithms for numeric allinear algebra can be found in this paper, where the authors consider least squares as well as robust regression problems, low rank approximation, and graph sparsification.
References
More filters
Book

Matrix computations

Gene H. Golub
Book

Matrix Analysis

TL;DR: In this article, the authors present results of both classic and recent matrix analyses using canonical forms as a unifying theme, and demonstrate their importance in a variety of applications, such as linear algebra and matrix theory.
Journal ArticleDOI

An introduction to variable and feature selection

TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Journal ArticleDOI

A haplotype map of the human genome

John W. Belmont, +232 more
TL;DR: A public database of common variation in the human genome: more than one million single nucleotide polymorphisms for which accurate and complete genotypes have been obtained in 269 DNA samples from four populations, including ten 500-kilobase regions in which essentially all information about common DNA variation has been extracted.
Book

Generalized inverses: theory and applications

TL;DR: In this paper, the Moore of the Moore-Penrose Inverse is described as a generalized inverse of a linear operator between Hilbert spaces, and a spectral theory for rectangular matrices is proposed.
Related Papers (5)
Trending Questions (1)
Who was the producer for the Matrix?

However, our two algorithms are the first polynomial time algorithms for such low-rank matrix approximations that come with relative-error guarantees; previously, in some cases, it was not even known whether such matrix decompositions exist.