Low-distortion subspace embeddings in input-sparsity time and applications to robust linear regression
read more
Citations
A Theoretical Analysis of Noisy Sparse Subspace Clustering on Dimensionality-Reduced Data
Toward a Unified Theory of Sparse Dimensionality Reduction in Euclidean Space
Learning-Based Low-Rank Approximations.
Towards a Zero-One Law for Column Subset Selection
Towards more efficient SPSD matrix approximation and CUR matrix decomposition
References
A Method for Simulating Stable Random Variables
Improved Approximation Algorithms for Large Matrices via Random Projections
CUR matrix decompositions for improved data analysis
Randomized algorithms for matrices and data
Related Papers (5)
Frequently Asked Questions (16)
Q2. What is the common parameterized family of regression problems?
A parameterized family of regression problems that is of particular interest is the overconstrained `p regression problem: given a matrix A ∈ Rn×d, with n > d, a vector b ∈
Q3. What is the simplest way to construct a subspace-preserving sampling?
Given R ∈ Rd×d such that AR−1 is well-conditioned in the `p norm, the authors can construct a (1 ± )-distortion embedding, specifically a subspace-preserving sampling, of Ap in O(nnz(A) · logn) additional time and with a constant probability.
Q4. What is the way to solve the p regression problem?
Given an `p regression problem specified by A ∈ Rn×d, b ∈ Rn, and p ∈ [1,∞), let S be a (1± )- distortion embedding matrix of the subspace spanned by A’s columns and b from Lemma 3, and let x̂ be an optimal solution to the subsampled problem minx∈Rd ‖SAx−Sb‖p.
Q5. What is the simplest way to embed a subspace?
The authors are interested in fast embedding of Ap into a d-dimensional subspace of (Rpoly(d), ‖ · ‖p), with distortion either poly(d) or (1± ), for some > 0, as well as applications of this embedding to problems such as `p regression.
Q6. What is the embedding dimension for p?
The (1 ± )-distortion subspace embedding (for `p, p ∈ [1, 2), that the authors construct from the input-sparsity time embedding and the fast subspace-preserving sampling) has embedding dimension s = O(poly(d) log(1/ )/ 2), where the somewhat large poly(d) term directly multiplies the log(1/ )/ 2 term.
Q7. What is the simplest way to compute a subspace-preserving sampling?
Given a matrix A ∈ Rn×d, p ∈ [1,∞), > 0, and a matrix R ∈ Rd×d such that AR−1 is wellconditioned, it takes O(nnz(A) · logn) time to compute a sampling matrix S ∈ Rs×n (with only one nonzero element per row) with s = O(κ̄pp(AR−1)
Q8. What is the embedding dimension in the two theorems?
In Theorem 2 and Theorem 4, the embedding dimension is s = O(poly(d) log(1/ )/ 2), where the poly(d) term is a somewhat large polynomial of d that directly multiplies the log(1/ )/ 2 term.
Q9. What is the way to solve the 1 regression problem?
In addition, the authors can use it to compute a (1 + )-approximation to the `1 regression problem in O(nnz(A) · logn+ poly(d/ )) time, which in turn leads to immediate improvements in `1-based matrix approximation objectives, e.g., for the `1 subspace approximation problem [6, 29, 10].
Q10. How can the authors use sparse embeddings to solve 2 regression problems?
The authors also show that, by coupling with recent work on fast subspace-preserving sampling from [10], these embeddings can be used to provide (1+ )-approximate solutions toements of the projection matrix onto the span of A. See [20, 15] for details; and note that they can be generalized to `1 and other `p norms [10] as well as to arbitrary n×d matrices, with both n and d large [21, 15].`p regression problems, for p ∈ [1, 2], in nearly input-sparsity time.
Q11. How did Clarkson and Woodruff achieve their improved results for 2-based problems?
Clarkson and Woodruff achieve their improved results for `2-based problems by showing how to construct such a Π with s = poly(d/ ) and showing that it can be applied to an arbitrary A in O(nnz(A)) time [11].
Q12. How long does it take to compute aj?
without any prior knowledge, the authors have to scan at least a constant portion of the input to guarantee that aj is observed with a constant probability, which takes O(nnz(A)) time.
Q13. What is the definition of a distribution D over R?
Definition 4. A distribution D over R is called p-stable, if for any m real numbers a1, . . . , am, the authors havem∑ i=1 aiXi ' ( m∑ i=1 |ai|p )1/p X,where Xi iid∼ D and X ∼ D. By “X ' Y ”, the authors mean X and Y have the same distribution.
Q14. What is the author's reaction to the first version of this paper?
The authors want to thank P. Drineas for reading the first version of this paper and pointing out that the embedding dimension in Theorem 1 can be easily improved from O(d4/ 2) to O(d2/ 2) using the same technique.
Q15. What is the simplest way to compute a 1+1 solution to an p?
Given a subspace-preserving sampling algorithm, Clarkson et al. [10, Theorem 5.4] show it is straightforward to compute a 1+1− -approximate solution to an `p regression problem.
Q16. What is the proof for 2 subspace?
Although their simpler direct proof leads to a better result for `2 subspace embedding, the technique used in the proof of Clarkson and Woodruff [11], which splits coordinates into “heavy” and “light” sets based on the leverage scores, highlights an important structural property of `2 subspace: that only a small subset of coordinates can have large `2 leverage scores.