Faster Projective Clustering Approximation of Big Data.

Open AccessPosted Content

Faster Projective Clustering Approximation of Big Data.

- 26 Nov 2020 -

TLDR

This work suggests to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of lines clustering in $O(ndm)$ time, and proves that for a sufficiently large $m$ the authors obtain a coreset for projective clustering.

Abstract:

In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible $S$ the sum of these distances is approximated up to a factor of $(1+\eps)$. We suggest to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of $m$ lines clustering in $O(ndm)$ time, compared to the existing $\exp(m)$ solution. We then project the points on these lines and prove that for a sufficiently large $m$ we obtain a coreset for projective clustering. Our algorithm also generalize to handle outliers. Experimental results and open code are also provided.

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Coresets and sketches for high dimensional subspace approximation problems

Dan Feldman, +3 more

TL;DR: The results of [7] for approximate linear regression, distances to subspace approximation, and optimal rank-j approximation are extended to error measures other than the Frobenius norm and it is shown that bounded precision can lead to further improvements.

...read moreread less

Proceedings ArticleDOI

A fast and efficient algorithm for low-rank approximation of a matrix

Nam H. Nguyen, +2 more

TL;DR: A fast and efficient algorithm which at first pre-processes matrix A in order to spread out information (energy) of every columns of A, then randomly selects some of its columns (or rows) and generates a rank-k approximation from the row space of these selected sets.

...read moreread less

Posted Content

Optimal approximate matrix product in terms of stable rank

Michael B. Cohen, +2 more

- 08 Jul 2015 -

arXiv: Data Structures and Algorithms

TL;DR: In this article, the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map having O(m = O(tilde{r}/\varepsilon^2) rows was shown.

...read moreread less

Proceedings ArticleDOI

Efficient subspace approximation algorithms

Nariankadu D. Shyamalkumar, +1 more

TL;DR: A randomized algorithm takes as input P, k, and a parameter 0<ε<1, and returns a k-subspace that with probability at least 1/2 has a fit that is at most (1+ε) times that of the optimal k- subspace.

...read moreread less

Posted Content

Dimensionality Reduction of Massive Sparse Datasets Using Coresets

Dan Feldman, +2 more

- 05 Mar 2015 -

arXiv: Data Structures and Algorithms

TL;DR: A new framework for deterministic coreset constructions based on a reduction to the problem of counting items in a stream is presented, using coresets to compute a non-trivial approximation to the PCA of very large but sparse databases such as the Wikipedia document-term matrix.

...read moreread less

Collapse

Faster Projective Clustering Approximation of Big Data.

References

Coresets and sketches for high dimensional subspace approximation problems

A fast and efficient algorithm for low-rank approximation of a matrix

Optimal approximate matrix product in terms of stable rank

Efficient subspace approximation algorithms

Dimensionality Reduction of Massive Sparse Datasets Using Coresets

Related Papers (5)

Epsilon-Coresets for Clustering (with Outliers) in Doubling Metrics

Tight Sensitivity Bounds For Smaller Coresets

Small space representations for metric min-sum k-clustering and their applications

Coresets for clustering in Euclidean spaces: importance sampling is nearly optimal

Minimum Coresets for Maxima Representation of Multidimensional Data