scispace - formally typeset
Open AccessPosted Content

Faster Projective Clustering Approximation of Big Data.

TLDR
This work suggests to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of lines clustering in $O(ndm)$ time, and proves that for a sufficiently large $m$ the authors obtain a coreset for projective clustering.
Abstract
In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible $S$ the sum of these distances is approximated up to a factor of $(1+\eps)$. We suggest to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of $m$ lines clustering in $O(ndm)$ time, compared to the existing $\exp(m)$ solution. We then project the points on these lines and prove that for a sufficiently large $m$ we obtain a coreset for projective clustering. Our algorithm also generalize to handle outliers. Experimental results and open code are also provided.

read more

References
More filters
Proceedings ArticleDOI

Coresets and sketches for high dimensional subspace approximation problems

TL;DR: The results of [7] for approximate linear regression, distances to subspace approximation, and optimal rank-j approximation are extended to error measures other than the Frobenius norm and it is shown that bounded precision can lead to further improvements.
Proceedings ArticleDOI

A fast and efficient algorithm for low-rank approximation of a matrix

TL;DR: A fast and efficient algorithm which at first pre-processes matrix A in order to spread out information (energy) of every columns of A, then randomly selects some of its columns (or rows) and generates a rank-k approximation from the row space of these selected sets.
Posted Content

Optimal approximate matrix product in terms of stable rank

TL;DR: In this article, the spectral norm guarantee for approximate matrix multiplication with a dimensionality-reducing map having O(m = O(tilde{r}/\varepsilon^2) rows was shown.
Proceedings ArticleDOI

Efficient subspace approximation algorithms

TL;DR: A randomized algorithm takes as input P, k, and a parameter 0<ε<1, and returns a k-subspace that with probability at least 1/2 has a fit that is at most (1+ε) times that of the optimal k- subspace.
Posted Content

Dimensionality Reduction of Massive Sparse Datasets Using Coresets

TL;DR: A new framework for deterministic coreset constructions based on a reduction to the problem of counting items in a stream is presented, using coresets to compute a non-trivial approximation to the PCA of very large but sparse databases such as the Wikipedia document-term matrix.
Related Papers (5)