scispace - formally typeset
Open AccessPosted Content

Faster Projective Clustering Approximation of Big Data.

TLDR
This work suggests to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of lines clustering in $O(ndm)$ time, and proves that for a sufficiently large $m$ the authors obtain a coreset for projective clustering.
Abstract
In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible $S$ the sum of these distances is approximated up to a factor of $(1+\eps)$. We suggest to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of $m$ lines clustering in $O(ndm)$ time, compared to the existing $\exp(m)$ solution. We then project the points on these lines and prove that for a sufficiently large $m$ we obtain a coreset for projective clustering. Our algorithm also generalize to handle outliers. Experimental results and open code are also provided.

read more