Open AccessPosted Content
Faster Projective Clustering Approximation of Big Data.
TLDR
This work suggests to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of lines clustering in $O(ndm)$ time, and proves that for a sufficiently large $m$ the authors obtain a coreset for projective clustering.Abstract:
In projective clustering we are given a set of n points in $R^d$ and wish to cluster them to a set $S$ of $k$ linear subspaces in $R^d$ according to some given distance function. An $\eps$-coreset for this problem is a weighted (scaled) subset of the input points such that for every such possible $S$ the sum of these distances is approximated up to a factor of $(1+\eps)$. We suggest to reduce the size of existing coresets by suggesting the first $O(\log(m))$ approximation for the case of $m$ lines clustering in $O(ndm)$ time, compared to the existing $\exp(m)$ solution. We then project the points on these lines and prove that for a sufficiently large $m$ we obtain a coreset for projective clustering. Our algorithm also generalize to handle outliers. Experimental results and open code are also provided.read more
References
More filters
Posted Content
On the k-Means/Median Cost Function.
Anup Bhattacharya,Ragesh Jaiswal +1 more
TL;DR: The techniques generalize for the metric $k$-median problem in arbitrary metrics and give bounds in terms of the doubling dimension of the metric.