Linear time algorithms for clustering problems in any dimensions

doi:10.1007/11523468_111

Book ChapterDOI

Linear time algorithms for clustering problems in any dimensions

Amit Kumar, +2 more

- Vol. 3580, pp 1374-1385

Chats0

TLDR

This work generalizes the k-means algorithm and shows that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness), resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems.

Abstract:

We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems. These are the first algorithms for these problems linear in the size of the input (nd for n points in d dimensions), independent of dimensions in the exponent, assuming k and e to be fixed. A key ingredient of the k-median result is a (1+e)-approximation algorithm for the 1-median problem which has running time O(2(1/e)O(1)d). The previous best known algorithm for this problem had linear running time.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

On Coresets for $k$-Median and $k$-Means Clustering in Metric and Euclidean Spaces and Their Applications

Ke Chen

- 01 Sep 2009 -

SIAM Journal on Computing

TL;DR: These are the first streaming algorithms, for those problems, that have space complexity with polynomial dependency on the dimension, using $O(d^2k^2\varepsilon^{-2}\log^8n)$ space.

...read moreread less

Proceedings ArticleDOI

A PTAS for k-means clustering based on weak coresets

Dan Feldman, +2 more

TL;DR: Every unweighted point set P has a weak coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space R^d.

...read moreread less

Proceedings ArticleDOI

Smaller coresets for k-median and k-means clustering

Sariel Har-Peled, +1 more

TL;DR: It is shown that there exists a (k, ε)-coreset for k-median and k-means clustering of n points in Rd, which is of size independent of n.

...read moreread less

Journal ArticleDOI

Linear-time approximation schemes for clustering problems in any dimensions

Amit Kumar, +2 more

- 08 Feb 2010 -

Journal of the ACM

TL;DR: This work presents a general approach for designing approximation algorithms for a fundamental class of geometric clustering problems in arbitrary dimensions and leads to simple randomized algorithms for the k-means, median and discrete problems.

...read moreread less

Journal ArticleDOI

Smaller Coresets for k-Median and k-Means Clustering

Sariel Har-Peled, +1 more

- 01 Jan 2007 -

Discrete and Computational Geometry

TL;DR: It is shown that there exists a $(k,\varepsilon)-coreset for k-median and k-means clustering of n points in ${\cal R}^d,$ which is of size independent of n.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Pattern Classification

Peter E. Hart, +2 more

Journal ArticleDOI

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990 -

Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Pattern Classification (2nd ed.)

Richard O. Duda, +2 more

Journal ArticleDOI

Color indexing

Michael J. Swain, +1 more

- 01 Nov 1991 -

International Journal of Computer Vision

TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.

...read moreread less

Journal ArticleDOI

Syntactic clustering of the Web

Andrei Z. Broder, +3 more

TL;DR: An efficient way to determine the syntactic similarity of files is developed and applied to every document on the World Wide Web, and a clustering of all the documents that are syntactically similar is built.

...read moreread less

Linear time algorithms for clustering problems in any dimensions

Citations

On Coresets for $k$-Median and $k$-Means Clustering in Metric and Euclidean Spaces and Their Applications

A PTAS for k-means clustering based on weak coresets

Smaller coresets for k-median and k-means clustering

Linear-time approximation schemes for clustering problems in any dimensions

Smaller Coresets for k-Median and k-Means Clustering

References

Pattern Classification

Indexing by Latent Semantic Analysis

Pattern Classification (2nd ed.)

Color indexing

Syntactic clustering of the Web

Related Papers (5)

A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions

On coresets for k-means and k-median clustering

Approximate clustering via core-sets

Application of weighted Voronoi diagrams and randomization to variance-based k-clustering

Approximation schemes for Euclidean k-medians and related problems