scispace - formally typeset
Book ChapterDOI

Linear time algorithms for clustering problems in any dimensions

Reads0
Chats0
TLDR
This work generalizes the k-means algorithm and shows that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness), resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems.
Abstract
We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2(k/e)O(1)dn) time (1+e)-approximation algorithms for these problems. These are the first algorithms for these problems linear in the size of the input (nd for n points in d dimensions), independent of dimensions in the exponent, assuming k and e to be fixed. A key ingredient of the k-median result is a (1+e)-approximation algorithm for the 1-median problem which has running time O(2(1/e)O(1)d). The previous best known algorithm for this problem had linear running time.

read more

Citations
More filters
Journal ArticleDOI

On Coresets for $k$-Median and $k$-Means Clustering in Metric and Euclidean Spaces and Their Applications

TL;DR: These are the first streaming algorithms, for those problems, that have space complexity with polynomial dependency on the dimension, using $O(d^2k^2\varepsilon^{-2}\log^8n)$ space.
Proceedings ArticleDOI

A PTAS for k-means clustering based on weak coresets

TL;DR: Every unweighted point set P has a weak coreset of size Poly(k,1/ε) for the k-means clustering problem, i.e. its size is independent of the cardinality |P| of the point set and the dimension d of the Euclidean space Rd.
Proceedings ArticleDOI

Smaller coresets for k-median and k-means clustering

TL;DR: It is shown that there exists a (k, ε)-coreset for k-median and k-means clustering of n points in Rd, which is of size independent of n.
Journal ArticleDOI

Linear-time approximation schemes for clustering problems in any dimensions

TL;DR: This work presents a general approach for designing approximation algorithms for a fundamental class of geometric clustering problems in arbitrary dimensions and leads to simple randomized algorithms for the k-means, median and discrete problems.
Journal ArticleDOI

Smaller Coresets for k-Median and k-Means Clustering

TL;DR: It is shown that there exists a $(k,\varepsilon)-coreset for k-median and k-means clustering of n points in ${\cal R}^d,$ which is of size independent of n.
References
More filters
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Journal ArticleDOI

Color indexing

TL;DR: In this paper, color histograms of multicolored objects provide a robust, efficient cue for indexing into a large database of models, and they can differentiate among a large number of objects.
Journal ArticleDOI

Syntactic clustering of the Web

TL;DR: An efficient way to determine the syntactic similarity of files is developed and applied to every document on the World Wide Web, and a clustering of all the documents that are syntactically similar is built.