A simple linear time (1 + /spl epsiv/)-approximation algorithm for k-means clustering in any dimensions
read more
Citations
SLIC Superpixels Compared to State-of-the-Art Superpixel Methods
k-means++: the advantages of careful seeding
Scalable k-means++
Fast approximate spectral clustering
The Planar k-Means Problem is NP-Hard
References
Indexing by Latent Semantic Analysis
Color indexing
Syntactic clustering of the Web
Related Papers (5)
Frequently Asked Questions (12)
Q2. What have the authors stated for future works in "A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions" ?
An interesting direction for further research is to extend their methods for other clustering problems.
Q3. What is the widely studied problem in computer science?
The problem of clustering a group of data items into similar groups is one of the most widely studied problems in computer science.
Q4. What are some of the applications of clustering?
Clustering has applications in a variety of areas, for example, data mining, information retrieval, image processing, and web search ([5, 7, 14, 9]).
Q5. How can the authors approximate the cost of the optimal k-means clustering?
Using the notion of balanced clusters in conjunction with Lemma 2.2, by eliminating at most (1+µ)γ|P | outliers, the authors can approximate the cost of the optimal k-means clustering with at most γ|P | outliers.
Q6. What is the problem to get a polynomial time?
it is an open problem to get a polynomial time (1 + ε)approximation algorithm for the k-means clustering problem when n, k and d are not constants.
Q7. What is the description of the kmeans problem?
some work has been devoted to finding (1 + ε)-approximation algorithms for the kmeans problem, where ε can be an arbitrarily small constant.
Q8. Why is it necessary to compute the 2-means solution in iteration i?
This is needed because when the authors find a candidate c′2 in iteration i + 1, the authors need to compute the 2-means solution when all points in P −Q′i are assigned to c′1 and the points in Q ′ i are assigned to the nearer of c′1 and c ′ 2.
Q9. What is the way to get a good approximation to the optimal 1-me?
if the authors choose m as 2ε , then with probability at least 1/2, the authors get a (1 + ε)-approximation to ∆1(P ) by taking the center as the centroid of T .
Q10. What is the common definition of clustering?
Most of these definitions begin by defining a notion of distance between two data items and then try to form clusters so that data items with small distance between them get clustered together.
Q11. What is the cost of assigning points to the centers in C?
Algorithm Irred-k-means(Q, m, k, C, α, Sum) Inputs Q: Remaining point setm: number of cluster centers yet to be found k: total number of clusters C: set of k − m cluster centers found so far α: approximation factor Sum: the cost of assigning pointsin P − Q to the centers in C Output The clustering of the points in Q in k clusters.
Q12. What is the kmeans cost of a set of k points?
Given a set of k points K , which the authors also denote as centers, define the k−means cost of P with respect to K , ∆(P, K), as∆(P, K) = ∑p∈Pd(p, K)2,where d(p, K) denotes the distance between p and the closest point to p in K .