NP-hardness of Euclidean sum-of-squares clustering
TLDR
A recent proof of NP-hardness of Euclidean sum-of-squares clustering, due to Drineas et al. (Mach. 56:9–33, 2004), is not valid and an alternate short proof is provided.Abstract:
A recent proof of NP-hardness of Euclidean sum-of-squares clustering, due to Drineas et al. (Mach. Learn. 56:9---33, 2004), is not valid. An alternate short proof is provided.read more
Citations
More filters
Journal ArticleDOI
A comparative study of efficient initialization methods for the k-means clustering algorithm
TL;DR: It is demonstrated that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods, and eight commonly used linear time complexity initialization methods are compared.
Journal ArticleDOI
Scalable k-means++
TL;DR: In this article, the authors show how to reduce the number of passes needed to obtain, in parallel, a good initialization of k-means++ in both sequential and parallel settings.
Book ChapterDOI
The Planar k-Means Problem is NP-Hard
TL;DR: It is shown that this well-known problem is NP-hard even for instances in the plane, answering an open question posed by Dasgupta [6].
Posted Content
Scalable K-Means++
TL;DR: It is proved that the proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and Experimental evaluation on real-world large-scale data demonstrates that k-Means|| outperforms k- means++ in both sequential and parallel settings.
Proceedings ArticleDOI
k-Shape: Efficient and Accurate Clustering of Time Series
John Paparrizos,Luis Gravano +1 more
TL;DR: K-Shape as discussed by the authors uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them, and develops a method to compute cluster centroids, which are used in every iteration to update the assignment of the time series to clusters.
References
More filters
Some methods for classification and analysis of multivariate observations
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Proceedings ArticleDOI
k-means++: the advantages of careful seeding
TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
Journal ArticleDOI
Cluster analysis and mathematical programming
Pierre Hansen,Brigitte Jaumard +1 more
TL;DR: In this article, a survey of clustering from a mathematical programming viewpoint is presented, focusing on solution methods, i.e., dynamic programming, graph theoretical algorithms, branch-and-bound, cutting planes, column generation and heuristics.
Journal ArticleDOI
Clustering Large Graphs via the Singular Value Decomposition
TL;DR: This paper considers the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters, and considers a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points, and argues that the relaxation provides a generalized clustering which is useful in its own right.
Journal ArticleDOI
Online clustering of parallel data streams
Jürgen Beringer,Eyke Hüllermeier +1 more
TL;DR: This paper develops an efficient online version of the classical K-means clustering algorithm, mainly due to a scalable online transformation of the original data which allows for a fast computation of approximate distances between streams.