scispace - formally typeset
Open AccessJournal ArticleDOI

NP-hardness of Euclidean sum-of-squares clustering

TLDR
A recent proof of NP-hardness of Euclidean sum-of-squares clustering, due to Drineas et al. (Mach. 56:9–33, 2004), is not valid and an alternate short proof is provided.
Abstract
A recent proof of NP-hardness of Euclidean sum-of-squares clustering, due to Drineas et al. (Mach. Learn. 56:9---33, 2004), is not valid. An alternate short proof is provided.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

A comparative study of efficient initialization methods for the k-means clustering algorithm

TL;DR: It is demonstrated that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods, and eight commonly used linear time complexity initialization methods are compared.
Journal ArticleDOI

Scalable k-means++

TL;DR: In this article, the authors show how to reduce the number of passes needed to obtain, in parallel, a good initialization of k-means++ in both sequential and parallel settings.
Book ChapterDOI

The Planar k-Means Problem is NP-Hard

TL;DR: It is shown that this well-known problem is NP-hard even for instances in the plane, answering an open question posed by Dasgupta [6].
Posted Content

Scalable K-Means++

TL;DR: It is proved that the proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and Experimental evaluation on real-world large-scale data demonstrates that k-Means|| outperforms k- means++ in both sequential and parallel settings.
Proceedings ArticleDOI

k-Shape: Efficient and Accurate Clustering of Time Series

TL;DR: K-Shape as discussed by the authors uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them, and develops a method to compute cluster centroids, which are used in every iteration to update the assignment of the time series to clusters.
References
More filters

Some methods for classification and analysis of multivariate observations

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Proceedings ArticleDOI

k-means++: the advantages of careful seeding

TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
Journal ArticleDOI

Cluster analysis and mathematical programming

TL;DR: In this article, a survey of clustering from a mathematical programming viewpoint is presented, focusing on solution methods, i.e., dynamic programming, graph theoretical algorithms, branch-and-bound, cutting planes, column generation and heuristics.
Journal ArticleDOI

Clustering Large Graphs via the Singular Value Decomposition

TL;DR: This paper considers the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters, and considers a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points, and argues that the relaxation provides a generalized clustering which is useful in its own right.
Journal ArticleDOI

Online clustering of parallel data streams

TL;DR: This paper develops an efficient online version of the classical K-means clustering algorithm, mainly due to a scalable online transformation of the original data which allows for a fast computation of approximate distances between streams.