NP-hardness of Euclidean sum-of-squares clustering

doi:10.1007/S10994-009-5103-0

Open AccessJournal ArticleDOI

NP-hardness of Euclidean sum-of-squares clustering

Daniel Aloise, +3 more

- 01 May 2009 -

Machine Learning

- Vol. 75, Iss: 2, pp 245-248

TLDR

A recent proof of NP-hardness of Euclidean sum-of-squares clustering, due to Drineas et al. (Mach. 56:9–33, 2004), is not valid and an alternate short proof is provided.

Abstract:

A recent proof of NP-hardness of Euclidean sum-of-squares clustering, due to Drineas et al. (Mach. Learn. 56:9---33, 2004), is not valid. An alternate short proof is provided.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A comparative study of efficient initialization methods for the k-means clustering algorithm

M. Emre Celebi, +2 more

- 01 Jan 2013 -

Expert Systems With Applications

TL;DR: It is demonstrated that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods, and eight commonly used linear time complexity initialization methods are compared.

...read moreread less

Journal ArticleDOI

Scalable k-means++

Bahman Bahmani, +4 more

TL;DR: In this article, the authors show how to reduce the number of passes needed to obtain, in parallel, a good initialization of k-means++ in both sequential and parallel settings.

...read moreread less

Book ChapterDOI

The Planar k-Means Problem is NP-Hard

Meena Mahajan, +2 more

TL;DR: It is shown that this well-known problem is NP-hard even for instances in the plane, answering an open question posed by Dasgupta [6].

...read moreread less

Posted Content

Scalable K-Means++

Bahman Bahmani, +4 more

- 29 Mar 2012 -

arXiv: Databases

TL;DR: It is proved that the proposed initialization algorithm k-means|| obtains a nearly optimal solution after a logarithmic number of passes, and Experimental evaluation on real-world large-scale data demonstrates that k-Means|| outperforms k- means++ in both sequential and parallel settings.

...read moreread less

Proceedings ArticleDOI

k-Shape: Efficient and Accurate Clustering of Time Series

John Paparrizos, +1 more

TL;DR: K-Shape as discussed by the authors uses a normalized version of the cross-correlation measure in order to consider the shapes of time series while comparing them, and develops a method to compute cluster centroids, which are used in every iteration to update the assignment of the time series to clusters.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Some methods for classification and analysis of multivariate observations

James B. MacQueen

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.

...read moreread less

Proceedings ArticleDOI

k-means++: the advantages of careful seeding

David Arthur, +1 more

TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.

...read moreread less

Journal ArticleDOI

Cluster analysis and mathematical programming

Pierre Hansen, +1 more

- 01 Oct 1997 -

Mathematical Programming

TL;DR: In this article, a survey of clustering from a mathematical programming viewpoint is presented, focusing on solution methods, i.e., dynamic programming, graph theoretical algorithms, branch-and-bound, cutting planes, column generation and heuristics.

...read moreread less

Journal ArticleDOI

Clustering Large Graphs via the Singular Value Decomposition

Petros Drineas, +4 more

- 25 Jun 2004 -

Machine Learning

TL;DR: This paper considers the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters, and considers a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points, and argues that the relaxation provides a generalized clustering which is useful in its own right.

...read moreread less

Journal ArticleDOI

Online clustering of parallel data streams

Jürgen Beringer, +1 more

TL;DR: This paper develops an efficient online version of the classical K-means clustering algorithm, mainly due to a scalable online transformation of the original data which allows for a fast computation of approximate distances between streams.

...read moreread less

NP-hardness of Euclidean sum-of-squares clustering

Citations

A comparative study of efficient initialization methods for the k-means clustering algorithm

Scalable k-means++

The Planar k-Means Problem is NP-Hard

Scalable K-Means++

k-Shape: Efficient and Accurate Clustering of Time Series

References

Some methods for classification and analysis of multivariate observations

k-means++: the advantages of careful seeding

Cluster analysis and mathematical programming

Clustering Large Graphs via the Singular Value Decomposition

Online clustering of parallel data streams

Related Papers (5)

Least squares quantization in PCM

k-means++: the advantages of careful seeding

Some methods for classification and analysis of multivariate observations

Data clustering: 50 years beyond K-means

Data clustering: a review