Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms
Sara Ahmadian,Ashkan Norouzi-Fard,Ola Svensson,Justin Ward +3 more
- pp 61-72
TLDR
A new primal-dual approach is presented that allows to exploit the geometric structure of k-means and to satisfy the hard constraint that at most k clusters are selected without deteriorating the approximation guarantee.Abstract:
Clustering is a classic topic in optimization with k-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best known algorithm for k-means with a provable guarantee is a simple local search heuristic yielding an approximation guarantee of 9+≥ilon, a ratio that is known to be tight with respect to such methods.We overcome this barrier by presenting a new primal-dual approach that allows us to (1) exploit the geometric structure of k-means and (2) to satisfy the hard constraint that at most k clusters are selected without deteriorating the approximation guarantee. Our main result is a 6.357-approximation algorithm with respect to the standard LP relaxation. Our techniques are quite general and we also show improved guarantees for the general version of k-means where the underlying metric is not required to be Euclidean and for k-median in Euclidean metrics.read more
Citations
More filters
Book ChapterDOI
Fair Coresets and Streaming Algorithms for Fair k-means
TL;DR: In this paper, the authors proposed a streaming PTAS for fair k-means in the case of two colors (and exact balances), which leads to a constant factor algorithm in the streaming model when combined with the coreset.
Proceedings ArticleDOI
Constant approximation for k-median and k-means with outliers via iterative rounding
TL;DR: For k-means with outliers, Chen et al. as discussed by the authors gave an O(1)-approximation algorithm for matroid and knapsack median problems, which is the best known approximation algorithm for k-median with outlier.
Journal ArticleDOI
Spectral rotation for deep one-step clustering
TL;DR: A deep spectral clustering method which embeds four parts in a unified framework with the following advantages, and develops a two-task deep clustering model with linear activation functions to output effective clustering result.
Proceedings ArticleDOI
Socially Fair k-Means Clustering
TL;DR: It is found that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.
Proceedings Article
(Individual) Fairness for k-Clustering
Sepideh Mahabadi,Ali Vakilian +1 more
TL;DR: The $k-median ($k-means) cost of the solution is within a constant factor of the cost of an optimal fair $k$-clustering, and the solution approximately satisfies the fairness condition.
References
More filters
Journal ArticleDOI
Least squares quantization in PCM
TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.
Least Squares Quantization in PCM
TL;DR: The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.
Proceedings ArticleDOI
k-means++: the advantages of careful seeding
TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
Book
Approximation Algorithms
TL;DR: Covering the basic techniques used in the latest research work, the author consolidates progress made so far, including some very recent and promising results, and conveys the beauty and excitement of work in the field.
Book ChapterDOI
Data Clustering: 50 Years Beyond K-means
TL;DR: Cluster analysis as mentioned in this paper is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics, which is one of the most fundamental modes of understanding and learning.