scispace - formally typeset
Open AccessProceedings ArticleDOI

Better Guarantees for k-Means and Euclidean k-Median by Primal-Dual Algorithms

TLDR
A new primal-dual approach is presented that allows to exploit the geometric structure of k-means and to satisfy the hard constraint that at most k clusters are selected without deteriorating the approximation guarantee.
Abstract
Clustering is a classic topic in optimization with k-means being one of the most fundamental such problems. In the absence of any restrictions on the input, the best known algorithm for k-means with a provable guarantee is a simple local search heuristic yielding an approximation guarantee of 9+≥ilon, a ratio that is known to be tight with respect to such methods.We overcome this barrier by presenting a new primal-dual approach that allows us to (1) exploit the geometric structure of k-means and (2) to satisfy the hard constraint that at most k clusters are selected without deteriorating the approximation guarantee. Our main result is a 6.357-approximation algorithm with respect to the standard LP relaxation. Our techniques are quite general and we also show improved guarantees for the general version of k-means where the underlying metric is not required to be Euclidean and for k-median in Euclidean metrics.

read more

Citations
More filters
Book ChapterDOI

Fair Coresets and Streaming Algorithms for Fair k-means

TL;DR: In this paper, the authors proposed a streaming PTAS for fair k-means in the case of two colors (and exact balances), which leads to a constant factor algorithm in the streaming model when combined with the coreset.
Proceedings ArticleDOI

Constant approximation for k-median and k-means with outliers via iterative rounding

TL;DR: For k-means with outliers, Chen et al. as discussed by the authors gave an O(1)-approximation algorithm for matroid and knapsack median problems, which is the best known approximation algorithm for k-median with outlier.
Journal ArticleDOI

Spectral rotation for deep one-step clustering

TL;DR: A deep spectral clustering method which embeds four parts in a unified framework with the following advantages, and develops a two-task deep clustering model with linear activation functions to output effective clustering result.
Proceedings ArticleDOI

Socially Fair k-Means Clustering

TL;DR: It is found that on benchmark datasets, Fair-Lloyd exhibits unbiased performance by ensuring that all groups have equal costs in the output k-clustering, while incurring a negligible increase in running time, thus making it a viable fair option wherever k-means is currently used.
Proceedings Article

(Individual) Fairness for k-Clustering

TL;DR: The $k-median ($k-means) cost of the solution is within a constant factor of the cost of an optimal fair $k$-clustering, and the solution approximately satisfies the fairness condition.
References
More filters
Journal ArticleDOI

Least squares quantization in PCM

TL;DR: In this article, the authors derived necessary conditions for any finite number of quanta and associated quantization intervals of an optimum finite quantization scheme to achieve minimum average quantization noise power.

Least Squares Quantization in PCM

TL;DR: The corresponding result for any finite number of quanta is derived; that is, necessary conditions are found that the quanta and associated quantization intervals of an optimum finite quantization scheme must satisfy.
Proceedings ArticleDOI

k-means++: the advantages of careful seeding

TL;DR: By augmenting k-means with a very simple, randomized seeding technique, this work obtains an algorithm that is Θ(logk)-competitive with the optimal clustering.
Book

Approximation Algorithms

TL;DR: Covering the basic techniques used in the latest research work, the author consolidates progress made so far, including some very recent and promising results, and conveys the beauty and excitement of work in the field.
Book ChapterDOI

Data Clustering: 50 Years Beyond K-means

TL;DR: Cluster analysis as mentioned in this paper is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics, which is one of the most fundamental modes of understanding and learning.
Related Papers (5)