scispace - formally typeset
Journal ArticleDOI

Polynomial-time approximation schemes for geometric min-sum median clustering

TLDR
The setransformations are used to solve NP-hard clustering problems in the cube as well as in geometric settings, and it is shown that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube.
Abstract
The Johnson--Lindenstrauss lemma states that n points in a high-dimensional Hilbert space can be embedded with small distortion of the distances into an O(log n) dimensional space by applying a random linear transformation. We show that similar (though weaker) properties hold for certain random linear transformations over the Hamming cube. We use these transformations to solve NP-hard clustering problems in the cube as well as in geometric settings.More specifically, we address the following clustering problem. Given n points in a larger set (e.g., ℝd) endowed with a distance function (e.g., L2 distance), we would like to partition the data set into k disjoint clusters, each with a "cluster center," so as to minimize the sum over all data points of the distance between the point and the center of the cluster containing the point. The problem is provably NP-hard in some high-dimensional geometric settings, even for k = 2. We give polynomial-time approximation schemes for this problem in several settings, including the binary cube {0,1}d with Hamming distance, and ℝd either with L1 distance, or with L2 distance, or with the square of L2 distance. In all these settings, the best previous results were constant factor approximation guarantees.We note that our problem is similar in flavor to the k-median problem (and the related facility location problem), which has been considered in graph-theoretic and fixed dimensional geometric settings, where it becomes hard when k is part of the input. In contrast, we study the problem when k is fixed, but the dimension is part of the input.

read more

Citations
More filters
Journal ArticleDOI

Clustering Large Graphs via the Singular Value Decomposition

TL;DR: This paper considers the problem of partitioning a set of m points in the n-dimensional Euclidean space into k clusters, and considers a continuous relaxation of this discrete problem: find the k-dimensional subspace V that minimizes the sum of squared distances to V of the m points, and argues that the relaxation provides a generalized clustering which is useful in its own right.
Journal ArticleDOI

The effectiveness of lloyd-type methods for the k-means problem

TL;DR: This work investigates variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and proposes and justifies a clusterability criterion for data sets.
Proceedings ArticleDOI

The Effectiveness of Lloyd-Type Methods for the k-Means Problem

TL;DR: This work investigates variants of Lloyd's heuristic for clustering high dimensional data in an attempt to explain its popularity (a half century after its introduction) among practitioners, and proposes and justifies a clusterability criterion for data sets.
Proceedings ArticleDOI

Matrix approximation and projective clustering via volume sampling

TL;DR: This paper proves that the additive error drops exponentially by iterating the sampling in an adaptive manner, and gives a pass-efficient algorithm for computing low-rank approximation with reduced additive error.

Sensor Network Optimization Using a Genetic Algorithm

TL;DR: By clustering a sensor network into a number of independent clusters using a GA, this approach can greatly minimize the total communication distance, thus prolonging the network lifetime and solving the shortest distance optimization problem.
References
More filters
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Journal ArticleDOI

Authoritative sources in a hyperlinked environment

TL;DR: This work proposes and test an algorithmic formulation of the notion of authority, based on the relationship between a set of relevant authoritative pages and the set of “hub pages” that join them together in the link structure, and has connections to the eigenvectors of certain matrices associated with the link graph.
Book

The Probabilistic Method

Joel Spencer
TL;DR: A particular set of problems - all dealing with “good” colorings of an underlying set of points relative to a given family of sets - is explored.
Proceedings ArticleDOI

Approximate nearest neighbors: towards removing the curse of dimensionality

TL;DR: In this paper, the authors present two algorithms for the approximate nearest neighbor problem in high-dimensional spaces, for data sets of size n living in R d, which require space that is only polynomial in n and d.
Related Papers (5)