scispace - formally typeset
S

Sudipto Guha

Researcher at University of Pennsylvania

Publications -  177
Citations -  21513

Sudipto Guha is an academic researcher from University of Pennsylvania. The author has contributed to research in topics: Approximation algorithm & Cluster analysis. The author has an hindex of 61, co-authored 177 publications receiving 20547 citations. Previous affiliations of Sudipto Guha include Amazon.com & AT&T.

Papers
More filters
Proceedings ArticleDOI

CURE: an efficient clustering algorithm for large databases

TL;DR: This work proposes a new clustering algorithm called CURE that is more robust to outliers, and identifies clusters having non-spherical shapes and wide variances in size, and demonstrates that random sampling and partitioning enable CURE to not only outperform existing algorithms but also to scale well for large databases without sacrificing clustering quality.
Journal ArticleDOI

ROCK: a robust clustering algorithm for categorical attributes

TL;DR: This paper develops a robust hierarchical clustering algorithm ROCK that employs links and not distances when merging clusters, and indicates that ROCK not only generates better quality clusters than traditional algorithms, but it also exhibits good scalability properties.
Proceedings ArticleDOI

ROCK: a robust clustering algorithm for categorical attributes

TL;DR: This work develops a robust hierarchical clustering algorithm, ROCK, that employs links and not distances when merging clusters, and shows that ROCK not only generates better quality clusters than traditional algorithms, but also exhibits good scalability properties.
Journal ArticleDOI

Clustering data streams: Theory and practice

TL;DR: This work describes a streaming algorithm that effectively clusters large data streams and provides empirical evidence of the algorithm's performance on synthetic and real data streams.
Proceedings ArticleDOI

Clustering data streams

TL;DR: This work gives constant-factor approximation algorithms for the k-median problem in the data stream model of computation in a single pass, and shows negative results implying that these algorithms cannot be improved in a certain sense.