The Challenges of Clustering High Dimensional Data

doi:10.1007/978-3-662-08968-2_16

Open AccessBook ChapterDOI

The Challenges of Clustering High Dimensional Data

Michael Steinbach, +2 more

- pp 273-309

Chats0

TLDR

This chapter provides a short introduction to cluster analysis, and presents a brief overview of several recent techniques, including a more detailed description of recent work of recent which uses a concept-based clustering approach.

Abstract:

Cluster analysis divides data into groups (clusters) for the purposes of summarization or improved understanding. For example, cluster analysis has been used to group related documents for browsing, to find genes and proteins that have similar functionality, or as a means of data compression. While clustering has a long history and a large number of clustering techniques have been developed in statistics, pattern recognition, data mining, and other fields, significant challenges still remain. In this chapter we provide a short introduction to cluster analysis, and then focus on the challenge of clustering high dimensional data. We present a brief overview of several recent techniques, including a more detailed description of recent work of our own which uses a concept-based clustering approach.

Citations

PDF

Open Access

More filters

Journal Article

When is nearest neighbor meaningful

Kevin S. Beyer, +3 more

- 01 Jan 1999 -

Lecture Notes in Computer Science

TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.

...read moreread less

Proceedings Article

Unsupervised deep embedding for clustering analysis

Junyuan Xie, +2 more

TL;DR: Deep Embedded Clustering (DEC) as discussed by the authors learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective.

...read moreread less

Journal Article

A Brief Survey of Text Mining.

Andreas Hotho, +2 more

- 01 Jan 2005 -

Ldv Forum

TL;DR: The main analysis tasks preprocessing, classification, clustering, information extraction and visualization are described and a number of successful applications of text mining are discussed.

...read moreread less

Journal ArticleDOI

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

Liping Jing, +2 more

- 01 Aug 2007 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces that can generate better clustering results than other subspace clustering algorithms and is also scalable to large data sets.

...read moreread less

Journal ArticleDOI

K-means properties on six clustering benchmark datasets

Pasi Fränti, +1 more

- 01 Dec 2018 -

Applied Intelligence

TL;DR: The results show that overlap is critical, and that k-means starts to work effectively when the overlap reaches 4% level.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Introduction to Algorithms

Thomas H. Cormen, +2 more

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.

...read moreread less

Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Martin Ester, +3 more

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.

...read moreread less

Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Martin Ester, +3 more

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

Journal ArticleDOI

Data clustering: a review

Anil K. Jain, +2 more

- 01 Sep 1999 -

ACM Computing Surveys

TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.

...read moreread less

Proceedings Article

Fast algorithms for mining association rules

Rakesh Agrawal, +1 more

TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.

...read moreread less

Collapse

The Challenges of Clustering High Dimensional Data

Citations

When is nearest neighbor meaningful

Unsupervised deep embedding for clustering analysis

A Brief Survey of Text Mining.

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

K-means properties on six clustering benchmark datasets

References

Introduction to Algorithms

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Data clustering: a review

Fast algorithms for mining association rules

Related Papers (5)

Some methods for classification and analysis of multivariate observations

Data clustering: a review

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Visualizing Data using t-SNE