scispace - formally typeset
Open AccessBook ChapterDOI

The Challenges of Clustering High Dimensional Data

Michael Steinbach, +2 more
- pp 273-309
Reads0
Chats0
TLDR
This chapter provides a short introduction to cluster analysis, and presents a brief overview of several recent techniques, including a more detailed description of recent work of recent which uses a concept-based clustering approach.
Abstract
Cluster analysis divides data into groups (clusters) for the purposes of summarization or improved understanding. For example, cluster analysis has been used to group related documents for browsing, to find genes and proteins that have similar functionality, or as a means of data compression. While clustering has a long history and a large number of clustering techniques have been developed in statistics, pattern recognition, data mining, and other fields, significant challenges still remain. In this chapter we provide a short introduction to cluster analysis, and then focus on the challenge of clustering high dimensional data. We present a brief overview of several recent techniques, including a more detailed description of recent work of our own which uses a concept-based clustering approach.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal Article

When is nearest neighbor meaningful

TL;DR: In this article, the authors explore the effect of dimensionality on the nearest neighbor problem and show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance of the farthest data point.
Proceedings Article

Unsupervised deep embedding for clustering analysis

TL;DR: Deep Embedded Clustering (DEC) as discussed by the authors learns a mapping from the data space to a lower-dimensional feature space in which it iteratively optimizes a clustering objective.
Journal Article

A Brief Survey of Text Mining.

TL;DR: The main analysis tasks preprocessing, classification, clustering, information extraction and visualization are described and a number of successful applications of text mining are discussed.
Journal ArticleDOI

An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data

TL;DR: This paper presents a new k-means type algorithm for clustering high-dimensional objects in sub-spaces that can generate better clustering results than other subspace clustering algorithms and is also scalable to large data sets.
Journal ArticleDOI

K-means properties on six clustering benchmark datasets

TL;DR: The results show that overlap is critical, and that k-means starts to work effectively when the overlap reaches 4% level.
References
More filters
Book

Introduction to Algorithms

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Journal ArticleDOI

Data clustering: a review

TL;DR: An overview of pattern clustering methods from a statistical pattern recognition perspective is presented, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners.
Proceedings Article

Fast algorithms for mining association rules

TL;DR: Two new algorithms for solving thii problem that are fundamentally different from the known algorithms are presented and empirical evaluation shows that these algorithms outperform theknown algorithms by factors ranging from three for small problems to more than an order of magnitude for large problems.
Related Papers (5)