An algorithm for generating artificial test clusters

doi:10.1007/BF02294153

Journal ArticleDOI

An algorithm for generating artificial test clusters

Glenn W. Milligan

- 01 Mar 1985 -

Psychometrika

- Vol. 50, Iss: 1, pp 123-127

Chats0

TLDR

An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented, useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics.

Abstract:

An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented. The algorithm is useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics. The algorithm generates data sets which contain either 1, 2, 3, 4, or 5 clusters. By default, the data are embedded in either a 4, 6, or 8 dimensional space. Three different patterns for assigning the points to the clusters are provided. One pattern assigns the points equally to the clusters while the remaining two schemes produce clusters of unequal sizes. Finally, a number of methods for introducing error in the data have been incorporated in the algorithm.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Zhexue Huang

- 01 Sep 1998 -

Data Mining and Knowledge Discovery

TL;DR: Two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values are presented and are shown to be efficient when clustering large data sets, which is critical to data mining applications.

...read moreread less

Journal ArticleDOI

K‐means clustering: A half‐century synthesis

Douglas Steinley

- 01 May 2006 -

British Journal of Mathematical and Stat...

TL;DR: This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years, leading to a unifying treatment of K-Means and some of its extensions.

...read moreread less

Journal ArticleDOI

A study of standardization of variables in cluster analysis

Glenn W. Milligan, +1 more

- 01 Sep 1988 -

Journal of Classification

TL;DR: The present simulation study examined the standardization problem and found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure.

...read moreread less

Journal ArticleDOI

Methodology review: Clustering methods

Glenn W. Milligan, +1 more

- 01 Dec 1987 -

Applied Psychological Measurement

TL;DR: A review of clustering methodology is presented, with emphasis on algorithm performance and the re sulting implications for applied research, and two sets of recommendations are offered.

...read moreread less

Book ChapterDOI

A Data-Clustering Algorithm on Distributed Memory Multiprocessors

Inderjit S. Dhillon, +1 more

TL;DR: To cluster increasingly massive data sets that are common today in data and text mining, a parallel implementation of the k-means clustering algorithm based on the message passing model is proposed and analytically shows that the speedup and the scaleup of the algorithm approach the optimal as the number of data points increases.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Cluster Analysis

Brian Everitt, +2 more

TL;DR: This fourth edition of the highly successful Cluster Analysis represents a thorough revision of the third edition and covers new and developing areas such as classification likelihood and neural networks for clustering.

...read moreread less

Book

Clustering Algorithms

John A. Hartigan

Journal ArticleDOI

An examination of procedures for determining the number of clusters in a data set

Glenn W. Milligan, +1 more

- 01 Jun 1985 -

Psychometrika

TL;DR: A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters to provide a variety of clustering solutions.

...read moreread less

Journal ArticleDOI

An examination of the effect of six types of error perturbation on fifteen clustering algorithms

Glenn W. Milligan

- 01 Sep 1980 -

Psychometrika

TL;DR: An evaluation of several clustering methods indicated that the hierarchical methods were differentially sensitive to the type of error perturbation and two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust to all of the types of error examined.

...read moreread less

Journal ArticleDOI

A Review of Classification

R. M. Cormack

An algorithm for generating artificial test clusters

Citations

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

K‐means clustering: A half‐century synthesis

A study of standardization of variables in cluster analysis

Methodology review: Clustering methods

A Data-Clustering Algorithm on Distributed Memory Multiprocessors

References

Cluster Analysis

Clustering Algorithms

An examination of procedures for determining the number of clusters in a data set

An examination of the effect of six types of error perturbation on fifteen clustering algorithms

A Review of Classification

Related Papers (5)

An examination of the effect of six types of error perturbation on fifteen clustering algorithms

An examination of procedures for determining the number of clusters in a data set

Some methods for classification and analysis of multivariate observations

Cluster Analysis

A Review of Classification