scispace - formally typeset
Journal ArticleDOI

An algorithm for generating artificial test clusters

Glenn W. Milligan
- 01 Mar 1985 - 
- Vol. 50, Iss: 1, pp 123-127
Reads0
Chats0
TLDR
An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented, useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics.
Abstract
An algorithm for generating artificial data sets which contain distinct nonoverlapping clusters is presented. The algorithm is useful for generating test data sets for Monte Carlo validation research conducted on clustering methods or statistics. The algorithm generates data sets which contain either 1, 2, 3, 4, or 5 clusters. By default, the data are embedded in either a 4, 6, or 8 dimensional space. Three different patterns for assigning the points to the clusters are provided. One pattern assigns the points equally to the clusters while the remaining two schemes produce clusters of unequal sizes. Finally, a number of methods for introducing error in the data have been incorporated in the algorithm.

read more

Citations
More filters
Journal ArticleDOI

Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

TL;DR: Two algorithms which extend the k-means algorithm to categorical domains and domains with mixed numeric and categorical values are presented and are shown to be efficient when clustering large data sets, which is critical to data mining applications.
Journal ArticleDOI

K‐means clustering: A half‐century synthesis

TL;DR: This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years, leading to a unifying treatment of K-Means and some of its extensions.
Journal ArticleDOI

A study of standardization of variables in cluster analysis

TL;DR: The present simulation study examined the standardization problem and found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure.
Journal ArticleDOI

Methodology review: Clustering methods

TL;DR: A review of clustering methodology is presented, with emphasis on algorithm performance and the re sulting implications for applied research, and two sets of recommendations are offered.
Book ChapterDOI

A Data-Clustering Algorithm on Distributed Memory Multiprocessors

TL;DR: To cluster increasingly massive data sets that are common today in data and text mining, a parallel implementation of the k-means clustering algorithm based on the message passing model is proposed and analytically shows that the speedup and the scaleup of the algorithm approach the optimal as the number of data points increases.
References
More filters
Book

Cluster Analysis

TL;DR: This fourth edition of the highly successful Cluster Analysis represents a thorough revision of the third edition and covers new and developing areas such as classification likelihood and neural networks for clustering.
Book

Clustering Algorithms

Journal ArticleDOI

An examination of procedures for determining the number of clusters in a data set

TL;DR: A Monte Carlo evaluation of 30 procedures for determining the number of clusters was conducted on artificial data sets which contained either 2, 3, 4, or 5 distinct nonoverlapping clusters to provide a variety of clustering solutions.
Journal ArticleDOI

An examination of the effect of six types of error perturbation on fifteen clustering algorithms

TL;DR: An evaluation of several clustering methods indicated that the hierarchical methods were differentially sensitive to the type of error perturbation and two alternative starting procedures for the nonhierarchical methods produced greatly enhanced cluster recovery and were found to be robust to all of the types of error examined.