scispace - formally typeset
Journal ArticleDOI

A study of standardization of variables in cluster analysis

Glenn W. Milligan, +1 more
- 01 Sep 1988 - 
- Vol. 5, Iss: 2, pp 181-204
Reads0
Chats0
TLDR
The present simulation study examined the standardization problem and found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure.
Abstract
A methodological problem in applied clustering involves the decision of whether or not to standardize the input variables prior to the computation of a Euclidean distance dissimilarity measure. Existing results have been mixed with some studies recommending standardization and others suggesting that it may not be desirable. The existence of numerous approaches to standardization complicates the decision process. The present simulation study examined the standardization problem. A variety of data structures were generated which varied the intercluster spacing and the scales for the variables. The data sets were examined in four different types of error environments. These involved error free data, error perturbed distances, inclusion of outliers, and the addition of random noise dimensions. Recovery of true cluster structure as found by four clustering methods was measured at the correct partition level and at reduced levels of coverage. Results for eight standardization strategies are presented. It was found that those approaches which standardize by division by the range of the variable gave consistently superior recovery of the underlying cluster structure. The result held over different error conditions, separation distances, clustering methods, and coverage levels. The traditionalz-score transformation was found to be less effective in several situations.

read more

Citations
More filters
Journal ArticleDOI

A comparative study of efficient initialization methods for the k-means clustering algorithm

TL;DR: It is demonstrated that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods, and eight commonly used linear time complexity initialization methods are compared.
Journal ArticleDOI

Dimensions of Children's Motivation for Reading and Their Relations to Reading Activity and Reading Achievement.

TL;DR: The Motivation for Reading Questionnaire (MRQ) as mentioned in this paper was designed to assess dimensions of reading motivation, including self-efficacy, several types of intrinsic and extrinsic reading motives, social aspects of reading, and the desire to avoid reading.
Journal ArticleDOI

K‐means clustering: A half‐century synthesis

TL;DR: This paper synthesizes the results, methodology, and research conducted concerning the K-means clustering method over the last fifty years, leading to a unifying treatment of K-Means and some of its extensions.
Book

Mathematical Classification and Clustering

Boris Mirkin
TL;DR: This paper presents a meta-analyses of Hierarchy as a Clustering Structure, a model for hierarchical clustering based on the model developed in [Bouchut-Boyaval, M3].
References
More filters

Biometery: The principles and practice of statistics in biological research

TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Book

Biometry: The Principles and Practice of Statistics in Biological Research

TL;DR: In this paper, the authors present a model for the analysis of variance in a single-classification and two-way and multiway analysis of Variance with the assumption of correlation.
Journal ArticleDOI

Exploratory data analysis

F. N. David, +1 more
- 01 Dec 1977 - 
Book

Cluster Analysis

TL;DR: This fourth edition of the highly successful Cluster Analysis represents a thorough revision of the third edition and covers new and developing areas such as classification likelihood and neural networks for clustering.
Journal ArticleDOI

Exploratory Data Analysis.