scispace - formally typeset
Search or ask a question

Showing papers on "Rand index published in 2001"


Journal ArticleDOI
TL;DR: A cluster analysis of real-world financial services data revealed that using the variable-selection heuristic prior to the K-means algorithm resulted in greater cluster stability, indicating the heuristic is extremely effective at eliminating masking variables.
Abstract: One of the most vexing problems in cluster analysis is the selection and/or weighting of variables in order to include those that truly define cluster structure, while eliminating those that might mask such structure. This paper presents a variable-selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. The heuristic was subjected to Monte Carlo testing across more than 2200 datasets with known cluster structure. The results indicate the heuristic is extremely effective at eliminating masking variables. A cluster analysis of real-world financial services data revealed that using the variable-selection heuristic prior to the K-means algorithm resulted in greater cluster stability.

131 citations


Book ChapterDOI
21 Aug 2001
TL;DR: Binary survey data from the Austrian National Guest Survey conducted in the summer season of 1997 were used to identify behavioral market segments on the basis of vacation activity information to establish segment stability for each input variable.
Abstract: Binary survey data from the Austrian National Guest Survey conducted in the summer season of 1997 were used to identify behavioral market segments on the basis of vacation activity information. Bagged clustering overcomes a number of difficulties typically encountered when partitioning large binary data sets: The partitions have greater structural stability over repetitions of the algorithm and the question of the "correct" number of clusters is less important because of the hierarchical step of the cluster analysis. Finally, the bootstrap part of the algorithm provides means for assessing and visualizing segment stability for each input variable.

12 citations