Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters

doi:10.1109/TPAMI.2007.1138

Journal ArticleDOI

Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters

Hanan Ayad, +1 more

- 01 Jan 2008 -

IEEE Transactions on Pattern Analysis an...

- Vol. 30, Iss: 1, pp 160-173

Chats0

TLDR

This paper proposes new consensus clustering algorithms with linear computational complexity in n and introduces the idea of cumulative voting as a solution for the problem of cluster label alignment, where unlike the common one-to-one voting scheme, a probabilistic mapping is computed.

Abstract:

Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In this paper, we propose new consensus clustering algorithms with linear computational complexity in n. We consider clusterings generated with a random number of clusters, which we describe by categorical random variables. We introduce the idea of cumulative voting as a solution for the problem of cluster label alignment, where unlike the common one-to-one voting scheme, a probabilistic mapping is computed. We seek a first summary of the ensemble that minimizes the average squared distance between the mapped partitions and the optimal representation of the ensemble, where the selection criterion of the reference clustering is defined based on maximizing the information content as measured by the entropy. We describe cumulative vote weighting schemes and corresponding algorithms to compute an empirical probability distribution summarizing the ensemble. Given the arbitrary number of clusters of the input partitions, we formulate the problem of extracting the optimal consensus as that of finding a compressed summary of the estimated distribution that preserves the maximum relevant information. An efficient solution is obtained using an agglomerative algorithm that minimizes the average generalized Jensen-Shannon divergence within the cluster. The empirical study demonstrates significant gains in accuracy and superior performance compared to several recent consensus clustering algorithms.

Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters

Citations

A survey of clustering ensemble algorithms

Cluster ensembles

Optimized Data Fusion for Kernel k-Means Clustering

K-Means-Based Consensus Clustering: A Unified View

Spectral Ensemble Clustering via Weighted K-Means: Theoretical and Practical Evidence

References

Random Forests

Elements of information theory

Pattern Classification

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

Bagging predictors

Related Papers (5)

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

Combining Multiple Clusterings Using Evidence Accumulation

Clustering ensembles: models of consensus and weak partitions

Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Data clustering: a review