scispace - formally typeset
Journal ArticleDOI

Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters

Reads0
Chats0
TLDR
This paper proposes new consensus clustering algorithms with linear computational complexity in n and introduces the idea of cumulative voting as a solution for the problem of cluster label alignment, where unlike the common one-to-one voting scheme, a probabilistic mapping is computed.
Abstract
Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In this paper, we propose new consensus clustering algorithms with linear computational complexity in n. We consider clusterings generated with a random number of clusters, which we describe by categorical random variables. We introduce the idea of cumulative voting as a solution for the problem of cluster label alignment, where unlike the common one-to-one voting scheme, a probabilistic mapping is computed. We seek a first summary of the ensemble that minimizes the average squared distance between the mapped partitions and the optimal representation of the ensemble, where the selection criterion of the reference clustering is defined based on maximizing the information content as measured by the entropy. We describe cumulative vote weighting schemes and corresponding algorithms to compute an empirical probability distribution summarizing the ensemble. Given the arbitrary number of clusters of the input partitions, we formulate the problem of extracting the optimal consensus as that of finding a compressed summary of the estimated distribution that preserves the maximum relevant information. An efficient solution is obtained using an agglomerative algorithm that minimizes the average generalized Jensen-Shannon divergence within the cluster. The empirical study demonstrates significant gains in accuracy and superior performance compared to several recent consensus clustering algorithms.

read more

Citations
More filters
Journal ArticleDOI

A survey of clustering ensemble algorithms

TL;DR: An overview of clustering ensemble methods that can be very useful for the community of clusters practitioners is presented and a taxonomy of these techniques is presented to illustrate some important applications.
Journal ArticleDOI

Cluster ensembles

TL;DR: A variety of algorithms that have been proposed to address the cluster ensemble problem are described, organizing them in conceptual categories that bring out the common threads and lessons learnt while simultaneously highlighting unique features of individual approaches.
Journal ArticleDOI

Optimized Data Fusion for Kernel k-Means Clustering

TL;DR: Simulated and real-life data fusion applications are experimentally studied, and the results validate that the proposed algorithm has comparable performance, moreover, it is more efficient on large-scale data sets.
Journal ArticleDOI

K-Means-Based Consensus Clustering: A Unified View

TL;DR: This paper reveals a necessary and sufficient condition for utility functions which work for KCC and investigates some important factors, such as the quality and diversity of basic partitionings, which may affect the performances of KCC.
Journal ArticleDOI

Spectral Ensemble Clustering via Weighted K-Means: Theoretical and Practical Evidence

TL;DR: This work discloses the theoretical equivalence between SEC and weighted K-means clustering and derives the latent consensus function of SEC, which to the best knowledge is the first to bridge co-association matrix based methods to the methods with explicit global objective functions.
References
More filters
Journal ArticleDOI

Random Forests

TL;DR: Internal estimates monitor error, strength, and correlation and these are used to show the response to increasing the number of features used in the forest, and are also applicable to regression.
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Journal ArticleDOI

Bagging predictors

Leo Breiman
TL;DR: Tests on real and simulated data sets using classification and regression trees and subset selection in linear regression show that bagging can give substantial gains in accuracy.
Related Papers (5)