scispace - formally typeset
Search or ask a question
Topic

Rand index

About: Rand index is a research topic. Over the lifetime, 630 publications have been published within this topic receiving 20373 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: In this paper, a comprehensive comparative study of a representative set of community detection methods is presented, in which community-oriented topological measures are used to qualify the communities and evaluate their deviation from the reference structure.
Abstract: Community detection is one of the most active fields in complex network analysis, due to its potential value in practical applications. Many works inspired by different paradigms are devoted to the development of algorithmic solutions allowing the network structure in such cohesive subgroups to be revealed. Comparative studies reported in the literature usually rely on a performance measure considering the community structure as a partition (Rand index, normalized mutual information, etc). However, this type of comparison neglects the topological properties of the communities. In this paper, we present a comprehensive comparative study of a representative set of community detection methods, in which we adopt both types of evaluation. Community-oriented topological measures are used to qualify the communities and evaluate their deviation from the reference structure. In order to mimic real-world systems, we use artificially generated realistic networks. It turns out there is no equivalence between the two approaches: a high performance does not necessarily correspond to correct topological properties, and vice versa. They can therefore be considered as complementary, and we recommend applying both of them in order to perform a complete and accurate assessment.

135 citations

Journal ArticleDOI
TL;DR: A cluster analysis of real-world financial services data revealed that using the variable-selection heuristic prior to the K-means algorithm resulted in greater cluster stability, indicating the heuristic is extremely effective at eliminating masking variables.
Abstract: One of the most vexing problems in cluster analysis is the selection and/or weighting of variables in order to include those that truly define cluster structure, while eliminating those that might mask such structure. This paper presents a variable-selection heuristic for nonhierarchical (K-means) cluster analysis based on the adjusted Rand index for measuring cluster recovery. The heuristic was subjected to Monte Carlo testing across more than 2200 datasets with known cluster structure. The results indicate the heuristic is extremely effective at eliminating masking variables. A cluster analysis of real-world financial services data revealed that using the variable-selection heuristic prior to the K-means algorithm resulted in greater cluster stability.

131 citations

Journal ArticleDOI
TL;DR: This paper generalizes many of the classical indices that have been used with outputs of crisp clustering algorithms so that they are applicable for candidate partitions of any type (i.e., crisp or soft, with soft comprising the fuzzy, probabilistic, and possibilistic cases).
Abstract: When clustering produces more than one candidate to partition a finite set of objects O, there are two approaches to validation (i.e., selection of a “best” partition, and implicitly, a best value for c , which is the number of clusters in O). First, we may use an internal index, which evaluates each partition separately. Second, we may compare pairs of candidates with each other, or with a reference partition that purports to represent the “true” cluster structure in the objects. This paper generalizes many of the classical indices that have been used with outputs of crisp clustering algorithms so that they are applicable for candidate partitions of any type (i.e., crisp or soft, with soft comprising the fuzzy, probabilistic, and possibilistic cases). Space prevents inclusion of all of the possible generalizations that can be realized this way. Here, we concentrate on the Rand index and its modifications. We compare our fuzzy-Rand index with those of Campello, Hullermeier and Rifqi, and Brouwer, and show that our extension of the Rand index is O(n), while the other three are all O(n2). Numerical examples are given to illustrate various facets of the new indices. In particular, we show that our indices can be used, even when the partitions are probabilistic or possibilistic, and that our method of generalization is valid for any index that depends only on the entries of the classical (i.e., four-pair types) contingency table for this problem.

123 citations

Journal ArticleDOI
TL;DR: This paper solves the key technical challenge of analytically computing the expected value and variance of generalized IT measures and proposes guidelines for using ARI and AMI as external validation indices.
Abstract: Adjusted for chance measures are widely used to compare partitions/clusterings of the same data set. In particular, the Adjusted Rand Index (ARI) based on pair-counting, and the Adjusted Mutual Information (AMI) based on Shannon information theory are very popular in the clustering community. Nonetheless it is an open problem as to what are the best application scenarios for each measure and guidelines in the literature for their usage are sparse, with the result that users often resort to using both. Generalized Information Theoretic (IT) measures based on the Tsallis entropy have been shown to link pair-counting and Shannon IT measures. In this paper, we aim to bridge the gap between adjustment of measures based on pair-counting and measures based on information theory. We solve the key technical challenge of analytically computing the expected value and variance of generalized IT measures. This allows us to propose adjustments of generalized IT measures, which reduce to well known adjusted clustering comparison measures as special cases. Using the theory of generalized IT measures, we are able to propose the following guidelines for using ARI and AMI as external validation indices: ARI should be used when the reference clustering has large equal sized clusters; AMI should be used when the reference clustering is unbalanced and there exist small clusters.

123 citations

Journal ArticleDOI
TL;DR: A comprehensive comparative study of a representative set of community detection methods, in which community-oriented topological measures are used to qualify the communities and evaluate their deviation from the reference structure and it turns out there is no equivalence between the two approaches.
Abstract: Community detection is one of the most active fields in complex networks analysis, due to its potential value in practical applications. Many works inspired by different paradigms are devoted to the development of algorithmic solutions allowing to reveal the network structure in such cohesive subgroups. Comparative studies reported in the literature usually rely on a performance measure considering the community structure as a partition (Rand Index, Normalized Mutual information, etc.). However, this type of comparison neglects the topological properties of the communities. In this article, we present a comprehensive comparative study of a representative set of community detection methods, in which we adopt both types of evaluation. Community-oriented topological measures are used to qualify the communities and evaluate their deviation from the reference structure. In order to mimic real-world systems, we use artificially generated realistic networks. It turns out there is no equivalence between both approaches: a high performance does not necessarily correspond to correct topological properties, and vice-versa. They can therefore be considered as complementary, and we recommend applying both of them in order to perform a complete and accurate assessment.

121 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
78% related
Deep learning
79.8K papers, 2.1M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202222
202170
202064
201945
201842