scispace - formally typeset
Search or ask a question
Topic

Rand index

About: Rand index is a research topic. Over the lifetime, 630 publications have been published within this topic receiving 20373 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The present paper introduces a generalization of the Hubert and Arabie adjusted Rand index, called the Omega index, which can be applied to situations where both, one, or neither of the solutions being compared is non-disjoint.
Abstract: Cluster recovery indices are more important than ever, because of the necessity for comparing the large number of clustering procedures available today. Of the cluster recovery indices prominent in contemporary literature, the Hubert and Arabie (1985) adjustment to the Rand index (1971) has been demonstrated to have the most desirable properties (Milligan & Cooper, 1986). However, use of the Hubert and Arabie adjustment to the Rand index is limited to cluster solutions involving non-overlapping, or disjoint, clusters. The present paper introduces a generalization of the Hubert and Arabie adjusted Rand index. This generalization, called the Omega index, can be applied to situations where both, one, or neither of the solutions being compared is non-disjoint. In the special case where both solutions are disjoint, the Omega index is equivalent to the Hubert and Arabie adjusted Rand index.

154 citations

Journal ArticleDOI
TL;DR: It is shown that one can calculate the Hubert-Arabie adjusted Rand index by first forming the fourfold contingency table counting the number of pairs of objects that were placed in the same cluster in both partitions.
Abstract: It is shown that one can calculate the Hubert-Arabie adjusted Rand index by first forming the fourfold contingency table counting the number of pairs of objects that were placed in the same cluster in both partitions, in the same cluster in one partition but in different clusters in the other partition, and in different clusters in both, and then computing Cohen's ? on this fourfold table.

153 citations

Journal ArticleDOI
TL;DR: New criteria for estimating a clustering, which are based on the posterior expected adjusted Rand index, are proposed and are shown to possess a shrinkage property and outperform Binder's loss in a simulation study and in an application to gene expression data.
Abstract: In this paper we address the problem of obtaining a single clustering estimate bc based on an MCMC sample of clusterings c (1) ;c (2) :::;c (M) from the posterior distribution of a Bayesian cluster model. Methods to derive b when the number of groups K varies between the clusterings are reviewed and discussed. These include the maximum a posteriori (MAP) estimate and methods based on the posterior similarity matrix, a matrix containing the posterior probabilities that the observations i and j are in the same cluster. The posterior similarity matrix is related to a commonly used loss function by Binder (1978). Minimization of the loss is shown to be equivalent to maximizing the Rand index between esti- mated and true clustering. We propose new criteria for estimating a clustering, which are based on the posterior expected adjusted Rand index. The criteria are shown to possess a shrinkage property and outperform Binder's loss in a simulation study and in an application to gene expression data. They also perform favorably compared to other clustering procedures.

145 citations

Journal ArticleDOI
08 Mar 2017-PeerJ
TL;DR: A new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms.
Abstract: Metagenomics has become an integral part of defining microbial diversity in various environments. Many ecosystems have characteristically low biomass and few cultured representatives. Linking potential metabolisms to phylogeny in environmental microorganisms is important for interpreting microbial community functions and the impacts these communities have on geochemical cycles. However, with metagenomic studies there is the computational hurdle of 'binning' contigs into phylogenetically related units or putative genomes. Binning methods have been implemented with varying approaches such as k-means clustering, Gaussian mixture models, hierarchical clustering, neural networks, and two-way clustering; however, many of these suffer from biases against low coverage/abundance organisms and closely related taxa/strains. We are introducing a new binning method, BinSanity, that utilizes the clustering algorithm affinity propagation (AP), to cluster assemblies using coverage with compositional based refinement (tetranucleotide frequency and percent GC content) to optimize bins containing multiple source organisms. This separation of composition and coverage based clustering reduces bias for closely related taxa. BinSanity was developed and tested on artificial metagenomes varying in size and complexity. Results indicate that BinSanity has a higher precision, recall, and Adjusted Rand Index compared to five commonly implemented methods. When tested on a previously published environmental metagenome, BinSanity generated high completion and low redundancy bins corresponding with the published metagenome-assembled genomes.

143 citations

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work proposes a new metric called the warping error that tolerates disagreements over boundary location, penalizes topological disagreements, and can be used directly as a cost function for learning boundary detection, in a method that it is called Boundary Learning by Optimization with Topological Constraints (BLOTC).
Abstract: Recent studies have shown that machine learning can improve the accuracy of detecting object boundaries in images. In the standard approach, a boundary detector is trained by minimizing its pixel-level disagreement with human boundary tracings. This naive metric is problematic because it is overly sensitive to boundary locations. This problem is solved by metrics provided with the Berkeley Segmentation Dataset, but these can be insensitive to topo-logical differences, such as gaps in boundaries. Furthermore, the Berkeley metrics have not been useful as cost functions for supervised learning. Using concepts from digital topology, we propose a new metric called the warping error that tolerates disagreements over boundary location, penalizes topological disagreements, and can be used directly as a cost function for learning boundary detection, in a method that we call Boundary Learning by Optimization with Topological Constraints (BLOTC). We trained boundary detectors on electron microscopic images of neurons, using both BLOTC and standard training. BLOTC produced substantially better performance on a 1.2 million pixel test set, as measured by both the warping error and the Rand index evaluated on segmentations generated from the boundary labelings. We also find our approach yields significantly better segmentation performance than either gPb-OWT-UCM or multiscale normalized cut, as well as Boosted Edge Learning trained directly on our data.

139 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
83% related
Support vector machine
73.6K papers, 1.7M citations
80% related
Feature (computer vision)
128.2K papers, 1.7M citations
78% related
Deep learning
79.8K papers, 2.1M citations
78% related
Feature extraction
111.8K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202222
202170
202064
201945
201842