scispace - formally typeset
Search or ask a question
Topic

Dunn index

About: Dunn index is a research topic. Over the lifetime, 150 publications have been published within this topic receiving 24021 citations.


Papers
More filters
Proceedings ArticleDOI
04 Nov 2020
TL;DR: In this paper, an unsupervised learning technique was used to identify subtypes of cancer using gene expression data obtained from CBioportal, which can help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics.
Abstract: This study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Gene expression data obtained from CBioportal was analyzed in this research using unsupervised learning techniques. Partitioning around medoids, K-means and Hierarchical clustering techniques with different distance and linkage measures were used in initial clustering of expression data. After the cluster identification, cluster validation was conducted according to internal measures like Silhouette, Dunn index. Relative measures were used to identify optimal number of clusters. External validations like comparing the classes with clinical variables and visual analysis of the classes using heatmaps were conducted. After heatmap filtering, it was identified that the three cluster analysis results have meaningful clusters. The cluster analysis with 3 clusters identified using k means clustering has significant expression patterns in each cluster.
Journal ArticleDOI
TL;DR: By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data.
Abstract: Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3′ end sequencing provides a valuable data source for the study of APA-related gene expression. Cluster analysis is a powerful technique for investigating the association structure among genes, however, conventional gene clustering methods are not suitable for APA-related data as they fail to consider the information of poly(A) sites (e.g., location, abundance, number, etc.) within each gene or measure the association among poly(A) sites between two genes. Here we proposed a computational framework, named PASCCA, for clustering genes from replicated or unreplicated poly(A) site data using canonical correlation analysis (CCA). PASCCA incorporates multiple layers of gene expression data from both the poly(A) site level and gene level and takes into account the number of replicates and the variability within each experimental group. Moreover, PASCCA characterizes poly(A) sites in various ways including the abundance and relative usage, which can exploit the advantages of 3′ end deep sequencing in quantifying APA sites. Using both real and synthetic poly(A) site data sets, the cluster analysis demonstrates that PASCCA outperforms other widely-used distance measures under five performance metrics including connectivity, the Dunn index, average distance, average distance between means, and the biological homogeneity index. We also used PASCCA to infer APA-specific gene modules from recently published poly(A) site data of rice and discovered some distinct functional gene modules. We have made PASCCA an easy-to-use R package for APA-related gene expression analyses, including the characterization of poly(A) sites, quantification of association between genes, and clustering of genes. By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. PASCCA could be used to elucidate the dynamic interplay of genes and their APA sites among various biological conditions from emerging 3′ end sequencing data to address the complex biological phenomenon.
Book ChapterDOI
20 Feb 2020
TL;DR: In this article, the authors explore the impact of dimensionality over the existing standard data stream clustering algorithms and compare them for different dimensions of stream using six performance parameters, namely adjusted Rand index, Dunn index, entropy, F1 measure, purity and within cluster sum of square measure.
Abstract: Handling stream data is a tedious task. Recently numerous techniques are presented for analysing stream data. Stream data clustering is one of the important tasks in stream data mining. A number of application programming interfaces (APIs) are available for implementing the stream data clustering. These APIs can handle the stream data of any dimension. The objective of the presented paper is to explore the impact of dimensionality over the existing standard data stream clustering algorithms. Selected standard data stream clustering algorithms are compared for different dimensions of stream using six performance parameters, namely adjusted Rand index, Dunn index, entropy, F1 measure, purity and within cluster sum of square measure.
Book ChapterDOI
01 Jan 2020
TL;DR: The optimal number of seed points selection algorithm of an unknown data based on two important internal cluster validity indices, namely, Dunn Index and Silhouette Index is described, where Shannon’s entropy with the threshold value of distance has been used to calculate the position of the seed point.
Abstract: In the present world, clustering is considered to be the most important data mining tool which is applied to huge data to help the futuristic decision-making processes. It is an unsupervised classification technique by which the data points are grouped to form the homogeneous entity. Cluster analysis is used to find out the clusters from a unlabeled data. The position of the seed points primarily affects the performances of most partitional clustering techniques. The correct number of clusters in a dataset plays an important role to judge the quality of the partitional clustering technique. Selection of initial seed of K-means clustering is a critical problem for the formation of the optimal number of the cluster with the benefit of fast stability. In this paper, we have described the optimal number of seed points selection algorithm of an unknown data based on two important internal cluster validity indices, namely, Dunn Index and Silhouette Index. Here, Shannon’s entropy with the threshold value of distance has been used to calculate the position of the seed point. The algorithm is applied to different datasets and the results are comparatively better than other methods. Moreover, the comparisons have been done with other algorithms in terms of different parameters to distinguish the novelty of our proposed method.
Journal ArticleDOI
18 Feb 2021
TL;DR: In this paper, a decision theoretic rough set-based neighborhood selection process is developed for self-organizing maps. And the results are evaluated in terms of DB index, Dunn index, quantization error, ARI, and NMI.
Abstract: A decision theoretic rough set-based neighborhood selection process is developed for self-organizing maps. While the neighborhood of the winner neuron is selected based on the probability of its associativity to the winner neuron, the selected neighborhood is updated using a new method which combines the probability of its associativity and the Gaussian function. This approach provides better results as compared to self-organizing map and other clustering algorithms on several real-life datasets. The results are evaluated in terms of DB index, Dunn index, quantization error, ARI, and NMI.

Network Information
Related Topics (5)
Feature selection
41.4K papers, 1M citations
70% related
Support vector machine
73.6K papers, 1.7M citations
69% related
Genetic algorithm
67.5K papers, 1.2M citations
68% related
Cluster analysis
146.5K papers, 2.9M citations
68% related
Web service
57.6K papers, 989K citations
66% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202120
202028
201917
201813
201710
201611