Topic

Dunn index

About: Dunn index is a research topic. Over the lifetime, 150 publications have been published within this topic receiving 24021 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Analysis of Expression Data Using Unsupervised Techniques

[...]

M. A. I. Perera¹, C. R. Wijesinghe¹, A. R. Weerasinghe¹•Institutions (1)

University of Colombo¹

04 Nov 2020

TL;DR: In this paper, an unsupervised learning technique was used to identify subtypes of cancer using gene expression data obtained from CBioportal, which can help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics.

...read moreread less

Abstract: This study was conducted to review and identify the unsupervised techniques that can be employed to analyze gene expression data in order to identify better subtypes of tumors. Identifying subtypes of cancer help in improving the efficacy and reducing the toxicity of the treatments by identifying clues to find target therapeutics. Process of gene expression data analysis described under three steps as preprocessing, clustering, and cluster validation. Gene expression data obtained from CBioportal was analyzed in this research using unsupervised learning techniques. Partitioning around medoids, K-means and Hierarchical clustering techniques with different distance and linkage measures were used in initial clustering of expression data. After the cluster identification, cluster validation was conducted according to internal measures like Silhouette, Dunn index. Relative measures were used to identify optimal number of clusters. External validations like comparing the classes with clinical variables and visual analysis of the classes using heatmaps were conducted. After heatmap filtering, it was identified that the three cluster analysis results have meaningful clusters. The cluster analysis with 3 clusters identified using k means clustering has significant expression patterns in each cluster.

...read moreread less

Journal Article•DOI•

Cluster analysis of replicated alternative polyadenylation data using canonical correlation analysis

[...]

Wenbin Ye¹, Yuqi Long¹, Guoli Ji¹, Yaru Su², Pengchao Ye¹, Hongjuan Fu¹, Xiaohui Wu¹ - Show less +3 more•Institutions (2)

Xiamen University¹, Fuzhou University²

22 Jan 2019-BMC Genomics

TL;DR: By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data.

...read moreread less

Abstract: Alternative polyadenylation (APA) has emerged as a pervasive mechanism that contributes to the transcriptome complexity and dynamics of gene regulation. The current tsunami of whole genome poly(A) site data from various conditions generated by 3′ end sequencing provides a valuable data source for the study of APA-related gene expression. Cluster analysis is a powerful technique for investigating the association structure among genes, however, conventional gene clustering methods are not suitable for APA-related data as they fail to consider the information of poly(A) sites (e.g., location, abundance, number, etc.) within each gene or measure the association among poly(A) sites between two genes. Here we proposed a computational framework, named PASCCA, for clustering genes from replicated or unreplicated poly(A) site data using canonical correlation analysis (CCA). PASCCA incorporates multiple layers of gene expression data from both the poly(A) site level and gene level and takes into account the number of replicates and the variability within each experimental group. Moreover, PASCCA characterizes poly(A) sites in various ways including the abundance and relative usage, which can exploit the advantages of 3′ end deep sequencing in quantifying APA sites. Using both real and synthetic poly(A) site data sets, the cluster analysis demonstrates that PASCCA outperforms other widely-used distance measures under five performance metrics including connectivity, the Dunn index, average distance, average distance between means, and the biological homogeneity index. We also used PASCCA to infer APA-specific gene modules from recently published poly(A) site data of rice and discovered some distinct functional gene modules. We have made PASCCA an easy-to-use R package for APA-related gene expression analyses, including the characterization of poly(A) sites, quantification of association between genes, and clustering of genes. By providing a better treatment of the noise inherent in repeated measurements and taking into account multiple layers of poly(A) site data, PASCCA could be a general tool for clustering and analyzing APA-specific gene expression data. PASCCA could be used to elucidate the dynamic interplay of genes and their APA sites among various biological conditions from emerging 3′ end sequencing data to address the complex biological phenomenon.

...read moreread less

Book Chapter•DOI•

Impact of Dimensionality on the Evaluation of Stream Data Clustering Algorithms

[...]

Naresh Kumar Nagwani¹•Institutions (1)

National Institute of Technology, Raipur¹

20 Feb 2020

TL;DR: In this article, the authors explore the impact of dimensionality over the existing standard data stream clustering algorithms and compare them for different dimensions of stream using six performance parameters, namely adjusted Rand index, Dunn index, entropy, F1 measure, purity and within cluster sum of square measure.

...read moreread less

Abstract: Handling stream data is a tedious task. Recently numerous techniques are presented for analysing stream data. Stream data clustering is one of the important tasks in stream data mining. A number of application programming interfaces (APIs) are available for implementing the stream data clustering. These APIs can handle the stream data of any dimension. The objective of the presented paper is to explore the impact of dimensionality over the existing standard data stream clustering algorithms. Selected standard data stream clustering algorithms are compared for different dimensions of stream using six performance parameters, namely adjusted Rand index, Dunn index, entropy, F1 measure, purity and within cluster sum of square measure.

...read moreread less

Book Chapter•DOI•

Optimal Number of Seed Point Selection Algorithm of Unknown Dataset.

[...]

Kuntal Chowdhury¹, Debasis Chaudhuri², Arup Kumar Pal¹•Institutions (2)

Indian Institutes of Technology¹, Defence Research and Development Organisation²

01 Jan 2020

TL;DR: The optimal number of seed points selection algorithm of an unknown data based on two important internal cluster validity indices, namely, Dunn Index and Silhouette Index is described, where Shannon’s entropy with the threshold value of distance has been used to calculate the position of the seed point.

...read moreread less

Abstract: In the present world, clustering is considered to be the most important data mining tool which is applied to huge data to help the futuristic decision-making processes. It is an unsupervised classification technique by which the data points are grouped to form the homogeneous entity. Cluster analysis is used to find out the clusters from a unlabeled data. The position of the seed points primarily affects the performances of most partitional clustering techniques. The correct number of clusters in a dataset plays an important role to judge the quality of the partitional clustering technique. Selection of initial seed of K-means clustering is a critical problem for the formation of the optimal number of the cluster with the benefit of fast stability. In this paper, we have described the optimal number of seed points selection algorithm of an unknown data based on two important internal cluster validity indices, namely, Dunn Index and Silhouette Index. Here, Shannon’s entropy with the threshold value of distance has been used to calculate the position of the seed point. The algorithm is applied to different datasets and the results are comparatively better than other methods. Moreover, the comparisons have been done with other algorithms in terms of different parameters to distinguish the novelty of our proposed method.

...read moreread less

Journal Article•DOI•

Decision Theoretic Rough Set-Based Neighborhood for Self-Organizing Map

[...]

Shubhra Sankar Ray, Sresht Agrawal, Sudip Ghosh

18 Feb 2021

TL;DR: In this paper, a decision theoretic rough set-based neighborhood selection process is developed for self-organizing maps. And the results are evaluated in terms of DB index, Dunn index, quantization error, ARI, and NMI.

...read moreread less

Abstract: A decision theoretic rough set-based neighborhood selection process is developed for self-organizing maps. While the neighborhood of the winner neuron is selected based on the probability of its associativity to the winner neuron, the selected neighborhood is updated using a new method which combines the probability of its associativity and the Gaussian function. This approach provides better results as compared to self-organizing map and other clustering algorithms on several real-life datasets. The results are evaluated in terms of DB index, Dunn index, quantization error, ARI, and NMI.

...read moreread less

Collapse

Network Information

Performance

Metrics

150

Papers

29,671

Citations

No. of papers in the topic in previous years
Year	Papers
2021	20
2020	28
2019	17
2018	13
2017	10
2016	11

Dunn index

Papers published on a yearly basis

Papers

Trending Questions (5)

Network Information

Related Topics (5)

Performance

Metrics