Comparative Study of K-Means, Partitioning Around Medoids, Agglomerative Hierarchical, and DIANA Clustering Algorithms by Using Cancer Datasets
Md. Bipul Hossen,Md. Rabiul Auwul +1 more
- Vol. 5, Iss: 1, pp 20
TLDR
This study provides practical evaluation frameworks for accessing clustering results on gene expression cancer datasets and determines that PAM isbest for Affymetrix data set and DIANA is best for cDNA dataset among these four clustering algorithms.Abstract:
Clustering plays a particularly fundamental role in exploring data, creating predictions and to overcome the anomalies in the data. Clusters that contain parallel, identical characteristics in a dataset are grouped using reiterative algorithms. As the data in real world is rising day by day so the challenges of perceiving and interpreting the consequential mass of data, which often consists of millions of measurements are increased by the intricacy of a huge number of genes of biological networks. To addressing this challenge, we use clustering algorithms. In this study, we provided a comparative study of the four most popular clustering algorithms: K-Means, PAM, Agglomerative Hierarchical and DIANA and these are evaluated on eight real cancer (four Affymetrix and four cDNA) gene data and simulated data set. The comparative results based upon seven popular cluster validity indices: Average Silhouette Index, Corrected rand Index, Variation of Information, Dunn Index, Calinski-Harabasz Index, Separation Index, and Pearson Gamma. We determine that PAM is best for Affymetrix data set and DIANA is best for cDNA dataset among these four clustering algorithms. This study provides practical evaluation frameworks for accessing clustering results on gene expression cancer datasets.read more
Citations
More filters
Optimizing the Division of Study Class Groups Using the Partitioning Around Medoids (PAM) Method
TL;DR: The Partitioning Around Medoids (PAM) method has succeeded in optimizing class grouping by calculating the closest distance between the achievement and intelligence of each student.
Journal ArticleDOI
On the Selection of Appropriate Proximity Measurement for Gene Expression Data
TL;DR: This paper investigates the appropriate proximity measurement for gene expression data and provides a comparative study of five proximity measures: Euclidean distance, Manhattan distance, Pearson correlation, Spearman correlation, Cosine distance and Silhouette Index.
Proceedings ArticleDOI
Computer Network Information Security Threat Identification Technology Based on Big Data Clustering Algorithm
TL;DR: In this article , a big data clustering algorithm is proposed to solve the shortcomings of the existing research on computer network information security threat identification technology, based on the discussion of Big Data Clustering algorithm and Computer Network Information Security, briefly introduces the configuration of the data set and experimental environment.
Proceedings ArticleDOI
New Approach of Covid-19 Prevention by Implemented Combination of Decision Support System Algorithm
TL;DR: In this paper , three algorithms are used, they are K-Mean algorithm as a pattern clustering and the AHP algorithm as level determination of the Covid-19 pandemic, While the relationship of candidate symptom pairs to Covid19 transmission is carried out using the Naïve Bayes algorithm.
Proceedings ArticleDOI
New Approach of Covid-19 Prevention by Implemented Combination of Decision Support System Algorithm
Eddy Soeryanto Soegoto,Yeffry Handoko Putra,Rahma Wahdiniwaty,Zuriani Ahmad Zukarnain,Noorihan Abdul Rahman +4 more
TL;DR: In this article , three algorithms are used, they are K-Mean algorithm as a pattern clustering and the AHP algorithm as level determination of the Covid-19 pandemic, While the relationship of candidate symptom pairs to Covid19 transmission is carried out using the Naïve Bayes algorithm.
References
More filters
Some methods for classification and analysis of multivariate observations
TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Journal ArticleDOI
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Journal ArticleDOI
Quantitative monitoring of gene expression patterns with a complementary DNA microarray.
TL;DR: A high-capacity system was developed to monitor the expression of many genes in parallel by means of simultaneous, two-color fluorescence hybridization, which enabled detection of rare transcripts in probe mixtures derived from 2 micrograms of total cellular messenger RNA.