Comparative Study of K-Means, Partitioning Around Medoids, Agglomerative Hierarchical, and DIANA Clustering Algorithms by Using Cancer Datasets

doi:10.11648/J.BSI.20200501.14

Open AccessJournal ArticleDOI

Comparative Study of K-Means, Partitioning Around Medoids, Agglomerative Hierarchical, and DIANA Clustering Algorithms by Using Cancer Datasets

- Vol. 5, Iss: 1, pp 20

TLDR

This study provides practical evaluation frameworks for accessing clustering results on gene expression cancer datasets and determines that PAM isbest for Affymetrix data set and DIANA is best for cDNA dataset among these four clustering algorithms.

Abstract:

Clustering plays a particularly fundamental role in exploring data, creating predictions and to overcome the anomalies in the data. Clusters that contain parallel, identical characteristics in a dataset are grouped using reiterative algorithms. As the data in real world is rising day by day so the challenges of perceiving and interpreting the consequential mass of data, which often consists of millions of measurements are increased by the intricacy of a huge number of genes of biological networks. To addressing this challenge, we use clustering algorithms. In this study, we provided a comparative study of the four most popular clustering algorithms: K-Means, PAM, Agglomerative Hierarchical and DIANA and these are evaluated on eight real cancer (four Affymetrix and four cDNA) gene data and simulated data set. The comparative results based upon seven popular cluster validity indices: Average Silhouette Index, Corrected rand Index, Variation of Information, Dunn Index, Calinski-Harabasz Index, Separation Index, and Pearson Gamma. We determine that PAM is best for Affymetrix data set and DIANA is best for cDNA dataset among these four clustering algorithms. This study provides practical evaluation frameworks for accessing clustering results on gene expression cancer datasets.

Comparative Study of K-Means, Partitioning Around Medoids, Agglomerative Hierarchical, and DIANA Clustering Algorithms by Using Cancer Datasets

Citations

Optimizing the Division of Study Class Groups Using the Partitioning Around Medoids (PAM) Method

On the Selection of Appropriate Proximity Measurement for Gene Expression Data

Computer Network Information Security Threat Identification Technology Based on Big Data Clustering Algorithm

New Approach of Covid-19 Prevention by Implemented Combination of Decision Support System Algorithm

New Approach of Covid-19 Prevention by Implemented Combination of Decision Support System Algorithm

References

Some methods for classification and analysis of multivariate observations

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Quantitative monitoring of gene expression patterns with a complementary DNA microarray.

Algorithms for clustering data

Algorithms for clustering data

Related Papers (5)

A comparison study of clustering validity indices

Performance of an Ensemble Clustering Algorithm on Biological Data Sets

A Hierarchical Clustering Algorithm Based on Silhouette Index for Cancer Subtype Discovery from Omics Data

An incremental clustering of gene expression data

Shared farthest neighbor approach to clustering of high dimensionality, low cardinality data