scispace - formally typeset
Open AccessJournal ArticleDOI

Clustering Algorithms: Their Application to Gene Expression Data

TLDR
This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.
Abstract
Gene expression data hide vital information required to understand the biological process that takes place in a particular organism in relation to its environment. Deciphering the hidden patterns in gene expression data proffers a prodigious preference to strengthen the understanding of functional genomics. The complexity of biological networks and the volume of genes present increase the challenges of comprehending and interpretation of the resulting mass of data, which consists of millions of measurements; these data also inhibit vagueness, imprecision, and noise. Therefore, the use of clustering techniques is a first step toward addressing these challenges, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. The clustering of gene expression data has been proven to be useful in making known the natural structure inherent in gene expression data, understanding gene functions, cellular processes, and subtypes of cells, mining useful information from noisy data, and understanding gene regulation. The other benefit of clustering gene expression data is the identification of homology, which is very important in vaccine design. This review examines the various clustering algorithms applicable to the gene expression data in order to discover and provide useful knowledge of the appropriate clustering technique that will guarantee stability and high degree of accuracy in its analysis procedure.

read more

Citations
More filters
Journal ArticleDOI

SGAClust: Semi-supervised Graph Attraction Clustering of gene expression data

TL;DR: A graph-theoretic clustering algorithm called GAClust is proposed which groups co-expressed genes into the same cluster while also detecting noise genes and it has been found that SGAClust outperforms the unsupervised algorithms.
Dissertation

Aplicación y comparativa de cuatro modelos de clustering para datos GTEx

TL;DR: TFM will implement all clustering models, search optimum clusters numbers statistical criteria-based as BIC, AIC and elbow method and, at last, the final partitions generated by those methods are visualized are compared via external validation metrics (measures).
Journal ArticleDOI

Mapping the Complex Transcriptional Landscape of the Phytopathogenic Bacterium Dickeya dadantii

TL;DR: In this paper , a comprehensive and annotated transcriptomic map of D. dadantii was obtained by a computational method combining five independent transcriptomic data sets: (i) paired-end RNA sequencing (RNA-seq) data for a precise reconstruction of the RNA landscape; (ii) DNA microarray data providing transcriptional responses to a broad variety of environmental conditions; (iii) long-read Nanopore native RNA-seq data for iso-form-level transcriptome validation and determination of transcription termination sites; (iv) differential RNA sequencing sequencing (dRNAseq) for the precise mapping of transcription start sites.

Archetypal solution spaces for clustering gene expression datasets in identification of cancer subtypes

TL;DR: In this paper , the authors used energy landscape theory to determine the organization of the solution space for a variety of gene expression datasets using the $K$-means clustering algorithm.
Posted ContentDOI

Multiple latent clusterisation model for the inference of RNA life-cycle kinetic rates from sequencing data

TL;DR: A widespread choral regulation of the three rates is uncovered in the murine fibroblasts to the activation of proto-oncogene MYC, which was not previously observed in this biological system.
References
More filters
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.

Some methods for classification and analysis of multivariate observations

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Related Papers (5)
Trending Questions (1)
What are applications of clustering algorithms?

Applications of clustering algorithms include revealing natural structures in gene expression data, understanding gene functions, identifying cell subtypes, mining information from noisy data, and aiding in vaccine design.