Computational cluster validation in post-genomic data analysis

doi:10.1093/BIOINFORMATICS/BTI517

Open AccessJournal ArticleDOI

Computational cluster validation in post-genomic data analysis

Julia Handl, +2 more

- 01 Aug 2005 -

Bioinformatics

- Vol. 21, Iss: 15, pp 3201-3212

TLDR

In this article, the authors present a review of clustering validation techniques for post-genomic data analysis, with a particular focus on their application to postgenomic analysis of biological data.

Abstract:

Motivation: The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledge---whether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics. Results: This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation. Availability: The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/ Contact: J.Handl@postgrad.manchester.ac.uk Supplementary information: Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/

Computational cluster validation in post-genomic data analysis

Citations

The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups

A systematic comparison and evaluation of biclustering methods for gene expression data

Computational Analysis of Microarray Data

Is my network module preserved and reproducible

Statistical strategies for avoiding false discoveries in metabolomics and related experiments

References

An introduction to the bootstrap

Some methods for classification and analysis of multivariate observations

Pattern Classification

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

Related Papers (5)

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Cluster analysis and display of genome-wide expression patterns

Finding Groups in Data: An Introduction to Cluster Analysis

Data clustering: a review

Algorithms for clustering data