scispace - formally typeset
Open AccessJournal ArticleDOI

Computational cluster validation in post-genomic data analysis

Julia Handl, +2 more
- 01 Aug 2005 - 
- Vol. 21, Iss: 15, pp 3201-3212
TLDR
In this article, the authors present a review of clustering validation techniques for post-genomic data analysis, with a particular focus on their application to postgenomic analysis of biological data.
Abstract
Motivation: The discovery of novel biological knowledge from the ab initio analysis of post-genomic data relies upon the use of unsupervised processing methods, in particular clustering techniques. Much recent research in bioinformatics has therefore been focused on the transfer of clustering methods introduced in other scientific fields and on the development of novel algorithms specifically designed to tackle the challenges posed by post-genomic data. The partitions returned by a clustering algorithm are commonly validated using visual inspection and concordance with prior biological knowledge---whether the clusters actually correspond to the real structure in the data is somewhat less frequently considered. Suitable computational cluster validation techniques are available in the general data-mining literature, but have been given only a fraction of the same attention in bioinformatics. Results: This review paper aims to familiarize the reader with the battery of techniques available for the validation of clustering results, with a particular focus on their application to post-genomic data analysis. Synthetic and real biological datasets are used to demonstrate the benefits, and also some of the perils, of analytical clustervalidation. Availability: The software used in the experiments is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/ Contact: J.Handl@postgrad.manchester.ac.uk Supplementary information: Enlarged colour plots are provided in the Supplementary Material, which is available at http://dbkweb.ch.umist.ac.uk/handl/clustervalidation/

read more

Citations
More filters
Journal ArticleDOI

A systematic comparison and evaluation of biclustering methods for gene expression data

TL;DR: A methodology for comparing and validating biclustering methods that includes a simple binary reference model that captures the essential features of most bic Lustering approaches and proposes a fast divide-and-conquer algorithm (Bimax).
Journal ArticleDOI

Is my network module preserved and reproducible

TL;DR: This work studies several types of network preservation statistics that do not require a module assignment in the test network, and finds that several human cortical modules are less preserved in chimpanzees.
Journal ArticleDOI

Statistical strategies for avoiding false discoveries in metabolomics and related experiments

TL;DR: A list of some of the simpler checks that might improve one’s confidence that a candidate biomarker is not simply a statistical artefact is provided, and a series of preferred tests and visualisation tools that can assist readers and authors in assessing papers are suggested.
References
More filters
Book

An introduction to the bootstrap

TL;DR: This article presents bootstrap methods for estimation, using simple arguments, with Minitab macros for implementing these methods, as well as some examples of how these methods could be used for estimation purposes.

Some methods for classification and analysis of multivariate observations

TL;DR: The k-means algorithm as mentioned in this paper partitions an N-dimensional population into k sets on the basis of a sample, which is a generalization of the ordinary sample mean, and it is shown to give partitions which are reasonably efficient in the sense of within-class variance.
Book

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Proceedings Article

A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise

TL;DR: In this paper, a density-based notion of clusters is proposed to discover clusters of arbitrary shape, which can be used for class identification in large spatial databases and is shown to be more efficient than the well-known algorithm CLAR-ANS.
Related Papers (5)