Judging the Quality of Gene Expression-Based Clustering Methods Using Gene Annotation
TLDR
It is concluded that enrichment of clusters for biological function is, in general, highest at rather low cluster numbers, and no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance at the optimal choice of cluster number.Abstract:
We compare several commonly used expression-based gene clustering algorithms using a figure of merit based on the mutual information between cluster membership and known gene attributes. By studying various publicly available expression data sets we conclude that enrichment of clusters for biological function is, in general, highest at rather low cluster numbers. As a measure of dissimilarity between the expression patterns of two genes, no method outperforms Euclidean distance for ratio-based measurements, or Pearson distance for non-ratio-based measurements at the optimal choice of cluster number. We show the self-organized-map approach to be best for both measurement types at higher numbers of clusters. Clusters of genes derived from single- and average-linkage hierarchical clustering tend to produce worse-than-random results.
[The algorithm described is available at http://llama.med.harvard.edu, under Software.]read more
Citations
More filters
Journal ArticleDOI
Gene regulatory network inference: Data integration in dynamic models—A review
TL;DR: This review deals with the reconstruction of gene regulatory networks (GRNs) from experimental data through computational methods and approaches are discussed that enable the modelling of the dynamics of Gene regulatory systems.
Journal ArticleDOI
Mapping the backbone of science
TL;DR: A new map representing the structure of all of science, based on journal articles, is presented, including both the natural and social sciences, including biochemistry, which appears as the most interdisciplinary discipline in science.
Journal ArticleDOI
clValid: An R Package for Cluster Validation
TL;DR: The R package clValid contains functions for validating the results of a clustering analysis, and the user can choose from nine clustering algorithms in existing R packages, including hierarchical, K-means, self-organizing maps (SOM), to choose from.
Mapping the backbone of science.
TL;DR: In this article, the authors presented a new map representing the structure of all of science, based on journal articles, including both the natural and social sciences, which provides a bird's eye view of today's scientific landscape.
References
More filters
Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Journal ArticleDOI
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more
TL;DR: The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing.
Journal ArticleDOI
Cluster analysis and display of genome-wide expression patterns
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Related Papers (5)
Gene Ontology: tool for the unification of biology
M Ashburner,Catherine A. Ball,Judith A. Blake,David Botstein,Heather Butler,J. M. Cherry,Allan Peter Davis,Kara Dolinski,Selina S. Dwight,J.T. Eppig,Midori A. Harris,David P. Hill,Laurie Issel-Tarver,Andrew Kasarskis,Suzanna E. Lewis,John C. Matese,Joel E. Richardson,M. Ringwald,Gerald M. Rubin,Gavin Sherlock +19 more