Model-based clustering and data transformations for gene expression data.

doi:10.1093/BIOINFORMATICS/17.10.977

Open AccessJournal ArticleDOI

Model-based clustering and data transformations for gene expression data.

Ka Yee Yeung, +4 more

- 01 Oct 2001 -

Bioinformatics

- Vol. 17, Iss: 10, pp 977-987

Chats0

TLDR

The model-based approach has superior performance on synthetic data sets, consistently selecting the correct model and the number of clusters, and the validity of the Gaussian mixture assumption on different transformations of real data is explored.

Abstract:

Motivation: Clustering is a useful exploratory technique for the analysis of gene expression data. Many different heuristic clustering algorithms have been proposed in this context. Clustering algorithms based on probability models offer a principled alternative to heuristic algorithms. In particular, model-based clustering assumes that the data is generated by a finite mixture of underlying probability distributions such as multivariate normal distributions. The issues of selecting a ‘good’ clustering method and determining the ‘correct’ number of clusters are reduced to model selection problems in the probability framework. Gaussian mixture models have been shown to be a powerful tool for clustering in many applications. Results: We benchmarked the performance of modelbased clustering on several synthetic and real gene expression data sets for which external evaluation criteria were available. The model-based approach has superior performance on our synthetic data sets, consistently selecting the correct model and the number of clusters. On real expression data, the model-based approach produced clusters of quality comparable to a leading heuristic clustering algorithm, but with the key advantage of suggesting the number of clusters and an appropriate model. We also explored the validity of the Gaussian mixture assumption on different transformations of real data. We also assessed the degree to which these real gene expression data sets fit multivariate Gaussian distributions both before and after subjecting them to commonly used data transformations. Suitably chosen transformations seem to result in reasonable fits. Availability: MCLUST is available at http://www.stat. washington.edu/fraley/mclust. The software for the diagonal model is under development.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Model-Based Clustering, Discriminant Analysis, and Density Estimation

Chris Fraley, +1 more

- 01 Jun 2002 -

Journal of the American Statistical Asso...

TL;DR: This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.

...read moreread less

Journal ArticleDOI

mclust 5: Clustering, Classification and Density Estimation Using Gaussian Finite Mixture Models.

Luca Scrucca, +3 more

- 01 Aug 2016 -

R Journal

TL;DR: This updated version of mclust adds new covariance structures, dimension reduction capabilities for visualisation, model selection criteria, initialisation strategies for the EM algorithm, and bootstrap-based inference, making it a full-featured R package for data analysis via finite mixture modelling.

...read moreread less

Journal ArticleDOI

Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Stefano Monti, +3 more

- 01 Jul 2003 -

Machine Learning

TL;DR: A new methodology of class discovery and clustering validation tailored to the task of analyzing gene expression data is presented, and in conjunction with resampling techniques, it provides for a method to represent the consensus across multiple runs of a clustering algorithm and to assess the stability of the discovered clusters.

...read moreread less

Journal ArticleDOI

Cluster analysis for gene expression data: a survey

Daxin Jiang, +2 more

- 01 Nov 2004 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This paper divides cluster analysis for gene expression data into three categories, presents specific challenges pertinent to each clustering category and introduces several representative approaches, and suggests the promising trends in this field.

...read moreread less

Journal ArticleDOI

Genomic and transcriptional aberrations linked to breast cancer pathophysiologies.

Koei Chin, +28 more

- 01 Dec 2006 -

Cancer Cell

TL;DR: It is shown that the recurrent CNAs differ between tumor subtypes defined by expression pattern and that stratification of patients according to outcome can be improved by measuring both expression and copy number, especially high-level amplification.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Estimating the Dimension of a Model

Gideon Schwarz

- 01 Mar 1978 -

Annals of Statistics

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

...read moreread less

Estimating the dimension of a model

Gideon Schwarz

TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.

...read moreread less

Journal ArticleDOI

Cluster analysis and display of genome-wide expression patterns

Michael B. Eisen, +3 more

- 08 Dec 1998 -

Proceedings of the National Academy of S...

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.

...read moreread less

Journal ArticleDOI

An Analysis of Transformations

George E. P. Box, +1 more

- 01 Jul 1964 -

Journal of the royal statistical society...

TL;DR: In this article, Lindley et al. make the less restrictive assumption that such a normal, homoscedastic, linear model is appropriate after some suitable transformation has been applied to the y's.

...read moreread less

Journal ArticleDOI

Objective Criteria for the Evaluation of Clustering Methods

William M. Rand

- 01 Dec 1971 -

Journal of the American Statistical Asso...

TL;DR: This article proposes several criteria which isolate specific aspects of the performance of a method, such as its retrieval of inherent structure, its sensitivity to resampling and the stability of its results in the light of new data.

...read moreread less