Open Access
Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling
Reads0
Chats0
TLDR
Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.Abstract:
‡We describe the use of singular value decomposition in transforming genome-wide expression data from genes 3 arrays space to reduced diagonalized ‘‘eigengenes’’ 3 ‘‘eigenarrays’’ space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.read more
Citations
More filters
Journal ArticleDOI
Adjusting batch effects in microarray expression data using empirical Bayes methods
TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Journal ArticleDOI
The Perseus computational platform for comprehensive analysis of (prote)omics data.
Stefka Tyanova,Tikira Temu,Pavel Sinitcyn,Arthur Carlson,Marco Y. Hein,Tamar Geiger,Matthias Mann,Jürgen Cox +7 more
TL;DR: The Perseus software platform was developed to support biological and biomedical researchers in interpreting protein quantification, interaction and post-translational modification data and it is anticipated that Perseus's arsenal of algorithms and its intuitive usability will empower interdisciplinary analysis of complex large data sets.
Journal ArticleDOI
Model-Based Clustering, Discriminant Analysis, and Density Estimation
Chris Fraley,Adrian E. Raftery +1 more
TL;DR: This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
Journal ArticleDOI
Missing value estimation methods for DNA microarrays.
Olga G. Troyanskaya,Michael N. Cantor,Gavin Sherlock,Patrick O. Brown,Trevor Hastie,Robert Tibshirani,David Botstein,Russ B. Altman +7 more
TL;DR: It is shown that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVD Impute and KNN Impute surpass the commonly used row average method (as well as filling missing values with zeros).
Journal ArticleDOI
Sparse Principal Component Analysis
TL;DR: This work introduces a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings and shows that PCA can be formulated as a regression-type optimization problem.
References
More filters
Book
A wavelet tour of signal processing
TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Journal ArticleDOI
Cluster analysis and display of genome-wide expression patterns
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI
Quantitative monitoring of gene expression patterns with a complementary DNA microarray.
TL;DR: A high-capacity system was developed to monitor the expression of many genes in parallel by means of simultaneous, two-color fluorescence hybridization, which enabled detection of rare transcripts in probe mixtures derived from 2 micrograms of total cellular messenger RNA.
Book
An Introduction to Multivariate Statistical Analysis
TL;DR: In this article, the distribution of the Mean Vector and the Covariance Matrix and the Generalized T2-Statistic is analyzed. But the distribution is not shown to be independent of sets of Variates.