scispace - formally typeset
Open Access

Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling

Reads0
Chats0
TLDR
Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
Abstract
‡We describe the use of singular value decomposition in transforming genome-wide expression data from genes 3 arrays space to reduced diagonalized ‘‘eigengenes’’ 3 ‘‘eigenarrays’’ space, where the eigengenes (or eigenarrays) are unique orthonormal superpositions of the genes (or arrays). Normalizing the data by filtering out the eigengenes (and eigenarrays) that are inferred to represent noise or experimental artifacts enables meaningful comparison of the expression of different genes across different arrays in different experiments. Sorting the data according to the eigengenes and eigenarrays gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype, respectively. After normalization and sorting, the significant eigengenes and eigenarrays can be associated with observed genome-wide effects of regulators, or with measured samples, in which these regulators are overactive or underactive, respectively.

read more

Citations
More filters
Journal ArticleDOI

Adjusting batch effects in microarray expression data using empirical Bayes methods

TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Journal ArticleDOI

The Perseus computational platform for comprehensive analysis of (prote)omics data.

TL;DR: The Perseus software platform was developed to support biological and biomedical researchers in interpreting protein quantification, interaction and post-translational modification data and it is anticipated that Perseus's arsenal of algorithms and its intuitive usability will empower interdisciplinary analysis of complex large data sets.
Journal ArticleDOI

Model-Based Clustering, Discriminant Analysis, and Density Estimation

TL;DR: This work reviews a general methodology for model-based clustering that provides a principled statistical approach to important practical questions that arise in cluster analysis, such as how many clusters are there, which clustering method should be used, and how should outliers be handled.
Journal ArticleDOI

Missing value estimation methods for DNA microarrays.

TL;DR: It is shown that KNNimpute appears to provide a more robust and sensitive method for missing value estimation than SVDimpute, and both SVD Impute and KNN Impute surpass the commonly used row average method (as well as filling missing values with zeros).
Journal ArticleDOI

Sparse Principal Component Analysis

TL;DR: This work introduces a new method called sparse principal component analysis (SPCA) using the lasso (elastic net) to produce modified principal components with sparse loadings and shows that PCA can be formulated as a regression-type optimization problem.
References
More filters
Book

Matrix computations

Gene H. Golub
Book

A wavelet tour of signal processing

TL;DR: An introduction to a Transient World and an Approximation Tour of Wavelet Packet and Local Cosine Bases.
Journal ArticleDOI

Cluster analysis and display of genome-wide expression patterns

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI

Quantitative monitoring of gene expression patterns with a complementary DNA microarray.

TL;DR: A high-capacity system was developed to monitor the expression of many genes in parallel by means of simultaneous, two-color fluorescence hybridization, which enabled detection of rare transcripts in probe mixtures derived from 2 micrograms of total cellular messenger RNA.
Book

An Introduction to Multivariate Statistical Analysis

TL;DR: In this article, the distribution of the Mean Vector and the Covariance Matrix and the Generalized T2-Statistic is analyzed. But the distribution is not shown to be independent of sets of Variates.
Related Papers (5)