LSimpute: accurate estimation of missing values in microarray data with least squares methods.
Reads0
Chats0
TLDR
Novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays are presented.Abstract:
Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortunately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm.read more
Citations
More filters
Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling
TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
Journal ArticleDOI
Activation of IFN pathways and plasmacytoid dendritic cell recruitment in target organs of primary Sjögren’s syndrome
Jacques-Eric Gottenberg,Nicolas Cagnard,Carlo Lucchesi,Franck Letourneur,Sylvie Mistou,Thierry Lazure,Sébastien Jacques,Nathalie Ba,Marc Ittah,Christine Lepajolec,Marc Labetoulle,M. Ardizzone,Jean Sibilia,Catherine Fournier,Gilles Chiocchia,Xavier Mariette +15 more
TL;DR: The results support the pathogenic interaction between the innate and adaptive immune system in pSS, and the persistence of the IFN signature might be related to a vicious circle, in which the environment interacts with genetic factors to drive the stimulation of salivary TLRs.
Journal ArticleDOI
Missing value estimation for DNA microarray gene expression data: local least squares imputation
TL;DR: Imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process.
Journal ArticleDOI
Gene expression profiling of minor salivary glands clearly distinguishes primary Sjögren's syndrome patients from healthy control subjects.
TL;DR: In this article, a 16K complementary DNA microarray was used to identify gene expression signatures in minor salivary glands (MSGs) from patients with Sjogren's syndrome (SS).
Journal ArticleDOI
Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications.
TL;DR: Some of the challenges associated with integrative analyses are outlined and some preliminary statistical solutions are presented and some new applications of integrated transcriptomic and proteomic analysis to the investigation of post-transcriptional regulation are discussed.
References
More filters
Journal ArticleDOI
Cluster analysis and display of genome-wide expression patterns
TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI
Molecular portraits of human breast tumours
Charles M. Perou,Therese Sørlie,Michael B. Eisen,Matt van de Rijn,Stefanie S. Jeffrey,Christian A. Rees,Jonathan R. Pollack,Douglas T. Ross,Hilde Johnsen,Lars A. Akslen,Øystein Fluge,Alexander Pergamenschikov,Cheryl A. Williams,Shirley Zhu,Per Eystein Lønning,Anne Lise Børresen-Dale,Patrick O. Brown,David Botstein +17 more
TL;DR: Variation in gene expression patterns in a set of 65 surgical specimens of human breast tumours from 42 different individuals were characterized using complementary DNA microarrays representing 8,102 human genes, providing a distinctive molecular portrait of each tumour.
Journal ArticleDOI
Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.
Todd R. Golub,Todd R. Golub,Donna K. Slonim,Pablo Tamayo,Christine Huard,Michelle Gaasenbeek,Jill P. Mesirov,Hilary A. Coller,Mignon L. Loh,James R. Downing,Michael A. Caligiuri,Clara D. Bloomfield,Eric S. Lander +12 more
TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Book
Applied Multivariate Statistical Analysis
R. A. Johnson,Dean W. Wichern +1 more
TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.
Journal ArticleDOI
Applied Multivariate Statistical Analysis.
TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.