scispace - formally typeset
Open AccessJournal ArticleDOI

LSimpute: accurate estimation of missing values in microarray data with least squares methods.

Reads0
Chats0
TLDR
Novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays are presented.
Abstract
Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortunately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm.

read more

Content maybe subject to copyright    Report

Citations
More filters

Singular Value Decomposition for Genome-Wide Expression Data Processing and Modeling

TL;DR: Using singular value decomposition in transforming genome-wide expression data from genes x arrays space to reduced diagonalized "eigengenes" x "eigenarrays" space gives a global picture of the dynamics of gene expression, in which individual genes and arrays appear to be classified into groups of similar regulation and function, or similar cellular state and biological phenotype.
Journal ArticleDOI

Activation of IFN pathways and plasmacytoid dendritic cell recruitment in target organs of primary Sjögren’s syndrome

TL;DR: The results support the pathogenic interaction between the innate and adaptive immune system in pSS, and the persistence of the IFN signature might be related to a vicious circle, in which the environment interacts with genetic factors to drive the stimulation of salivary TLRs.
Journal ArticleDOI

Missing value estimation for DNA microarray gene expression data: local least squares imputation

TL;DR: Imputation methods based on the least squares formulation are proposed to estimate missing values in the gene expression data, which exploit local similarity structures in the data as well as least squares optimization process.
Journal ArticleDOI

Gene expression profiling of minor salivary glands clearly distinguishes primary Sjögren's syndrome patients from healthy control subjects.

TL;DR: In this article, a 16K complementary DNA microarray was used to identify gene expression signatures in minor salivary glands (MSGs) from patients with Sjogren's syndrome (SS).
Journal ArticleDOI

Integrative analysis of transcriptomic and proteomic data: challenges, solutions and applications.

TL;DR: Some of the challenges associated with integrative analyses are outlined and some preliminary statistical solutions are presented and some new applications of integrated transcriptomic and proteomic analysis to the investigation of post-transcriptional regulation are discussed.
References
More filters
Journal ArticleDOI

Cluster analysis and display of genome-wide expression patterns

TL;DR: A system of cluster analysis for genome-wide expression data from DNA microarray hybridization is described that uses standard statistical algorithms to arrange genes according to similarity in pattern of gene expression, finding in the budding yeast Saccharomyces cerevisiae that clustering gene expression data groups together efficiently genes of known similar function.
Journal ArticleDOI

Molecular portraits of human breast tumours

TL;DR: Variation in gene expression patterns in a set of 65 surgical specimens of human breast tumours from 42 different individuals were characterized using complementary DNA microarrays representing 8,102 human genes, providing a distinctive molecular portrait of each tumour.
Journal ArticleDOI

Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

TL;DR: A generic approach to cancer classification based on gene expression monitoring by DNA microarrays is described and applied to human acute leukemias as a test case and suggests a general strategy for discovering and predicting cancer classes for other types of cancer, independent of previous biological knowledge.
Book

Applied Multivariate Statistical Analysis

TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.
Journal ArticleDOI

Applied Multivariate Statistical Analysis.

TL;DR: In this article, the authors present an overview of the basic concepts of multivariate analysis, including matrix algebra and random vectors, as well as a strategy for analyzing multivariate models.
Related Papers (5)