scispace - formally typeset
Open AccessJournal ArticleDOI

Adjusting batch effects in microarray expression data using empirical Bayes methods

W. Evan Johnson, +2 more
- 01 Jan 2007 - 
- Vol. 8, Iss: 1, pp 118-127
Reads0
Chats0
TLDR
This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Abstract
SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Integrating single-cell transcriptomic data across different conditions, technologies, and species.

TL;DR: An analytical strategy for integrating scRNA-seq data sets based on common sources of variation is introduced, enabling the identification of shared populations across data sets and downstream comparative analysis.
Journal ArticleDOI

GSVA: gene set variation analysis for microarray and RNA-seq data.

TL;DR: This work introduces Gene Set Variation Analysis (GSVA), a GSE method that estimates variation of pathway activity over a sample population in an unsupervised manner and constitutes a starting point to build pathway-centric models of biology.
Journal ArticleDOI

The consensus molecular subtypes of colorectal cancer

TL;DR: An international consortium dedicated to large-scale data sharing and analytics across expert groups is formed, showing marked interconnectivity between six independent classification systems coalescing into four consensus molecular subtypes (CMSs) with distinguishing features.
Journal ArticleDOI

The sva package for removing batch effects and other unwanted variation in high-throughput experiments

TL;DR: The sva package is described, which supports surrogate variable estimation with the sva function, direct adjustment for known batch effects with the ComBat function and adjustment for batch and latent variables in prediction problems with the fsva function.
References
More filters
Journal ArticleDOI

Significance analysis of microarrays applied to the ionizing radiation response

TL;DR: A method that assigns a score to each gene on the basis of change in gene expression relative to the standard deviation of repeated measurements is described, suggesting that this repair pathway for UV-damaged DNA might play a previously unrecognized role in repairing DNA damaged by ionizing radiation.
Journal ArticleDOI

Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments

TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Journal ArticleDOI

Exploration, normalization, and summaries of high density oligonucleotide array probe level data

TL;DR: There is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities, and the exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values.
Journal ArticleDOI

Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation

TL;DR: This article proposes normalization methods that are based on robust local regression and account for intensity and spatial dependence in dye biases for different types of cDNA microarray experiments.
Journal ArticleDOI

Empirical Bayes analysis of a microarray experiment

TL;DR: A simple nonparametric empirical Bayes model is introduced, which is used to guide the efficient reduction of the data to a single summary statistic per gene, and also to make simultaneous inferences concerning which genes were affected by the radiation.
Related Papers (5)