scispace - formally typeset
Search or ask a question
Author

Kristen J. Antonellis

Bio: Kristen J. Antonellis is an academic researcher from Johns Hopkins University. The author has contributed to research in topics: Affymetrix GeneChip Operating Software & Normalization (statistics). The author has an hindex of 2, co-authored 2 publications receiving 10255 citations.

Papers
More filters
Journal ArticleDOI
TL;DR: There is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities, and the exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values.
Abstract: SUMMARY In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip R � system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip R � arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth’s Genetics Institute involving 95 HG-U95A human GeneChip R � arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip R � arrays. We display some familiar features of the perfect match and mismatch probe ( PM and MM )v alues of these data, and examine the variance–mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike-in data and assess three commonly used summary measures: Affymetrix’s (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multiarray average (RMA) of background-adjusted, normalized, and log-transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities. ∗ To whom correspondence should be addressed

10,711 citations

Patent
06 Jun 2002
TL;DR: The authors proposed methods of normalizing hybridization reactions that are designed to select normalization control genes, specifically 5'-3'-, and middle portions of the these genes, that hybridize similarly to a probe array and that produce the most consistently linear curve of hybridization signal over a range of normalisation control gene segment concentrations.
Abstract: The present invention includes methods of normalizing hybridization reactions that are designed to select normalization control genes, specifically 5'-3'-, and middle portions of the these genes, that hybridize similarly to a probe array and that produce the most consistently linear curve of hybridization signal over a range of normalization control gene segment concentrations. These methods have applicability across a broad spectrum of hybridization formats.

4 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Three methods of performing normalization at the probe intensity level are presented: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure and the simplest and quickest complete data method is found to perform favorably.
Abstract: Motivation: When running experiments that involve multiple high density oligonucleotide arrays, it is important to remove sources of variation between arrays of non-biological origin. Normalization is a process for reducing this variation. It is common to see non-linear relations between arrays and the standard normalization provided by Affymetrix does not perform well in these situations. Results: We present three methods of performing normalization at the probe intensity level. These methods are called complete data methods because they make use of data from all arrays in an experiment to form the normalizing relation. These algorithms are compared to two methods that make use of a baseline array: a one number scaling based algorithm and a method that uses a non-linear normalizing relation by comparing the variability and bias of an expression measure. Two publicly available datasets are used to carry out the comparisons. The simplest and quickest complete data method is found to perform favorably. Availabilty: Software implementing all three of the complete data normalization methods is available as part of the R package Affy, which is a part of the Bioconductor project http://www.bioconductor.org. Contact: bolstad@stat.berkeley.edu Supplementary information: Additional figures may be found at http://www.stat.berkeley.edu/∼bolstad/normalize/ index.html

8,324 citations

Journal ArticleDOI
TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Abstract: SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

6,319 citations

Journal ArticleDOI
TL;DR: A simple and effective method for performing normalization is outlined and dramatically improved results for inferring differential expression in simulated and publicly available data sets are shown.
Abstract: The fine detail provided by sequencing-based transcriptome surveys suggests that RNA-seq is likely to become the platform of choice for interrogating steady state RNA. In order to discover biologically important changes in expression, we show that normalization continues to be an essential step in the analysis. We outline a simple and effective method for performing normalization and show dramatically improved results for inferring differential expression in simulated and publicly available data sets.

6,042 citations

Journal ArticleDOI
TL;DR: It is found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models.
Abstract: High density oligonucleotide array technology is widely used in many areas of biomedical research for quantitative and highly parallel measurements of gene expression. Affymetrix GeneChip arrays are the most popular. In this technology each gene is typically represented by a set of 11–20 pairs of probes. In order to obtain expression measures it is necessary to summarize the probe level data. Using two extensive spike-in studies and a dilution study, we developed a set of tools for assessing the effectiveness of expression measures. We found that the performance of the current version of the default expression measure provided by Affymetrix Microarray Suite can be significantly improved by the use of probe level summaries derived from empirically motivated statistical models. In particular, improvements in the ability to detect differentially expressed genes are demonstrated.

5,119 citations

Journal ArticleDOI
TL;DR: The affy package is an R package of functions and classes for the analysis of oligonucleotide arrays manufactured by Affymetrix that provides the user with extreme flexibility when carrying out an analysis and make it possible to access and manipulate probe intensity data.
Abstract: Motivation: The processing of the Affymetrix GeneChip data has been a recent focus for data analysts. Alternatives to the original procedure have been proposed and some of these new methods are widely used. Results: The affy package is an R package of functions and classes for the analysis of oligonucleotide arrays manufactured by Affymetrix. The package is currently in its second release, affy provides the user with extreme flexibility when carrying out an analysis and make it possible to access and manipulate probe intensity data. In this paper, we present the main classes and functions in the package and demonstrate how they can be used to process probe-level data. We also demonstrate the importance of probe-level analysis when using the Affymetrix GeneChip platform.

4,822 citations