scispace - formally typeset
Search or ask a question
JournalISSN: 1465-4644

Biostatistics 

Oxford University Press
About: Biostatistics is an academic journal published by Oxford University Press. The journal publishes majorly in the area(s): Covariate & Estimator. It has an ISSN identifier of 1465-4644. Over the lifetime, 1269 publications have been published receiving 84407 citations. The journal is also known as: biometry & biometrics.


Papers
More filters
Journal ArticleDOI
TL;DR: There is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities, and the exploratory data analyses of the probe level data motivate a new summary measure that is a robust multi-array average (RMA) of background-adjusted, normalized, and log-transformed PM values.
Abstract: SUMMARY In this paper we report exploratory analyses of high-density oligonucleotide array data from the Affymetrix GeneChip R � system with the objective of improving upon currently used measures of gene expression. Our analyses make use of three data sets: a small experimental study consisting of five MGU74A mouse GeneChip R � arrays, part of the data from an extensive spike-in study conducted by Gene Logic and Wyeth’s Genetics Institute involving 95 HG-U95A human GeneChip R � arrays; and part of a dilution study conducted by Gene Logic involving 75 HG-U95A GeneChip R � arrays. We display some familiar features of the perfect match and mismatch probe ( PM and MM )v alues of these data, and examine the variance–mean relationship with probe-level data from probes believed to be defective, and so delivering noise only. We explain why we need to normalize the arrays to one another using probe level intensities. We then examine the behavior of the PM and MM using spike-in data and assess three commonly used summary measures: Affymetrix’s (i) average difference (AvDiff) and (ii) MAS 5.0 signal, and (iii) the Li and Wong multiplicative model-based expression index (MBEI). The exploratory data analyses of the probe level data motivate a new summary measure that is a robust multiarray average (RMA) of background-adjusted, normalized, and log-transformed PM values. We evaluate the four expression summary measures using the dilution study data, assessing their behavior in terms of bias, variance and (for MBEI and RMA) model fit. Finally, we evaluate the algorithms in terms of their ability to detect known levels of differential expression using the spike-in data. We conclude that there is no obvious downside to using RMA and attaching a standard error (SE) to this quantity using a linear model which removes probe-specific affinities. ∗ To whom correspondence should be addressed

10,711 citations

Journal ArticleDOI
TL;DR: This paper proposed parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples.
Abstract: SUMMARY Non-biological experimental variation or “batch effects” are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and non-parametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice. Software for our method is freely available at: http://biosun1.harvard.edu/complab/batch/.

6,319 citations

Journal ArticleDOI
TL;DR: Using a coordinate descent procedure for the lasso, a simple algorithm is developed that solves a 1000-node problem in at most a minute and is 30-4000 times faster than competing methods.
Abstract: We consider the problem of estimating sparse graphs by a lasso penalty applied to the inverse covariance matrix. Using a coordinate descent procedure for the lasso, we develop a simple algorithm--the graphical lasso--that is remarkably fast: It solves a 1000-node problem ( approximately 500,000 parameters) in at most a minute and is 30-4000 times faster than competing methods. It also provides a conceptual link between the exact problem and the approximation suggested by Meinshausen and Buhlmann (2006). We illustrate the method on some cell-signaling data from proteomics.

5,577 citations

Journal ArticleDOI
TL;DR: A modification ofbinary segmentation is developed, which is called circular binary segmentation, to translate noisy intensity measurements into regions of equal copy number in DNA sequence copy number.
Abstract: DNA sequence copy number is the number of copies of DNA at a region of a genome. Cancer progression often involves alterations in DNA copy number. Newly developed microarray technologies enable simultaneous measurement of copy number at thousands of sites in a genome. We have developed a modification of binary segmentation, which we call circular binary segmentation, to translate noisy intensity measurements into regions of equal copy number. The method is evaluated by simulation and is demonstrated on cell line data with known copy number alterations and on a breast cancer cell line data set.

2,269 citations

Journal ArticleDOI
TL;DR: A penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix, and establishes connections between the SCoTLASS method for sparse principal component analysis and the method of Zou and others (2006).
Abstract: SUMMARY We present a penalized matrix decomposition (PMD), a new framework for computing a rank-K approximation for a matrix. We approximate the matrix X as ˆ X = � K=1 dkukv T , where dk, uk, and

1,540 citations

Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
202317
202267
202198
202087
201958
201845