scispace - formally typeset
Search or ask a question
Author

Michael N. Gould

Bio: Michael N. Gould is an academic researcher from University of Wisconsin-Madison. The author has contributed to research in topics: Carcinogenesis & Perillyl alcohol. The author has an hindex of 49, co-authored 174 publications receiving 9597 citations. Previous affiliations of Michael N. Gould include Morgridge Institute for Research & University of North Carolina at Chapel Hill.


Papers
More filters
Journal ArticleDOI
TL;DR: EBSeq is developed, using the merits of empirical Bayesian methods, for identifying DE isoforms in an RNA-seq experiment comparing two or more biological conditions and proves to be a robust approach for identifying De genes.
Abstract: Motivation: Messenger RNA expression is important in normal development and differentiation, as well as in manifestation of disease. RNA-seq experiments allow for the identification of differentially expressed (DE) genes and their corresponding isoforms on a genome-wide scale. However, statistical methods are required to ensure that accurate identifications are made. A number of methods exist for identifying DE genes, but far fewer are available for identifying DE isoforms. When isoform DE is of interest, investigators often apply gene-level (count-based) methods directly to estimates of isoform counts. Doing so is not recommended. In short, estimating isoform expression is relatively straightforward for some groups of isoforms, but more challenging for others. This results in estimation uncertainty that varies across isoform groups. Count-based methods were not designed to accommodate this varying uncertainty, and consequently, application of them for isoform inference results in reduced power for some classes of isoforms and increased false discoveries for others. Results: Taking advantage of the merits of empirical Bayesian methods, we have developed EBSeq for identifying DE isoforms in an RNA-seq experiment comparing two or more biological conditions. Results demonstrate substantially improved power and performance of EBSeq for identifying DE isoforms. EBSeq also proves to be a robust approach for identifying DE genes. Availability and implementation: An R package containing examples and sample datasets is available at http://www.biostat.wisc.edu/ � kendzior/EBSEQ/.

1,048 citations

Journal ArticleDOI
TL;DR: Inference for most genes is not adversely affected by pooling, and it is recommended that pooling be done when fewer than three arrays are used in each condition, and for larger designs, pooling does not significantly improve inferences if few subjects are pooled.
Abstract: Over 15% of the data sets catalogued in the Gene Expression Omnibus Database involve RNA samples that have been pooled before hybridization. Pooling affects data quality and inference, but the exact effects are not yet known because pooling has not been systematically studied in the context of microarray experiments. Here we report on the results of an experiment designed to evaluate the utility of pooling and the impact on identifying differentially expressed genes. We find that inference for most genes is not adversely affected by pooling, and we recommend that pooling be done when fewer than three arrays are used in each condition. For larger designs, pooling does not significantly improve inferences if few subjects are pooled. The realized benefits in this case do not outweigh the price paid for loss of individual specific information. Pooling is beneficial when many subjects are pooled, provided that independent samples contribute to multiple pools.

486 citations

Journal ArticleDOI
TL;DR: A general empirical Bayes modelling approach which allows for replicate expression profiles in multiple conditions is proposed and used in a study of mammary cancer in the rat, where four distinct patterns of expression are possible.
Abstract: DNA microarrays provide for unprecedented large-scale views of gene expression and, as a result, have emerged as a fundamental measurement tool in the study of diverse biological systems. Statistical questions abound, but many traditional data analytic approaches do not apply, in large part because thousands of individual genes are measured with relatively little replication. Empirical Bayes methods provide a natural approach to microarray data analysis because they can significantly reduce the dimensionality of an inference problem while compensating for relatively few replicates by using information across the array. We propose a general empirical Bayes modelling approach which allows for replicate expression profiles in multiple conditions. The hierarchical mixture model accounts for differences among genes in their average expression levels, differential expression for a given gene among cell types, and measurement fluctuations. Two distinct parameterizations are considered: a model based on Gamma distributed measurements and one based on log-normally distributed measurements. False discovery rate and related operating characteristics of the methodology are assessed in a simulation study. We also show how the posterior odds of differential expression in one version of the model is related to the ratio of the arithmetic mean to the geometric mean of the two sample means. The methodology is used in a study of mammary cancer in the rat, where four distinct patterns of expression are possible. Copyright © 2003 John Wiley & Sons, Ltd.

389 citations

Journal Article
TL;DR: The effects of dietary supplementation of flavonol quercetin on both 7,12-dimethylbenz( a )anthracene (DMBA)- and N -nitrosomethylurea-induced mammary cancer in female Sprague-Dawley rats were determined as discussed by the authors.
Abstract: The effects of dietary supplementation of flavonol quercetin on both 7,12-dimethylbenz( a )anthracene (DMBA)- and N -nitrosomethylurea-induced mammary cancer in female Sprague-Dawley rats were determined. Quercetin diet was started 1 wk before intragastric instillation of DMBA (65 mg/kg of body weight) or i.v. injection of N -nitrosomethylurea (50 mg/kg of body weight) and was continued during the entire period (20 wk) of the experiment. Dietary quercetin inhibited both the incidence and the number of palpable rat mammary tumors; rats fed on 2% quercetin had 25% less incidence of mammary cancer, while the average number of mammary tumors per rat was reduced by 39% at 20 wk post-DMBA administration compared to animals on a control diet. In a separate experiment, a 5% quercetin diet elicited a greater inhibitory effect on the induction of rat mammary tumors by DMBA than was observed with a 2% quercetin diet. The inhibitory effect of quercetin on mammary tumor incidence in rats on 2% and 5% diets and on tumor multiplicity in animals on a 5% diet was statistically significant ( P < 0.05). In addition, the risk of the development of a palpable tumor (as determined by the nonparametric estimate of the hazard function) in the quercetin-fed group was lower than the group on control diet throughout the course of the experiment. Furthermore, 5% dietary quercetin significantly inhibited ( P < 0.05), although to a lesser extent than observed in DMBA-induced tumor formation, both the incidence and the number of palpable mammary tumors per rat induced by N -nitrosomethylurea. Dietary quercetin did not elicit any detectable sign of toxicity. The gain in body weight in rats on the quercetin diet and the quantity of diet consumed per rat per week were similar to those for rats on the control diet.

305 citations

Journal ArticleDOI
TL;DR: This work outlines achievements in rat gene discovery to date, shows how these findings have been translated to human disease, and document an increasing pace of discovery of new disease genes, pathways and mechanisms.
Abstract: The rat is an important system for modeling human disease. Four years ago, the rich 150-year history of rat research was transformed by the sequencing of the rat genome, ushering in an era of exceptional opportunity for identifying genes and pathways underlying disease phenotypes. Genome-wide association studies in human populations have recently provided a direct approach for finding robust genetic associations in common diseases, but identifying the precise genes and their mechanisms of action remains problematic. In the context of significant progress in rat genomic resources over the past decade, we outline achievements in rat gene discovery to date, show how these findings have been translated to human disease, and document an increasing pace of discovery of new disease genes, pathways and mechanisms. Finally, we present a set of principles that justify continuing and strengthening genetic studies in the rat model, and further development of genomic infrastructure for rat research.

296 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

47,038 citations

Posted ContentDOI
17 Nov 2014-bioRxiv
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

17,014 citations

Journal ArticleDOI
TL;DR: The hierarchical model of Lonnstedt and Speed (2002) is developed into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples and the moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom.
Abstract: The problem of identifying differentially expressed genes in designed microarray experiments is considered. Lonnstedt and Speed (2002) derived an expression for the posterior odds of differential expression in a replicated two-color experiment using a simple hierarchical parametric model. The purpose of this paper is to develop the hierarchical model of Lonnstedt and Speed (2002) into a practical approach for general microarray experiments with arbitrary numbers of treatments and RNA samples. The model is reset in the context of general linear models with arbitrary coefficients and contrasts of interest. The approach applies equally well to both single channel and two color microarray experiments. Consistent, closed form estimators are derived for the hyperparameters in the model. The estimators proposed have robust behavior even for small numbers of arrays and allow for incomplete data arising from spot filtering or spot quality weights. The posterior odds statistic is reformulated in terms of a moderated t-statistic in which posterior residual standard deviations are used in place of ordinary standard deviations. The empirical Bayes approach is equivalent to shrinkage of the estimated sample variances towards a pooled estimate, resulting in far more stable inference when the number of arrays is small. The use of moderated t-statistics has the advantage over the posterior odds that the number of hyperparameters which need to estimated is reduced; in particular, knowledge of the non-null prior for the fold changes are not required. The moderated t-statistic is shown to follow a t-distribution with augmented degrees of freedom. The moderated t inferential approach extends to accommodate tests of composite null hypotheses through the use of moderated F-statistics. The performance of the methods is demonstrated in a simulation study. Results are presented for two publicly available data sets.

11,864 citations

Journal ArticleDOI
TL;DR: The factors underlying the influence of the different classes of polyphenols in enhancing their resistance to oxidation are discussed and support the contention that the partition coefficients of the flavonoids as well as their rates of reaction with the relevant radicals define the antioxidant activities in the lipophilic phase.

8,513 citations

Journal ArticleDOI
TL;DR: This protocol provides a workflow for genome-independent transcriptome analysis leveraging the Trinity platform and presents Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes.
Abstract: De novo assembly of RNA-seq data enables researchers to study transcriptomes without the need for a genome sequence; this approach can be usefully applied, for instance, in research on 'non-model organisms' of ecological and evolutionary importance, cancer samples or the microbiome. In this protocol we describe the use of the Trinity platform for de novo transcriptome assembly from RNA-seq data in non-model organisms. We also present Trinity-supported companion utilities for downstream applications, including RSEM for transcript abundance estimation, R/Bioconductor packages for identifying differentially expressed transcripts across samples and approaches to identify protein-coding genes. In the procedure, we provide a workflow for genome-independent transcriptome analysis leveraging the Trinity platform. The software, documentation and demonstrations are freely available from http://trinityrnaseq.sourceforge.net. The run time of this protocol is highly dependent on the size and complexity of data to be analyzed. The example data set analyzed in the procedure detailed herein can be processed in less than 5 h.

6,369 citations