Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

doi:10.1186/S13059-014-0550-8

Home
/
Papers
/
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Journal Article•DOI•

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Michael I. Love¹, Michael I. Love², Wolfgang Huber, Simon Anders•Institutions (2)

Harvard University¹, Max Planck Society²

05 Dec 2014-Genome Biology (BioMed Central)-Vol. 15, Iss: 12, pp 550-550

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

read less

Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Development of Poly(A)-ClickSeq as a Tool Enabling Simultaneous Genome-wide Poly(A)-site identification and Differential Expression Analysis.

[...]

Nathan D. Elrod¹, Elizabeth Jaworski¹, Ping Ji¹, Eric J. Wagner, Andrew Routh - Show less +1 more•Institutions (1)

University of Texas Medical Branch¹

15 Feb 2019-Methods

TL;DR: PAC-seq is shown to be able to accurately and sensitively count transcripts for differential gene expression analysis, as well as identify alternative poly(A) sites and determine the precise nucleotides of the poly (A) tail boundaries.

...read moreread less

22 citations

Journal Article•DOI•

Multi-omics reveals clinically relevant proliferative drive associated with mTOR-MYC-OXPHOS activity in chronic lymphocytic leukemia

[...]

Junyan Lu¹, Junyan Lu², Ester Cannizzaro³, Fabienne Meier-Abt³, Sebastian Scheinost⁴, Peter-Martin Bruch⁵, Peter-Martin Bruch⁶, Peter-Martin Bruch⁴, Holly A. R. Giles¹, Holly A. R. Giles², Almut Lütge³, Jennifer Hüllein⁴, Jennifer Hüllein², Lena Wagner⁴, Brian Giacopelli⁷, Ferran Nadeu⁸, Julio Delgado⁸, Elias Campo⁸, Maurizio Mangolini⁹, Ingo Ringshausen⁹, Martin Böttcher¹⁰, Dimitrios Mougiakakos¹⁰, Andrea Jacobs¹¹, Bernd Bodenmiller¹¹, Sascha Dietrich, Christopher C. Oakes⁷, Thorsten Zenz⁴, Thorsten Zenz³, Wolfgang Huber¹, Wolfgang Huber² - Show less +26 more•Institutions (11)

Molecular Medicine Partnership Unit¹, European Bioinformatics Institute², University of Zurich³, German Cancer Research Center⁴, University Hospital Heidelberg⁵, Heidelberg University⁶, Ohio State University⁷, University of Barcelona⁸, University of Cambridge⁹, University of Erlangen-Nuremberg¹⁰, ETH Zurich¹¹

01 Jul 2021

TL;DR: In this paper, the authors devised a method for simultaneous subgroup discovery across multiple data types and applied it to genomic, transcriptomic, DNA methylation and ex-vivo drug response data from 217 chronic Lymphocytic Leukemia (CLL) cases.

...read moreread less

Abstract: Chronic Lymphocytic Leukemia (CLL) has a complex pattern of driver mutations and much of its clinical diversity remains unexplained. We devised a method for simultaneous subgroup discovery across multiple data types and applied it to genomic, transcriptomic, DNA methylation and ex-vivo drug response data from 217 Chronic Lymphocytic Leukemia (CLL) cases. We uncovered a biological axis of heterogeneity strongly associated with clinical behavior and orthogonal to the known biomarkers. We validated its presence and clinical relevance in four independent cohorts (n=547 patients). We find that this axis captures the proliferative drive (PD) of CLL cells, as it associates with lymphocyte doubling rate, global hypomethylation, accumulation of driver aberrations and response to pro-proliferative stimuli. CLL-PD was linked to the activation of mTOR-MYC-oxidative phosphorylation (OXPHOS) through transcriptomic, proteomic and single cell resolution analysis. CLL-PD is a key determinant of disease outcome in CLL. Our multi-table integration approach may be applicable to other tumors whose inter-individual differences are currently unexplained.

...read moreread less

22 citations

Journal Article•DOI•

Regeneration in the sponge Sycon ciliatum partly mimics postlarval development.

[...]

Anael Soubigou¹, Ethan G. Ross², Yousef Touhami, Nathan A. M. Chrismas, Vengamanaidu Modepalli - Show less +1 more•Institutions (2)

University of Paris¹, University of Southampton²

15 Nov 2020-Development

TL;DR: It is found that sponge regeneration is orchestrated by recruiting pathways similar to those utilized in embryonic development, and the importance of apoptosis in remodelling the primmorphs to initiate re-development is revealed.

...read moreread less

Abstract: Somatic cells dissociated from an adult sponge can re-organize and develop into a juvenile-like sponge, a remarkable phenomenon of regeneration. However, the extent to which regeneration recapitulates embryonic developmental pathways has remained enigmatic. We have standardized and established a sponge Sycon ciliatum regeneration protocol from dissociated cells. From the morphological analysis, we demonstrated that dissociated sponge cells follow a series of morphological events resembling postembryonic development. We performed high-throughput sequencing on regenerating samples and compared the data with regular postlarval development. Our comparative transcriptomic analysis illuminates that sponge regeneration is equally as dynamic as embryogenesis. We find that sponge regeneration is orchestrated by recruiting pathways like those utilized in embryonic development. We further demonstrated that sponge regeneration is accompanied by cell death at early stages, revealing the importance of apoptosis in remodelling the primmorphs to initiate re-development. Since sponges are likely to be the first branch of extant multicellular animals, we suggest that this system can be explored to study the genetic features underlying the evolution of multicellularity and regeneration.

...read moreread less

22 citations

Journal Article•DOI•

Gene Expression in Solitary Fibrous Tumors (SFTs) Correlates with Anatomic Localization and NAB2-STAT6 Gene Fusion Variants.

[...]

Matthias Bieg¹, Evgeny A. Moskalev², Rainer Will³, Simone Hebele², Matthias Schwarzbach, Sanja Schmeck, Peter Hohenberger⁴, Jens Jakob⁵, Bernd Kasper⁴, Timo Gaiser⁴, Philip Ströbel⁵, Eva Wardelmann⁶, Udo Kontny⁷, Till Braunschweig⁷, Horia Sirbu², Robert Grützmann², Norbert Meidenbauer², Naveed Ishaque¹, Roland Eils¹, Stefan Wiemann³, Arndt Hartmann², Abbas Agaimy², Karen J. Fritchie⁸, Caterina Giannini⁹, Florian Haller² - Show less +21 more•Institutions (9)

Charité¹, University of Erlangen-Nuremberg², German Cancer Research Center³, Heidelberg University⁴, University of Göttingen⁵, University of Münster⁶, RWTH Aachen University⁷, Mayo Clinic⁸, University of Bologna⁹

23 Jan 2021-American Journal of Pathology

TL;DR: In this article, the authors employed next-generation sequencing-based gene expression profiling to identify significant differences in gene expression associated with anatomic localization and NAB2-STAT6 gene fusion variants.

...read moreread less

Abstract: Solitary fibrous tumors (SFTs) harbor recurrent NAB2-STAT6 gene fusions, promoting constitutional up-regulation of oncogenic early growth response 1 (EGR1)-dependent gene expression. SFTs with the most common canonical NAB2 exon 4–STAT6 exon 2 fusion variant are often located in the thorax (pleuropulmonary) and are less cellular with abundant collagen. In contrast, SFTs with NAB2 exon 6–STAT6 exon 16/17 fusion variants typically display a cellular round to ovoid cell morphology and are often located in the deep soft tissue of the retroperitoneum and intra-abdominal pelvic region or in the meninges. Here, we employed next-generation sequencing–based gene expression profiling to identify significant differences in gene expression associated with anatomic localization and NAB2-STAT6 gene fusion variants. SFTs with the NAB2 exon 4–STAT6 exon 2 fusion variant showed a transcriptional signature enriched for genes involved in DNA binding, gene transcription, and nuclear localization, whereas SFTs with the NAB2 exon 6–STAT6 exon 16/17 fusion variants were enriched for genes involved in tyrosine kinase signaling, cell proliferation, and cytoplasmic localization. Specific transcription factor binding motifs were enriched among differentially expressed genes in SFTs with different fusion variants, implicating co–transcription factors in the modification of chimeric NGFI-A binding protein 2 (NAB2)-STAT6–dependent deregulation of EGR1-dependent gene expression. In summary, this study establishes a potential molecular biologic basis for clinicopathologic differences in SFTs with distinct NAB2-STAT6 gene fusion variants.

...read moreread less

22 citations

Journal Article•DOI•

ALDH1A3 Coordinates Metabolism With Gene Regulation in Pulmonary Arterial Hypertension.

[...]

Dan Li¹, Ning-Yi Shao², Ning-Yi Shao¹, Jan-Renier A. J. Moonen¹, Zhixin Zhao¹, Minyi Shi¹, Shoichiro Otsuki¹, Lingli Wang¹, Tiffany Nguyen¹, Elaine Yan¹, David Marciano¹, Kévin Contrepois¹, Caiyun G. Li¹, Joseph C. Wu¹, Michael Snyder¹, Marlene Rabinovitch¹ - Show less +12 more•Institutions (2)

Stanford University¹, University of Macau²

25 May 2021-Circulation

TL;DR: In this article, metabolic alterations provide substrates that influence chromatin structure to regulate gene expression that determines cell function in health and disease, and increased proliferation of cell function is reported.

...read moreread less

Abstract: Background: Metabolic alterations provide substrates that influence chromatin structure to regulate gene expression that determines cell function in health and disease. Heightened proliferation of ...

...read moreread less

22 citations

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Controlling the false discovery rate: a practical and powerful approach to multiple testing

[...]

Yoav Benjamini, Yosef Hochberg

01 Jan 1995-Journal of the royal statistical society series b-methodological

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.

...read moreread less

Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

...read moreread less

83,420 citations

"Moderated estimation of fold change..." refers methods in this paper

...TheWald test P values from the subset of genes that pass an independent filtering step, described in the next section, are adjusted for multiple testing using the procedure of Benjamini and Hochberg [21]....
[...]
...The Wald test p-values from the subset of genes that pass an independent filtering step, described in the next section, are adjusted for multiple testing using the procedure of Benjamini and Hochberg [20]....
[...]
...For all algorithms returning P values, the P values from genes with non-zero sum of read counts across samples were adjusted using the Benjamini–Hochberg procedure [21]....
[...]
...TheWald test P values from the subset of genes that pass the independent filtering step are adjusted for multiple testing using the procedure of Benjamini and Hochberg [21]....
[...]
...The Wald test p-values from the subset of genes which pass the independent filtering step are adjusted for multiple testing using the procedure of Benjamini and Hochberg [20]....
[...]

Journal Article•DOI•

Handbook of Mathematical Functions

[...]

Milton Abramowitz, Irene A. Stegun, Donald A. McQuarrie

01 Feb 1966-American Journal of Physics

46,339 citations

Journal Article•DOI•

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

[...]

Mark D. Robinson¹, Davis J. McCarthy¹, Gordon K. Smyth¹•Institutions (1)

Walter and Eliza Hall Institute of Medical Research¹

01 Jan 2010-Bioinformatics

TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.

...read moreread less

Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

...read moreread less

29,413 citations

"Moderated estimation of fold change..." refers methods in this paper

...The Negative Binomial based approaches compared were DESeq (old) [4], edgeR [32], edgeR with the robust option [33], DSS [6] and EBSeq [34]....
[...]

Book•

Generalized Linear Models

[...]

Peter McCullagh¹, John A. Nelder•Institutions (1)

Imperial College London¹

01 Jan 1983

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).

...read moreread less

Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

...read moreread less

23,215 citations

Book•

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

[...]

Trevor Hastie¹, Robert Tibshirani, Jerome H. Friedman•Institutions (1)

University of New South Wales¹

28 Jul 2013

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.

...read moreread less

Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

...read moreread less

19,261 citations