scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

05 Dec 2014-Genome Biology (BioMed Central)-Vol. 15, Iss: 12, pp 550-550
TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract: In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
15 Feb 2019-Methods
TL;DR: PAC-seq is shown to be able to accurately and sensitively count transcripts for differential gene expression analysis, as well as identify alternative poly(A) sites and determine the precise nucleotides of the poly (A) tail boundaries.

22 citations

Journal ArticleDOI
01 Jul 2021
TL;DR: In this paper, the authors devised a method for simultaneous subgroup discovery across multiple data types and applied it to genomic, transcriptomic, DNA methylation and ex-vivo drug response data from 217 chronic Lymphocytic Leukemia (CLL) cases.
Abstract: Chronic Lymphocytic Leukemia (CLL) has a complex pattern of driver mutations and much of its clinical diversity remains unexplained. We devised a method for simultaneous subgroup discovery across multiple data types and applied it to genomic, transcriptomic, DNA methylation and ex-vivo drug response data from 217 Chronic Lymphocytic Leukemia (CLL) cases. We uncovered a biological axis of heterogeneity strongly associated with clinical behavior and orthogonal to the known biomarkers. We validated its presence and clinical relevance in four independent cohorts (n=547 patients). We find that this axis captures the proliferative drive (PD) of CLL cells, as it associates with lymphocyte doubling rate, global hypomethylation, accumulation of driver aberrations and response to pro-proliferative stimuli. CLL-PD was linked to the activation of mTOR-MYC-oxidative phosphorylation (OXPHOS) through transcriptomic, proteomic and single cell resolution analysis. CLL-PD is a key determinant of disease outcome in CLL. Our multi-table integration approach may be applicable to other tumors whose inter-individual differences are currently unexplained.

22 citations

Journal ArticleDOI
TL;DR: It is found that sponge regeneration is orchestrated by recruiting pathways similar to those utilized in embryonic development, and the importance of apoptosis in remodelling the primmorphs to initiate re-development is revealed.
Abstract: Somatic cells dissociated from an adult sponge can re-organize and develop into a juvenile-like sponge, a remarkable phenomenon of regeneration. However, the extent to which regeneration recapitulates embryonic developmental pathways has remained enigmatic. We have standardized and established a sponge Sycon ciliatum regeneration protocol from dissociated cells. From the morphological analysis, we demonstrated that dissociated sponge cells follow a series of morphological events resembling postembryonic development. We performed high-throughput sequencing on regenerating samples and compared the data with regular postlarval development. Our comparative transcriptomic analysis illuminates that sponge regeneration is equally as dynamic as embryogenesis. We find that sponge regeneration is orchestrated by recruiting pathways like those utilized in embryonic development. We further demonstrated that sponge regeneration is accompanied by cell death at early stages, revealing the importance of apoptosis in remodelling the primmorphs to initiate re-development. Since sponges are likely to be the first branch of extant multicellular animals, we suggest that this system can be explored to study the genetic features underlying the evolution of multicellularity and regeneration.

22 citations

Journal ArticleDOI
TL;DR: In this article, the authors employed next-generation sequencing-based gene expression profiling to identify significant differences in gene expression associated with anatomic localization and NAB2-STAT6 gene fusion variants.
Abstract: Solitary fibrous tumors (SFTs) harbor recurrent NAB2-STAT6 gene fusions, promoting constitutional up-regulation of oncogenic early growth response 1 (EGR1)-dependent gene expression. SFTs with the most common canonical NAB2 exon 4–STAT6 exon 2 fusion variant are often located in the thorax (pleuropulmonary) and are less cellular with abundant collagen. In contrast, SFTs with NAB2 exon 6–STAT6 exon 16/17 fusion variants typically display a cellular round to ovoid cell morphology and are often located in the deep soft tissue of the retroperitoneum and intra-abdominal pelvic region or in the meninges. Here, we employed next-generation sequencing–based gene expression profiling to identify significant differences in gene expression associated with anatomic localization and NAB2-STAT6 gene fusion variants. SFTs with the NAB2 exon 4–STAT6 exon 2 fusion variant showed a transcriptional signature enriched for genes involved in DNA binding, gene transcription, and nuclear localization, whereas SFTs with the NAB2 exon 6–STAT6 exon 16/17 fusion variants were enriched for genes involved in tyrosine kinase signaling, cell proliferation, and cytoplasmic localization. Specific transcription factor binding motifs were enriched among differentially expressed genes in SFTs with different fusion variants, implicating co–transcription factors in the modification of chimeric NGFI-A binding protein 2 (NAB2)-STAT6–dependent deregulation of EGR1-dependent gene expression. In summary, this study establishes a potential molecular biologic basis for clinicopathologic differences in SFTs with distinct NAB2-STAT6 gene fusion variants.

22 citations

Journal ArticleDOI
TL;DR: In this article, metabolic alterations provide substrates that influence chromatin structure to regulate gene expression that determines cell function in health and disease, and increased proliferation of cell function is reported.
Abstract: Background: Metabolic alterations provide substrates that influence chromatin structure to regulate gene expression that determines cell function in health and disease. Heightened proliferation of ...

22 citations

References
More filters
Journal ArticleDOI
TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

83,420 citations


"Moderated estimation of fold change..." refers methods in this paper

  • ...TheWald test P values from the subset of genes that pass an independent filtering step, described in the next section, are adjusted for multiple testing using the procedure of Benjamini and Hochberg [21]....

    [...]

  • ...The Wald test p-values from the subset of genes that pass an independent filtering step, described in the next section, are adjusted for multiple testing using the procedure of Benjamini and Hochberg [20]....

    [...]

  • ...For all algorithms returning P values, the P values from genes with non-zero sum of read counts across samples were adjusted using the Benjamini–Hochberg procedure [21]....

    [...]

  • ...TheWald test P values from the subset of genes that pass the independent filtering step are adjusted for multiple testing using the procedure of Benjamini and Hochberg [21]....

    [...]

  • ...The Wald test p-values from the subset of genes which pass the independent filtering step are adjusted for multiple testing using the procedure of Benjamini and Hochberg [20]....

    [...]

Journal ArticleDOI
TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

29,413 citations


"Moderated estimation of fold change..." refers methods in this paper

  • ...The Negative Binomial based approaches compared were DESeq (old) [4], edgeR [32], edgeR with the robust option [33], DSS [6] and EBSeq [34]....

    [...]

Book
01 Jan 1983
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,215 citations

Book
28 Jul 2013
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

19,261 citations