scispace - formally typeset

Journal ArticleDOI

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

05 Dec 2014-Genome Biology (BioMed Central)-Vol. 15, Iss: 12, pp 550-550

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.

AbstractIn comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html .

...read more

Content maybe subject to copyright    Report


Citations
More filters
Journal ArticleDOI
TL;DR: This work presents HTSeq, a Python library to facilitate the rapid development of custom scripts for high-throughput sequencing data analysis, and presents htseq-count, a tool developed with HTSequ that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes.
Abstract: Motivation: A large choice of tools exists for many standard tasks in the analysis of high-throughput sequencing (HTS) data. However, once a project deviates from standard workflows, custom scripts are needed. Results: We present HTSeq, a Python library to facilitate the rapid development of such scripts. HTSeq offers parsers for many common data formats in HTS projects, as well as classes to represent data, such as genomic coordinates, sequences, sequencing reads, alignments, gene model information and variant calls, and provides data structures that allow for querying via genomic coordinates. We also present htseq-count, a tool developed with HTSeq that preprocesses RNA-Seq data for differential expression analysis by counting the overlap of reads with genes. Availability and implementation: HTSeq is released as an opensource software under the GNU General Public Licence and available from http://www-huber.embl.de/HTSeq or from the Python Package Index at https://pypi.python.org/pypi/HTSeq. Contact: sanders@fs.tum.de

11,833 citations

Journal ArticleDOI
13 Jun 2019-Cell
TL;DR: A strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.
Abstract: Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function. Here, we develop a strategy to "anchor" diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities. After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations. Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns. Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.

3,853 citations


Cites methods from "Moderated estimation of fold change..."

  • ...To identify differentially-expressed genes between the CD69+ and CD69- sorted populations, we used DESeq2 [Love et al., 2014] and filtered for significant genes with a log2-fold change in expression greater than 1.5 and a q-value of less than 0.01 [Storey and Tibshirani, 2003]....

    [...]

  • ...To identify differentially-expressed genes between the CD69+ and CD69- sorted populations, we used DESeq2 [Love et al., 2014] and filtered for significant genes with a log2-fold change in expression greater than 1....

    [...]

Journal ArticleDOI
TL;DR: This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts.
Abstract: High-throughput sequencing of mRNA (RNA-seq) has become the standard method for measuring and comparing the levels of gene expression in a wide variety of species and conditions. RNA-seq experiments generate very large, complex data sets that demand fast, accurate and flexible software to reduce the raw read data to comprehensible results. HISAT (hierarchical indexing for spliced alignment of transcripts), StringTie and Ballgown are free, open-source software tools for comprehensive analysis of RNA-seq experiments. Together, they allow scientists to align reads to a genome, assemble transcripts including novel splice variants, compute the abundance of these transcripts in each sample and compare experiments to identify differentially expressed genes and transcripts. This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts. The protocol's execution time depends on the computing resources, but it typically takes under 45 min of computer time. HISAT, StringTie and Ballgown are available from http://ccb.jhu.edu/software.shtml.

2,234 citations

Journal ArticleDOI
28 May 2020-Cell
TL;DR: It is proposed that reduced innate antiviral defenses coupled with exuberant inflammatory cytokine production are the defining and driving features of COVID-19.
Abstract: Viral pandemics, such as the one caused by SARS-CoV-2, pose an imminent threat to humanity. Because of its recent emergence, there is a paucity of information regarding viral behavior and host response following SARS-CoV-2 infection. Here we offer an in-depth analysis of the transcriptional response to SARS-CoV-2 compared with other respiratory viruses. Cell and animal models of SARS-CoV-2 infection, in addition to transcriptional and serum profiling of COVID-19 patients, consistently revealed a unique and inappropriate inflammatory response. This response is defined by low levels of type I and III interferons juxtaposed to elevated chemokines and high expression of IL-6. We propose that reduced innate antiviral defenses coupled with exuberant inflammatory cytokine production are the defining and driving features of COVID-19.

2,083 citations


Cites background or methods from "Moderated estimation of fold change..."

  • ...1.10 Ilumina http://basespace.illumina.com/ dashboard DESeq2 Love et al., 2014 https://bioconductor.org/packages/ release/bioc/html/DESeq2.html STRING Szklarczyk et al., 2019 https://string-db.org/ gplots CRAN https://cran.r-project.org/web/ packages/gplots/index.html PMA Witten et al., 2009 https://cran.r-project.org/web/ packages/PMA/index.html ggplot2 Tidyverse https://ggplot2.tidyverse.org/ Bowtie2 Langmead and Salzberg, 2012 http://bowtie-bio.sourceforge.net/ bowtie2/index.shtml ImmGen Yoshida et al., 2019 http://www.immgen.org/ ll...

    [...]

  • ...1.10 Ilumina http://basespace.illumina.com/ dashboard DESeq2 Love et al., 2014 https://bioconductor.org/packages/ release/bioc/html/DESeq2.html STRING Szklarczyk et al., 2019 https://string-db.org/ gplots CRAN https://cran.r-project.org/web/ packages/gplots/index.html PMA Witten et al., 2009…...

    [...]

  • ...Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2....

    [...]

  • ...Raw reads were aligned to the human genome (hg19) using the RNA-Seq Aligment App on Basespace (Illumina, CA), following differential expression analysis using DESeq2 (Love et al., 2014)....

    [...]

Journal ArticleDOI
Peter Bailey1, David K. Chang2, Katia Nones1, Katia Nones3, Amber L. Johns4, Ann-Marie Patch3, Ann-Marie Patch1, Marie-Claude Gingras5, David Miller4, David Miller1, Angelika N. Christ1, Timothy J. C. Bruxner1, Michael C.J. Quinn1, Michael C.J. Quinn3, Craig Nourse2, Craig Nourse1, Murtaugh Lc6, Ivon Harliwong1, Senel Idrisoglu1, Suzanne Manning1, Ehsan Nourbakhsh1, Shivangi Wani1, Shivangi Wani3, J. Lynn Fink1, Oliver Holmes3, Oliver Holmes1, Chin4, Matthew J. Anderson1, Stephen H. Kazakoff3, Stephen H. Kazakoff1, Conrad Leonard1, Conrad Leonard3, Felicity Newell1, Nicola Waddell1, Scott Wood1, Scott Wood3, Qinying Xu3, Qinying Xu1, Peter J. Wilson1, Nicole Cloonan3, Nicole Cloonan1, Karin S. Kassahn7, Karin S. Kassahn1, Karin S. Kassahn8, Darrin Taylor1, Kelly Quek1, Alan J. Robertson1, Lorena Pantano9, Laura Mincarelli2, Luis Navarro Sanchez2, Lisa Evers2, Jianmin Wu4, Mark Pinese4, Mark J. Cowley4, Jones2, Jones4, Emily K. Colvin4, Adnan Nagrial4, Emily S. Humphrey4, Lorraine A. Chantrill10, Lorraine A. Chantrill4, Amanda Mawson4, Jeremy L. Humphris4, Angela Chou4, Angela Chou11, Marina Pajic12, Marina Pajic4, Christopher J. Scarlett4, Christopher J. Scarlett13, Andreia V. Pinho4, Marc Giry-Laterriere4, Ilse Rooman4, Jaswinder S. Samra14, James G. Kench4, James G. Kench15, James G. Kench16, Jessica A. Lovell4, Neil D. Merrett12, Christopher W. Toon4, Krishna Epari17, Nam Q. Nguyen18, Andrew Barbour19, Nikolajs Zeps20, Kim Moran-Jones2, Nigel B. Jamieson2, Janet Graham21, Janet Graham2, Fraser Duthie22, Karin A. Oien4, Karin A. Oien22, Hair J22, Robert Grützmann23, Anirban Maitra24, Christine A. Iacobuzio-Donahue25, Christopher L. Wolfgang26, Richard A. Morgan26, Rita T. Lawlor, Corbo, Claudio Bassi, Borislav Rusev, Paola Capelli27, Roberto Salvia, Giampaolo Tortora, Debabrata Mukhopadhyay28, Gloria M. Petersen28, Munzy Dm5, William E. Fisher5, Saadia A. Karim, Eshleman26, Ralph H. Hruban26, Christian Pilarsky23, Jennifer P. Morton, Owen J. Sansom2, Aldo Scarpa27, Elizabeth A. Musgrove2, Ulla-Maja Bailey2, Oliver Hofmann2, Oliver Hofmann9, R. L. Sutherland4, David A. Wheeler5, Anthony J. Gill15, Anthony J. Gill4, Richard A. Gibbs5, John V. Pearson3, John V. Pearson1, Andrew V. Biankin, Sean M. Grimmond29, Sean M. Grimmond1, Sean M. Grimmond2 
03 Mar 2016-Nature
TL;DR: Detailed genomic analysis of 456 pancreatic ductal adenocarcinomas identified 32 recurrently mutated genes that aggregate into 10 pathways: KRAS, TGF-β, WNT, NOTCH, ROBO/SLIT signalling, G1/S transition, SWI-SNF, chromatin modification, DNA repair and RNA processing.
Abstract: Integrated genomic analysis of 456 pancreatic ductal adenocarcinomas identified 32 recurrently mutated genes that aggregate into 10 pathways: KRAS, TGF-β, WNT, NOTCH, ROBO/SLIT signalling, G1/S transition, SWI-SNF, chromatin modification, DNA repair and RNA processing. Expression analysis defined 4 subtypes: (1) squamous; (2) pancreatic progenitor; (3) immunogenic; and (4) aberrantly differentiated endocrine exocrine (ADEX) that correlate with histopathological characteristics. Squamous tumours are enriched for TP53 and KDM6A mutations, upregulation of the TP63∆N transcriptional network, hypermethylation of pancreatic endodermal cell-fate determining genes and have a poor prognosis. Pancreatic progenitor tumours preferentially express genes involved in early pancreatic development (FOXA2/3, PDX1 and MNX1). ADEX tumours displayed upregulation of genes that regulate networks involved in KRAS activation, exocrine (NR5A2 and RBPJL), and endocrine differentiation (NEUROD1 and NKX2-2). Immunogenic tumours contained upregulated immune networks including pathways involved in acquired immune suppression. These data infer differences in the molecular evolution of pancreatic cancer subtypes and identify opportunities for therapeutic development.

1,820 citations


References
More filters
Journal ArticleDOI
Abstract: SUMMARY The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferronitype procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

71,936 citations


"Moderated estimation of fold change..." refers methods in this paper

  • ...TheWald test P values from the subset of genes that pass an independent filtering step, described in the next section, are adjusted for multiple testing using the procedure of Benjamini and Hochberg [21]....

    [...]

  • ...The Wald test p-values from the subset of genes that pass an independent filtering step, described in the next section, are adjusted for multiple testing using the procedure of Benjamini and Hochberg [20]....

    [...]

  • ...For all algorithms returning P values, the P values from genes with non-zero sum of read counts across samples were adjusted using the Benjamini–Hochberg procedure [21]....

    [...]

  • ...TheWald test P values from the subset of genes that pass the independent filtering step are adjusted for multiple testing using the procedure of Benjamini and Hochberg [21]....

    [...]

  • ...The Wald test p-values from the subset of genes which pass the independent filtering step are adjusted for multiple testing using the procedure of Benjamini and Hochberg [20]....

    [...]

Book
01 Jan 1983
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,204 citations

Journal ArticleDOI
Abstract: Summary: It is expected that emerging digital gene expression (DGE) technologies will overtake microarray technologies in the near future for many functional genomics applications. One of the fundamental data analysis tasks, especially for gene expression studies, involves determining whether there is evidence that counts for a transcript or exon are significantly different across experimental conditions. edgeR is a Bioconductor software package for examining differential expression of replicated count data. An overdispersed Poisson model is used to account for both biological and technical variability. Empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference. The methodology can be used even with the most minimal levels of replication, provided at least one phenotype or experimental condition is replicated. The software may have other applications beyond sequencing data, such as proteome peptide count data. Availability: The package is freely available under the LGPL licence from the Bioconductor web site (http://bioconductor.org).

21,575 citations


"Moderated estimation of fold change..." refers methods in this paper

  • ...The Negative Binomial based approaches compared were DESeq (old) [4], edgeR [32], edgeR with the robust option [33], DSS [6] and EBSeq [34]....

    [...]

Book
28 Jul 2013
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

18,981 citations