Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
read more
Citations
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2
Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown
Normalization and microbial differential abundance strategies depend upon data characteristics
Cohesin Loss Eliminates All Loop Domains
Shotgun metagenomics, from sampling to analysis
References
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Handbook of Mathematical Functions
edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.
Generalized Linear Models
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
Related Papers (5)
Frequently Asked Questions (15)
Q2. What was used to compare a hierarchical clustering with the true cluster membership?
The adjusted Rand index [37] was used to compare a hierarchical clustering based on various distances with the true cluster membership.
Q3. What is the main reason why the methods that treat each gene separately suffer from lack of power?
Inferential methods that treat each gene separately suffer here from lack of power, due to the high uncertainty of within-group variance estimates.
Q4. What is the disadvantage of the rlog transformation with respect to the VST?
A disadvantage of the rlog transformation with respect to the VST is, however, that the ordering of genes within a sample will change if neighboring genes undergo shrinkage of different strength.
Q5. What is the way to remove outliers from subsequent analysis?
By default, outliers in conditions with six or fewer replicates cause the whole gene to be flagged and removed from subsequent analysis, including P value adjustment for multiple testing.
Q6. Why is the Wald test used in multiple testing?
Due to the large number of tests performed in the analysis of RNA-seq and other genome-wide experiments, the multiple testing problem needs to be addressed.
Q7. What are the use cases of DESeq2?
Its use cases are not limited to RNA-seq data or other transcriptomics assays; rather, many kinds of high-throughput count data can be used.
Q8. What was the expected result of the permutation-based SAMseq method?
It was expected that the permutation-based SAMseq method would rarely produce adjusted P value < 0.1 in the evaluation set, because the three vs three comparison does not enable enough permutations.
Q9. How can the authors extend the approach used in DESeq2 to isoform specific analysis?
In addition, the approach used in DESeq2 can be extended to isoformspecific analysis, either through generalized linear modeling at the exon level with a gene-specific mean as in the DEXSeq package [30] or through counting evidence for alternative isoforms in splice graphs [31,32].
Q10. What is the way to replace an outlier?
As the outlier is replaced with the value predicted by the null hypothesis of no differential expression, this is a more conservative choice than simply omitting the outlier.
Q11. What is the significance of a thresholded test?
Figure 4A demonstrates how such a thresholded test gives rise to a curved decision boundary: to reach significance, the estimated LFC has to exceed the specified threshold by an amount that depends on the available information.
Q12. What is the difference between the LFC estimates and the mean?
the estimates are more evenly spread around zero, and for very weakly expressed genes (with less than one read per sample on average), LFCs hardly deviate from zero, reflecting that accurate LFC estimates are not possible here.
Q13. What is the null hypothesis for DESeq2?
if any biological processes are genuinely affected by the difference in experimental treatment, this null hypothesis implies that the gene under consideration is perfectly decoupled from these processes.
Q14. What is the difference between the rlog transformation and the VST?
Figure 5 provides diagnostic plots of the normalized counts under the ordinary logarithm with a pseudocount of 1 and the rlog transformation, showing that the rlog both stabilizes the variance through the range of the mean of counts and helps to find meaningful patterns in the data.
Q15. What can be done to improve the results of DESeq2?
if estimates for average transcript length are available for the conditions, these can be incorporated into the DESeq2 framework as gene- and sample-specific normalization factors.