scispace - formally typeset
Open AccessPosted ContentDOI

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

Reads0
Chats0
TLDR
This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Abstract
In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-Seq data, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data. DESeq2 uses shrinkage estimation for dispersions and fold changes to improve stability and interpretability of the estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression and facilitates downstream tasks such as gene ranking and visualization. DESeq2 is available as an R/Bioconductor package.

read more

Content maybe subject to copyright    Report

Love et al. Genome Biology
(2014) 15:550
DOI 10.1186/s13059-014-0550-8
METHO D Open Access
Moderated estimation of fold change and
dispersion for RNA-seq data with DESeq2
Michael I Love
1,2,3
,WolfgangHuber
2
and Simon Anders
2*
Abstract
In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read
counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate
numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We
present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold
changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the
strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.
bioconductor.org/packages/release/bioc/html/DESeq2.html.
Background
The rapid adoption of high-throughput sequencing (HTS)
technologies for genomic studies has resulted in a need
for statistical methods to assess quantitative differences
between experiments. An important task here is the anal-
ysis of RNA sequencing (RNA-seq) data with the aim
of finding genes that are differentially expressed acro s s
groups of samples. This task is general: methods for it are
typically also applicable for other comparative HTS assays,
including chromatin immunoprecipitation sequencing,
chromosome conformation capture, or counting observe d
taxa in metagenomic studies.
Besides the need to account for the s pecifi cs of count
data, such as non-normality and a dep endence of the vari-
ance on the mean, a core challenge is the small number
of samples in typical HTS experiments often a s few as
two or three replicates per condition. Inferential methods
that treat each gene separately suffer here from lack of
power, due to the high uncertainty of within-group vari-
ance estimates. In high-throughput assays, this limitation
can be overcome by po o ling information across genes,
specifically, by exploiting assumptions about the similarity
of the variances of different genes measured in the same
experiment [1].
*Correspondence: sanders@fs.tum.de
2
Genome Biology Unit, European Molecular Biology Laboratory,
Meyerhofstrasse 1, 69117 Heidelberg, Germany
Full list of author information is available at the end of the article
Many methods for differential expression analysis of
RNA-seq data p erform such information sharing across
genes for variance (or, equivalently, dispersion) estima-
tion. edgeR [2,3] moderates the dispersion estimate for
each gene toward a common estimate across all genes, or
toward a local estimate from genes with similar expres-
sion strength, using a weighted conditional likelihood.
Our DESeq method [4] detects and correct s dispersion
estimates that are too low through modeling of the depen-
dence of the dispersion on the average expression strength
over all samples. BBSeq [5] models the dispersion on
the mean, with the mean absolute deviation of disper-
sion estimates used to reduce the influence of outliers.
DSS [6] use s a Bayesian approach to provide an estimate
for the dispersion for individual genes that accounts for
the heterogeneity of dispersion values for different genes .
baySeq [7] and ShrinkBayes [8] e stimate priors for a
Bayesian model over all genes, and then provide posterior
probabilities or false discovery rates (FDRs) for differential
expression.
The most common approach in the comparative anal-
ysis of transcriptomics data is to test the null hypothesis
that the logarithmic fold change (LFC) between treat-
ment and control for a gene’s expression is exactly zero,
i.e., that the gene is not at all affected by the treatment.
Often the goal of differential analysis is to produce a list of
genes passing multiple-test adjustment, ranked by P value.
However, small changes, even if statistically highly signif-
icant, might not be the most interesting candidates for
© 2014 Love et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction
in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Love et al. Genome Biology
(2014) 15:550
Page 2 of 21
further investigation. Ranking by fold change, on the other
hand, is complicated by the noisiness of L FC estimates for
genes with low counts. Furthermore, the number of genes
called significantly differentially expressed depends as
much on the sample size and other aspects of experimen-
tal design as it does on the biology of the experiment
and well-powered experiments often generate an over-
whelmingly long list of hits [9]. We, therefore, developed
a s tatistical framework to facilitate gene ranking and visu-
alization based on stable estimation of effect sizes (LFCs),
as well a s testing of differential expression with respect to
user-defined thresholds of biological significance.
Here we present DESeq2, a successor to our DESeq
method [4]. DESeq2 integrates methodological advances
with several novel features to facilitate a more quantita-
tive analysis of comparative RNA-seq data using shrinkage
estimators for dispersion and fold change. We demon-
strate the advantages of DESeq2s new features by describ-
ing a number of applications possible with shrunken fold
changes and their estimates of standard error, including
improved gene ranking and visualization, hypothesis tests
above and below a threshold, and the regularized loga-
rithm transformation for quality asse ssment and cluster-
ing of overdispersed count data. We furthermore compare
DESeq2s statistical power with existing tools, revealing
that our methodology has high sensitivity and precision,
while controlling the false posi tive rate. DESeq2 is avail-
able [10] as an R/Bioconductor package [11].
Results and discussion
Model and normalization
The s tarting point of a DESeq2 analysis is a count matri x
K with one row for each gene i and one column for each
sample j. The matrix entries K
ij
indicate the number of
sequencing reads that have been unambiguously mapped
to a gene in a sample. Note that although we refer in this
paper to counts of reads in genes, the methods presented
herecanbeappliedaswelltootherkindsofHTScount
data. For each gene, we f it a generalized linear model
(GLM) [12] a s follows.
We model read counts K
ij
as following a negative bino-
mial distribution (sometimes also called a gamma-Poisson
distribution) with mean μ
ij
and dispersion α
i
.Themeanis
taken as a quantity q
ij
, proportional to the concentration
of cDNA fragments from the gene in the sample, scaled by
a normalization factor s
ij
, i.e., μ
ij
= s
ij
q
ij
. For many appli-
cations, the same constant s
j
can be used for all genes in
a sample, which then accounts for differences in sequenc-
ing depth between samples. To estimate these size factors,
the DESeq2 package offers the median-of-ratios method
already used in DESeq [4]. However, it can be adv anta-
geous to calculate gene-specific normalization factors s
ij
to account for further s ources of technic al bia ses such as
differing dependence on GC content, gene length or the
like, using published methods [13,14], and these can be
supplied instead.
We use GLMs with a logarithmic link, log
2
q
ij
=
r
x
jr
β
ir
, w ith design matrix elements x
jr
and co efficients
β
ir
. In the simplest case of a comparison between two
groups, such as treated and control samples, the des ign
matrix elements indicate whether a sample j is treated
or not, and the GLM f it returns coefficients indicating
the overall expression s trength of the gene and the log
2
fold change between treatment and control. The use of
linear models, however, provides the flexibility to also ana-
lyze more complex designs, as is often useful in genomic
studies [15].
Empirical Bayes shrinkage for dispersion estimation
Within-group variability, i.e., the v ariability between repli-
cates, is mo deled by the dispersion parameter α
i
,which
describes the variance of counts via Var K
ij
= μ
ij
+ α
i
μ
2
ij
.
Accurate estimation of the dispe rsion parameter α
i
is crit-
ical for the statistical inference of differential expression.
For studies with large sample sizes this is usually not
a problem. For controlled experiments, however, sample
sizes tend to be smaller (experimental designs with as lit-
tle as two or three replicates are common and reasonable),
resulting in highly variable dispersion estimates for each
gene. If used directly, these noisy estimates would com-
promise the accuracy of differential expression testing.
One sensible solution is to share information across
genes. In DESeq2, we assume that genes of similar aver-
age expression strength have similar dispersion. We here
explain the concepts of our approach using as examples a
dataset by Bottomly et al. [16] with RNA-seq data for mice
of two different strains and a dataset by Pickrell et al. [17]
with RNA-seq data for human lymphoblastoid cell lines.
For the mathematical details, see Methods.
We first treat each gene separately and estimate gene-
wise dispersion estimates (using maximum likelihood),
which rely only on the data of each indiv idual gene
(black dots in Figure 1). Next, we determine the location
parameter of the distr i bution of these estimates; to allow
for dep endence on average expression strength, we fit a
smooth curve, a s shown by the re d line in Figure 1. This
provide s an accurate estimate for the expected dispersion
value for genes of a given expression strength but does not
represent deviations of individual genes from this overall
trend. We then shrink the gene-wise dispersion estimates
toward the values pre dicted by the curve to obtain final
dispersion values (blue arrow heads). We use an empiri-
cal Bayes approach (Methods), which lets the strength of
shrinkage depend (i) on an estimate of how close true dis-
persion values tend to be to the fit and (ii) on the degrees
of freedom: as the sample size increase s, the shrinkage
decreases in strength, and eventually becomes negligi-
ble. Our approach therefore accounts for gene-specific

Love et al. Genome Biology
(2014) 15:550
Page 3 of 21
Figure 1 Shrinkage estimation of dispersion. Plot of dispersion estimates over the average expression strength (A) for the Bottomly et al. [16]
dataset with six samples across two groups and (B) for five samples from the Pickrell et al. [17] dataset, fitting only an intercept term. First, gene-wise
MLEs are obtained using only the respective gene’s data (black dots). Then, a curve (red) is fit to the MLEs to capture the overall trend of
dispersion-mean dependence. This fit is used as a prior mean for a second estimation r ound, which results in the final MAP estimates of dispersion
(arrow heads). This can be understood as a shrinkage (along the blue arrows) of the noisy gene-wise estimates toward the consensus represented
by the red line. The black points c ircled in blue are detected as dispersion outliers and not shrunk toward the prior (shrinkage would follow the
dotted line). For clarity, only a subset of genes is shown, which is enriched for dispersion outliers. Additional file 1: Figure S1 displays the same data
but with dispersions of all genes shown. MAP, maximum a posteriori; MLE, maximum-likelihood estimate.
variation to the extent that the data provide this informa-
tion, while the fitted curve aids estimation and tes ting in
less information-rich settings.
Our approach is similar to the one used by DSS [6],
in that both methods sequentially estimate a prior dis-
tribution for the tr ue dispersion values around the fit,
and then provide the maximum a poster iori (MAP) as
the final estimate. It differs from the previous imple-
mentation of DESe q, which used the maximum of the
fitted curve and the gene-wise dispersion estimate as the
final estimate and tended to overestimate the dispersions
(Additional file 1: Fig ure S2). The approach of DESeq2
differs from that of edgeR [3], a s DESe q2 estimates the
width of the prior distribution f rom the data and there-
fore automatically controls theamountofshrinkagebased
on the observed properties of the data. In contrast, the
default steps in edgeR require a user-adjustable parameter,
the prior degrees of freedom, which weighs the contribu-
tion of the individual gene estimate and edgeRs dispersion
fit.
Note that in Figure 1 a number of genes with gene-
wise dispersion estimates below the curve have their final
estimates raised substantially. The shrinkage procedure
thereby helps avoid potential false positives, which can
result from underestimate s of dispersion. If, on the other
hand, an individual gene’s dispersion is f ar above the dis-
tribution of the gene-wise dispersion e stimates of other
genes, then the shrinkage would lead to a greatly reduced
final estimate of dispersion. We reasoned that in many
cases , the reason for extraordinarily high dispersion of a
gene is that it does not obey our modeling assumptions;
some genes may show much higher variability than others
for biological or technic al reasons, even though they have
the same average expression levels. In the se case s, infer-
ence based on the shrunken disp ersion estimates could
lead to undesirable false positive calls. DESeq2 handles
these cas es by using the gene-wise estimate instead of
the shrunken estimate when the former is more than 2
residual standard deviations above the curv e.
Empirical Bayes shrinkage for fold-change estimation
A common difficulty in the analysis of HTS data is the
strong variance of LFC estimates for genes with low read
count. We demonstrate this issue using the dataset by
Bottomly et al. [16]. As visuali ze d in Figure 2A, weakly
expressed genes seem to show much stronger differ-
ences between the compared mouse strains than s trongly
expressed genes. This phenomenon, seen in most HTS
datasets, is a direct consequence of dealing with count
data, in which ratios are inherently noisier when counts
are low. This heteroskedasticity (variance of LFCs depend-
ing on mean count) complicates downstream analysis and
data interpretation, as it makes effect sizes difficult to
compare across the dynamic range of the data.
DESeq2 overcomes this issue by shrinking LFC esti-
mates toward zero in a manner such that shrinkage is
stronger when the available information for a gene is
low, which may be because counts are low, dispersion
is high or there are few degrees of freedom. We again
employ an empirical Bayes procedure: we f irst perform

Love et al. Genome Biology
(2014) 15:550
Page 4 of 21
Figure 2 Effect of shrinkage on logarithmic fold change estimates. Plots of the (A) MLE (i.e., no shrinkage) and (B) MAP estimate (i.e., with
shrinkage) for the LFCs attributable to mouse strain, over the average expression strength for a ten vs eleven sample comparison of the Bottomly
et al. [16] dataset. Small triangles at the top and bottom of the plots indicate points that would fall outside of the plotting window. Two genes with
similar mean count and MLE logarithmic fold change are highlighted with green and purple circles. (C) The counts (normalized by size factors s
j
) for
these genes reveal low dispersion for the gene in green and high dispersion for the gene in purple. (D) Density plots of the likelihoods (solid lines,
scaled to integrate to 1) and the posteriors (dashed lines) for the green and purple genes and of the prior (solid black line): due to the higher
dispersion of the purple gene, its likelihood is wider and less peaked (indicating less information), and the prior has more influence on its posterior
than for the green gene. The stronger curvature of the green posterior at its maximum translates to a smaller reported standard error for the MAP
LFC estimate (horizontal error bar). adj., adjusted; LFC, logarithmic fold change; MAP, maximum a posteriori; MLE, maximum-likelihood estimate.
ordinar y GLM fit s to obtain maximum-likelihood esti-
mates (MLEs) for the LFCs and then fit a zero-centered
normal dis tribution to the observed distribution of MLEs
over all genes. This distribution is used as a prior on LFCs
in a second round of GLM fits, and the MAP estimates
are kept as final estimates of LFC. Furthermore, a stan-
dard error for each estimate is rep orted, which is deri ved
from the posteriors curvature at its maximum (see
Methods for details ). These shrunken LFCs and their stan-
dard errors are used in the Wald tests for differential
expression described in the next section.
The resulting MAP LFCs are biased toward zero in a
manner that removes the problem of exaggerated LFCs for
low counts. As Figure 2B shows, the strongest LFCs are no
longer exhibited by genes with weakest expression. Rather,
the estimates are more evenlyspreadaroundzero,and
for very weakly expressed genes (with less than one read
per sample on average), LFCs hardly deviate from zero,
reflecting that accurate LFC estimates are not possible
here.
The strength of shrinkage does not depend simply on
the mean count, but rather on the amount of informa-
tion available for the fold change e stimation (as indicated
by the obs erved Fisher information; see Methods). Two
genes with equal expression strength but different dis-
persions will experience a different amount of shrinkage
(Figure 2C,D). The shrinkage of LFC estimates can be
described as a bias-variance trade-off [18]: for genes with
little information for LFC estimation, a reduction of the
strong variance is bought at the cost of accepting a bias
toward zero, and this can result in an overall reduc-
tion in mean squared error, e.g., when comparing to LFC

Love et al. Genome Biology
(2014) 15:550
Page 5 of 21
estimates from a new dataset. Genes with high informa-
tion for LFC estimation will have, in our approach, LFCs
with both low bias and low variance. Furthermore, as the
degrees of freedom increase, and the experiment pro-
vides more information for LFC estimation, the shrunken
estimates will converge to the unshrunken estimates. We
note that other Bayesian efforts toward moderating fold
changes for RNA-seq include hierarchical models [8,19]
and the GFOLD (or generalized fold change) tool [20],
which use s a posterior distribution of LFCs.
The shrunken MAP LFCs offer a more reproducible
quantification of transcriptional differences than standard
MLE LFCs. To demonstrate this, we split the Bottomly
et al. samples equally into two groups, I and II, such that
each group contained a balanced split of the strains, sim-
ulating a scenario where an experiment (samples in group
I) is performed, analyzed and reported, and then indepen-
dently replicated (samples in group II). Within each g roup,
we estimated LFCs betwe en the strains and compared
between groups I and II, using the MLE LFCs (Figure 3A)
and using the MAP LFCs (Fig ure 3B). Because the
shrinkage moves large LFCs that are not well supported
by the data toward zero, the agreement b etween the
two independent sample groups increases considerably.
Therefore, shrunken fold-change estimates offer a more
reliable basis for quantitative conclusions than normal
MLEs.
This makes shrunken LFCs also suitable for ranking
genes, e.g., to prioritize them for follow-up experiments.
For example, if we sort the genes in the two s ample g roups
of Figure 3 by unshrunken LFC estimates, and consider
the 100 genes with the strongest up- or down-regulation
in group I, w e f ind only 21 of these again among the top
100 up- or down-regulated genes in group II. However, if
we rank the genes by shrunken LFC estimates, the overlap
improves to 81 of 100 genes ( Additional file 1: Figure S3).
A simpler often used method is to add a fixed num-
ber (pseudocount) to all counts before forming ratios .
However, this re quire s the choice of a tuning parame-
ter and only reacts to one of the sources of uncertainty,
low counts, but not to gene-specific dispersion differences
or sample size. We demonstrate this in the Benchmarks
section below.
Hypothesis tests for differential expression
After GLMs are fit for each gene, one may test whether
each model coefficient differs significantly from zero.
DESeq2 reports the standard error for each shrunken LFC
estimate, obtained from the curvature of the coefficient’s
po sterior (dashe d lines in Figure 2D) at its maximum.
For significance testing, DESe q2 uses a Wald test: the
shrunken estimate of LFC is divided by its s tandard error,
resulting in a z-statistic, which is compared to a standard
normal distri bution. (See Methods for details.) The Wald
test allows testing of individual coefficients , or contrasts
of coefficients, without the need to fit a reduced model as
with the likelihood ratio test, though the likelihood ratio
test is also available a s an option in DESeq2.TheWaldtest
P values from the subset of genes that pass an independent
filtering step, describ ed in the next section, are adjusted
for multiple testing using the procedure of Benjamini and
Hochberg [21].
Automatic i ndependent filtering
Duetothelargenumberoftestsperformedintheanaly-
sis of RNA-seq and other genome-wide experiments, the
multiple testing problem needs to be addressed. A popu-
lar objective is control or estimation of the FDR . Multiple
Figure 3 Stability of logarithmic fold changes. DESeq2 is run on equally split halves of the data of Bottomly et al. [16], and the LFCs from t he
halves are plotted against each other. (A) MLEs, i.e., without LFC shrinkage. (B) MAP estimates, i.e., with shrinkage. Points in the top left and bottom
right quadrants indicate genes with a change of sign of LFC. Red points indicate genes with adjusted P value < 0.1. The legend displays the
root-mean-square error of the estimates in group I compared to those in group II. LFC, logarithmic fold change; MAP, maximum a posteriori; MLE,
maximum-likelihood estimate; RMSE, root-mean-square error.

Figures
Citations
More filters
Journal ArticleDOI

Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

TL;DR: This work presents DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates, which enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression.
Journal ArticleDOI

Transcript-level expression analysis of RNA-seq experiments with HISAT, StringTie and Ballgown

TL;DR: This protocol describes all the steps necessary to process a large set of raw sequencing reads and create lists of gene transcripts, expression levels, and differentially expressed genes and transcripts.
Journal ArticleDOI

Shotgun metagenomics, from sampling to analysis

TL;DR: Computational approaches to overcome the challenges that affect both assembly-based and mapping-based metagenomic profiling, particularly of high-complexity samples or environments containing organisms with limited similarity to sequenced genomes, are needed.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Journal ArticleDOI

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Book

Generalized Linear Models

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Book

The Elements of Statistical Learning: Data Mining, Inference, and Prediction

TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What are the contributions mentioned in the paper "Moderated estimation of fold change and dispersion for rna-seq data with deseq2" ?

The authors present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. An important task here is the analysis of RNA sequencing ( RNA-seq ) data with the aim of finding genes that are differentially expressed across groups of samples. Tum. de 2Genome Biology Unit, European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany Full list of author information is available at the end of the article Many methods for differential expression analysis of RNA-seq data perform such information sharing across genes for variance ( or, equivalently, dispersion ) estimation. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http: //creativecommons. org/licenses/by/4. 0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http: //creativecommons. org/publicdomain/zero/1. 0/ ) applies to the data made available in this article, unless otherwise stated. Here the authors present DESeq2, a successor to their DESeq method [ 4 ]. The authors demonstrate the advantages ofDESeq2 ’ s new features by describing a number of applications possible with shrunken fold changes and their estimates of standard error, including improved gene ranking and visualization, hypothesis tests above and below a threshold, and the regularized logarithm transformation for quality assessment and clustering of overdispersed count data. Note that although the authors refer in this paper to counts of reads in genes, the methods presented here can be applied as well to other kinds of HTS count data. For each gene, the authors fit a generalized linear model ( GLM ) [ 12 ] as follows. The authors model read counts Kij as following a negative binomial distribution ( sometimes also called a gamma-Poisson distribution ) withmeanμij and dispersion αi. The use of linearmodels, however, provides the flexibility to also analyze more complex designs, as is often useful in genomic studies [ 15 ]. The authors here explain the concepts of their approach using as examples a dataset by Bottomly et al. [ 16 ] with RNA-seq data formice of two different strains and a dataset by Pickrell et al. [ 17 ] with RNA-seq data for human lymphoblastoid cell lines. Next, the authors determine the location parameter of the distribution of these estimates ; to allow for dependence on average expression strength, they fit a smooth curve, as shown by the red line in Figure 1. This provides an accurate estimate for the expected dispersion value for genes of a given expression strength but does not represent deviations of individual genes from this overall trend. Their approach therefore accounts for gene-specific Love et al. The black points circled in blue are detected as dispersion outliers and not shrunk toward the prior ( shrinkage would follow the dotted line ). Variation to the extent that the data provide this information, while the fitted curve aids estimation and testing in less information-rich settings. Their approach is similar to the one used by DSS [ 6 ], in that both methods sequentially estimate a prior distribution for the true dispersion values around the fit, and then provide the maximum a posteriori ( MAP ) as the final estimate. The authors reasoned that in many cases, the reason for extraordinarily high dispersion of a gene is that it does not obey their modeling assumptions ; some genes may showmuch higher variability than others for biological or technical reasons, even though they have the same average expression levels. The authors demonstrate this issue using the dataset by Bottomly et al. [ 16 ]. The authors again employ an empirical Bayes procedure: they first perform Love et al. The stronger curvature of the green posterior at its maximum translates to a smaller reported standard error for the MAP LFC estimate ( horizontal error bar ). Genes with high information for LFC estimation will have, in their approach, LFCs with both low bias and low variance. Furthermore, as the degrees of freedom increase, and the experiment provides more information for LFC estimation, the shrunken estimates will converge to the unshrunken estimates. To demonstrate this, the authors split the Bottomly et al. samples equally into two groups, I and II, such that each group contained a balanced split of the strains, simulating a scenario where an experiment ( samples in group I ) is performed, analyzed and reported, and then independently replicated ( samples in group II ). This makes shrunken LFCs also suitable for ranking genes, e. g., to prioritize them for follow-up experiments. For example, if the authors sort the genes in the two sample groups of Figure 3 by unshrunken LFC estimates, and consider the 100 genes with the strongest upor down-regulation in group I, they find only 21 of these again among the top 100 upor down-regulated genes in group II. However, if the authors rank the genes by shrunken LFC estimates, the overlap improves to 81 of 100 genes ( Additional file 1: Figure S3 ). The authors demonstrate this in the Benchmarks section below. However, the loss can be reduced if genes that have little or no chance of being detected as differentially expressed are omitted from the testing, provided that the criterion for omission is independent of the test statistic under the null hypothesis [ 22 ] ( see Methods ). For well-powered experiments, however, a statistical test against the conventional null hypothesis of zero LFC may report genes with statistically significant changes that are so weak in effect strength that they could be considered irrelevant or distracting. However, this approach loses the benefit of an easily interpretable FDR, as the reported P value and adjusted P value still correspond to the test of zero LFC. It is therefore desirable to include the threshold in the statistical testing procedure directly, i. e., not to filter post hoc on a reported fold-change estimate, but rather to evaluate statistically directly whether there is sufficient evidence that the LFC is above the chosen threshold. The authors note that related approaches to generate gene lists that satisfy both statistical and biological significance criteria have been previously discussed for microarray data [ 23 ] and recently for sequencing data [ 19 ]. For such analyses, DESeq2 offers a test of the composite null hypothesis |βir| ≥ θ, which will report genes as significant for which there is evidence that their LFC is weaker than θ. As the aim of differential expression analysis is typically to find consistently upor down-regulated genes, it is useful to consider diagnostics for detecting individual observations that overly influence the LFC estimate and P value for a gene. While the original fitted means are heavily influenced by a single sample with a large count, the corrected LFCs provide a better fit to the majority of the samples. When the authors consider the variance of each gene, computed across samples, these variances are stabilized – i. e., approximately the same, or homoskedastic – after the rlog transformation, while they would otherwise strongly depend on the mean counts. Note that while the rlog transformation builds upon on their LFC shrinkage approach, it is distinct from and not part of the statistical inference Love et al. This is in contrast to the variance-stabilizing transformation ( VST ) for overdispersed counts introduced in DESeq [ 4 ]: while the VST is also effective at stabilizing variance, it does not directly take into account differences in size factors ; and in datasets with large variation in sequencing depth ( dynamic range of size factors 4 ) the authors observed undesirable artifacts in the performance of the VST. Both the rlog transformation and the VST are provided in the DESeq2 package. The authors demonstrate the use of the rlog transformation on the RNA-seq dataset of Hammer et al. [ 26 ], wherein RNA was sequenced from the dorsal root ganglion of rats that had undergone spinal nerve ligation and controls, at 2 weeks and at 2 months after the ligation. Love et al. Genome Biology ( 2014 ) 15:550 Page 2 of 21 further investigation. Furthermore, the number of genes called significantly differentially expressed depends as much on the sample size and other aspects of experimental design as it does on the biology of the experiment – and well-powered experiments often generate an overwhelmingly long list of hits [ 9 ]. The authors furthermore compare DESeq2 ’ s statistical power with existing tools, revealing that their methodology has high sensitivity and precision, while controlling the false positive rate. However, it can be advantageous to calculate gene-specific normalization factors sij to account for further sources of technical biases such as differing dependence on GC content, gene length or the like, using published methods [ 13,14 ], and these can be supplied instead. The shrinkage procedure thereby helps avoid potential false positives, which can result from underestimates of dispersion. Furthermore, a standard error for each estimate is reported, which is derived from the posterior ’ s curvature at its maximum ( see Methods for details ). 

The adjusted Rand index [37] was used to compare a hierarchical clustering based on various distances with the true cluster membership. 

Inferential methods that treat each gene separately suffer here from lack of power, due to the high uncertainty of within-group variance estimates. 

A disadvantage of the rlog transformation with respect to the VST is, however, that the ordering of genes within a sample will change if neighboring genes undergo shrinkage of different strength. 

By default, outliers in conditions with six or fewer replicates cause the whole gene to be flagged and removed from subsequent analysis, including P value adjustment for multiple testing. 

Due to the large number of tests performed in the analysis of RNA-seq and other genome-wide experiments, the multiple testing problem needs to be addressed. 

Its use cases are not limited to RNA-seq data or other transcriptomics assays; rather, many kinds of high-throughput count data can be used. 

It was expected that the permutation-based SAMseq method would rarely produce adjusted P value < 0.1 in the evaluation set, because the three vs three comparison does not enable enough permutations. 

In addition, the approach used in DESeq2 can be extended to isoformspecific analysis, either through generalized linear modeling at the exon level with a gene-specific mean as in the DEXSeq package [30] or through counting evidence for alternative isoforms in splice graphs [31,32]. 

As the outlier is replaced with the value predicted by the null hypothesis of no differential expression, this is a more conservative choice than simply omitting the outlier. 

Figure 4A demonstrates how such a thresholded test gives rise to a curved decision boundary: to reach significance, the estimated LFC has to exceed the specified threshold by an amount that depends on the available information. 

the estimates are more evenly spread around zero, and for very weakly expressed genes (with less than one read per sample on average), LFCs hardly deviate from zero, reflecting that accurate LFC estimates are not possible here. 

if any biological processes are genuinely affected by the difference in experimental treatment, this null hypothesis implies that the gene under consideration is perfectly decoupled from these processes. 

Figure 5 provides diagnostic plots of the normalized counts under the ordinary logarithm with a pseudocount of 1 and the rlog transformation, showing that the rlog both stabilizes the variance through the range of the mean of counts and helps to find meaningful patterns in the data. 

if estimates for average transcript length are available for the conditions, these can be incorporated into the DESeq2 framework as gene- and sample-specific normalization factors.