scispace - formally typeset
Open AccessJournal ArticleDOI

Dual RNA-Sequencing to Elucidate the Plant-Pathogen Duel.

TLDR
This review focuses on the design of dual RNA-seq experiments and the application of downstream data analysis to gain biological insight into both sides of the interaction and a reduction in sequencing cost and single cell transcriptomics coupled with protein and metabolite level dual approaches are set to enhance understanding of plant-pathogen interactions.
Abstract
RNA-sequencing technology has been widely adopted to investigate host responses during infection with pathogens. Dual RNA-sequencing (RNA-seq) allows the simultaneous capture of pathogen specific transcripts during infection, providing a more complete view of the interaction. In this review, we focus on the design of dual RNA-seq experiments and the application of downstream data analysis to gain biological insight into both sides of the interaction. Recent literature in this area demonstrates the power of the dual RNA-seq approach and shows that it is not limited to model systems where genomic resources are available. A reduction in sequencing cost and single cell transcriptomics coupled with protein and metabolite level dual approaches are set to enhance our understanding of plant-pathogen interactions. Sequencing costs continue to decrease and single cell transcriptomics is becoming more feasible. In combination with proteomics and metabolomics studies, these technological advances are likely to contribute to our understanding of the temporal and spatial aspects of dynamic plant-pathogen interactions.

read more

Content maybe subject to copyright    Report

Dual RNA-seq to Elucidate the Plant–
Pathogen Duel
Sanushka Naidoo
1
*, Erik Andrei Visser
1
, Lizahn Zwart
1
, Yves du Toit
1
,
Vijai Bhadauria
2
and Louise Simone Shuey
1
*
1
Department of Genetics, Forestry and Agricultural Biotechnology Institute, Genomics Research Institute, University
of Pretoria, Pretoria, South Africa.
2
Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK, Canada.
*Correspondence: Sanushka.Naidoo@fabi.up.ac.za and Louise.Shuey@fabi.up.ac.za
hps://doi.org/10.21775/cimb.027.127
Abstract
RNA-sequencing technology has been widely
adopted to investigate host responses during
infection with pathogens. Dual RNA-sequencing
(RNA-seq) allows the simultaneous capture of
pathogen-specic transcripts during infection,
providing a more complete view of the interaction.
In this review, we focus on the design of dual RNA-
seq experiments and the application of downstream
data analysis to gain biological insight into both
sides of the interaction. Recent literature in this
area demonstrates the power of the dual RNA-seq
approach and shows that it is not limited to model
systems where genomic resources are available.
Sequencing costs continue to decrease and single
cell transcriptomics is becoming more feasible. In
combination with proteomics and metabolomics
studies, these technological advances are likely to
contribute to our understanding of the temporal
and spatial aspects of dynamic plant–pathogen
interactions.
A dual approach in planta
e interaction between plants and pathogens is an
active and dynamic process that can be likened to a
duel. Plants have complex defence mechanisms that
can be rendered ineective when pathogens inter-
fere with one of the various processes required for
host defence. ese processes include penetration
resistance, recognition by Paern Recognition
Receptors (PRRs), phytohormone signalling path-
ways, secretory pathways, secondary metabolite
production, and plant cell death (Dou and Zhou,
2012). Until recently, transcriptomic approaches
have been applied in the host and pathogen sepa-
rately to obtain the gene expression prole of each
organism and gain insight into infection biology or
host defence mechanisms.
RNA sequencing (RNA-seq) is a powerful tech-
nology that does not rely on any prior knowledge
of transcripts and can generate vast quantities of
data with much smaller costs involved than for
older techniques such as microarrays (Pareek et al.,
2011; Wilhelm and Landry, 2009). An advantage
of RNA-seq in the eld of plant–pathogen interac-
tions is that both plant and pathogen transcripts
can be detected simultaneously and accurately in
the same sample. is tactic, known as dual RNA-
seq, in planta RNA-seq, simultaneous RNA-seq, or
comparative RNA-seq, is a relatively new technique
both in the plant and medical elds. In plants, it
allows for the study of plant–pathogen interactions
in herbaceous crops (Chen et al., 2013; Kunjeti
et al., 2012; Lowe et al., 2014) as well as trees
(Hayden et al., 2014; Liang et al., 2014; Teixeira et
al., 2014). is review outlines technical considera-
tions for dual RNA-seq experiments, summarizes
recent insights drawn from such approaches in
plant–pathogen interactions, and provides an
Curr. Issues Mol. Biol. Vol. 27

Naidoo et al.
128
|
overview of the next generation of dual approaches.
Since this technique is most useful to study interac-
tions with pathogens with complex prokaryotic and
eukaryotic genomes, viral pathogens have not been
included in this review.
It’s all in the design
Experimental design considerations for a dual
RNA-seq experiment can be divided into three
broad categories: sample generation, data genera-
tion and data analysis. An overview of the process
can be found in Fig. 8.1.
Figure 8.1 Flow chart of a dual RNA-seq experiment with example software programs. (A) Experimental
design considerations can include comparisons between resistant and susceptible interactions, normalized
to a mock inoculated control. (B) Dierent library preparation options include enrichments for mRNA, small
RNAs, stranded RNA and total RNA. (C) The sequencing platform can vary based on availability and aim of the
study. Paired-end sequencing on the Illumina platform is a common approach for RNA-seq. Deep sequencing
is required for dual RNA-seq approaches. Read length can vary depending on the application and sequencing
platform used. (D) Downstream read quality control can be implemented. Filtering for a minimum Phred quality
score of Q30 is generally optimal, but the threshold is data dependent. (E) Dual RNA-seq is performed by
mapping to both the host and pathogen reference genome sequences, or to the host rst with the remaining
reads mapped to the pathogen reference. Endophyte contamination can be removed by mapping to common
contaminant sequences obtained from databases such as Refseq. (F) If a reference is not available, a de novo
transcriptome can be assembled from the reads. (G) The mapping approach will dier based on the type of
reference used and dierent methods can be used when mapping to the host or the pathogen. (H) Other
programs commonly used for expression quantication include HTSeq, featureCounts (Liao et al., 2014) and
Limma (Ritchie et al., 2015). (I) Dierential Gene Expression (DGE) analysis can be performed using a number of
methods. The two examples listed here can be used for both transcriptome and genome-based DGE analysis.
(J) Genes identied as dierentially expressed can be used in programs and databases such as BinGO (Maere
et al., 2005), MapMan (Thimm et al., 2004) and Kegg (Ogata et al., 1999) to predict biological signicance
(Maere et al., 2005).
Curr. Issues Mol. Biol. Vol. 27

Dual RNA-seq of the Plant–Pathogen Duel
|
129
Sample generation
When considering experimental design for sample
generation, the main factors include trial design,
sample harvesting approach and sample handling
(reviewed in Yang and Wei, 2015).
Two important trial design and sample harvest-
ing considerations for dual RNA-seq experiments
are the predicted gene number in the pathogen and
host genomes, and the relative amounts of pathogen
and host cells within a given sample (Westermann
et al., 2012). Both of these factors inuence the
amount of pathogen RNA relative to host RNA
within a sample. A lower ratio of pathogen to host
RNA requires greater sequencing depth to capture
the full extent of biological variation within the
pathogen.
A dual RNA-seq experiment considering
an interaction between a eukaryotic host and
prokaryotic pathogen requires approximately 10
to 20 times as many reads than would usually be
required. is is partly due to the smaller amount
of cellular RNA within prokaryotic cells relative to
eukaryotes (Westermann et al., 2012). While the
relative amounts of cellular RNA between host and
pathogen are more similar in eukaryote–eukaryote
interactions, the higher read coverage is still neces-
sary due to the lower quantity of pathogen versus
host cells, which results in less pathogen RNA per
sample.
An important trial design consideration specic
to dual RNA-seq experiments is the inclusion
of a control for pathogen gene expression. is
can be done by comparing in planta expression
of pathogen genes to in planta gene expression of
a non-pathogenic strain and/or pathogen gene
expression in an agar culture or spore suspension
(Kawahara et al., 2012). Synthetic RNA spike-ins
can also be included to quantify both pathogen and
host RNA (Box 8.1).
Data generation
e main experimental design considerations for
data generation include the level of sample replica-
tion, library construction and sequencing (Liu et
al., 2014).
Sample replication
One of the rst factors to consider in experimental
design is the level of sample replication (Auer and
Doerge, 2010). Sample replication is divided into
technical replication, which is dened as perform-
ing the same analysis multiple times on the same
sample, and biological replication, a study depend-
ent term that can be loosely dened as harvesting
the same type of sample from the same type of
organism from the same conditions.
Technical variation arises when errors occur in
the experimental procedure and can be accounted
for through technical replication. Illumina sequenc-
ing produces negligible technical variability,
removing the need for technical replication in RNA-
seq experiments (Marioni et al., 2008). However,
when coverage is low for certain transcripts, techni-
cal variation can still arise (McIntyre et al., 2011).
us, technical replication should be considered
for dual RNA-seq experiments where there is low
representation of pathogen RNA within a sample,
resulting in low coverage of pathogen transcripts.
Box 8.1 Total RNA quantication
It is not always possible to accurately predict the amounts of host and pathogen RNA that will be present
in a sample. While it is possible to measure the amount of host and pathogen DNA in a sample using
qRT-PCR, this is not always an accurate reection of total host and pathogen RNA. This problem can be
circumvented by the addition of RNA spike-ins to samples. An RNA spike-in for RNA-seq is a mixture
of synthesized RNA transcripts of known sequence, concentration and abundance. While inclusion of
an RNA spike-in could increase the cost of sequencing due to increased coverage requirements, it can
be used to measure sensitivity and accuracy of sequencing as well as to detect biases that can occur
during RNA-seq (Jiang et al., 2011). Furthermore, standard curves can be generated from RNA spike-ins.
This allows for more accurate quantication of transcript abundance (Jiang et al., 2011). In dual RNA-
seq experiments, it is possible to use these standard curves to estimate host and pathogen RNA levels
within a sample. However, it is important to ensure that none of the spike-in sequences are present in the
genome of either host or pathogen, as this could preclude accurate quantication of genes containing
similar sequences and the use of those spike-in sequences (Jiang et al., 2011).
Curr. Issues Mol. Biol. Vol. 27

Naidoo et al.
130
|
While technical replication can be excluded due
to reliability of the technology, biological replica-
tion remains crucial to all RNA-seq experiments.
Besides accounting for biological variation (Hansen
et al., 2011; Neleton, 2014), biological replication
signicantly aects the power and accuracy of dier-
ential expression analyses. Liu et al. (2014) showed
that increasing the number of biological replicates
sequenced increased the number of accurately
identied dierentially expressed genes, whereas
increased read depth produced diminishing returns
for both statistical power and the precision with
which dierential expression is detected. is is
especially important in dual RNA-seq experiments
where biological variation is introduced from both
pathogen and host.
Library construction and sequencing
e main factors to consider during library con-
struction and sequencing are depletion methods,
strandedness, insert size, read length, and read
depth. e use of strand-specic rather than non-
strand-specic libraries [reviewed in Levin et al.
(2010)], allows the accurate detection of anti-sense
transcription and can allow accurate expression
quantication of overlapping transcripts. us,
strand-specic sequencing in dual RNA-seq
experiments could enable detection of evidence
suggesting host–pathogen interaction through anti-
sense transcription.
e choice of insert size is dependent on the
complexity of the transcriptome and the target
RNA species (reviewed in Head et al., 2014). Insert
size selection can be a limiting factor in which RNA
species can be analysed because inclusion of a size
selection step during library preparation results in
loss of transcripts shorter than the selected insert
size. Insert size selection also imposes an upper
limit on read length, since reads longer than the
insert size will sequence into adapters, providing no
new information.
Apart from insert size, the choice of read length
is dependent mainly on the objectives of the study
and the quality of the reference sequence used
for mapping. If a high quality and well annotated
reference sequence is available, increasing read
length above 50 bp is unnecessary for accurate
detection of dierential expression (Chhangawala
et al., 2015). Similarly, sequencing of paired-end
instead of single-end reads does not signicantly
aect detection of dierential expression in these
cases (Chhangawala et al., 2015). Conversely, when
studying organisms with less well dened reference
sequences, sequencing of longer paired-end reads
increases the accuracy of splice junction detection
(Chhangawala et al., 2015). When no reference
sequence is available, it is oen assumed that longer
reads equate to increased accuracy for de novo
assembly. Similar to the detection of dierential
expression, however, there seems to be a species-
specic threshold beyond which increasing read
length becomes redundant (Chang et al., 2014).
In cases where a high quality reference is avail-
able, less coverage is required for accurate transcript
identication and quantication, compared to
cases where a reference is missing. is is because
gaps in an assembly arising from low coverage can
be lled using the underlying reference sequence.
For studies relying on de novo assembly, a predicted
minimum of 30× total reference coverage is required
(Martin and Wang, 2011), while genome-guided
assembly can be accomplished with coverage below
10× (Denoeud et al., 2008).
For an RNA-seq experiment to be representa-
tive, it is important to make sure that the number
of reads is sucient to account for the least rep-
resented RNA species. is is also referred to as
sucient sequencing depth. To obtain adequate
depth for a dual RNA-seq experiment, enough
reads need to be sequenced to have at least 1× cov-
erage of the least represented pathogen transcript in
the sample with the lowest level of pathogen to host
RNA. However, it is almost impossible to know this
information when performing de novo RNA-seq
experiments.
Techniques to deplete or enrich certain RNA
species, such as RNA fractionation and poly(A)
selection, can enhance detection of transcripts with
low expression in eukaryotes (Sims et al., 2014).
Depletion of the rRNA fraction can further reduce
the required sequencing depth of an experiment
and, unlike poly(A) selection, allow for detection of
non-poly(A) transcripts. Although depletion-based
methods allow for selection of non-poly(A) RNA
species, these methods can bias quantication of
abundant transcripts and decrease exon coverage
and power to detect splice junctions due to the
presence of sequenced introns from pre-mRNA in
eukaryotes (Martin and Wang, 2011; Sims et al.,
2014).
Curr. Issues Mol. Biol. Vol. 27

Dual RNA-seq of the Plant–Pathogen Duel
|
131
Data analysis
As with sample and data generation, data analysis
considerations are dependent on the underlying
biological questions. Data analysis for the majority
of RNA-seq experiments follows three sequential
steps: (1) quality control, (2) mapping, expres-
sion quantication and DE analysis, and (3)
downstream analysis. Due to the variety of tools
and platforms that can be used for RNA-seq data
analysis (Grant et al., 2011), programs typically
used for RNA-seq data analysis may be created for
specic analysis types and tested within a specic
experimental context. us, it is oen advisable
to repeat an analysis using dierent programs and
compare the outputs.
Quality control
Quality control for dual RNA-seq studies is
similar to that used for traditional RNA-seq stud-
ies. However, contaminant ltering becomes more
complicated as reads from both host and pathogen
need to be retained. While reads originating from
the host and pathogen can be separated by map-
ping to the host and pathogen reference sequences
(Schulze et al., 2015; Westermann et al., 2012),
contamination of various forms should be consid-
ered in order to improve the accuracy and eciency
with which genes and transcripts are mapped and
quantied. Contamination may occur in two main
forms: non-mRNA species (which constitute the
majority of the total RNA extracted) and reads
representing mRNA extracted from organisms
(saprophytes and endophytes) other than the
organisms of interest. ese forms of contamina-
tion may skew the quantication of genes and
transcripts when assembling and mapping reads
to the reference. Westermann et al. (2012) provide
insight into dealing with contaminating RNA which
is species and study dependent.
Contamination in the form of RNA extracted
from endophytic or saprophytic organisms is
especially important in plant–pathogen interaction
studies. Saprophytes may be present at the sites
of wounding due to the degradation of tissue that
occurs, while endophytes colonize areas below the
surface of the plant tissue without causing symp-
toms. us RNA from these types of organisms
can be present in RNA-seq libraries. While surface
sterilization could be used to decrease the presence
of saprophytes, the process is time consuming
and may result in damage to host RNA. Surface
sterilization could also result in decreased pathogen
representation, which is counterproductive for
a dual RNA-seq experiment. erefore, removal
of these contaminating sequences requires bioin-
formatics intervention. is can be accomplished
through stringent mapping of data to a database
of common contaminant cDNA sequences con-
structed from databases such as RefSeq, UniRef100
and GenBank (Ikeue et al., 2015). In cases where
reference genomes are available for known endo-
phytes and saprophytes, stringent alignment to
these references could also be used to lter reads
(Zuluaga et al., 2015).
Mapping, expression quantication and
dierential expression analysis
Mapping is the reconstruction of the transcriptome
through alignment of reads to a reference sequence.
In dual RNA-seq experiments, reads are mapped
to the host reference sequence and the unaligned
sequences are retained and mapped to the patho-
gen reference sequence (Teixeira et al., 2014). A
common program used for read alignment to a
reference is the short read aligner Bowtie, which is
part of the Tophat package of the Tuxedo pipeline
(Trapnell et al., 2012). Box 8.2 describes mapping
and splice site determination. Bowtie allows the
user to set the number of mismatches between the
query and reference sequence, eectively seing a
stringency threshold for the alignment (Langmead
and Salzberg, 2012). is aects the stringency
with which reads will be aligned and eectively
assigned to the host or pathogen reference.
Once the reads have been assembled and l-
tered into host and pathogen libraries, transcript
abundance quantication and dierential expres-
sion analysis can be performed (Boxes 8.3 and
8.4, respectively). Expression levels are quantied
by counting the number of reads mapped to each
gene/transcript, normalized across the length of the
gene/transcript to account for bias across abundant
gene regions, relative to the number of reads in the
original library. Programs like Cuinks (Trapnell
et al., 2012) and RSEM (Li and Dewey, 2011) can
be used to accurately quantify relative numbers of
genes and transcripts. Dierential expression analy-
sis is commonly performed using packages such as
Cudi, DESeq and EdgeR (Anders and Huber,
2010; Robinson et al., 2010; Trapnell et al., 2013).
Curr. Issues Mol. Biol. Vol. 27

Citations
More filters
Journal ArticleDOI

The Road to Resistance in Forest Trees.

TL;DR: The way forward to developing superior genotypes with enhanced resistance against biotic stress is proposed and tree associations with non-pathogenic endophytic and subterranean microbes may be engineered in forest trees to improve resistance in the future.
Journal ArticleDOI

A roadmap for research in octoploid strawberry.

TL;DR: This forward-looking review proposes avenues of research toward new biological insights and applications to agriculture, including the origins of the genome, characterization of genetic variants, and big data approaches to breeding.
Journal ArticleDOI

Best practices on the differential expression analysis of multi-species RNA-seq.

TL;DR: In this article, the authors describe best practices for multi-species transcriptomics and differential gene expression, as well as the analysis of multispecies transcriptomics datasets requires modifications to the alignment, quantification, and downstream analysis steps compared to the single-species analysis pipelines.
Journal ArticleDOI

Coordinated bacterial and plant sulfur metabolism in Enterobacter sp. SA187-induced plant salt stress tolerance.

TL;DR: In this paper, the authors compared the metabolic wirings of Arabidopsis and SA187 in the free-living and endophytic interaction states, and found that the interaction of SA187 with the host plant induced massive changes in bacterial gene expression for chemotaxis, flagellar biosynthesis, quorum sensing, and biofilm formation.
Journal ArticleDOI

Dual RNA Sequencing of Vitis vinifera during Lasiodiplodia theobromae Infection Unveils Host-Pathogen Interactions.

TL;DR: Gene expression analysis showed changes in the fungal metabolism of phenolic compounds, carbohydrate metabolism, transmembrane transport, and toxin synthesis, suggesting that the fungus could evade the host defense response using the phenylpropanoid pathway.
References
More filters
Journal ArticleDOI

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

TL;DR: EdgeR as mentioned in this paper is a Bioconductor software package for examining differential expression of replicated count data, which uses an overdispersed Poisson model to account for both biological and technical variability and empirical Bayes methods are used to moderate the degree of overdispersion across transcripts, improving the reliability of inference.
Journal ArticleDOI

Differential expression analysis for sequence count data.

Simon Anders, +1 more
- 27 Oct 2010 - 
TL;DR: A method based on the negative binomial distribution, with variance and mean linked by local regression, is proposed and an implementation, DESeq, as an R/Bioconductor package is presented.
Journal ArticleDOI

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

TL;DR: This protocol begins with raw sequencing reads and produces a transcriptome assembly, lists of differentially expressed and regulated genes and transcripts, and publication-quality visualizations of analysis results, which takes less than 1 d of computer time for typical experiments and ∼1 h of hands-on time.
Journal ArticleDOI

Differential analysis of gene regulation at transcript resolution with RNA-seq

TL;DR: Cuffdiff 2, an algorithm that estimates expression at transcript-level resolution and controls for variability evident across replicate libraries, robustly identifies differentially expressed transcripts and genes and reveals differential splicing and promoter-preference changes.
Journal ArticleDOI

MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes

TL;DR: Widespread changes in the expression of genes encoding receptor kinases, transcription factors, components of signalling pathways, proteins involved in post-translational modification and turnover, and proteins involved with the synthesis and sensing of cytokinins, abscisic acid and ethylene revealing large-scale rewiring of the regulatory network is an early response to sugar depletion are revealed.
Related Papers (5)