Dual RNA-Sequencing to Elucidate the Plant-Pathogen Duel.

doi:10.21775/CIMB.027.127

Dual RNA-seq to Elucidate the Plant–

Pathogen Duel

Sanushka Naidoo

1

*, Erik Andrei Visser

1

, Lizahn Zwart

1

, Yves du Toit

1

,

Vijai Bhadauria

2

and Louise Simone Shuey

1

*

1

Department of Genetics, Forestry and Agricultural Biotechnology Institute, Genomics Research Institute, University

of Pretoria, Pretoria, South Africa.

2

Crop Development Centre and Department of Plant Sciences, University of Saskatchewan, Saskatoon, SK, Canada.

*Correspondence: Sanushka.Naidoo@fabi.up.ac.za and Louise.Shuey@fabi.up.ac.za

hps://doi.org/10.21775/cimb.027.127

Abstract

RNA-sequencing technology has been widely

adopted to investigate host responses during

infection with pathogens. Dual RNA-sequencing

(RNA-seq) allows the simultaneous capture of

pathogen-specic transcripts during infection,

providing a more complete view of the interaction.

In this review, we focus on the design of dual RNA-

seq experiments and the application of downstream

data analysis to gain biological insight into both

sides of the interaction. Recent literature in this

area demonstrates the power of the dual RNA-seq

approach and shows that it is not limited to model

systems where genomic resources are available.

Sequencing costs continue to decrease and single

cell transcriptomics is becoming more feasible. In

combination with proteomics and metabolomics

studies, these technological advances are likely to

contribute to our understanding of the temporal

and spatial aspects of dynamic plant–pathogen

interactions.

A dual approach in planta

e interaction between plants and pathogens is an

active and dynamic process that can be likened to a

duel. Plants have complex defence mechanisms that

can be rendered ineective when pathogens inter-

fere with one of the various processes required for

host defence. ese processes include penetration

resistance, recognition by Paern Recognition

Receptors (PRRs), phytohormone signalling path-

ways, secretory pathways, secondary metabolite

production, and plant cell death (Dou and Zhou,

2012). Until recently, transcriptomic approaches

have been applied in the host and pathogen sepa-

rately to obtain the gene expression prole of each

organism and gain insight into infection biology or

host defence mechanisms.

RNA sequencing (RNA-seq) is a powerful tech-

nology that does not rely on any prior knowledge

of transcripts and can generate vast quantities of

data with much smaller costs involved than for

older techniques such as microarrays (Pareek et al.,

2011; Wilhelm and Landry, 2009). An advantage

of RNA-seq in the eld of plant–pathogen interac-

tions is that both plant and pathogen transcripts

can be detected simultaneously and accurately in

the same sample. is tactic, known as dual RNA-

seq, in planta RNA-seq, simultaneous RNA-seq, or

comparative RNA-seq, is a relatively new technique

both in the plant and medical elds. In plants, it

allows for the study of plant–pathogen interactions

in herbaceous crops (Chen et al., 2013; Kunjeti

et al., 2012; Lowe et al., 2014) as well as trees

(Hayden et al., 2014; Liang et al., 2014; Teixeira et

al., 2014). is review outlines technical considera-

tions for dual RNA-seq experiments, summarizes

recent insights drawn from such approaches in

plant–pathogen interactions, and provides an

Curr. Issues Mol. Biol. Vol. 27

Naidoo et al.

128

|

overview of the next generation of dual approaches.

Since this technique is most useful to study interac-

tions with pathogens with complex prokaryotic and

eukaryotic genomes, viral pathogens have not been

included in this review.

It’s all in the design

Experimental design considerations for a dual

RNA-seq experiment can be divided into three

broad categories: sample generation, data genera-

tion and data analysis. An overview of the process

can be found in Fig. 8.1.

Figure 8.1 Flow chart of a dual RNA-seq experiment with example software programs. (A) Experimental

design considerations can include comparisons between resistant and susceptible interactions, normalized

to a mock inoculated control. (B) Dierent library preparation options include enrichments for mRNA, small

RNAs, stranded RNA and total RNA. (C) The sequencing platform can vary based on availability and aim of the

study. Paired-end sequencing on the Illumina platform is a common approach for RNA-seq. Deep sequencing

is required for dual RNA-seq approaches. Read length can vary depending on the application and sequencing

platform used. (D) Downstream read quality control can be implemented. Filtering for a minimum Phred quality

score of Q30 is generally optimal, but the threshold is data dependent. (E) Dual RNA-seq is performed by

mapping to both the host and pathogen reference genome sequences, or to the host rst with the remaining

reads mapped to the pathogen reference. Endophyte contamination can be removed by mapping to common

contaminant sequences obtained from databases such as Refseq. (F) If a reference is not available, a de novo

transcriptome can be assembled from the reads. (G) The mapping approach will dier based on the type of

reference used and dierent methods can be used when mapping to the host or the pathogen. (H) Other

programs commonly used for expression quantication include HTSeq, featureCounts (Liao et al., 2014) and

Limma (Ritchie et al., 2015). (I) Dierential Gene Expression (DGE) analysis can be performed using a number of

methods. The two examples listed here can be used for both transcriptome and genome-based DGE analysis.

(J) Genes identied as dierentially expressed can be used in programs and databases such as BinGO (Maere

et al., 2005), MapMan (Thimm et al., 2004) and Kegg (Ogata et al., 1999) to predict biological signicance

(Maere et al., 2005).

Curr. Issues Mol. Biol. Vol. 27

Dual RNA-seq of the Plant–Pathogen Duel

|

129

Sample generation

When considering experimental design for sample

generation, the main factors include trial design,

sample harvesting approach and sample handling

(reviewed in Yang and Wei, 2015).

Two important trial design and sample harvest-

ing considerations for dual RNA-seq experiments

are the predicted gene number in the pathogen and

host genomes, and the relative amounts of pathogen

and host cells within a given sample (Westermann

et al., 2012). Both of these factors inuence the

amount of pathogen RNA relative to host RNA

within a sample. A lower ratio of pathogen to host

RNA requires greater sequencing depth to capture

the full extent of biological variation within the

pathogen.

A dual RNA-seq experiment considering

an interaction between a eukaryotic host and

prokaryotic pathogen requires approximately 10

to 20 times as many reads than would usually be

required. is is partly due to the smaller amount

of cellular RNA within prokaryotic cells relative to

eukaryotes (Westermann et al., 2012). While the

relative amounts of cellular RNA between host and

pathogen are more similar in eukaryote–eukaryote

interactions, the higher read coverage is still neces-

sary due to the lower quantity of pathogen versus

host cells, which results in less pathogen RNA per

sample.

An important trial design consideration specic

to dual RNA-seq experiments is the inclusion

of a control for pathogen gene expression. is

can be done by comparing in planta expression

of pathogen genes to in planta gene expression of

a non-pathogenic strain and/or pathogen gene

expression in an agar culture or spore suspension

(Kawahara et al., 2012). Synthetic RNA spike-ins

can also be included to quantify both pathogen and

host RNA (Box 8.1).

Data generation

e main experimental design considerations for

data generation include the level of sample replica-

tion, library construction and sequencing (Liu et

al., 2014).

Sample replication

One of the rst factors to consider in experimental

design is the level of sample replication (Auer and

Doerge, 2010). Sample replication is divided into

technical replication, which is dened as perform-

ing the same analysis multiple times on the same

sample, and biological replication, a study depend-

ent term that can be loosely dened as harvesting

the same type of sample from the same type of

organism from the same conditions.

Technical variation arises when errors occur in

the experimental procedure and can be accounted

for through technical replication. Illumina sequenc-

ing produces negligible technical variability,

removing the need for technical replication in RNA-

seq experiments (Marioni et al., 2008). However,

when coverage is low for certain transcripts, techni-

cal variation can still arise (McIntyre et al., 2011).

us, technical replication should be considered

for dual RNA-seq experiments where there is low

representation of pathogen RNA within a sample,

resulting in low coverage of pathogen transcripts.

Box 8.1 Total RNA quantication

It is not always possible to accurately predict the amounts of host and pathogen RNA that will be present

in a sample. While it is possible to measure the amount of host and pathogen DNA in a sample using

qRT-PCR, this is not always an accurate reection of total host and pathogen RNA. This problem can be

circumvented by the addition of RNA spike-ins to samples. An RNA spike-in for RNA-seq is a mixture

of synthesized RNA transcripts of known sequence, concentration and abundance. While inclusion of

an RNA spike-in could increase the cost of sequencing due to increased coverage requirements, it can

be used to measure sensitivity and accuracy of sequencing as well as to detect biases that can occur

during RNA-seq (Jiang et al., 2011). Furthermore, standard curves can be generated from RNA spike-ins.

This allows for more accurate quantication of transcript abundance (Jiang et al., 2011). In dual RNA-

seq experiments, it is possible to use these standard curves to estimate host and pathogen RNA levels

within a sample. However, it is important to ensure that none of the spike-in sequences are present in the

genome of either host or pathogen, as this could preclude accurate quantication of genes containing

similar sequences and the use of those spike-in sequences (Jiang et al., 2011).

Curr. Issues Mol. Biol. Vol. 27

Naidoo et al.

130

|

While technical replication can be excluded due

to reliability of the technology, biological replica-

tion remains crucial to all RNA-seq experiments.

Besides accounting for biological variation (Hansen

et al., 2011; Neleton, 2014), biological replication

signicantly aects the power and accuracy of dier-

ential expression analyses. Liu et al. (2014) showed

that increasing the number of biological replicates

sequenced increased the number of accurately

identied dierentially expressed genes, whereas

increased read depth produced diminishing returns

for both statistical power and the precision with

which dierential expression is detected. is is

especially important in dual RNA-seq experiments

where biological variation is introduced from both

pathogen and host.

Library construction and sequencing

e main factors to consider during library con-

struction and sequencing are depletion methods,

strandedness, insert size, read length, and read

depth. e use of strand-specic rather than non-

strand-specic libraries [reviewed in Levin et al.

(2010)], allows the accurate detection of anti-sense

transcription and can allow accurate expression

quantication of overlapping transcripts. us,

strand-specic sequencing in dual RNA-seq

experiments could enable detection of evidence

suggesting host–pathogen interaction through anti-

sense transcription.

e choice of insert size is dependent on the

complexity of the transcriptome and the target

RNA species (reviewed in Head et al., 2014). Insert

size selection can be a limiting factor in which RNA

species can be analysed because inclusion of a size

selection step during library preparation results in

loss of transcripts shorter than the selected insert

size. Insert size selection also imposes an upper

limit on read length, since reads longer than the

insert size will sequence into adapters, providing no

new information.

Apart from insert size, the choice of read length

is dependent mainly on the objectives of the study

and the quality of the reference sequence used

for mapping. If a high quality and well annotated

reference sequence is available, increasing read

length above 50 bp is unnecessary for accurate

detection of dierential expression (Chhangawala

et al., 2015). Similarly, sequencing of paired-end

instead of single-end reads does not signicantly

aect detection of dierential expression in these

cases (Chhangawala et al., 2015). Conversely, when

studying organisms with less well dened reference

sequences, sequencing of longer paired-end reads

increases the accuracy of splice junction detection

(Chhangawala et al., 2015). When no reference

sequence is available, it is oen assumed that longer

reads equate to increased accuracy for de novo

assembly. Similar to the detection of dierential

expression, however, there seems to be a species-

specic threshold beyond which increasing read

length becomes redundant (Chang et al., 2014).

In cases where a high quality reference is avail-

able, less coverage is required for accurate transcript

identication and quantication, compared to

cases where a reference is missing. is is because

gaps in an assembly arising from low coverage can

be lled using the underlying reference sequence.

For studies relying on de novo assembly, a predicted

minimum of 30× total reference coverage is required

(Martin and Wang, 2011), while genome-guided

assembly can be accomplished with coverage below

10× (Denoeud et al., 2008).

For an RNA-seq experiment to be representa-

tive, it is important to make sure that the number

of reads is sucient to account for the least rep-

resented RNA species. is is also referred to as

sucient sequencing depth. To obtain adequate

depth for a dual RNA-seq experiment, enough

reads need to be sequenced to have at least 1× cov-

erage of the least represented pathogen transcript in

the sample with the lowest level of pathogen to host

RNA. However, it is almost impossible to know this

information when performing de novo RNA-seq

experiments.

Techniques to deplete or enrich certain RNA

species, such as RNA fractionation and poly(A)

selection, can enhance detection of transcripts with

low expression in eukaryotes (Sims et al., 2014).

Depletion of the rRNA fraction can further reduce

the required sequencing depth of an experiment

and, unlike poly(A) selection, allow for detection of

non-poly(A) transcripts. Although depletion-based

methods allow for selection of non-poly(A) RNA

species, these methods can bias quantication of

abundant transcripts and decrease exon coverage

and power to detect splice junctions due to the

presence of sequenced introns from pre-mRNA in

eukaryotes (Martin and Wang, 2011; Sims et al.,

2014).

Curr. Issues Mol. Biol. Vol. 27

Dual RNA-seq of the Plant–Pathogen Duel

|

131

Data analysis

As with sample and data generation, data analysis

considerations are dependent on the underlying

biological questions. Data analysis for the majority

of RNA-seq experiments follows three sequential

steps: (1) quality control, (2) mapping, expres-

sion quantication and DE analysis, and (3)

downstream analysis. Due to the variety of tools

and platforms that can be used for RNA-seq data

analysis (Grant et al., 2011), programs typically

used for RNA-seq data analysis may be created for

specic analysis types and tested within a specic

experimental context. us, it is oen advisable

to repeat an analysis using dierent programs and

compare the outputs.

Quality control

Quality control for dual RNA-seq studies is

similar to that used for traditional RNA-seq stud-

ies. However, contaminant ltering becomes more

complicated as reads from both host and pathogen

need to be retained. While reads originating from

the host and pathogen can be separated by map-

ping to the host and pathogen reference sequences

(Schulze et al., 2015; Westermann et al., 2012),

contamination of various forms should be consid-

ered in order to improve the accuracy and eciency

with which genes and transcripts are mapped and

quantied. Contamination may occur in two main

forms: non-mRNA species (which constitute the

majority of the total RNA extracted) and reads

representing mRNA extracted from organisms

(saprophytes and endophytes) other than the

organisms of interest. ese forms of contamina-

tion may skew the quantication of genes and

transcripts when assembling and mapping reads

to the reference. Westermann et al. (2012) provide

insight into dealing with contaminating RNA which

is species and study dependent.

Contamination in the form of RNA extracted

from endophytic or saprophytic organisms is

especially important in plant–pathogen interaction

studies. Saprophytes may be present at the sites

of wounding due to the degradation of tissue that

occurs, while endophytes colonize areas below the

surface of the plant tissue without causing symp-

toms. us RNA from these types of organisms

can be present in RNA-seq libraries. While surface

sterilization could be used to decrease the presence

of saprophytes, the process is time consuming

and may result in damage to host RNA. Surface

sterilization could also result in decreased pathogen

representation, which is counterproductive for

a dual RNA-seq experiment. erefore, removal

of these contaminating sequences requires bioin-

formatics intervention. is can be accomplished

through stringent mapping of data to a database

of common contaminant cDNA sequences con-

structed from databases such as RefSeq, UniRef100

and GenBank (Ikeue et al., 2015). In cases where

reference genomes are available for known endo-

phytes and saprophytes, stringent alignment to

these references could also be used to lter reads

(Zuluaga et al., 2015).

Mapping, expression quantication and

dierential expression analysis

Mapping is the reconstruction of the transcriptome

through alignment of reads to a reference sequence.

In dual RNA-seq experiments, reads are mapped

to the host reference sequence and the unaligned

sequences are retained and mapped to the patho-

gen reference sequence (Teixeira et al., 2014). A

common program used for read alignment to a

reference is the short read aligner Bowtie, which is

part of the Tophat package of the Tuxedo pipeline

(Trapnell et al., 2012). Box 8.2 describes mapping

and splice site determination. Bowtie allows the

user to set the number of mismatches between the

query and reference sequence, eectively seing a

stringency threshold for the alignment (Langmead

and Salzberg, 2012). is aects the stringency

with which reads will be aligned and eectively

assigned to the host or pathogen reference.

Once the reads have been assembled and l-

tered into host and pathogen libraries, transcript

abundance quantication and dierential expres-

sion analysis can be performed (Boxes 8.3 and

8.4, respectively). Expression levels are quantied

by counting the number of reads mapped to each

gene/transcript, normalized across the length of the

gene/transcript to account for bias across abundant

gene regions, relative to the number of reads in the

original library. Programs like Cuinks (Trapnell

et al., 2012) and RSEM (Li and Dewey, 2011) can

be used to accurately quantify relative numbers of

genes and transcripts. Dierential expression analy-

sis is commonly performed using packages such as

Cudi, DESeq and EdgeR (Anders and Huber,

2010; Robinson et al., 2010; Trapnell et al., 2013).

Curr. Issues Mol. Biol. Vol. 27

Dual RNA-Sequencing to Elucidate the Plant-Pathogen Duel.

Citations

The Road to Resistance in Forest Trees.

A roadmap for research in octoploid strawberry.

Best practices on the differential expression analysis of multi-species RNA-seq.

Coordinated bacterial and plant sulfur metabolism in Enterobacter sp. SA187-induced plant salt stress tolerance.

Dual RNA Sequencing of Vitis vinifera during Lasiodiplodia theobromae Infection Unveils Host-Pathogen Interactions.

References

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Differential expression analysis for sequence count data.

Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks

Differential analysis of gene regulation at transcript resolution with RNA-seq

MAPMAN: a user-driven tool to display genomics data sets onto diagrams of metabolic pathways and other biological processes

Related Papers (5)

Dual RNA-seq of pathogen and host

Simultaneous RNA-seq analysis of a mixed transcriptome of rice and blast fungus interaction.

Role of plant hormones in plant defence responses.

The plant immune system

Single-cell RNA-seq: advances and future challenges