scispace - formally typeset
Open AccessJournal ArticleDOI

Nuclear ribosomal spacer regions in plant phylogenetics: problems and prospects

Reads0
Chats0
TLDR
The information is summarized about the structure and utility of the phylogenetically informative spacer regions of the rDNA, namely internal- and external transcribedSpacer regions as well as the intergenic spacer (IGS).
Abstract
The nuclear ribosomal locus coding for the large subunit is represented in tandem arrays in the plant genome. These consecutive gene blocks, consisting of several regions, are widely applied in plant phylogenetics. The regions coding for the subunits of the rRNA have the lowest rate of evolution. Also the spacer regions like the internal transcribed spacers (ITS) and external transcribed spacers (ETS) are widely utilized in phylogenetics. The fact, that these regions are present in many copies in the plant genome is an advantage for laboratory practice but might be problem for phylogenetic analysis. Beside routine usage, the rDNA regions provide the great potential to study complex evolutionary mechanisms, such as reticulate events or array duplications. The understanding of these processes is based on the observation that the multiple copies of rDNA regions are homogenized through concerted evolution. This phenomenon results to paralogous copies, which can be misleading when incorporated in phylogenetic analyses. The fact that non-functional copies or pseudogenes can coexist with ortholougues in a single individual certainly makes also the analysis difficult. This article summarizes the information about the structure and utility of the phylogenetically informative spacer regions of the rDNA, namely internal- and external transcribed spacer regions as well as the intergenic spacer (IGS).

read more

Content maybe subject to copyright    Report

https://helda.helsinki.fi
Nuclear ribosomal spacer regions in plant phylogenetics :
problems and prospects
Poczai, Péter
2010
Poczai , P & Hyvönen , J 2010 , ' Nuclear ribosomal spacer regions in plant phylogenetics :
problems and prospects ' , Molecular Biology Reports , vol. 37 , no. 4 , pp. 1897-1912 . https://doi.org/10.1007/s11033-009-9630-3
http://hdl.handle.net/10138/29505
https://doi.org/10.1007/s11033-009-9630-3
submittedVersion
Downloaded from Helda, University of Helsinki institutional repository.
This is an electronic reprint of the original article.
This reprint may differ from the original in pagination and typographic detail.
Please cite the original version.

Nuclear ribosomal spacer regions in plant phylogenetics:
problems and prospects
Pe
´
ter Poczai Æ Jaakko Hyvo
¨
nen
Received: 22 December 2008 / Accepted: 9 July 2009 / Published online: 21 July 2009
Ó Springer Science+Business Media B.V. 2009
Abstract The nuclear ribosomal locus coding for the
large subunit is represented in tandem arrays in the plant
genome. These consecutive gene blocks, consisting of
several regions, are widely applied in plant phylogenetics.
The regions coding for the subunits of the rRNA have the
lowest rate of evolution. Also the spacer regions like the
internal transcribed spacers (ITS) and external transcribed
spacers (ETS) are widely utilized in phylogenetics. The
fact, that these regions are present in many copies in the
plant genome is an advantage for laboratory practice but
might be problem for phylogenetic analysis. Beside routine
usage, the rDNA regions provide the great potential to
study complex evolutionary mechanisms, such as reticulate
events or array duplications. The understanding of these
processes is based on the observation that the multiple
copies of rDNA regions are homogenized through con-
certed evolution. This phenomenon results to paralogous
copies, which can be misleading when incorporated in
phylogenetic analyses. The fact that non-functional copies
or pseudogenes can coexist with ortholougues in a single
individual certainly makes also the analysis difficult. This
article summarizes the information about the structure and
utility of the phylogenetically informative spacer regions of
the rDNA, namely internal- and external transcribed spacer
regions as well as the intergenic spacer (IGS).
Keywords Internal transcribed spacer (ITS)
External transcribed spacer (ETS)
Intergenic spacer (IGS) Nuclear ribosomal DNA (rDNA)
Phylogenetics
Introduction
The ribosomal RNA (rRNA) genes and their spacer regions
have become widely used as a source of phylogenetic
information across the entire breadth of life [1]. The pop-
ularity of the rDNA locus for phylogenetics might be
attributed to the phenomena that they serve the same
function in all free-living organisms. They have the same
or almost the same structure within a wide range of taxa.
The coding regions, like the small- and large subunit gene,
represent some of the most conservative sequences in
eukaryotes [2, 3], which is a result of a strong selection
against any loss-of-function mutation in components of the
ribosome subunits [4]. The most conservative part appears
to be the 3
0
end of the 26S rDNA representing the a-sarcin/
ricin (S/R) loop [5]. The information provided by the
rDNA locus in phylogenetic research is significant, and it
can be used at different taxonomic levels, since the specific
regions of the rDNA loci are conserved differentially. The
spacer regions of the rDNA locus possess information
useful for plant systematics from species to generic level.
They have also been used on studies of speciation and
biogeography, due to the high sequence variability and
divergence. There are three notable spacer regions: the
external- and internal transcribed spacers (ETS, ITS) and
the intergenic spacer (IGS). The general properties of these
rDNA spacer regions will be reviewed in a phylogenetic
context. Besides the general description, organization and
P. Poczai (&)
Department of Plant Sciences and Biotechnology, Georgikon
Faculty, University of Pannonia, Festetics 7, 8360 Keszthely,
Hungary
e-mail: guanine@ex1.georgikon.hu
J. Hyvo
¨
nen
Plant Biology, University of Helsinki, P.O. Box 65,
FI-00014 Helsinki, Finland
e-mail: jaakko.hyvonen@helsinki.fi
123
Mol Biol Rep (2010) 37:1897–1912
DOI 10.1007/s11033-009-9630-3

structure of each spacer, the recent advances made in the
utilitization of each unit will also be discussed. Some of
these are well summarized in other studies (like for ITS),
while for ETS and IGS the relevant new findings have not
been adequately reviewed. Thus, the aim of this study is to
summarize the features of all rDNA spacer regions suitable
for phylogenetic research.
The internal transcribed spacer as a phylogenetic
marker
The internal transcribed spacer (ITS) is intercalated in the
16S-5.8S-26S region separating the elements of the rDNA
locus (Fig. 1). The ITS region consists of three parts: the
ITS1 and ITS2 and the highly conserved 5.8S rDNA exon
located in between [6]. The total length of this region varies
between 500 and 750 bp in angiosperms [7] while in other
seed plants it can be much longer, up to 1,500–3,500 bp [8,
9]. Both spacers are incorporated into the mature ribosome,
but undergo a specific cleavage during the maturation of
the ribosomal RNAs [1012]. It is now certain that ITS2 is
sufficient for the formation of the large subunit (LSU)
rRNA during the ribosome biogenesis [13]. The correct
higher order structure of both spacers is important to direct
endonucleolytic enzymes to proper cut sites [14].
Although, the sequence length of the ITS2 is highly vari-
able between different organisms, Hadjiolova et al. [15]
identified structurally homologous domains within mam-
mals and Saccharomyces cerevisiae. In contrast to the
coding regions, spacers evolve more quickly, like the
internal transcribed spacer (ITS) region, which is exten-
sively used as a marker for phylogenetic reconstruction at
different levels. Since its first application by Porter and
Collins [16] it has become widely used for phylogeny
reconstruction. As a part of the transcriptional unit of
rDNA, the ITS is present in virtually all organisms [11].
The advantages of this region are: (1) biparental inheri-
tance, in comparison to the maternally inherited chloroplast
and mitochondrial markers; (2) easy PCR amplification,
with several universal primers available for a various kind
of organisms; (3) multicopy structure; (4) moderate size
allowing easy sequencing; and (5) based on published
studies it shows variation at the level that makes it suitable
for evolutionary studies at the species or generic level [7
9]. A
´
lvarez and Wendel [17] and Baldwin et al. [7] sum-
marize that this variability is due to frequently occurring
nucleotide polymorphisms or to common insertions/dele-
tions in the sequence. This high rate of divergence is also
an important source to study population differentiation or
phylogeography [1821]. It has been widely utilized across
the whole tree of life, including fungi [2231], animals
[3236], different groups of ‘algae’ [14, 3739] lichens,
and bryophytes [40, 41]. In addition it is often used in the
other two major domains of the tree of life Archaea and
Bacteria [4246] where RISSC, a novel database for
ribosomal 16S–23S RNA genes and spacer regions is
developed to provide easy access to information [47].
The high copy numbers allow for highly reproducible
amplification and sequencing results, as well the potential
Fig. 1 Schematic presentation of the universal structure of the rDNA
region in plants. (a) The chromosomal location of the rDNA regions.
(b) Tandem arrays of the consecutive gene blocks (18S-5.8S-26S). In
the tandem arrays each gene block is separated by an intergenic
spacer (IGS) consisting of a 5
0
end and 3
0
end external transcribed
spacer (ETS). The two ETS regions are separated by a non-
transcribed region (NTS). The transcription start site (TIS) labels
the start position of the 5
0
ETS. The small subunit (18S) and large
subunit genes (5.8S and 26S) are separated by the internal transcribed
spacer 1 (ITS1) and internal transcribed spacer 2 (ITS2)
1898 Mol Biol Rep (2010) 37:1897–1912
123

to study concerted and reticulate evolution. The number of
studies utilizing ITS in phylogenetic studies is increasing,
publicly available ITS sequences has tripled since 2003
[11]. The plant families most intensively studied are As-
teraceae, Fabaceae, Orchideaceae, Poaceae, Brassicaceae,
and Apiaceae. At the genus level there are for example
more than 1,000 sequences available for different species
of Carex (NCBI GenBank, nucleotide search preformed in
15.02.2009).
Besides several advantages there are many drawbacks
for use of rDNA ITS data in evolutionary studies. There are
hundreds or thousands of ITS copies in a typical plant
genome [17]. Inferring phylogeny from multigene families
like ITS can lead to erroneous results, because there is
variation among the different repeats present in a single
eukaryote genome [48]. Evidence now suggests that this
variation among ITS sequences of an organism is found
only within organisms that are hybrids or polyploids [49].
Multiple rDNA arrays and paralogy
Several ribosomal loci, both transcriptionally active and
inactive, are usually present in plant genomes [50]. As
ribosomes are the workhorses of the protein biosynthesis,
translating mRNA to build polypeptide chains, they are
extremely important structures in the cell. For this reason
many copies are required to tend to the needs of an
organism for this important process. These copies as well
as their number and distribution in the plant genome are
highly variable [5156]. As both ITS regions are part of the
cytoplasmic ribosome genes playing a role in the formation
of the mature ribosome, there are hundreds, or in some
cases thousands of tandem copies [57, 58]. Because of the
high copy number this region is recognized as a multi-copy
gene family, which provides easy amplification via PCR.
This is an advantage, but on the other hand it can be a
problem in phylogenetic analyses, if paralogous sequences
are present. However, the general assumption for phylo-
genetic studies is that all ribosomal copies present within
the genome have fairly identical sequences due to func-
tional constraints. Orthologous genes and gene products
found in different species are the basic requirement of
phylogenetic inferences concerning common ancestry
among species [59]. Unidentified paralogous relationships
and infrequent recombination between paralogues can
result in erroneous species phylogenies [60]. Paralogous
sequences can occur at many levels: within an individual,
among individuals within a species, and among species. To
determine intra-individual paralogues among sequences of
an individual and to find which are maintained and shared
with other species is a potential problem in phylogenetic
analysis. Another problem is PCR amplification, because
the ITS sequence amplified is a consensus of many targets
sharing the same priming sites in one or several loci usually
located in separate chromosomes. This consensus sequence
used as a row of data in phylogenetic analysis is a
molecular phenotype from which the genotype of the
organism cannot always be inferred [50]. It is also
impossible to determine the zygosity of the marker. There
are two types of alternative copies which can be detected
with PCR. First there are sequences having the same size as
the others from different loci, but there are SNPs (single
nucleotide polymorphisms) in different positions within
their sequence. Sequences can differ also in size, because
of permanent insertion/deletion events. Both types occur
when different ITS repeats are merged within a single
genome via hybridization (including allopolyploidy) or
introgression. These processes are very common in plants;
recent estimates suggest that 70% of all angiosperms have
experienced one or more episodes of polyploidization [61].
Concerted evolution
In plants the ribosomal genes are present in several copies.
For example in Arabidopsis thaliana more than 1,400
genes encode rRNAs, and occur on different chromosomes,
with specific polymorphic alleles largely homogeneous in
each rDNA array [62]. All copies within and among ribo-
somal loci are expected to be homogenized through
genomic mechanism of turnover like gene conversion, the
non-reciprocal transfer of genetic information between
similar sequences, and unequal crossing over [63]. This
phenomenon was first reported by Schlo
¨
tterer and Tautz
[64] and later by Polanco et al. [
65] studying polymor-
phisms within the ITS in populations of Drosophila. They
found that individual rDNA arrays are homogenized for
different polymorphic alleles, which indicate that intra-
chromosomal recombination events occur at rates much
higher than those for recombination between homologous
chromosomes at the rDNA locus. The intra-genomic rDNA
diversity is generally low, and this low diversity results
from concerted evolution within and between ribosomal
loci [66]. The mechanism of concerted evolution com-
pletely, or almost completely, reduces the level of inter-
repeat sequence variation between the multiple arrays of
rDNA in every organism [17].
The fact that the ribosomal multigene family evolves
through the process called concerted evolution certainly
makes phylogenetic analysis much more difficult. It is
important to recognize that concerted evolution is a
complex process. According to various authors there are
special stages during the process of concerted evolution,
which lead to different classes in the plant genome [49,
6769]. These stages can be important features in
Mol Biol Rep (2010) 37:1897–1912 1899
123

phylogenetics, leading to questions: Are the several cop-
ies homogenized properly? Are there any heterogenic
sequences? Which copy is the dominant sequence in the
genome? Are there any variations between the sequences
of an individual? However, concerted evolution does not
act immediately after organismal processes such as
hybridization or polyploidization, or after genomic chan-
ges like gene and chromosome segment duplication, and
various forms of homologous and non-homologous
recombination [11]. Thus, divergent rDNA copies could
be present throughout the genome, disturbing phyloge-
netic analysis and sequencing. Because paralogous copies
occur due to polyploidy or hybridization they can be
utilized to study these processes. The presence of parental
rDNA repeat types in a hybrid is determined by many
forces affecting their molecular evolution [70]. The
detection of these alternative copies depends on their
number. If hybrids are recent, both parental types are
almost always present [71]. Such hybridization can be
easily reveled by direct sequencing, where an additive
pattern of sequence variation is present. In such cases, the
sites differing between species yield signals from two
different nucleotides. According to Rauscher et al. [70]it
is unclear how common a repeat type must be, relative to
the other parental type. In the case of Gossypium spp. the
homogenization process was complete, leaving no easily
traceable evidence in the ITS region to track polyploidy
[69]. However, in the Glycine tomentella complex the ITS
region was successfully used to evaluate parental rela-
tionships and hybrid speciation [70]. In this study repeat-
specific and exclusion PCR primers were designed to
detect rare parental ITS types. In another study Koch
et al. [72] clarified the multiple hybrid origin of natural
populations of Arabis divaricarpa, the putative hybrid of
A. holboellii and A. drummondii. They detected multiple
intraindividual ITS copies in several A. divaricarpa
accessions which were also present in the parental spe-
cies. But concerted evolution in this case also resulted in
different ITS types, in the hybrid A. divaricarpa and in
the parental taxa, respectively. In other groups like Pot-
amogeton [73], Bromus [74], Nymphaea [75], Armeria
[76] and Cardamine [77] ITS was a valuable source to
reveal complex reticulate events between putative hybrids
[7881]. Concerted evolution is sometimes incomplete
and some copies of the tandem arrays became non-func-
tional pseudogenes [50]. Mayol and Rossello
´
[82] reana-
lyzed datasets by two different and independent
laboratory teams [83, 84] generated for the study of
systematics of the genus Quercus. Their surprising result
was that the divergent ITS alleles reported by one of the
teams were non-functional paralogous copies (pseudo-
genes). It was also concluded that the incorporation of
these ITS paralogues in evolutionary studies can lead to
erroneous hypotheses about phylogeny. Standard defini-
tions of pseudogenes are hard to be apply to rDNA
pseudogenes. In the context of phylogenetic reconstruc-
tion Bailey et al. [85] determined rDNA pseudogenes as
sequences with nucleotide divergence pattern that has not
been constrained by function irrespective of expression
patterns.
Secondary structure modeling of the ITS region
The construction of the secondary structure model of the
ITS RNA transcript was proposed as a novel tool for
phylogenetics. These new methods have also made anal-
yses easier in a user-friendly interface (e.g., online dat-
abases and programs). The importance of this recent
advance enable inference of phylogenies not only based
on sequence information, but also based on predicted
secondary structures. The phenomena that rRNA single-
stranded chains form a secondary structure which contain
stemmed regions and different loops correlating with base
pairing opened a new field to infer phylogenies. During
phylogenetic analysis it is hard to determine whether a
pseudogene or a paralogous sequence has interfered the
results. ITS2 is a well suited marker with a broad use in
low level phylogenetic analyses, as its sequence evolves
quite fast. This feature, which made the region useful for
analyses at generic and infrageneric level, is a ‘hindrance’
for the application of this marker for more general phy-
logenetic analyses [86]. The possibility to predict the
folding structure has enhanced the role of ITS in phylo-
genetic studies, since this will enhance sequence align-
ment which can be based on secondary structures [87].
When comparing the structure of the ITS2 RNA tran-
script, it turned out that a conserved core is found in
different species. Many methods have been applied to
infer the secondary structure of the ITS2, like electron
microscopy [88], chemical and structure probing [89], and
site-directed mutagenesis [90, 91] Also different softwares
have been developed for this purpose [86]. The surprising
result of these studies has been that the examined
eukaryote groups share the same general ITS2 secondary
structure [92]. It was concluded that the secondary
structure for the ITS2 consists of four helixes. Among
plants, nucleotide sequence evolves most rapidly in region
IV followed by helix I [48]. It was also described that
‘helix II is more stabile, and characteristically has a
pyrimidine-pyrimidine bulge while helix III contains on
the 5
0
side the single most conserved primary sequence, a
region of approx. 20 bp encompassing the TGGT’ [48].
The fundamental role of the helicoidal ring of ITS2
during the pre-RNA processes is to trigger the maturation
of the 26S rRNA, because it was observed that the lack of
its structure blocks the productions of the mature large
1900 Mol Biol Rep (2010) 37:1897–1912
123

Citations
More filters

Molecules and morphology in evolution conflict or compromise third international congress of systematics and evolutionary biology brighton england uk july 4 11 1985

C Patterson
TL;DR: Patterson et al. as discussed by the authors reconstructed the Avian phylogeny from comparisons of the genetic material, DNA Charles G. Sibley and Jon E. Ahlquist, and A. R. Woese.
Journal ArticleDOI

A renaissance in herbal medicine identification: From morphology to DNA

TL;DR: This review summarizes recent key advances in the DNA barcoding of medicinal plant ingredients (herbal materia medica) as a contribution towards safe and efficacious herbal medicines.
Journal Article

An assessment of proposed DNA barcodes in freshwater green algae

TL;DR: Of the markers tested, rbcL,I TS2 and tufA (in chlo- rophytes) are the most promising for use as DNA barcodes, however, none of the loci tested were ideal for use across all tested lineages of green algae.

Phylogenetic and biogeographic relationships of eastern Asian and eastern north American disjunct Suillus species (Fungi) as inf

TL;DR: Phylogenetic patterns revealed by this study imply a close phylogenetic relationship between eastern Asian and eastern North American disjunct population/species of Suillus, suggesting potential coevolutionary/comigratory trends.
References
More filters
Journal ArticleDOI

A general method applicable to the search for similarities in the amino acid sequence of two proteins

TL;DR: A computer adaptable method for finding similarities in the amino acid sequences of two proteins has been developed and it is possible to determine whether significant homology exists between the proteins to trace their possible evolutionary development.
Journal ArticleDOI

ITS primers with enhanced specificity for basidiomycetes--application to the identification of mycorrhizae and rusts.

TL;DR: In this paper, two taxon-selective primers for the internal transcribed spacer (ITS) region in the nuclear ribosomal repeat unit were proposed, which were intended to be specific to fungi and basidiomycetes, respectively.
Journal ArticleDOI

Application of Phylogenetic Networks in Evolutionary Studies

TL;DR: This article reviews the terminology used for phylogenetic networks and covers both split networks and reticulate networks, how they are defined, and how they can be interpreted and outlines the beginnings of a comprehensive statistical framework for applying split network methods.
Journal ArticleDOI

Ribosomal DNA: molecular evolution and phylogenetic inference.

TL;DR: An analysis of aligned sequences of the four nuclear and two mitochondrial rRNA genes identified regions of these genes that are likely to be useful to address phylogenetic problems over a wide range of levels of divergence.
Related Papers (5)
Frequently Asked Questions (15)
Q1. What have the authors contributed in "Nuclear ribosomal spacer regions in plant phylogenetics : problems and prospects" ?

Beside routine usage, the rDNA regions provide the great potential to study complex evolutionary mechanisms, such as reticulate events or array duplications. This article summarizes the information about the structure and utility of the phylogenetically informative spacer regions of the rDNA, namely internaland external transcribed spacer regions as well as the intergenic spacer ( IGS ). 

Although, concerted evolution and the repetitive nature of the ITS could prevent its routine usage, it still might have great potential to study more complex evolutionary relationships. The incorporation and application of new developed protocols and methods to study the divergence among and within repeat types of multigene families, such as the rDNA locus, is a developing area providing new data about phylogeny in both higher and lower level evolutionary studies. Besides the routine use of the nuclear ribosomal spacer regions ( ITS, ETS or IGS ) searching for alternative repeat type sequences which have escaped the homogenization process of concerted evolution is a field that deserves much further attention in future. ETS sequences can be used instead of ITS sequences or in combination when the ITS provides relatively weak phylogenetic signal. 

As ribosomes are the workhorses of the protein biosynthesis, translating mRNA to build polypeptide chains, they are extremely important structures in the cell. 

All copies within and among ribosomal loci are expected to be homogenized through genomic mechanism of turnover like gene conversion, the non-reciprocal transfer of genetic information between similar sequences, and unequal crossing over [63]. 

To determine intra-individual paralogues among sequences of an individual and to find which are maintained and shared with other species is a potential problem in phylogenetic analysis. 

The ribosomal RNA (rRNA) genes and their spacer regions have become widely used as a source of phylogenetic information across the entire breadth of life [1]. 

The ETS region can be used successfully in phylogenetic studies where ITS seems to have only a weak signal, such as in recently diverged lineages, because it shares the same favorable features of the ITS, and it is generally known to evolve faster and to contain more phylogenetically informative characters than the ITS in plants [145, 155, 161]. 

The most noticeable result of this study is that despite of the rapid evolution of the IGS sequences within and between the two legume tribes, some motifs have been conserved in their sequence and relative position. 

The primer design for this region can be problematic too, because the rDNA IGS is known for gradual decrease in sequence conservation upstream from the 18S gene to the center of the rDNA IGS which consists of repetitive elements [120, 121]. 

The fact that these regions are more variable in length or in their sequence composition makes them underutilized, because the routine amplification with available universal PCR primers is not always successful. 

This consensus sequence used as a row of data in phylogenetic analysis is a molecular phenotype from which the genotype of the organism cannot always be inferred [50]. 

The method developed by Schultz et al. [86] to predict ITS2 structures is based on the Needleman–Wunsch algorithm [96], but applies a BLAST search with the newly predicted structure in the database to compare it with others [86, 97, 98]. 

As both ITS regions are part of the cytoplasmic ribosome genes playing a role in the formation of the mature ribosome, there are hundreds, or in some cases thousands of tandem copies [57, 58]. 

The conformational similarities in the higher order in the predicted structures of the RNA transcript might be attributed to stronger functional constrains of the ITS2. 

The popularity of the rDNA locus for phylogenetics might be attributed to the phenomena that they serve the same function in all free-living organisms.