scispace - formally typeset
Search or ask a question
Posted ContentDOI

Structural characterization and evolutionary analyses of the Coccidioides immitis and Coccidioides posadasii mitochondrial genomes

TL;DR: This work describes the assembly and annotation of mitochondrial reference genomes for two representative strains of Coccidioides immitis and C. posadasii, and identifies the fourteen mitochondrial protein-coding genes common to most fungal mitochondria.
Abstract: Fungal mitochondrial genomes encode for genes involved in crucial cellular processes, such as oxidative phosphorylation and mitochondrial translation, and these genes have been used as molecular markers for population genetics studies. Coccidioides immitis and C. posadasii are endemic fungal pathogens that cause coccidioidomycosis in arid regions across both American continents. To date, almost one hundred Coccidioides strains have been sequenced. The focus of these studies has been exclusively to infer patterns of variation of nuclear genomes (nucDNA). However, their mitochondrial genomes (mtDNA) have not been studied. In this report, we describe the assembly and annotation of mitochondrial reference genomes for two representative strains of C. posadasii and C. immitis, as well as assess population variation among 77 published genomes. The circular-mapping mtDNA molecules are 68.2 Kb in C. immitis and 75.1 Kb in C. posadasii. We identified the fourteen mitochondrial protein-coding genes common to most fungal mitochondria, including genes encoding the small and large ribosomal RNAs (rns and rnl), the RNA subunit of RNAse P (rnpB), and 26 tRNAs organized in polycistronic transcription units, which are mostly syntenic across different populations and species of Coccidioides. Both Coccidioides species are characterized by a large number of group I and II introns, harboring twice the number of elements as compared to closely related Onygenales. The introns contain complete or truncated ORFs with high similarity to homing endonucleases of the LAGLIDADG and GIY-YIG families. Phylogenetic comparison of the mtDNA and nucDNA genomes shows discordance, possibly due to differences in patterns of inheritance. In summary, this work represents the first complete assessment of mitochondrial genomes among several isolates of both species of Coccidioides, and provides a foundation for future functional work.

Content maybe subject to copyright    Report

Structural characterization and evolutionary analyses of the
Coccidioides immitis and Coccidioides posadasii mitochondrial
genomes
Marcus de Melo Teixeira
1,2*
, B. Franz Lang
3*
, Daniel R. Matute
4
, Jason E. Stajich
5,6
, Bridget
1
Barker
1,@
2
3
1
Pathogen and Microbiome Institute, Northern Arizona University, Flagstaff, AZ, USA
4
2
Faculty of Medicine, University of Brasília-DF, Brazil
5
3
Robert Cedergren Centre for Bioinformatics and Génomiques, Département de Biochimie,
6
Université de Montréal, Montréal-QC, Canada
7
4
Department of Biology, University of North Carolina, Chapel Hill, USA.
8
4
Institute for Integrative Genome Biology, University of California, Riverside, CA, 92521;
9
5
Department of Microbiology and Plant Pathology, University of California, Riverside, CA, 92521;
10
11
* These authors contributed equally to the manuscript
12
@ Correspondence:
13
Bridget M. Barker, bridget.barker@nau.edu. Pathogen and Microbiome Institute, Northern Arizona
14
University, Applied Research & Development Building, 1395 S. Knoles Drive, Flagstaff, Arizona
15
86011-4073
16
Keywords: Coccidioides
,
coccidioidomycosis, mitochondrial, introns group I and II.
17
ORCIDs:
18
MMT: 0000-0003-17633464
19
BFL: 0000-0003-1035-5449
20
DRM: 0000-0002-7597-602X
21
JES: 0000-0002-7591-0020
22
BMB: 0000-0002-3439-4517
23
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted September 14, 2020. ; https://doi.org/10.1101/2020.09.14.296954doi: bioRxiv preprint

Coccidioides spp. mitochondrial genome
2
Abstract
24
Fungal mitochondrial genomes encode for genes involved in crucial cellular processes, such as
25
oxidative phosphorylation and mitochondrial translation, and these genes have been used as
26
molecular markers for population genetics studies. Coccidioides immitis and C. posadasii are
27
endemic fungal pathogens that cause coccidioidomycosis in arid regions across both American
28
continents. To date, almost one hundred Coccidioides strains have been sequenced. The focus of
29
these studies has been exclusively to infer patterns of variation of nuclear genomes (nucDNA).
30
However, their mitochondrial genomes (mtDNA) have not been studied. In this report, we describe
31
the assembly and annotation of mitochondrial reference genomes for two representative strains of C.
32
posadasii and C. immitis, as well as assess population variation among 77 published genomes. The
33
circular-mapping mtDNA molecules are 68.2 Kb in C. immitis and 75.1 Kb in C. posadasii. We
34
identified the fourteen mitochondrial protein-coding genes common to most fungal mitochondria,
35
including genes encoding the small and large ribosomal RNAs (rns and rnl), the RNA subunit of
36
RNAse P (rnpB), and 26 tRNAs organized in polycistronic transcription units, which are mostly
37
syntenic across different populations and species of Coccidioides. Both Coccidioides species are
38
characterized by a large number of group I and II introns, harboring twice the number of elements as
39
compared to closely related Onygenales. The introns contain complete or truncated ORFs with high
40
similarity to homing endonucleases of the LAGLIDADG and GIY-YIG families. Phylogenetic
41
comparison of the mtDNA and nucDNA genomes shows discordance, possibly due to differences in
42
patterns of inheritance. In summary, this work represents the first complete assessment of
43
mitochondrial genomes among several isolates of both species of Coccidioides, and provides a
44
foundation for future functional work.
45
46
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted September 14, 2020. ; https://doi.org/10.1101/2020.09.14.296954doi: bioRxiv preprint

Coccidioides spp. mitochondrial genome
3
Introduction
47
Fungal mitochondrial genomes exist as either linear or circular-mapping molecules and range from
48
~17.6 kb (e.g. Schizosaccharomyces pombe Genbank ID MK618090.1) to well over 200 kb (e.g.
49
272,238 bp in Morchella importuna (1)). Fungal mitochondrial genomes usually encode proteins
50
involved in oxidative phosphorylation - the main source of ATP production of the cell - as well as
51
two ribosomal RNA subunits, and a set of tRNAs involved in mitochondrial ribosome translation.
52
More specifically, fungal mitochondrial protein-coding genes fall into several classes: seven subunits
53
of ubiquinone oxidoreductase (nad; not present in a number of Saccharomycotina and in fission
54
yeasts, (2)), cytochrome b (cob), three subunits of cytochrome oxidase (cox) and up to three ATP
55
synthase subunits (atp; the presence of atp8 and atp9 varies among fungal taxa) (3). Also, a gene
56
encoding a ribosomal protein subunit (rps3) is present in most fungal mitochondrial genomes.
57
Mitochondrial protein-coding genes are frequently intercalated with genes that encode structural
58
RNAs: ribosomal RNAs (small and large subunit rRNAs rns and rnl), the RNA subunit of RNase P
59
(rnpB) with infrequent occurrence across fungi, and variable numbers of tRNAs. Notable exceptions
60
are the nad genes, which tend to be organized in operon-like structures, with some of the genes
61
overlapping without discernable intergenic regions (e.g., nad4L situated upstream of nad5,
62
overlapping by one to a dozen or more nucleotides) (3).
63
Mitochondrial genes in fungi contain highly variable numbers of group I and II introns that
64
are inserted in protein-coding as well as rRNA genes (4). For instance, Endoconidiophora species
65
seem to contain more than 80 mitochondrial introns (5), which can create gene annotation challenges
66
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted September 14, 2020. ; https://doi.org/10.1101/2020.09.14.296954doi: bioRxiv preprint

Coccidioides spp. mitochondrial genome
4
especially when transcriptome data are not available. Both intron groups may contain complete or
67
truncated ORFs that encode either homing endonucleases of the LAGLIDADG and GIY-YIG
68
families, or reverse transcriptases/maturases (6). If present, these proteins direct an intron transfer
69
within mitochondrial genomes of genetically compatible fungal isolates, or less frequently across
70
genera, and even kingdom boundaries (7). Mitochondrial DNA (mtDNA)-encoded genes are
71
particularly prone to crossing species boundaries. As intron transfer via homing endonucleases
72
involves genetic co-conversion of flanking exon sequences, phylogenetic inferences using mtDNA
73
especially genes with high intron numbers (e.g., cox1, cob and rnl (3, 8))— may reveal replacement
74
of coding regions, related to ongoing intron invasion.
75
In this study, we focus on describing the mitogenomes of Coccidioides immitis and C.
76
posadasii (Ascomycota, Onygenales), which are fungal species endemic to both American
77
continents, and the causative agents of coccidioidomycosis (9). This disease is most frequently
78
reported in the “Lower Sonoran Life Zone” in California, Arizona, Texas, and northwestern Mexico
79
(10). However, the disease is also reported in arid and semi-arid areas throughout the American
80
continents (11). The two species have a complex evolutionary history dominated by biogeographic
81
distribution patterns (12, 13). Coccidioides immitis has been found in California and Baja Mexico as
82
well in eastern Washington state, and each region harbors unique genotypes (14-16). Coccidioides
83
posadasii is present throughout Arizona, Texas, Central, and South America, and population
84
structure has been described as containing an Arizona population, a Texas/Mexico/South America
85
(TX/MX/SA) population, and a distinct Caribbean population (13).
86
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted September 14, 2020. ; https://doi.org/10.1101/2020.09.14.296954doi: bioRxiv preprint

Coccidioides spp. mitochondrial genome
5
Notably, nucDNA studies have found extensive differentiation between species of
87
Coccidioides with some evidence for gene flow between species (17, 18). The two species, C.
88
immitis and C. posadasii, can be discriminated based on polymorphisms found at the first intron of
89
the cox1 gene (19). Yet, no studies have addressed whether or not mtDNA reflects the divergence of
90
ncDNA, or if mtDNA has moved between Coccidioides species or among populations. In this study
91
we: i) describe the full circular-mapping mitogenomes of C. posadasii and C. immitis, ii) compare
92
their core genes, structural RNAs and introns of group I and II with other Onygenales fungal species,
93
and iii) compare the evolutionary trajectories between the mtDNA and nucDNA genomes among
94
publicly available genomes of this medically important fungal pathogen.
95
Materials and Methods
96
Mitochondrial genome assembly and annotation
97
Paired end Illumina sequence reads from 20 Coccidioides immitis and 57 C. posadasii were retrieved
98
from the Sequence Read Archive (SRA) and accessions and details are listed in Table S1. Following
99
cleaning and quality-clipping of reads with Trimmomatic v0.35, we assembled the genomes of C.
100
posadasii Tucson-2 and C. immitis WA221 using the SPAdes Genome Assembler v3.14.0 (20) with
101
a kmer sizes 61, 91, and 127. We identified mitochondrial contigs in this initial assembly using
102
similarity searches with expected fungal genes. To minimize assembly error we (i) used Rcorrector
103
[Song, L., Florea, L. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.
104
GigaSci 4, 48 (2015).] for read correction, (ii) reduced the number of Illumina reads to a target kmer
105
coverage of the mtDNA between 30-50x, (iii) reads mapping against the identified mitochondrial
106
contigs were identified with Bowtie2 (21), which were then (iii) reassembled with Spades, resulting
107
in preliminary (uncorrected) mitogenome assemblies. In a final step, all reads of the reduced 30-50x
108
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted September 14, 2020. ; https://doi.org/10.1101/2020.09.14.296954doi: bioRxiv preprint

Citations
More filters
01 Jun 2012
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online ( http://bioinf.spbau.ru/spades ). It is distributed as open source software.

10,124 citations

Journal ArticleDOI
TL;DR: This research reveals the first significant phenotypic difference between the two species that directly applies to ecological research, and indicates thermotolerance differs between these two species.
Abstract: Coccidioidomycosis, or Valley fever, is caused by two species of dimorphic fungi. Based on molecular phylogenetic evidence, the genus Coccidioides contains two reciprocally monophyletic species: C. immitis and C. posadasii. However, phenotypic variation between species has not been deeply investigated. We therefore explored differences in growth rate under various conditions. A collection of 39 C. posadasii and 46 C. immitis isolates, representing the full geographical range of the two species, was screened for mycelial growth rate at 37 °C and 28 °C on solid media. The radial growth rate was measured for 16 days on yeast extract agar. A linear mixed effect model was used to compare the growth rate of C. posadasii and C. immitis at 37 °C and 28 °C, respectively. C. posadasii grew significantly faster at 37 °C, when compared to C. immitis; whereas both species had similar growth rates at 28 °C. These results indicate thermotolerance differs between these two species. As the ecological niche has not been well-described for Coccidioides spp., and disease variability between species has not been shown, the evolutionary pressure underlying the adaptation is unclear. However, this research reveals the first significant phenotypic difference between the two species that directly applies to ecological research.

14 citations

References
More filters
Journal ArticleDOI
TL;DR: Burrows-Wheeler Alignment tool (BWA) is implemented, a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps.
Abstract: Motivation: The enormous amount of short reads generated by the new DNA sequencing technologies call for the development of fast and accurate read alignment programs. A first generation of hash table-based methods has been developed, including MAQ, which is accurate, feature rich and fast enough to align short reads from a single individual. However, MAQ does not support gapped alignment for single-end reads, which makes it unsuitable for alignment of longer reads where indels may occur frequently. The speed of MAQ is also a concern when the alignment is scaled up to the resequencing of hundreds of individuals. Results: We implemented Burrows-Wheeler Alignment tool (BWA), a new read alignment package that is based on backward search with Burrows–Wheeler Transform (BWT), to efficiently align short sequencing reads against a large reference sequence such as the human genome, allowing mismatches and gaps. BWA supports both base space reads, e.g. from Illumina sequencing machines, and color space reads from AB SOLiD machines. Evaluations on both simulated and real data suggest that BWA is ~10–20× faster than MAQ, while achieving similar accuracy. In addition, BWA outputs alignment in the new standard SAM (Sequence Alignment/Map) format. Variant calling and other downstream analyses after the alignment can be achieved with the open source SAMtools software package. Availability: http://maq.sourceforge.net Contact: [email protected]

43,862 citations

Journal ArticleDOI
TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Abstract: As the rate of sequencing increases, greater throughput is demanded from read aligners. The full-text minute index is often used to make alignment very fast and memory-efficient, but the approach is ill-suited to finding longer, gapped alignments. Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.

37,898 citations


"Structural characterization and evo..." refers methods in this paper

  • ...GigaSci 4, 48 (2015).] for read correction, (ii) reduced the number of Illumina reads to a target kmer coverage of the mtDNA between 30-50x, (iii) reads mapping against the identified mitochondrial contigs were identified with Bowtie2 (21), which were then (iii) reassembled with Spades, resulting in preliminary (uncorrected) mitogenome assemblies....

    [...]

  • ...Coccidioides annotations 125 were manually inspected and intron boundaries were checked and adjusted by aligning available 126 RNAseq data (27) with respective mitochondrial assemblies using Bowtie 2 (21)....

    [...]

  • ...] for read correction, (ii) reduced the number of Illumina reads to a target kmer 105 coverage of the mtDNA between 30-50x, (iii) reads mapping against the identified mitochondrial 106 contigs were identified with Bowtie2 (21), which were then (iii) reassembled with Spades, resulting 107 in preliminary (uncorrected) mitogenome assemblies....

    [...]

  • ...In a final step, all reads of the reduced 30-50x read set were aligned back to the preliminary assembly with Bowtie2 and analyzed for kmer coverage with Bedtools v2....

    [...]

Journal ArticleDOI
TL;DR: The GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.
Abstract: Next-generation DNA sequencing (NGS) projects, such as the 1000 Genomes Project, are already revolutionizing our understanding of genetic variation among individuals. However, the massive data sets generated by NGS—the 1000 Genome pilot alone includes nearly five terabases—make writing feature-rich, efficient, and robust analysis tools difficult for even computationally sophisticated individuals. Indeed, many professionals are limited in the scope and the ease with which they can answer scientific questions by the complexity of accessing and manipulating the data produced by these machines. Here, we discuss our Genome Analysis Toolkit (GATK), a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers using the functional programming philosophy of MapReduce. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. Separating specific analysis calculations from common data management infrastructure enables us to optimize the GATK framework for correctness, stability, and CPU and memory efficiency and to enable distributed and shared memory parallelization. We highlight the capabilities of the GATK by describing the implementation and application of robust, scale-tolerant tools like coverage calculators and single nucleotide polymorphism (SNP) calling. We conclude that the GATK programming framework enables developers and analysts to quickly and easily write efficient and robust NGS tools, many of which have already been incorporated into large-scale sequencing projects like the 1000 Genomes Project and The Cancer Genome Atlas.

20,557 citations

Journal ArticleDOI
TL;DR: SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies.
Abstract: The lion's share of bacteria in various environments cannot be cloned in the laboratory and thus cannot be sequenced using existing technologies. A major goal of single-cell genomics is to complement gene-centric metagenomic data with whole-genome assemblies of uncultivated organisms. Assembly of single-cell data is challenging because of highly non-uniform read coverage as well as elevated levels of sequencing errors and chimeric reads. We describe SPAdes, a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V−SC assembler (specialized for single-cell data) and on popular assemblers Velvet and SoapDeNovo (for multicell data). SPAdes generates single-cell assemblies, providing information about genomes of uncultivatable bacteria that vastly exceeds what may be obtained via traditional metagenomics studies. SPAdes is available online (http://bioinf.spbau.ru/spades). It is distributed as open source software.

16,859 citations


Additional excerpts

  • ...0 (20) with 101 a kmer sizes 61, 91, and 127....

    [...]

Journal ArticleDOI
TL;DR: It is shown that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented and found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space.
Abstract: Large phylogenomics data sets require fast tree inference methods, especially for maximum-likelihood (ML) phylogenies. Fast programs exist, but due to inherent heuristics to find optimal trees, it is not clear whether the best tree is found. Thus, there is need for additional approaches that employ different search strategies to find ML trees and that are at the same time as fast as currently available ML programs. We show that a combination of hill-climbing approaches and a stochastic perturbation method can be time-efficiently implemented. If we allow the same CPU time as RAxML and PhyML, then our software IQ-TREE found higher likelihoods between 62.2% and 87.1% of the studied alignments, thus efficiently exploring the tree-space. If we use the IQ-TREE stopping rule, RAxML and PhyML are faster in 75.7% and 47.1% of the DNA alignments and 42.2% and 100% of the protein alignments, respectively. However, the range of obtaining higher likelihoods with IQ-TREE improves to 73.3-97.1%. IQ-TREE is freely available at http://www.cibiv.at/software/iqtree.

13,668 citations


"Structural characterization and evo..." refers methods in this paper

  • ...We generated 138 Maximum Likelihood (ML) concatenated trees for mtDNA and nucDNA using methods 139 implemented in IQ-TREE software (32) using -m MFP option (ModelFinder - (33)) for model 140 selection and 1,000 ultrafast bootstraps coupled with Shimodaira–Hasegawa-like approximate 141 likelihood ratio test (SH-aLRT) were performed for branch confidence test (34)....

    [...]

Frequently Asked Questions (6)
Q1. What are the contributions mentioned in the paper "Structural characterization and evolutionary analyses of the coccidioides immitis and coccidioides posadasii mitochondrial genomes" ?

Teixeira et al. this paper proposed two mitochondrial genomes, Coccidioides immitis and Posadasii, for mitochondrial genomes. 

Mitochondrial markers are extensively used as 216 molecular markers in speciation studies, including for Coccidioides (19, 37, 38). 

The terminal taxa are color-coded according to their placement 273 on the nucDNA tree and taxa are connected between mtDNA and nucDNA phylogenomic trees in 274 order to visualize concordance (solid lines) vs discordance (dotted lines). 

To minimize assembly error the authors (i) used Rcorrector 103 [Song, L., Florea, L. Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads. 

The 168 introns found in the Coccidioides mitogenomes contain complete or truncated ORFs with high 169 similarity to homing endonucleases of the LAGLIDADG and GIY-YIG families (Table 1). 

Conflicting phylogenetic and 230 population distributions have been observed in other pathogenic fungi, and their results indicate shared 231 ancestry among recently diverged C. immitis and C. posadasii populations.