Comparative genomic analysis of the thermophilic biomass-degrading fungi Myceliophthora thermophila and Thielavia terrestris
Summary (5 min read)
Introduction
- Furthermore, 30 biomass-degrading enzymes from thermophilic fungi consistently demonstrate higher hydrolytic 31 3 capacity4 despite the fact that extracellular enzyme titers (in grams per liter) are typically lower 1 than those from more conventionally used species such as Trichoderma or Aspergillus.
- 6 RESULTS 7 Genomes summary 8 Among thermophilic fungi, M. thermophila and T. terrestris are two of the best characterized in 9 terms of thermostable enzymes and cellulolytic activity1–4.
AUTHOR CONTRIBUTIONS 16
- The final text of the manuscript was written by R.M.B. and A.T., and reviewed by I.V.G.; who 17 together also coordinated the overall analysis.
- A.T. coordinated the transcriptome and exo-proteome work, and analyzed the 20 transcriptomes.
- D.O.N. analyzed the mating types and phylogeny of 27 thermophilic fungi.
- D.T. characterized the biochemical properties of the 29 xylanases.
- 12 Reprints and permissions information is available online at http://www.nature.com/reprints/index.html.
Evidence of RIP in thermophilic fungi
- Both of the thermophilic Supplementary Information 3 species (M. thermophlia and T. terrestris) examined in this study show evidence of directional mutation that may be attributed to RIP.
- Sizeable portions of the T. terrestris and M. thermophila genomes appear to comprise transposable elements degraded by RIP.
- In stark contrast, the non-thermophilic species C. globosum is the first member of the subphylum Pezizomycotina in which no evidence of RIP has been found.
Chromatin structure and dynamics
- M. thermophila and T. terrestris each has at least 178 genes (Supplementary Table 23) with clear sequence identity to known yeast genes involved in chromatin structure or dynamics.
- Interestingly, there are only two M. thermophila and three T. terrestris non-orthologous proteins involved in chromatin structure and dynamics.
- The profiles of enzymes involved in the deconstruction of complex carbohydrates were analyzed by a double clustering procedure to compare the number of enzymes in each CAZyme family among various fungi (Supplementary Fig. 3).
- The two thermophiles clustered together, demonstrating the similarity of their overall CAZyme profiles.
- Therefore, in their analysis the authors compared the growth on a specific substrate to the growth on glucose and used the relative difference as a comparison between the species (Supplementary Fig. 5).
Proteases and peptidases
- The criterion for identification as peptidases was that they were identified by both INTERPROSCAN 15 and the MEROPS batch BLAST 16 server (http://merops.sanger.ac.uk) as peptidases.
- That was Mycth_54466, whose nearest homolog (27% identical, 48% similar, 16% gap) in S. cerevisiae is YBR286W, a biochemically characterized aminopeptidase.
- Of all glutamic and aspartic peptidases identified, 100% and approximately 75%, respectively, had signal sequences.
- No threonine peptidases in either organism had signal sequences.
Oxidoreductases
- Basidiomycete fungi produce a variety of extracellular oxidoreductases (e.g., lignin peroxidases and manganese peroxidases) that are believed to play a role in degradation of lignin.
- None have been directly implicated in lignin decomposition.
- Only three of these appear to be true orthologues of each other, showing 68-77% sequence identity.
- Each thermophile genome also encodes a likely cellobiose dehydrogenase (EC 1.1.99.18), an extracellular hemoflavoenzyme produced by several wood-degrading fungi 17 and also a likely copper radical oxidase (glyoxal oxidase).
- Supplementary Table 15 enumerates the predicted extracellular oxidoreductases encoded in the genomes of five Ascomycete fungi.
Oxidative stress proteins
- Oxidative stress is associated with growth in an oxygen rich environment.
- A large variety of enzymes broadly termed peroxidases (EC 1.11.1._) are capable of reducing peroxides to water, alcohols or oxygen.
- Genes that fit into the broad category of peroxidases were compared across the three genomes, including the catalases, catalase-peroxidases and other peroxidases.
- The results summarized in Supplementary Table 17 show that there are some differences in copy number, C. globosum has the most, but nothing indicative of a pattern associated with a thermophilic lifestyle.
- The extra C. globosum catalase gene (PID 13820) is unusual in that it most closely aligns with Eurotiomycete sequences, not the Sordariomycetes to which the three fungi examined in this work belong.
Transporters
- Bioinformatics analysis of membrane transporters in the genomes of M. thermophila and T. terrestris identified a total of 201 and 496 predicted cytoplasmic transporters, respectively (Supplementary Tables 18, 24, 25) Both genomes encode a broad array of transporters for the uptake of sugars and sugar-phosphates, amino acids, oligopeptides, carboxylates, and nucleosides.
- Additionally, their genomes encode a large number of major facilitator superfamily (MFS) uptake transporters of unknown specificity, suggestive that they may be capable of uptake of a range of more esoteric carbon sources.
- Both organisms also include a swathe of multidrug efflux transporters that are presumably involved in secretion of secondary metabolites and protection against exogenous toxic compounds.
- Similarly, there are less than half as many ATP Binding Cassette (ABC) Superfamily transporters encoded by the M. thermophila genome (19 versus 44).
- T. terrestris does possess a variety of predicted transport capabilities that are not present in M. thermophila including transporters for arsenite (ArsB family), chromate (Chr family), tellurite (TDT family); heavy metals (VIT, NRamp and ILT families); and sodium and calcium ion channels (VIC and Annexin families).
Membrane responses to temperature
- Common strategies for adaptation of the lipid membrane to tolerance of high or low temperatures include changes in sterol content, the ratio of saturated to unsaturated fatty acids, or alterations to fatty acid chain length 18, 19, 20.
- In temperature-shift transcriptome experiments with either species, none of the transcript levels for the ergosterol pathway genes varied by more than a factor of two in either direction (data not shown).
- In filamentous ascomycetes the major cell wall polysaccharides include chitin, 1,3-β-glucan, 1,3- β-/1,4-β-glucan, and 1,3-α-glucan.
- The correspondence between the two sets of proteins is not one-to-one.
- M. thermophila and T. terrestris each have orthologous proteins similar to EglC, BtgC, and BtgE in the Bgl2 family; M. thermophila has two more proteins similar to EglC with no orthologue in T. terrestris.
Secondary metabolism
- T. terrestris and M. thermophila contain less than remarkable numbers of polyketide synthase or non-ribosomal peptide synthetase (NRPS) genes.
- The genome of T. terrestris possesses three NRPS genes and one hybrid NRPS-PKS gene, whereas, M. thermophila encodes five NRPSs and three NRPS-PKS hybrids.
- Interestingly, both genomes encode PKS genes (Thite_35447 and Mycth_101261) that, based on high amino acid identity (~60%), are possibly orthologues of the predicted octaketide producing PKS from Aspergillus nidulans (AN0150.2) which is responsible for production of emodin, emodin-derivatives and monodictyphenone 21, 22.
- The other PKS genes have putative orthologues in other fungi but were not associated with specific compounds.
- The genomes of both thermophiles encode putative orthologues of LaeA (Thite_2121390 and Mycth_2294559), the global regulator of secondary metabolism in Aspergillus species 23 suggesting a similar mechanism controlling secondary metabolite gene clusters among these diverse ascomycete species.
Melanin pigment genes
- Melanogenesis is required in development, stress management and pathogenesis in filamentous ascomycetes, and is thus considered a key secondary metabolic pathway.
- In total, 18 predicted genes in T. terrestris showed significant similarity to pig-1 and cmr-1, and each of these ORFs contained either GAL4-like and/or zinc-finger DNA binding domains.
- Initial investigations also revealed that T. terrestris homologs potentially involved in melanin biosynthesis appear to show at least partial clustering; chromosome 3 contains at least four (of 16 total) potential melanin biosynthesis genes [PKS (Thite52153), tyrosinase (Thite2118068) and two naphthalene reductases (Thite2145571 and Thite2130018)] distributed over approximately 1 MB.
- The extent to which this molecule is involved in general stress management in this, and other members of the Chaetomiaceae, is unclear.
Genome Sequencing
- All sequencing reads for the whole genome shotgun sequencing were collected with standard Sanger sequencing protocols on ABI 3730XL capillary sequencing machines.
- Initially, the authors targeted all low quality regions and gaps with computationally selected sequencing reactions completed with 4:1 BigDye terminator: dGTP chemistry (Applied Biosystems).
- After the completion of the automated rounds, a trained finisher manually inspected each assembly.
- These reactions included additional custom primer walks on plasmid subclones or fosmids.
Genome assembly
- For both genomes, the sequencing reads were assembled using a modified version of ARACHNE v.20071016 26 with parameters maxcliq1=100, correct1_passes=0 and BINGE_AND_PURGE=True.
- Mb, 8 scaffolds larger than 100 kb, and total scaffold size of 37.0 Mb. Each scaffold was screened against bacterial proteins, organelle sequences and GenBank and removed if found to be a contaminant.
- Additional scaffolds were removed if the scaffold contained only unanchored rDNA sequences.
Construction and analysis of ESTs
- Poly A+ RNA was isolated from total RNA (pooled RNA from cells grown in MY50 (rich medium) for T. terrestris and 1% cellulose and 1% pectin pooled culture from M. thermophila) using the Absolutely mRNA Purification Kit and manufacturer’s instructions (Stratagene, La Jolla, CA).
- Approximately 1-2 μg of poly A+ RNA, reverse transcriptase SuperScript II and oligo dT-NotI primer (5' GACTAGTTCTAGATCGCGAGCGGCCGCCCT15VN 3') were used to synthesize first strand cDNA.
- EST sequences with less than 100 high quality bases were removed.
- For clustering, ESTs were evaluated with MALIGN, a kmer based alignment tool (Chapman, unpublished), which clusters ESTs based on sequence overlap (kmer = 16, seed length requirement = 32 alignment ID >= 98%).
Genome annotation
- Genomic assembly scaffolds were masked using REPEATMASKER 32 and the REPBASE library of 234 fungal transposable elements 33.
- GENEWISE models were extended where possible using scaffold data to find start and stop codons.
- The C. globosum genome assembly and gene models, used for comparison to the two thermophiles, were downloaded from the Broad Institute Chaetomium globosum Database at http://www.broadinstitute.org/annotation/genome/chaetomium_globosum.
- All predicted gene models were functionally annotated using SIGNALP 41, TMHMM 42, INTERPROSCAN15, BLASTp 31 against the nr database, and hardware-accelerated double-affine SmithWaterman alignments (deCypherSW; http://www.timelogic.com/decypher_sw.html) against SWISSPROT (http://www.expasy.org/sprot/), the Kyoto Encyclopedia of Genes and Genomes (KEGG) 43, and the eukaryotic orthologous groups of proteins database (KOG) 44.
- Segmental duplications were selected as duplicated genome fragments with minimum of three genes in each fragment with at least of 50% of genes between fragments being homologs to each other.
Transcriptome analysis
- Total RNA samples were isolated from mycelia as described 48.
- After the mapped reads were sorted by genomic position, potential splice junctions were extracted from the reads with gapped alignments, and filtered.
- Equal amounts of proteins (45 µg) were fractionated by SDSPAGE on 2.4 cm long, 7-15% polyacrylamide gels.
- Distiller version 2.0.0 (Matrix Science) was used to generate the peak list, with the following detection parameters: correlation threshold 0.4; minimum S/N 5; precursor selection tolerance 3 Da. Peptide identifications were accepted if they could be established at greater than 95.0% probability as specified by the Peptide Prophet algorithm 54.
Prediction of transport proteins
- Complete protein sequence datasets from both genomes were analyzed using the TransAAP pipeline 55 for their predicted complement of membrane transport proteins.
- This approach combines BLAST searches against a curated membrane transport protein database (Transport DB), as well as HMM searches and COG-based searches against membrane transporter protein families.
- Prediction of fungal cell wall proteins For M. thermophila and T. terrestris, the closest relative with wellcharacterized cell wall proteins is Aspergillus, especially A. nidulans 56.
- To identify orthologs within protein families, the sequences of the query proteins and the BLASTP hits were aligned with T-COFFEE, and an average distance tree (using BLOSUM62 distances) was calculated from the multiple alignments with JALVIEW.
- GPI anchor sites were predicted with Big-Pi 58.
Prediction of Proteins Involved in Chromatin Structure and Dynamics
- The search was initiated using the KOG database 59.
- All gene models in the COG Chromatin Structure and Dynamics section were analyzed using BLASTP 31 and CDD (conserved domain database) 60.
- Function was assigned based either on having over 45% identity at the protein level as determined using BLASTP or a conserved domain as identified by CDD or both.
- A local BLASTP was also preformed with the downloaded genomes of M. thermophila, T. terrestris and C. globosum.
PfamID Mycth Thite Chag Neucr Necha Trire Anig Pfam Desciription
- Anig, Aspergillus niger; Chagl, Chaetomium globosum; Mycth, Myceliophthora thermophila; Necha, Nectria haematococca; Neucr, Neurospora crassa;.
- Supplementary Information 22 Supplementary Table 4. Comparison of the number of predicted CAZymes for Myceliophthora thermophila and Thielavia terrestris with eight mesophilic, filamentous fungi: GH, glycoside hydrolase; GT, glycosyl transferase; PL, polysaccharide lyase; CE, carbohydrate esterase; CBM, carbohydratebinding module; and EXPN, expansin.
Organism A V L I C P M Y F H W D N E Q S T R K G
- See Supplementary Table 6 for genus names.
- Supplementary Information 26 Amino acid Substitution Organism pair Mycth-Chagl Mycth-Thite Thite - Chagl Trire - Triat Trire - Trivi Triat – Trivi.
T. reesei 0 0 3 0 0 0
- A comparison of predicted extracellular oxidoreductases encoded in the genomes of five filamentous Ascomycete fungi.
- The closest orthologues, as determined by ClustalW alignments are located on the same row of the table.
- Three additional acetyl-CoA C-acetyltransferases were present in the automated annotation of each genome, but the two shown are the most similar to the S. cerevisiae Erg10p.
Did you find this useful? Give us your feedback
Citations
1,037 citations
724 citations
Cites background from "Comparative genomic analysis of the..."
...Genome Browser tracks shown for a thermophile Thielavia terrestris (14) include GC content (light blue), VISTA-based genome conservation (blue and red curve), automatically predicted (blue) and manually curated (red) gene models, transcriptomics (light green) and proteomics (dark green) data, PFAM domains (orange), BLASTx hits against proteins of related organism (blue), and repeats (black)....
[...]
526 citations
Cites background from "Comparative genomic analysis of the..."
...Genome Browser tracks shown for a thermophile Thielavia terrestris (14) include GC content (light blue), VISTA-based genome conservation (blue and red curve), automatically predicted (blue) and manually curated (red) gene models, transcriptomics (light green) and proteomics (dark green) data, PFAM domains (orange), BLASTx hits against proteins of related organism (blue), and repeats (black)....
[...]
514 citations
471 citations
References
88,255 citations
20,335 citations
13,337 citations
12,003 citations
11,473 citations
Related Papers (5)
Frequently Asked Questions (19)
Q2. Why was a single representative model chosen for each locus?
Because 6multiple gene models were generated for each locus, a single representative model was 7algorithmically chosen based on model quality.
Q3. What are the roles of chitinases in fungi?
In fungi, enzymes which break down chitin (collectively termed chitinases) are believed to have autolytic, nutritional, morphogical and mycoparasitic roles.
Q4. How many peptidases were identified in each genome?
A total of approximately 150 peptidase sequences were identified in each genome (143 in T. terrestris and 159 in M. thermophila).
Q5. What are the major cell wall polysaccharides in A. nidul?
In filamentous ascomycetes the major cell wall polysaccharides include chitin, 1,3-β-glucan, 1,3- β-/1,4-β-glucan, and 1,3-α-glucan.
Q6. What was the common method of predicting multigene families?
Multigene families were predicted with the Markov clustering algorithm (MCL) 46, using BLASTp alignment scores between proteins as a similarity metric.
Q7. how many chromatin remodeling proteins are found in m. thermophila?
Forty-six chromatin remodeling proteins were identified for both M. thermophila and T. terrestris, ten members of the SWI/SNF complex, four condensins, ten SAGA complex factors, six INO80 complex factors and one from the FACT complex 9, 10, 11.
Q8. What are the common strategies for adaptation of the lipid membrane to tolerance of high or low temperatures?
Common strategies for adaptation of the lipid membrane to tolerance of high or low temperatures include changes in sterol content, the ratio of saturated to unsaturated fatty acids, or alterations to fatty acid chain length 18, 19, 20.
Q9. What was the recombination of the cDNAs used to transform A.?
The amplified 6cDNAs were cloned into the A. niger expression vector using the Gateway recombination 7method (Invitrogen) and used to transform A. niger.
Q10. How many SOD genes are in C. globosum?
There are a total of six SOD genes in both C. globosum and T. terrestris, whereas, M. thermophila has five, missing the Cu-Zn SOD orthologue that is predicted to be secreted.
Q11. How many substitutions were found when comparing thermophilic pairs?
the authors also found 20 significantly asymmetric substitutions when comparing thermophilic pairs: M. thermophila and T. terrestris.
Q12. What is the way to analyze amino acid adaptations in thermophiles?
Another approach to analyze the potential amino acid adaptations in thermophiles is to align closely related thermophilic and mesophilic proteins to detect substitutional asymmetry, i.e., when certain aligned amino acids appear to occur substantially more often in either mesophilic or thermophilic proteins 4, 5, 6.
Q13. How many pairs of substitutions were found in the BLAST?
For each thermophilic - mesophilic pair (M.t - C.g and T.t - C.g) the authors found correspondingly 29 and 36 pairs of substitutions (out of 190) with significant deviation from an expected 1:1 ratio (Bonferroni-corrected chi-square test P < 10-5).
Q14. How many additional sequences were identified in each genome?
There were approximately 50 additional sequences in each genome identified as peptidases only by INTERPROSCAN and not by MEROPS batch BLAST.
Q15. What are the reasons why a cluster may have more than one consensus sequence?
Clusters may have more than one consensus sequence for various reasons to include; the clone has a long insert, clones are splice variants or consensus sequences are erroneously not assembled.
Q16. Why is there less number of transporters in M. thermophila than in T. terre?
The lower number of transporters encoded by M. thermophila compared with T. terrestris is largely due to decreased numbers of paralogues in large transporter families, for example, 86 M. thermophila MFS transporters compared to 221 members in T. terrestris.
Q17. How many of the 40 M. thermophila proteins had signal sequences?
In contrast, only a quarter of serine peptidases, a fifth of metallo-peptidases and a tenth of cysteine peptidases had signal sequences.
Q18. Why do the authors find asymmetric substitutions in the Trichoderma species?
Because none of the Trichoderma species are thermophilic and because many asymmetric substitutions coincide in the two analyses, the authors conclude that most of asymmetric substitutions are probably not related to high temperature adaptability.
Q19. How many transporters are encoded by M. thermophila?
there are less than half as many ATP Binding Cassette (ABC) Superfamily transporters encoded by the M. thermophila genome (19 versus 44).