scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Comparative genomic analysis of the thermophilic biomass-degrading fungi Myceliophthora thermophila and Thielavia terrestris

TL;DR: These genomes are the first described for thermophilic eukaryotes and the first complete telomere-to-telomere genomes for filamentous fungi and suggest that both thermophiles are capable of hydrolyzing all major polysaccharides found in biomass.
Abstract: Thermostable enzymes and thermophilic cell factories may afford economic advantages in the production of many chemicals and biomass-based fuels. Here we describe and compare the genomes of two thermophilic fungi, Myceliophthora thermophila and Thielavia terrestris. To our knowledge, these genomes are the first described for thermophilic eukaryotes and the first complete telomere-to-telomere genomes for filamentous fungi. Genome analyses and experimental data suggest that both thermophiles are capable of hydrolyzing all major polysaccharides found in biomass. Examination of transcriptome data and secreted proteins suggests that the two fungi use shared approaches in the hydrolysis of cellulose and xylan but distinct mechanisms in pectin degradation. Characterization of the biomass-hydrolyzing activity of recombinant enzymes suggests that these organisms are highly efficient in biomass decomposition at both moderate and high temperatures. Furthermore, we present evidence suggesting that aside from representing a potential reservoir of thermostable enzymes, thermophilic fungi are amenable to manipulation using classical and molecular genetics.

Summary (5 min read)

Introduction

  • Furthermore, 30 biomass-degrading enzymes from thermophilic fungi consistently demonstrate higher hydrolytic 31 3 capacity4 despite the fact that extracellular enzyme titers (in grams per liter) are typically lower 1 than those from more conventionally used species such as Trichoderma or Aspergillus.
  • 6 RESULTS 7 Genomes summary 8 Among thermophilic fungi, M. thermophila and T. terrestris are two of the best characterized in 9 terms of thermostable enzymes and cellulolytic activity1–4.

AUTHOR CONTRIBUTIONS 16

  • The final text of the manuscript was written by R.M.B. and A.T., and reviewed by I.V.G.; who 17 together also coordinated the overall analysis.
  • A.T. coordinated the transcriptome and exo-proteome work, and analyzed the 20 transcriptomes.
  • D.O.N. analyzed the mating types and phylogeny of 27 thermophilic fungi.
  • D.T. characterized the biochemical properties of the 29 xylanases.
  • 12 Reprints and permissions information is available online at http://www.nature.com/reprints/index.html.

Evidence of RIP in thermophilic fungi

  • Both of the thermophilic Supplementary Information 3 species (M. thermophlia and T. terrestris) examined in this study show evidence of directional mutation that may be attributed to RIP.
  • Sizeable portions of the T. terrestris and M. thermophila genomes appear to comprise transposable elements degraded by RIP.
  • In stark contrast, the non-thermophilic species C. globosum is the first member of the subphylum Pezizomycotina in which no evidence of RIP has been found.

Chromatin structure and dynamics

  • M. thermophila and T. terrestris each has at least 178 genes (Supplementary Table 23) with clear sequence identity to known yeast genes involved in chromatin structure or dynamics.
  • Interestingly, there are only two M. thermophila and three T. terrestris non-orthologous proteins involved in chromatin structure and dynamics.
  • The profiles of enzymes involved in the deconstruction of complex carbohydrates were analyzed by a double clustering procedure to compare the number of enzymes in each CAZyme family among various fungi (Supplementary Fig. 3).
  • The two thermophiles clustered together, demonstrating the similarity of their overall CAZyme profiles.
  • Therefore, in their analysis the authors compared the growth on a specific substrate to the growth on glucose and used the relative difference as a comparison between the species (Supplementary Fig. 5).

Proteases and peptidases

  • The criterion for identification as peptidases was that they were identified by both INTERPROSCAN 15 and the MEROPS batch BLAST 16 server (http://merops.sanger.ac.uk) as peptidases.
  • That was Mycth_54466, whose nearest homolog (27% identical, 48% similar, 16% gap) in S. cerevisiae is YBR286W, a biochemically characterized aminopeptidase.
  • Of all glutamic and aspartic peptidases identified, 100% and approximately 75%, respectively, had signal sequences.
  • No threonine peptidases in either organism had signal sequences.

Oxidoreductases

  • Basidiomycete fungi produce a variety of extracellular oxidoreductases (e.g., lignin peroxidases and manganese peroxidases) that are believed to play a role in degradation of lignin.
  • None have been directly implicated in lignin decomposition.
  • Only three of these appear to be true orthologues of each other, showing 68-77% sequence identity.
  • Each thermophile genome also encodes a likely cellobiose dehydrogenase (EC 1.1.99.18), an extracellular hemoflavoenzyme produced by several wood-degrading fungi 17 and also a likely copper radical oxidase (glyoxal oxidase).
  • Supplementary Table 15 enumerates the predicted extracellular oxidoreductases encoded in the genomes of five Ascomycete fungi.

Oxidative stress proteins

  • Oxidative stress is associated with growth in an oxygen rich environment.
  • A large variety of enzymes broadly termed peroxidases (EC 1.11.1._) are capable of reducing peroxides to water, alcohols or oxygen.
  • Genes that fit into the broad category of peroxidases were compared across the three genomes, including the catalases, catalase-peroxidases and other peroxidases.
  • The results summarized in Supplementary Table 17 show that there are some differences in copy number, C. globosum has the most, but nothing indicative of a pattern associated with a thermophilic lifestyle.
  • The extra C. globosum catalase gene (PID 13820) is unusual in that it most closely aligns with Eurotiomycete sequences, not the Sordariomycetes to which the three fungi examined in this work belong.

Transporters

  • Bioinformatics analysis of membrane transporters in the genomes of M. thermophila and T. terrestris identified a total of 201 and 496 predicted cytoplasmic transporters, respectively (Supplementary Tables 18, 24, 25) Both genomes encode a broad array of transporters for the uptake of sugars and sugar-phosphates, amino acids, oligopeptides, carboxylates, and nucleosides.
  • Additionally, their genomes encode a large number of major facilitator superfamily (MFS) uptake transporters of unknown specificity, suggestive that they may be capable of uptake of a range of more esoteric carbon sources.
  • Both organisms also include a swathe of multidrug efflux transporters that are presumably involved in secretion of secondary metabolites and protection against exogenous toxic compounds.
  • Similarly, there are less than half as many ATP Binding Cassette (ABC) Superfamily transporters encoded by the M. thermophila genome (19 versus 44).
  • T. terrestris does possess a variety of predicted transport capabilities that are not present in M. thermophila including transporters for arsenite (ArsB family), chromate (Chr family), tellurite (TDT family); heavy metals (VIT, NRamp and ILT families); and sodium and calcium ion channels (VIC and Annexin families).

Membrane responses to temperature

  • Common strategies for adaptation of the lipid membrane to tolerance of high or low temperatures include changes in sterol content, the ratio of saturated to unsaturated fatty acids, or alterations to fatty acid chain length 18, 19, 20.
  • In temperature-shift transcriptome experiments with either species, none of the transcript levels for the ergosterol pathway genes varied by more than a factor of two in either direction (data not shown).
  • In filamentous ascomycetes the major cell wall polysaccharides include chitin, 1,3-β-glucan, 1,3- β-/1,4-β-glucan, and 1,3-α-glucan.
  • The correspondence between the two sets of proteins is not one-to-one.
  • M. thermophila and T. terrestris each have orthologous proteins similar to EglC, BtgC, and BtgE in the Bgl2 family; M. thermophila has two more proteins similar to EglC with no orthologue in T. terrestris.

Secondary metabolism

  • T. terrestris and M. thermophila contain less than remarkable numbers of polyketide synthase or non-ribosomal peptide synthetase (NRPS) genes.
  • The genome of T. terrestris possesses three NRPS genes and one hybrid NRPS-PKS gene, whereas, M. thermophila encodes five NRPSs and three NRPS-PKS hybrids.
  • Interestingly, both genomes encode PKS genes (Thite_35447 and Mycth_101261) that, based on high amino acid identity (~60%), are possibly orthologues of the predicted octaketide producing PKS from Aspergillus nidulans (AN0150.2) which is responsible for production of emodin, emodin-derivatives and monodictyphenone 21, 22.
  • The other PKS genes have putative orthologues in other fungi but were not associated with specific compounds.
  • The genomes of both thermophiles encode putative orthologues of LaeA (Thite_2121390 and Mycth_2294559), the global regulator of secondary metabolism in Aspergillus species 23 suggesting a similar mechanism controlling secondary metabolite gene clusters among these diverse ascomycete species.

Melanin pigment genes

  • Melanogenesis is required in development, stress management and pathogenesis in filamentous ascomycetes, and is thus considered a key secondary metabolic pathway.
  • In total, 18 predicted genes in T. terrestris showed significant similarity to pig-1 and cmr-1, and each of these ORFs contained either GAL4-like and/or zinc-finger DNA binding domains.
  • Initial investigations also revealed that T. terrestris homologs potentially involved in melanin biosynthesis appear to show at least partial clustering; chromosome 3 contains at least four (of 16 total) potential melanin biosynthesis genes [PKS (Thite52153), tyrosinase (Thite2118068) and two naphthalene reductases (Thite2145571 and Thite2130018)] distributed over approximately 1 MB.
  • The extent to which this molecule is involved in general stress management in this, and other members of the Chaetomiaceae, is unclear.

Genome Sequencing

  • All sequencing reads for the whole genome shotgun sequencing were collected with standard Sanger sequencing protocols on ABI 3730XL capillary sequencing machines.
  • Initially, the authors targeted all low quality regions and gaps with computationally selected sequencing reactions completed with 4:1 BigDye terminator: dGTP chemistry (Applied Biosystems).
  • After the completion of the automated rounds, a trained finisher manually inspected each assembly.
  • These reactions included additional custom primer walks on plasmid subclones or fosmids.

Genome assembly

  • For both genomes, the sequencing reads were assembled using a modified version of ARACHNE v.20071016 26 with parameters maxcliq1=100, correct1_passes=0 and BINGE_AND_PURGE=True.
  • Mb, 8 scaffolds larger than 100 kb, and total scaffold size of 37.0 Mb. Each scaffold was screened against bacterial proteins, organelle sequences and GenBank and removed if found to be a contaminant.
  • Additional scaffolds were removed if the scaffold contained only unanchored rDNA sequences.

Construction and analysis of ESTs

  • Poly A+ RNA was isolated from total RNA (pooled RNA from cells grown in MY50 (rich medium) for T. terrestris and 1% cellulose and 1% pectin pooled culture from M. thermophila) using the Absolutely mRNA Purification Kit and manufacturer’s instructions (Stratagene, La Jolla, CA).
  • Approximately 1-2 μg of poly A+ RNA, reverse transcriptase SuperScript II and oligo dT-NotI primer (5' GACTAGTTCTAGATCGCGAGCGGCCGCCCT15VN 3') were used to synthesize first strand cDNA.
  • EST sequences with less than 100 high quality bases were removed.
  • For clustering, ESTs were evaluated with MALIGN, a kmer based alignment tool (Chapman, unpublished), which clusters ESTs based on sequence overlap (kmer = 16, seed length requirement = 32 alignment ID >= 98%).

Genome annotation

  • Genomic assembly scaffolds were masked using REPEATMASKER 32 and the REPBASE library of 234 fungal transposable elements 33.
  • GENEWISE models were extended where possible using scaffold data to find start and stop codons.
  • The C. globosum genome assembly and gene models, used for comparison to the two thermophiles, were downloaded from the Broad Institute Chaetomium globosum Database at http://www.broadinstitute.org/annotation/genome/chaetomium_globosum.
  • All predicted gene models were functionally annotated using SIGNALP 41, TMHMM 42, INTERPROSCAN15, BLASTp 31 against the nr database, and hardware-accelerated double-affine SmithWaterman alignments (deCypherSW; http://www.timelogic.com/decypher_sw.html) against SWISSPROT (http://www.expasy.org/sprot/), the Kyoto Encyclopedia of Genes and Genomes (KEGG) 43, and the eukaryotic orthologous groups of proteins database (KOG) 44.
  • Segmental duplications were selected as duplicated genome fragments with minimum of three genes in each fragment with at least of 50% of genes between fragments being homologs to each other.

Transcriptome analysis

  • Total RNA samples were isolated from mycelia as described 48.
  • After the mapped reads were sorted by genomic position, potential splice junctions were extracted from the reads with gapped alignments, and filtered.
  • Equal amounts of proteins (45 µg) were fractionated by SDSPAGE on 2.4 cm long, 7-15% polyacrylamide gels.
  • Distiller version 2.0.0 (Matrix Science) was used to generate the peak list, with the following detection parameters: correlation threshold 0.4; minimum S/N 5; precursor selection tolerance 3 Da. Peptide identifications were accepted if they could be established at greater than 95.0% probability as specified by the Peptide Prophet algorithm 54.

Prediction of transport proteins

  • Complete protein sequence datasets from both genomes were analyzed using the TransAAP pipeline 55 for their predicted complement of membrane transport proteins.
  • This approach combines BLAST searches against a curated membrane transport protein database (Transport DB), as well as HMM searches and COG-based searches against membrane transporter protein families.
  • Prediction of fungal cell wall proteins For M. thermophila and T. terrestris, the closest relative with wellcharacterized cell wall proteins is Aspergillus, especially A. nidulans 56.
  • To identify orthologs within protein families, the sequences of the query proteins and the BLASTP hits were aligned with T-COFFEE, and an average distance tree (using BLOSUM62 distances) was calculated from the multiple alignments with JALVIEW.
  • GPI anchor sites were predicted with Big-Pi 58.

Prediction of Proteins Involved in Chromatin Structure and Dynamics

  • The search was initiated using the KOG database 59.
  • All gene models in the COG Chromatin Structure and Dynamics section were analyzed using BLASTP 31 and CDD (conserved domain database) 60.
  • Function was assigned based either on having over 45% identity at the protein level as determined using BLASTP or a conserved domain as identified by CDD or both.
  • A local BLASTP was also preformed with the downloaded genomes of M. thermophila, T. terrestris and C. globosum.

PfamID Mycth Thite Chag Neucr Necha Trire Anig Pfam Desciription

  • Anig, Aspergillus niger; Chagl, Chaetomium globosum; Mycth, Myceliophthora thermophila; Necha, Nectria haematococca; Neucr, Neurospora crassa;.
  • Supplementary Information 22 Supplementary Table 4. Comparison of the number of predicted CAZymes for Myceliophthora thermophila and Thielavia terrestris with eight mesophilic, filamentous fungi: GH, glycoside hydrolase; GT, glycosyl transferase; PL, polysaccharide lyase; CE, carbohydrate esterase; CBM, carbohydratebinding module; and EXPN, expansin.

Organism A V L I C P M Y F H W D N E Q S T R K G

  • See Supplementary Table 6 for genus names.
  • Supplementary Information 26 Amino acid Substitution Organism pair Mycth-Chagl Mycth-Thite Thite - Chagl Trire - Triat Trire - Trivi Triat – Trivi.

T. reesei 0 0 3 0 0 0

  • A comparison of predicted extracellular oxidoreductases encoded in the genomes of five filamentous Ascomycete fungi.
  • The closest orthologues, as determined by ClustalW alignments are located on the same row of the table.
  • Three additional acetyl-CoA C-acetyltransferases were present in the automated annotation of each genome, but the two shown are the most similar to the S. cerevisiae Erg10p.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Lawrence Berkeley National Laboratory
Recent Work
Title
Comparative genomic analysis of the thermophilic biomass-degrading fungi Myceliophthora
thermophila and Thielavia terrestris
Permalink
https://escholarship.org/uc/item/90w6c49d
Authors
Berka, Randy M.
Grigoriev, Igor V.
Otillar, Robert
et al.
Publication Date
2011-10-02
eScholarship.org Powered by the California Digital Library
University of California

Comparative genomic analysis of the thermophilic biomass-degrading fungi
Myceliophthora thermophila and Thielavia terrestris
Randy M Berka
1,15
, Igor V Grigoriev
2,15
, Robert Otillar
2
, Asaf Salamov
2
, Jane Grimwood
3
, Ian
Reid
4
, Nadeeza Ishmael
4
, Tricia John
4
, Corinne Darmond
4
, Marie-Claude Moisan
4
, Bernard
Henrissat
5
, Pedro M Coutinho
5
, Vincent Lombard
5
, Donald O Natvig
6
, Erika Lindquist
2
,
Jeremy Schmutz
3
, Susan Lucas
2
, Paul Harris
1
, Justin Powlowski
4
, Annie Bellemare
4
, David
Taylor
4
, Gregory Butler
4
, Ronald P de Vries
7,8
, Iris E Allijn
7
, Joost van den Brink
7
, Sophia
Ushinsky
4
, Reginald Storms
4
, Amy J Powell
9
, Ian T Paulsen
10
, Liam D H Elbourne
10
, Scott E
Baker
11
, Jon Magnuson
11
, Sylvie LaBoissiere
12
, A John Clutterbuck
13
, Diego Martinez
6, 14
,
Mark Wogulis
1
, Alfredo Lopez de Leon
1
, Michael W Rey
1
& Adrian Tsang
4,15
1
Novozymes, Inc., Davis, California, USA.
2
US Department of Energy Joint Genome Institute,
Walnut Creek, California, USA.
3
HudsonAlpha Institute for Biotechnology, Huntsville,
Alabama, USA.
4
Centre for Structural and Functional Genomics, Concordia University,
Montreal, Quebec, Canada.
5
Architecture et Fonction des Macromolécules Biologiques,
CNRS/Universités de Provence/Université de la Mediterranée, Marseille, France.
6
Department of Biology, University of New Mexico, Albuquerque, New Mexico, USA.
7
CBS-
KNAW Fungal Biodiversity Centre, Utrecht, The Netherlands.
8
Microbiology and Kluyver
Centre for Genomics of Industrial Fermentation, Utrecht University, Utrecht, The
Netherlands.
9
Sandia National Laboratory, Albuquerque, New Mexico, USA.
10
Department of
Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia.
11
Fungal
Biotechnology Team, Pacific Northwest National Laboratory, Richland, Washington, USA.
12
McGill University and Génome Québec Innovation Centre, Montreal, Canada.
13
University
of Glasgow, Glasgow, UK.
14
Present address: Broad Institute of MIT & Harvard, Cambridge,
Massachusetts USA.
15
These authors contributed equally to this work.
October 2011
The work conducted by the U.S. Department of Energy Joint Genome Institute is supported
by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-
05CH11231

DISCLAIMER
This document was prepared as an account of work sponsored by the United States
Government. While this document is believed to contain correct information, neither the
United States Government nor any agency thereof, nor The Regents of the University of
California, nor any of their employees, makes any warranty, express or implied, or assumes
any legal responsibility for the accuracy, completeness, or usefulness of any information,
apparatus, product, or process disclosed, or represents that its use would not infringe
privately owned rights. Reference herein to any specific commercial product, process, or
service by its trade name, trademark, manufacturer, or otherwise, does not necessarily
constitute or imply its endorsement, recommendation, or favoring by the United States
Government or any agency thereof, or The Regents of the University of California. The views
and opinions of authors expressed herein do not necessarily state or reflect those of the
United States Government or any agency thereof or The Regents of the University of
California.

1
1
2
3
4
Comparative genomic analysis of the thermophilic biomass-degrading 5
fungi Myceliophthora thermophila and Thielavia terrestris 6
Randy M Berka
1,15
, Igor V Grigoriev
2,15
, Robert Otillar
2
, Asaf Salamov
2
, Jane Grimwood
3
, Ian Reid
4
, Nadeeza 7
Ishmael
4
, Tricia John
4
, Corinne Darmond
4
, Marie-Claude Moisan
4
, Bernard Henrissat
5
, Pedro M Coutinho
5
, Vincent 8
Lombard
5
, Donald O Natvig
6
, Erika Lindquist
2
, Jeremy Schmutz
3
, Susan Lucas
2
, Paul Harris
1
, Justin Powlowski
4
, 9
Annie Bellemare
4
, David Taylor
4
, Gregory Butler
4
, Ronald P de Vries
7,8
, Iris E Allijn
7
, Joost van den Brink
7
, Sophia 10
Ushinsky
4
, Reginald Storms
4
, Amy J Powell
9
, Ian T Paulsen
10
, Liam D H Elbourne
10
, Scott E Baker
11
, Jon 11
Magnuson
11
, Sylvie LaBoissiere
12
, A John Clutterbuck
13
, Diego Martinez
6, 14
, Mark Wogulis
1
, Alfredo Lopez de 12
Leon
1
, Michael W Rey
1
& Adrian Tsang
4,15
13
1
Novozymes, Inc., Davis, California, USA.
2
US Department of Energy Joint Genome Institute, Walnut Creek, 14
California, USA.
3
HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, USA.
4
Centre for Structural and 15
Functional Genomics, Concordia University, Montreal, Quebec, Canada.
5
Architecture et Fonction des 16
Macromolécules Biologiques, CNRS/Universités de Provence/Université de la Mediterranée, Marseille, France. 17
6
Department of Biology, University of New Mexico, Albuquerque, New Mexico, USA.
7
CBS-KNAW Fungal 18
Biodiversity Centre, Utrecht, The Netherlands.
8
Microbiology and Kluyver Centre for Genomics of Industrial 19
Fermentation, Utrecht University, Utrecht, The Netherlands.
9
Sandia National Laboratory, Albuquerque, New 20
Mexico, USA.
10
Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia. 21
11
Fungal Biotechnology Team, Pacific Northwest National Laboratory, Richland, Washington, USA.
12
McGill 22
University and Génome Québec Innovation Centre, Montreal, Canada.
13
University of Glasgow, Glasgow, UK. 23
14
Present address: Broad Institute of MIT & Harvard, Cambridge, Massachusetts USA.
15
These authors contributed 24
equally to this work. Correspondence should be addressed to A.T. (tsang@gene.concordia.ca
). 25
26
Received 16 May; accepted 18 August; published online 02 October 2011; doi:10.1038/nbt1976 27
28
Thermostable enzymes and thermophilic cell factories may afford economic advantages in 29
the production of many chemicals and biomass-based. Here we describe and compare the 30
genomes of two thermophilic fungi, Myceliophthora thermophila and Thielavia terrestris. To 31

2
our knowledge, these genomes are the first described for thermophilic eukaryotes and the 1
first complete telomere-to-telomere genomes for filamentous fungi. Genome analyses and 2
experimental data suggest that both thermophiles are capable of hydrolyzing all major 3
polysaccharides found in biomass. Examination of transcriptome data and secreted 4
proteins suggests that the two fungi use shared approaches in the hydrolysis of cellulose 5
and xylan but distinct mechanisms in pectin degradation s. Characterization of the 6
biomass-hydrolyzing activity of recombinant enzymes suggests that these organisms are 7
highly efficient in biomass decomposition at both moderate and high temperatures. 8
Furthermore, we present evidence suggesting that aside from representing a potential 9
reservoir of thermostable enzymes, thermophilic fungi are amenable to manipulation using 10
classical and molecular genetics. 11
Rapid, efficient and robust enzymatic degradation of biomass-derived polysaccharides is 12
currently a major challenge for biofuel production. A prerequisite is the availability of enzymes 13
that hydrolyze cellulose, hemicellulose and other polysaccharides into fermentable sugars at 14
conditions suitable for industrial use. The best studied and most widely used cellulases and 15
hemicellulases are produced by Trichoderma, Aspergillus and Penicillium species, and they are 16
most effective over a temperature range from 40 °C to ~50 °C. At these temperatures, complete 17
saccharification of biomass polysaccharides (>90% conversion to fermentable sugars) requires 18
long reaction times, during which hydrolysis reactors are susceptible to contamination. One way 19
to overcome these obstacles is to raise the reaction temperature, thereby increasing hydrolytic 20
rates and reducing contamination risks. However, implementing higher reaction temperatures 21
requires the deployment of enzymes that are more thermostable than the available preparations 22
from mesophilic fungi. Additional advantages of elevated hydrolysis temperatures include 23
enhanced mass transfer, reduced substrate viscosity, and the potential for enzyme recycling
1
. 24
Thermophilic fungi represent a potential reservoir of thermostable enzymes for industrial 25
applications. They can also potentially be developed into cell factories to support production of 26
chemicals and materials at elevated temperatures. Enzymes from thermophilic fungi often 27
tolerate higher temperatures than enzymes from mesophilic species, and some show stability at 28
70–80 °C
1,2
. Notably, it has been reported the cellulolytic activity of some thermophilic species 29
was several times higher than that of the most active cellulolytic mesophiles
3
. Furthermore, 30
biomass-degrading enzymes from thermophilic fungi consistently demonstrate higher hydrolytic 31

Citations
More filters
Journal ArticleDOI
TL;DR: MycoCosm is a fungal genomics portal developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools.
Abstract: MycoCosm is a fungal genomics portal (http://jgi.doe.gov/fungi), developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools. MycoCosm also promotes and facilitates user community participation through the nomination of new species of fungi for sequencing, and the annotation and analysis of resulting data. By efficiently filling gaps in the Fungal Tree of Life, MycoCosm will help address important problems associated with energy and the environment, taking advantage of growing fungal genomics resources.

1,037 citations

Journal ArticleDOI
TL;DR: Major updates of the Genome Portal in the past 2 years are described with a specific emphasis on efficient handling of the rapidly growing amount of diverse genomic data accumulated in JGI.
Abstract: The US Department of Energy (DOE) Joint Genome Institute (JGI), a national user facility, serves the diverse scientific community by providing integrated high-throughput sequencing and computational analysis to enable system-based scientific approaches in support of DOE missions related to clean energy generation and environmental characterization The JGI Genome Portal (http://genomejgidoegov) provides unified access to all JGI genomic databases and analytical tools The JGI maintains extensive data management systems and specialized analytical capabilities to manage and interpret complex genomic data A user can search, download and explore multiple data sets available for all DOE JGI sequencing projects including their status, assemblies and annotations of sequenced genomes Here we describe major updates of the Genome Portal in the past 2 years with a specific emphasis on efficient handling of the rapidly growing amount of diverse genomic data accumulated in JGI

724 citations


Cites background from "Comparative genomic analysis of the..."

  • ...Genome Browser tracks shown for a thermophile Thielavia terrestris (14) include GC content (light blue), VISTA-based genome conservation (blue and red curve), automatically predicted (blue) and manually curated (red) gene models, transcriptomics (light green) and proteomics (dark green) data, PFAM domains (orange), BLASTx hits against proteins of related organism (blue), and repeats (black)....

    [...]

Journal ArticleDOI
TL;DR: The general organization of the JGI Genome Portal is described and the most recent addition, MycoCosm, a new integrated fungal genomics resource is described.
Abstract: The Department of Energy (DOE) Joint Genome Institute (JGI) is a national user facility with massive-scale DNA sequencing and analysis capabilities dedicated to advancing genomics for bioenergy and environmental applications. Beyond generating tens of trillions of DNA bases annually, the Institute develops and maintains data management systems and specialized analytical capabilities to manage and interpret complex genomic data sets, and to enable an expanding community of users around the world to analyze these data in different contexts over the web. The JGI Genome Portal (http://genome.jgi.doe.gov) provides a unified access point to all JGI genomic databases and analytical tools. A user can find all DOE JGI sequencing projects and their status, search for and download assemblies and annotations of sequenced genomes, and interactively explore those genomes and compare them with other sequenced microbes, fungi, plants or metagenomes using specialized systems tailored to each particular class of organisms. We describe here the general organization of the Genome Portal and the most recent addition, MycoCosm (http://jgi.doe.gov/fungi), a new integrated fungal genomics resource.

526 citations


Cites background from "Comparative genomic analysis of the..."

  • ...Genome Browser tracks shown for a thermophile Thielavia terrestris (14) include GC content (light blue), VISTA-based genome conservation (blue and red curve), automatically predicted (blue) and manually curated (red) gene models, transcriptomics (light green) and proteomics (dark green) data, PFAM domains (orange), BLASTx hits against proteins of related organism (blue), and repeats (black)....

    [...]

Journal ArticleDOI
TL;DR: The Dothideomycetes are one of the largest groups of fungi with a high level of ecological diversity including many plant pathogens infecting a broad range of hosts as mentioned in this paper.
Abstract: The class Dothideomycetes is one of the largest groups of fungi with a high level of ecological diversity including many plant pathogens infecting a broad range of hosts. Here, we compare genome features of 18 members of this class, including 6 necrotrophs, 9 (hemi)biotrophs and 3 saprotrophs, to analyze genome structure, evolution, and the diverse strategies of pathogenesis. The Dothideomycetes most likely evolved from a common ancestor more than 280 million years ago. The 18 genome sequences differ dramatically in size due to variation in repetitive content, but show much less variation in number of (core) genes. Gene order appears to have been rearranged mostly within chromosomal boundaries by multiple inversions, in extant genomes frequently demarcated by adjacent simple repeats. Several Dothideomycetes contain one or more gene-poor, transposable element (TE)-rich putatively dispensable chromosomes of unknown function. The 18 Dothideomycetes offer an extensive catalogue of genes involved in cellulose degradation, proteolysis, secondary metabolism, and cysteine-rich small secreted proteins. Ancestors of the two major orders of plant pathogens in the Dothideomycetes, the Capnodiales and Pleosporales, may have had different modes of pathogenesis, with the former having fewer of these genes than the latter. Many of these genes are enriched in proximity to transposable elements, suggesting faster evolution because of the effects of repeat induced point (RIP) mutations. A syntenic block of genes, including oxidoreductases, is conserved in most Dothideomycetes and upregulated during infection in L. maculans, suggesting a possible function in response to oxidative stress.

514 citations

Journal ArticleDOI
TL;DR: The recent convergence of crystallographic and biochemical in vitro analysis of nucleoporins, the components of the NPC, with cryo-electron microscopic imaging of the entire NPC in situ has provided first pseudo-atomic view of its central core and revealed that an unexpected network of short linear motifs is an important spatial organization principle.
Abstract: Nuclear pore complexes (NPCs) are large protein assemblies that form channels in the nuclear envelope and constitute major routes for nucleocytoplasmic communication. Insights into the complex structure of NPCs provide the basis for understanding their functions and reveal how the dysfunction of their structural components, nucleoporins, contributes to human disease.

471 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches and can be used simultaneously to achieve even greater alignment speeds.
Abstract: Bowtie is an ultrafast, memory-efficient alignment program for aligning short DNA sequence reads to large genomes. For the human genome, Burrows-Wheeler indexing allows Bowtie to align more than 25 million reads per CPU hour with a memory footprint of approximately 1.3 gigabytes. Bowtie extends previous Burrows-Wheeler techniques with a novel quality-aware backtracking algorithm that permits mismatches. Multiple processor cores can be used simultaneously to achieve even greater alignment speeds. Bowtie is open source http://bowtie.cbcb.umd.edu.

20,335 citations

Journal ArticleDOI
TL;DR: The results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.
Abstract: High-throughput mRNA sequencing (RNA-Seq) promises simultaneous transcript discovery and abundance estimation. However, this would require algorithms that are not restricted by prior gene annotations and that account for alternative transcription and splicing. Here we introduce such algorithms in an open-source software program called Cufflinks. To test Cufflinks, we sequenced and analyzed >430 million paired 75-bp RNA-Seq reads from a mouse myoblast cell line over a differentiation time series. We detected 13,692 known transcripts and 3,724 previously unannotated ones, 62% of which are supported by independent expression data or by homologous genes in other species. Over the time series, 330 genes showed complete switches in the dominant transcription start site (TSS) or splice isoform, and we observed more subtle shifts in 1,304 other genes. These results suggest that Cufflinks can illuminate the substantial regulatory flexibility and complexity in even this well-studied model of muscle development and that it can improve transcriptome-based genome annotation.

13,337 citations

Journal ArticleDOI
TL;DR: A simplified scoring system is proposed that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length.
Abstract: A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homologous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.

12,003 citations

Journal ArticleDOI
TL;DR: The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer.
Abstract: Motivation: A new protocol for sequencing the messenger RNA in a cell, known as RNA-Seq, generates millions of short sequence fragments in a single run. These fragments, or ‘reads’, can be used to measure levels of gene expression and to identify novel splice variants of genes. However, current software for aligning RNA-Seq data to a genome relies on known splice junctions and cannot identify novel ones. TopHat is an efficient read-mapping algorithm designed to align reads from an RNA-Seq experiment to a reference genome without relying on known splice sites. Results: We mapped the RNA-Seq reads from a recent mammalian RNA-Seq experiment and recovered more than 72% of the splice junctions reported by the annotation-based software from that study, along with nearly 20 000 previously unreported junctions. The TopHat pipeline is much faster than previous systems, mapping nearly 2.2 million reads per CPU hour, which is sufficient to process an entire RNA-Seq experiment in less than a day on a standard desktop computer. We describe several challenges unique to ab initio splice site discovery from RNA-Seq reads that will require further algorithm development. Availability: TopHat is free, open-source software available from http://tophat.cbcb.umd.edu Contact: ude.dmu.sc@eloc Supplementary information: Supplementary data are available at Bioinformatics online.

11,473 citations

Related Papers (5)
Frequently Asked Questions (19)
Q1. What was the function to determine if the splice junctions were overlapping?

To pass the filter, the splice junctions needed to have introns with GT-AG, GC-AG, or AT-AC donor-acceptor pairs, and to score above 0.5 in a discriminant function that combined the donoracceptor type, the number and distribution of reads spanning the junction, the number of reads mapping inside the intron relative to the number of spanning reads, and the presence of overlapping splice junctions. 

Because 6multiple gene models were generated for each locus, a single representative model was 7algorithmically chosen based on model quality. 

In fungi, enzymes which break down chitin (collectively termed chitinases) are believed to have autolytic, nutritional, morphogical and mycoparasitic roles. 

A total of approximately 150 peptidase sequences were identified in each genome (143 in T. terrestris and 159 in M. thermophila). 

In filamentous ascomycetes the major cell wall polysaccharides include chitin, 1,3-β-glucan, 1,3- β-/1,4-β-glucan, and 1,3-α-glucan. 

Multigene families were predicted with the Markov clustering algorithm (MCL) 46, using BLASTp alignment scores between proteins as a similarity metric. 

Forty-six chromatin remodeling proteins were identified for both M. thermophila and T. terrestris, ten members of the SWI/SNF complex, four condensins, ten SAGA complex factors, six INO80 complex factors and one from the FACT complex 9, 10, 11. 

Common strategies for adaptation of the lipid membrane to tolerance of high or low temperatures include changes in sterol content, the ratio of saturated to unsaturated fatty acids, or alterations to fatty acid chain length 18, 19, 20. 

The amplified 6cDNAs were cloned into the A. niger expression vector using the Gateway recombination 7method (Invitrogen) and used to transform A. niger. 

There are a total of six SOD genes in both C. globosum and T. terrestris, whereas, M. thermophila has five, missing the Cu-Zn SOD orthologue that is predicted to be secreted. 

the authors also found 20 significantly asymmetric substitutions when comparing thermophilic pairs: M. thermophila and T. terrestris. 

Another approach to analyze the potential amino acid adaptations in thermophiles is to align closely related thermophilic and mesophilic proteins to detect substitutional asymmetry, i.e., when certain aligned amino acids appear to occur substantially more often in either mesophilic or thermophilic proteins 4, 5, 6. 

For each thermophilic - mesophilic pair (M.t - C.g and T.t - C.g) the authors found correspondingly 29 and 36 pairs of substitutions (out of 190) with significant deviation from an expected 1:1 ratio (Bonferroni-corrected chi-square test P < 10-5). 

There were approximately 50 additional sequences in each genome identified as peptidases only by INTERPROSCAN and not by MEROPS batch BLAST. 

Clusters may have more than one consensus sequence for various reasons to include; the clone has a long insert, clones are splice variants or consensus sequences are erroneously not assembled. 

The lower number of transporters encoded by M. thermophila compared with T. terrestris is largely due to decreased numbers of paralogues in large transporter families, for example, 86 M. thermophila MFS transporters compared to 221 members in T. terrestris. 

In contrast, only a quarter of serine peptidases, a fifth of metallo-peptidases and a tenth of cysteine peptidases had signal sequences. 

Because none of the Trichoderma species are thermophilic and because many asymmetric substitutions coincide in the two analyses, the authors conclude that most of asymmetric substitutions are probably not related to high temperature adaptability. 

there are less than half as many ATP Binding Cassette (ABC) Superfamily transporters encoded by the M. thermophila genome (19 versus 44).