scispace - formally typeset
Search or ask a question
Journal ArticleDOI

MycoCosm portal: gearing up for 1000 fungal genomes

TL;DR: MycoCosm is a fungal genomics portal developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools.
Abstract: MycoCosm is a fungal genomics portal (http://jgi.doe.gov/fungi), developed by the US Department of Energy Joint Genome Institute to support integration, analysis and dissemination of fungal genome sequences and other 'omics' data by providing interactive web-based tools. MycoCosm also promotes and facilitates user community participation through the nomination of new species of fungi for sequencing, and the annotation and analysis of resulting data. By efficiently filling gaps in the Fungal Tree of Life, MycoCosm will help address important problems associated with energy and the environment, taking advantage of growing fungal genomics resources.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: Large-scale molecular surveys have provided novel insights into the diversity, spatial and temporal dynamics of mycorrhizal fungal communities, and network theory makes it possible to analyze interactions between plant-fungal partners as complex underground multi-species networks.
Abstract: Almost all land plants form symbiotic associations with mycorrhizal fungi. These below-ground fungi play a key role in terrestrial ecosystems as they regulate nutrient and carbon cycles, and influence soil structure and ecosystem multifunctionality. Up to 80% of plant N and P is provided by mycorrhizal fungi and many plant species depend on these symbionts for growth and survival. Estimates suggest that there are c. 50 000 fungal species that form mycorrhizal associations with c. 250 000 plant species. The development of high-throughput molecular tools has helped us to better understand the biology, evolution, and biodiversity of mycorrhizal associations. Nuclear genome assemblies and gene annotations of 33 mycorrhizal fungal species are now available providing fascinating opportunities to deepen our understanding of the mycorrhizal lifestyle, the metabolic capabilities of these plant symbionts, the molecular dialogue between symbionts, and evolutionary adaptations across a range of mycorrhizal associations. Large-scale molecular surveys have provided novel insights into the diversity, spatial and temporal dynamics of mycorrhizal fungal communities. At the ecological level, network theory makes it possible to analyze interactions between plant-fungal partners as complex underground multi-species networks. Our analysis suggests that nestedness, modularity and specificity of mycorrhizal networks vary and depend on mycorrhizal type. Mechanistic models explaining partner choice, resource exchange, and coevolution in mycorrhizal associations have been developed and are being tested. This review ends with major frontiers for further research.

1,223 citations


Cites background from "MycoCosm portal: gearing up for 100..."

  • ...Genome sequences are now available for several mycorrhizal fungi and are valuable for resolving long-standing issues about their Genome sequences and annotations can be assessed through the JGI MycoCosm portal (http://genome.jgi-psf.org/programs/fungi/index.jsf; Grigoriev et al., 2014)....

    [...]

Journal ArticleDOI
TL;DR: Convergent evolution of the mycorrhizal habit in fungi occurred via the repeated evolution of a 'symbiosis toolkit', with reduced numbers of PCWDEs and lineage-specific suites of myCorrhiza-induced genes.
Abstract: To elucidate the genetic bases of mycorrhizal lifestyle evolution, we sequenced new fungal genomes, including 13 ectomycorrhizal (ECM), orchid (ORM) and ericoid (ERM) species, and five saprotrophs, which we analyzed along with other fungal genomes. Ectomycorrhizal fungi have a reduced complement of genes encoding plant cell wall-degrading enzymes (PCWDEs), as compared to their ancestral wood decayers. Nevertheless, they have retained a unique array of PCWDEs, thus suggesting that they possess diverse abilities to decompose lignocellulose. Similar functional categories of nonorthologous genes are induced in symbiosis. Of induced genes, 7-38% are orphan genes, including genes that encode secreted effector-like proteins. Convergent evolution of the mycorrhizal habit in fungi occurred via the repeated evolution of a 'symbiosis toolkit', with reduced numbers of PCWDEs and lineage-specific suites of mycorrhiza-induced genes.

799 citations

Journal ArticleDOI
TL;DR: The results indicate that the prevailing paradigm of white rot vs. brown rot does not capture the diversity of fungal wood decay mechanisms, and suggest a continuum rather than a dichotomy between the white-rot and brown-rot modes of wood decay.
Abstract: Basidiomycota (basidiomycetes) make up 32% of the described fungi and include most wood-decaying species, as well as pathogens and mutualistic symbionts. Wood-decaying basidiomycetes have typically been classified as either white rot or brown rot, based on the ability (in white rot only) to degrade lignin along with cellulose and hemicellulose. Prior genomic comparisons suggested that the two decay modes can be distinguished based on the presence or absence of ligninolytic class II peroxidases (PODs), as well as the abundance of enzymes acting directly on crystalline cellulose (reduced in brown rot). To assess the generality of the white-rot/brown-rot classification paradigm, we compared the genomes of 33 basidiomycetes, including four newly sequenced wood decayers, and performed phylogenetically informed principal-components analysis (PCA) of a broad range of gene families encoding plant biomass-degrading enzymes. The newly sequenced Botryobasidium botryosum and Jaapia argillacea genomes lack PODs but possess diverse enzymes acting on crystalline cellulose, and they group close to the model white-rot species Phanerochaete chrysosporium in the PCA. Furthermore, laboratory assays showed that both B. botryosum and J. argillacea can degrade all polymeric components of woody plant cell walls, a characteristic of white rot. We also found expansions in reducing polyketide synthase genes specific to the brown-rot fungi. Our results suggest a continuum rather than a dichotomy between the white-rot and brown-rot modes of wood decay. A more nuanced categorization of rot types is needed, based on an improved understanding of the genomics and biochemistry of wood decay.

588 citations


Cites background from "MycoCosm portal: gearing up for 100..."

  • ...Genome assemblies and annotations for the organisms used in this study are available via the JGI Genome Portal MycoCosm (http://jgi.doe. gov/fungi; see also Table S1)....

    [...]

  • ...into MycoCosm (78), a Web-based fungal resource for comparative analysis....

    [...]

  • ...Grigoriev IV, et al. (2014) MycoCosm portal: Gearing up for 1000 fungal genomes....

    [...]

  • ...All genomes were annotated using the JGI Annotation Pipeline (77), which combines several gene prediction and annotation methods with transcriptomics data, and integrates the annotated genomes into MycoCosm (78), a Web-based fungal resource for comparative analysis....

    [...]

Journal ArticleDOI
TL;DR: The gut mycobiome of the Human Microbiome Project (HMP) cohort was investigated by sequencing the Internal Transcribed Spacer 2 (ITS2) region as well as the 18S rRNA gene, suggesting that it is a more sensitive method for studying the mycoboome of stool samples.
Abstract: Most studies describing the human gut microbiome in healthy and diseased states have emphasized the bacterial component, but the fungal microbiome (i.e., the mycobiome) is beginning to gain recognition as a fundamental part of our microbiome. To date, human gut mycobiome studies have primarily been disease centric or in small cohorts of healthy individuals. To contribute to existing knowledge of the human mycobiome, we investigated the gut mycobiome of the Human Microbiome Project (HMP) cohort by sequencing the Internal Transcribed Spacer 2 (ITS2) region as well as the 18S rRNA gene. Three hundred seventeen HMP stool samples were analyzed by ITS2 sequencing. Fecal fungal diversity was significantly lower in comparison to bacterial diversity. Yeast dominated the samples, comprising eight of the top 15 most abundant genera. Specifically, fungal communities were characterized by a high prevalence of Saccharomyces, Malassezia, and Candida, with S. cerevisiae, M. restricta, and C. albicans operational taxonomic units (OTUs) present in 96.8, 88.3, and 80.8% of samples, respectively. There was a high degree of inter- and intra-volunteer variability in fungal communities. However, S. cerevisiae, M. restricta, and C. albicans OTUs were found in 92.2, 78.3, and 63.6% of volunteers, respectively, in all samples donated over an approximately 1-year period. Metagenomic and 18S rRNA gene sequencing data agreed with ITS2 results; however, ITS2 sequencing provided greater resolution of the relatively low abundance mycobiome constituents. Compared to bacterial communities, the human gut mycobiome is low in diversity and dominated by yeast including Saccharomyces, Malassezia, and Candida. Both inter- and intra-volunteer variability in the HMP cohort were high, revealing that unlike bacterial communities, an individual’s mycobiome is no more similar to itself over time than to another person’s. Nonetheless, several fungal species persisted across a majority of samples, evidence that a core gut mycobiome may exist. ITS2 sequencing data provided greater resolution of the mycobiome membership compared to metagenomic and 18S rRNA gene sequencing data, suggesting that it is a more sensitive method for studying the mycobiome of stool samples.

558 citations


Cites background from "MycoCosm portal: gearing up for 100..."

  • ...Finally, availability of fungal genomes is also lacking compared to bacteria, though there are efforts underway to change this [49]....

    [...]

Journal ArticleDOI
TL;DR: Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with the MMseqs2 software for fast and sensitive protein sequence searching and clustering.
Abstract: We present three clustered protein sequence databases, Uniclust90, Uniclust50, Uniclust30 and three databases of multiple sequence alignments (MSAs), Uniboost10, Uniboost20 and Uniboost30, as a resource for protein sequence analysis, function prediction and sequence searches. The Uniclust databases cluster UniProtKB sequences at the level of 90%, 50% and 30% pairwise sequence identity. Uniclust90 and Uniclust50 clusters showed better consistency of functional annotation than those of UniRef90 and UniRef50, owing to an optimised clustering pipeline that runs with our MMseqs2 software for fast and sensitive protein sequence searching and clustering. Uniclust sequences are annotated with matches to Pfam, SCOP domains, and proteins in the PDB, using our HHblits homology detection tool. Due to its high sensitivity, Uniclust contains 17% more Pfam domain annotations than UniProt. Uniboost MSAs of three diversities are built by enriching the Uniclust30 MSAs with local sequence matches from MMseqs2 profile searches through Uniclust30. All databases can be downloaded from the Uniclust server at uniclust.mmseqs.com. Users can search clusters by keywords and explore their MSAs, taxonomic representation, and annotations. Uniclust is updated every two months with the new UniProt release.

469 citations


Cites methods from "MycoCosm portal: gearing up for 100..."

  • ...gz: archive containing three files with Pfam, SCOP, and PDB annotations, each formatted as tab-separated lists with nine columns: (1,2) identifiers for query and target, (3-5, 6-8) domain start and end-position and total sequence length for both UniProt and database sequence, (9) HHblits E-value....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: The definition and use of family-specific, manually curated gathering thresholds are explained and some of the features of domains of unknown function (also known as DUFs) are discussed, which constitute a rapidly growing class of families within Pfam.
Abstract: Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is approximately 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11,912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

14,075 citations

Journal ArticleDOI
TL;DR: A new membrane protein topology prediction method, TMHMM, based on a hidden Markov model is described and validated, and it is discovered that proteins with N(in)-C(in) topologies are strongly preferred in all examined organisms, except Caenorhabditis elegans, where the large number of 7TM receptors increases the counts for N(out)-C-in topologies.

11,453 citations


"MycoCosm portal: gearing up for 100..." refers methods in this paper

  • ...SignalP (14) is used to detect the sequence motifs responsible for protein localization, TMHMM (15) identifies possible transmembrane domains and InterProScan (16) predicts functional domains from Pfam (17) and other databases....

    [...]

Journal ArticleDOI
TL;DR: Pfam as discussed by the authors is a widely used database of protein families, containing 14 831 manually curated entries in the current version, version 27.0, and has been updated several times since 2012.
Abstract: Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

9,415 citations


"MycoCosm portal: gearing up for 100..." refers methods in this paper

  • ...SignalP (14) is used to detect the sequence motifs responsible for protein localization, TMHMM (15) identifies possible transmembrane domains and InterProScan (16) predicts functional domains from Pfam (17) and other databases....

    [...]

Journal ArticleDOI
TL;DR: SignalP 4.0 was the best signal-peptide predictor for all three organism types but was not in all cases as good as SignalP 3.0 according to cleavage-site sensitivity or signal- peptide correlation when there are no transmembrane proteins present.
Abstract: We benchmarked SignalP 4.0 against SignalP 3.0 and ten other signal peptide prediction algorithms (Fig. 1). We compared prediction performance using the Matthews correlation coefficient16, for which each sequence was counted as a true or false positive or negative. To test SignalP 4.0 performance, we did not use data that had been used in training the networks or selecting the optimal architecture, and the test data did not contain homologs to the training and optimization data (Supplementary Methods). The test set for SignalP 3.0 was also independent of the training set because we removed sequences used to construct SignalP 3.0 and their homologs from the benchmark data. For other algorithms more recent than SignalP 3.0, the benchmark data may include data used to train the methods, possibly leading to slight overestimations of their performance. Our results show that SignalP 4.0 was the best signal-peptide predictor for all three organism types (Fig. 1). This comes at a price, however, because SignalP 4.0 was not in all cases as good as SignalP 3.0 according to cleavage-site sensitivity or signal-peptide correlation when there are no transmembrane proteins present (Supplementary Results). An ideal method would have the best SignalP 4.0: discriminating signal peptides from transmembrane regions

8,370 citations


"MycoCosm portal: gearing up for 100..." refers methods in this paper

  • ...SignalP (14) is used to detect the sequence motifs responsible for protein localization, TMHMM (15) identifies possible transmembrane domains and InterProScan (16) predicts functional domains from Pfam (17) and other databases....

    [...]

Journal ArticleDOI
TL;DR: KEGG Mapper, a collection of tools for KEGG PATHWAY, BRITE and MODULE mapping, enabling integration and interpretation of large-scale data sets and recent enhancements to the K EGG content, especially the incorporation of disease and drug information used in practice and in society, to support translational bioinformatics.
Abstract: Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/ or http://www.kegg.jp/) is a database resource that integrates genomic, chemical and systemic functional information. In particular, gene catalogs from completely sequenced genomes are linked to higher-level systemic functions of the cell, the organism and the ecosystem. Major efforts have been undertaken to manually create a knowledge base for such systemic functions by capturing and organizing experimental knowledge in computable forms; namely, in the forms of KEGG pathway maps, BRITE functional hierarchies and KEGG modules. Continuous efforts have also been made to develop and improve the cross-species annotation procedure for linking genomes to the molecular networks through the KEGG Orthology system. Here we report KEGG Mapper, a collection of tools for KEGG PATHWAY, BRITE and MODULE mapping, enabling integration and interpretation of large-scale data sets. We also report a variant of the KEGG mapping procedure to extend the knowledge base, where different types of data and knowledge, such as disease genes and drug targets, are integrated as part of the KEGG molecular networks. Finally, we describe recent enhancements to the KEGG content, especially the incorporation of disease and drug information used in practice and in society, to support translational bioinformatics.

4,259 citations


"MycoCosm portal: gearing up for 100..." refers background in this paper

  • ...Interpro, KEGG and Swiss-Prot hits are used to map gene ontology (GO) terms (21)....

    [...]

  • ...gov), Swiss-Prot (18), KEGG (19) and KOG (20) databases additionally facilitate functional interpretation....

    [...]

  • ...Protein alignments to the NCBI’s nonredundant (http://www.ncbi.nlm.nih.gov), Swiss-Prot (18), KEGG (19) and KOG (20) databases additionally facilitate functional interpretation....

    [...]

  • ...Additionally, summaries listing numbers of genes by category in the GO, KEGG and KOG classifications are accessible from the portal menu, and can be compared with other selected genomes to explore gene family expansions and contractions across genomes....

    [...]

  • ...Mycocosm therefore includes tools that integrate single genomes into a comparative context, such as the ability to visualize variation in gene counts in different GO, KEGG and KOG categories across a user-selected assortment of genomes....

    [...]