scispace - formally typeset

Journal ArticleDOI

antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline

02 Jul 2019-Nucleic Acids Research (Oxford University Press)-Vol. 47

TL;DR: AntiSMASH 5 adds detection rules for clusters encoding the biosynthesis of acyl-amino acids, β-lactones, fungal RiPPs, RaS-Ri PPs, polybrominated diphenyl ethers, C-nucleosides, PPY-like ketones and lipolanthines and provides more detailed predictions for type II polyketide synthase-encoding gene clusters.

AbstractSecondary metabolites produced by bacteria and fungi are an important source of antimicrobials and other bioactive compounds. In recent years, genome mining has seen broad applications in identifying and characterizing new compounds as well as in metabolic engineering. Since 2011, the 'antibiotics and secondary metabolite analysis shell-antiSMASH' (https://antismash.secondarymetabolites.org) has assisted researchers in this, both as a web server and a standalone tool. It has established itself as the most widely used tool for identifying and analysing biosynthetic gene clusters (BGCs) in bacterial and fungal genome sequences. Here, we present an entirely redesigned and extended version 5 of antiSMASH. antiSMASH 5 adds detection rules for clusters encoding the biosynthesis of acyl-amino acids, β-lactones, fungal RiPPs, RaS-RiPPs, polybrominated diphenyl ethers, C-nucleosides, PPY-like ketones and lipolanthines. For type II polyketide synthase-encoding gene clusters, antiSMASH 5 now offers more detailed predictions. The HTML output visualization has been redesigned to improve the navigation and visual representation of annotations. We have again improved the runtime of analysis steps, making it possible to deliver comprehensive annotations for bacterial genomes within a few minutes. A new output file in the standard JavaScript object notation (JSON) format is aimed at downstream tools that process antiSMASH results programmatically.

Topics: Fungal genetics (53%)

...read more

Content maybe subject to copyright    Report

Citations
More filters

Journal ArticleDOI
TL;DR: It is argued that the future of antibiotic discovery looks bright as new technologies such as genome mining and editing are deployed to discover new natural products with diverse bioactivities.
Abstract: The first antibiotic, salvarsan, was deployed in 1910. In just over 100 years antibiotics have drastically changed modern medicine and extended the average human lifespan by 23 years. The discovery of penicillin in 1928 started the golden age of natural product antibiotic discovery that peaked in the mid-1950s. Since then, a gradual decline in antibiotic discovery and development and the evolution of drug resistance in many human pathogens has led to the current antimicrobial resistance crisis. Here we give an overview of the history of antibiotic discovery, the major classes of antibiotics and where they come from. We argue that the future of antibiotic discovery looks bright as new technologies such as genome mining and editing are deployed to discover new natural products with diverse bioactivities. We also report on the current state of antibiotic development, with 45 drugs currently going through the clinical trials pipeline, including several new classes with novel modes of action that are in phase 3 clinical trials. Overall, there are promising signs for antibiotic discovery, but changes in financial models are required to translate scientific advances into clinically approved antibiotics.

196 citations


Journal ArticleDOI
TL;DR: MIBiG 2.0 is presented, which encompasses major updates to the schema, the data, and the online repository itself, and improves the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases.
Abstract: Fueled by the explosion of (meta)genomic data, genome mining of specialized metabolites has become a major technology for drug discovery and studying microbiome ecology. In these efforts, computational tools like antiSMASH have played a central role through the analysis of Biosynthetic Gene Clusters (BGCs). Thousands of candidate BGCs from microbial genomes have been identified and stored in public databases. Interpreting the function and novelty of these predicted BGCs requires comparison with a well-documented set of BGCs of known function. The MIBiG (Minimum Information about a Biosynthetic Gene Cluster) Data Standard and Repository was established in 2015 to enable curation and storage of known BGCs. Here, we present MIBiG 2.0, which encompasses major updates to the schema, the data, and the online repository itself. Over the past five years, 851 new BGCs have been added. Additionally, we performed extensive manual data curation of all entries to improve the annotation quality of our repository. We also redesigned the data schema to ensure the compliance of future annotations. Finally, we improved the user experience by adding new features such as query searches and a statistics page, and enabled direct link-outs to chemical structure databases. The repository is accessible online at https://mibig.secondarymetabolites.org/.

186 citations


Journal ArticleDOI
TL;DR: The utility of this collection of >10,000 metagenomes collected from diverse habitats covering all of Earth’s continents and oceans is demonstrated for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses.
Abstract: The reconstruction of bacterial and archaeal genomes from shotgun metagenomes has enabled insights into the ecology and evolution of environmental and host-associated microbiomes. Here we applied this approach to >10,000 metagenomes collected from diverse habitats covering all of Earth’s continents and oceans, including metagenomes from human and animal hosts, engineered environments, and natural and agricultural soils, to capture extant microbial, metabolic and functional potential. This comprehensive catalog includes 52,515 metagenome-assembled genomes representing 12,556 novel candidate species-level operational taxonomic units spanning 135 phyla. The catalog expands the known phylogenetic diversity of bacteria and archaea by 44% and is broadly available for streamlined comparative analyses, interactive exploration, metabolic modeling and bulk download. We demonstrate the utility of this collection for understanding secondary-metabolite biosynthetic potential and for resolving thousands of new host linkages to uncultivated viruses. This resource underscores the value of genome-centric approaches for revealing genomic properties of uncultivated microorganisms that affect ecosystem processes.

111 citations


Journal ArticleDOI
TL;DR: The review discusses the new classes of RiPPs that have been discovered, the advances in the understanding of the installation of both primary and secondary post-translational modifications, and the mechanisms by which the enzymes recognize the leader peptides in their substrates.
Abstract: Covering: up to June 2020Ribosomally-synthesized and post-translationally modified peptides (RiPPs) are a large group of natural products. A community-driven review in 2013 described the emerging commonalities in the biosynthesis of RiPPs and the opportunities they offered for bioengineering and genome mining. Since then, the field has seen tremendous advances in understanding of the mechanisms by which nature assembles these compounds, in engineering their biosynthetic machinery for a wide range of applications, and in the discovery of entirely new RiPP families using bioinformatic tools developed specifically for this compound class. The First International Conference on RiPPs was held in 2019, and the meeting participants assembled the current review describing new developments since 2013. The review discusses the new classes of RiPPs that have been discovered, the advances in our understanding of the installation of both primary and secondary post-translational modifications, and the mechanisms by which the enzymes recognize the leader peptides in their substrates. In addition, genome mining tools used for RiPP discovery are discussed as well as various strategies for RiPP engineering. An outlook section presents directions for future research.

105 citations


Journal ArticleDOI
Abstract: Many microorganisms produce natural products that form the basis of antimicrobials, antivirals, and other drugs. Genome mining is routinely used to complement screening-based workflows to discover novel natural products. Since 2011, the "antibiotics and secondary metabolite analysis shell-antiSMASH" (https://antismash.secondarymetabolites.org/) has supported researchers in their microbial genome mining tasks, both as a free-to-use web server and as a standalone tool under an OSI-approved open-source license. It is currently the most widely used tool for detecting and characterising biosynthetic gene clusters (BGCs) in bacteria and fungi. Here, we present the updated version 6 of antiSMASH. antiSMASH 6 increases the number of supported cluster types from 58 to 71, displays the modular structure of multi-modular BGCs, adds a new BGC comparison algorithm, allows for the integration of results from other prediction tools, and more effectively detects tailoring enzymes in RiPP clusters.

90 citations


References
More filters

Journal ArticleDOI
TL;DR: Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set, and the facility to view the relationship between families within a clan has been improved by the introduction of a new tool.
Abstract: In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.

4,236 citations


"antiSMASH 5.0: updates to the secon..." refers methods in this paper

  • ...To facilitate these and other GO-based analyses, antiSMASH 5 includes an option to automatically annotate GO terms on Pfam domains....

    [...]

  • ...These are based on identifying co-occurring conserved core enzymes in the genome using HMM-profiles that were derived from Pfam (20), SMART (21), BAGEL (22) or Yadav et al. (23), or that were created specifically for antiSMASH....

    [...]

  • ...As antiSMASH can automatically annotate Pfam domains, the GO annotation functionality makes use of the Pfam to GO mapping supplied by the Gene Ontology Consortium’s website (37)....

    [...]

  • ...These are based on identifying co-occurring conserved core enzymes in the genome using HMM-profiles that were derived from Pfam (20), SMART (21), BAGEL (22) or Yadav et al....

    [...]

  • ...If the ID of a predicted Pfam domain in an antiSMASH record is present in the Pfam to GO mapping, the respective GO terms are assigned and presented in the ‘gene details’ panel....

    [...]


Journal ArticleDOI
TL;DR: This contribution is a completely updated and expanded version of the four prior analogous reviews that were published in this journal in 1997, 2003, 2007, and 2012, and the time frame has been extended to cover the 34 years from January 1, 1981, to December 31, 2014, for all diseases worldwide, and from 1950 (earliest so far identified) to December 2014 for all approved antitumor drugs worldwide.
Abstract: This contribution is a completely updated and expanded version of the four prior analogous reviews that were published in this journal in 1997, 2003, 2007, and 2012. In the case of all approved therapeutic agents, the time frame has been extended to cover the 34 years from January 1, 1981, to December 31, 2014, for all diseases worldwide, and from 1950 (earliest so far identified) to December 2014 for all approved antitumor drugs worldwide. As mentioned in the 2012 review, we have continued to utilize our secondary subdivision of a “natural product mimic”, or “NM”, to join the original primary divisions and the designation “natural product botanical”, or “NB”, to cover those botanical “defined mixtures” now recognized as drug entities by the U.S. FDA (and similar organizations). From the data presented in this review, the utilization of natural products and/or their novel structures, in order to discover and develop the final drug entity, is still alive and well. For example, in the area of cancer, over t...

3,538 citations


Journal ArticleDOI
Abstract: Microbial secondary metabolism constitutes a rich source of antibiotics, chemotherapeutics, insecticides and other high-value chemicals. Genome mining of gene clusters that encode the biosynthetic pathways for these metabolites has become a key methodology for novel compound discovery. In 2011, we introduced antiSMASH, a web server and stand-alone tool for the automatic genomic identification and analysis of biosynthetic gene clusters, available at http://antismash.secondarymetabolites.org. Here, we present version 3.0 of antiSMASH, which has undergone major improvements. A full integration of the recently published ClusterFinder algorithm now allows using this probabilistic algorithm to detect putative gene clusters of unknown types. Also, a new dereplication variant of the ClusterBlast module now identifies similarities of identified clusters to any of 1172 clusters with known end products. At the enzyme level, active sites of key biosynthetic enzymes are now pinpointed through a curated pattern-matching procedure and Enzyme Commission numbers are assigned to functionally classify all enzyme-coding genes. Additionally, chemical structure prediction has been improved by incorporating polyketide reduction states. Finally, in order for users to be able to organize and analyze multiple antiSMASH outputs in a private setting, a new XML output module allows offline editing of antiSMASH annotations within the Geneious software.

1,575 citations


Journal ArticleDOI
TL;DR: The current contents of the GO knowledgebase are summarized, several new features and improvements that have been made to the ontology, the annotations and the tools are presented, and extensions to the resource are extended, increasing support for descriptions of causal models of biological systems and network biology.
Abstract: The Gene Ontology (GO) is a comprehensive resource of computable knowledge regarding the functions of genes and gene products. As such, it is extensively used by the biomedical research community for the analysis of -omics and related data. Our continued focus is on improving the quality and utility of the GO resources, and we welcome and encourage input from researchers in all areas of biology. In this update, we summarize the current contents of the GO knowledgebase, and present several new features and improvements that have been made to the ontology, the annotations and the tools. Among the highlights are 1) developments that facilitate access to, and application of, the GO knowledgebase, and 2) extensions to the resource as well as increasing support for descriptions of causal models of biological systems and network biology. To learn more, visit http://geneontology.org/.

1,356 citations


Journal ArticleDOI
TL;DR: This work presents the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view.
Abstract: Bacterial and fungal secondary metabolism is a rich source of novel bioactive compounds with potential pharmaceutical applications as antibiotics, anti-tumor drugs or cholesterol-lowering drugs To find new drug candidates, microbiologists are increasingly relying on sequencing genomes of a wide variety of microbes However, rapidly and reliably pinpointing all the potential gene clusters for secondary metabolites in dozens of newly sequenced genomes has been extremely challenging, due to their biochemical heterogeneity, the presence of unknown enzymes and the dispersed nature of the necessary specialized bioinformatics tools and resources Here, we present antiSMASH (antibiotics & Secondary Metabolite Analysis Shell), the first comprehensive pipeline capable of identifying biosynthetic loci covering the whole range of known secondary metabolite compound classes (polyketides, non-ribosomal peptides, terpenes, aminoglycosides, aminocoumarins, indolocarbazoles, lantibiotics, bacteriocins, nucleosides, beta-lactams, butyrolactones, siderophores, melanins and others) It aligns the identified regions at the gene cluster level to their nearest relatives from a database containing all other known gene clusters, and integrates or cross-links all previously available secondary-metabolite specific gene analysis methods in one interactive view antiSMASH is available at http://antismashsecondarymetabolitesorg

1,222 citations


"antiSMASH 5.0: updates to the secon..." refers background or methods in this paper

  • ...Initially released in 2011 (6), it has since been further extended and improved (7–12), and is currently used by thousands of academic and industrial scientists worldwide to identify so called secondary metabolite ‘biosynthetic gene clusters’ (BGCs) in their genomes of interest....

    [...]

  • ...The KnownClusterBlast and ClusterBlast search functions use an algorithm first described in antiSMASH 1 (6), which also is in use in a generalized version in MultiGeneBlast (39)....

    [...]