The confluence of big data and evolutionary genome mining for the discovery of natural products.
TL;DR: This review covers literature between 2003-2021 and highlights examples where Big Data and evolutionary analyses have been combined to provide bioinformatic resources and tools for the discovery of novel natural products and their biosynthetic enzymes.
About: This article is published in Natural Product Reports.The article was published on 2021-11-17. It has received 21 citations till now.
Citations
More filters
01 Jan 2014
TL;DR: By performing a systematic computational analysis of BGC evolution, this work derives evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules.
Abstract: © 2014 Medema et al. Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties a
117 citations
••
TL;DR: The authors analyzed ~170,000 bacterial genomes and ~47,000 metagenome assembled genomes using a modified BiG-SLiCE and the new clust-o-matic algorithm.
Abstract: Bacterial specialized metabolites are a proven source of antibiotics and cancer therapies, but whether we have sampled all the secondary metabolite chemical diversity of cultivated bacteria is not known. We analysed ~170,000 bacterial genomes and ~47,000 metagenome assembled genomes (MAGs) using a modified BiG-SLiCE and the new clust-o-matic algorithm. We estimate that only 3% of the natural products potentially encoded in bacterial genomes have been experimentally characterized. We show that the variation in secondary metabolite biosynthetic diversity drops significantly at the genus level, identifying it as an appropriate taxonomic rank for comparison. Equal comparison of genera based on relative evolutionary distance revealed that Streptomyces bacteria encode the largest biosynthetic diversity by far, with Amycolatopsis, Kutzneria and Micromonospora also encoding substantial diversity. Finally, we find that several less-well-studied taxa, such as Weeksellaceae (Bacteroidota), Myxococcaceae (Myxococcota), Pleurocapsa and Nostocaceae (Cyanobacteria), have potential to produce highly diverse sets of secondary metabolites that warrant further investigation.
60 citations
••
TL;DR: Results from this model community show that bacterial BGC expression and chemical output depend on the identity and biosynthetic capacity of coculture partners, suggesting community composition and microbiome interactions may shape the regulation of secondary metabolism in nature.
Abstract: Significance Microbial communities have been implicated in human and plant disease and are essential to global biogeochemical cycles. However, our ability to reliably alter these communities is limited by insufficient understanding of the networks that drive community processes. The goal of this study was to understand how community membership alters secondary metabolism in a model microbial community. We found that community species composition affects expression of biosynthetic genes and abundance of metabolites. Dramatic changes were observed when the biosynthetic gene cluster of one metabolite, koreenceine, was deleted, suggesting that interspecies interaction networks may be driven by secondary metabolites. This work offers an approach to dissecting the flow of information through communities, which could lead to strategies for manipulating community function.
11 citations
••
TL;DR: Genomics-based approaches for prioritizing candidate BGCs extracted from large-scale genomic data are discussed, by highlighting studies that have successfully produced compounds with high chemical novelty, novel biosynthesis pathway, and potent bioactivities.
Abstract: Large-scale genome-mining analyses have identified an enormous number of cryptic biosynthetic gene clusters (BGCs) as a great source of novel bioactive natural products. Given the sheer number of natural product (NP) candidates, effective strategies and computational methods are keys to choosing appropriate BGCs for further NP characterization and production. This review discusses genomics-based approaches for prioritizing candidate BGCs extracted from large-scale genomic data, by highlighting studies that have successfully produced compounds with high chemical novelty, novel biosynthesis pathway, and potent bioactivities. We group these studies based on their BGC-prioritization logics: detecting presence of resistance genes, use of phylogenomics analysis as a guide, and targeting for specific chemical structures. We also briefly comment on the different bioinformatics tools used in the field and examine practical considerations when employing a large-scale genome mining study.
8 citations
••
TL;DR: This work considers innovative approaches which have led to prioritization of strain targets and have mitigated rediscovery rates, and discusses integration of principles of comparative evolutionary studies and retrobiosynthetic predictions to better understand biosynthetic mechanistic details and link genome sequence to structure.
Abstract: The pairing of analytical chemistry with genomic techniques represents a new wave in natural product chemistry. With an increase in the availability of sequencing and assembly of microbial genomes, interrogation into the biosynthetic capability of producers with valuable secondary metabolites is possible. However, without the development of robust, accessible, and medium to high throughput tools, the bottleneck in pairing metabolic potential and compound isolation will continue. Several innovative approaches have proven useful in the nascent stages of microbial genome-informed drug discovery. Here, we consider a number of these approaches which have led to prioritization of strain targets and have mitigated rediscovery rates. Likewise, we discuss integration of principles of comparative evolutionary studies and retrobiosynthetic predictions to better understand biosynthetic mechanistic details and link genome sequence to structure. Lastly, we discuss advances in engineering, chemistry, and molecular networking and other computational approaches that are accelerating progress in the field of omic-informed natural product drug discovery. Together, these strategies enhance the synergy between cutting edge omics, chemical characterization, and computational technologies that pitch the discovery of natural products with pharmaceutical and other potential applications to the crest of the wave where progress is ripe for rapid advances.
7 citations
References
More filters
••
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.
88,255 citations
••
TL;DR: It is proposed that a formal system of organisms be established in which above the level of kingdom there exists a new taxon called a "domain." Life on this planet would be seen as comprising three domains, the Bacteria, the Archaea, and the Eucarya, each containing two or more kingdoms.
Abstract: Molecular structures and sequences are generally more revealing of evolutionary relationships than are classical phenotypes (particularly so among microorganisms). Consequently, the basis for the definition of taxa has progressively shifted from the organismal to the cellular to the molecular level. Molecular comparisons show that life on this planet divides into three primary groupings, commonly known as the eubacteria, the archaebacteria, and the eukaryotes. The three are very dissimilar, the differences that separate them being of a more profound nature than the differences that separate typical kingdoms, such as animals and plants. Unfortunately, neither of the conventionally accepted views of the natural relationships among living systems--i.e., the five-kingdom taxonomy or the eukaryote-prokaryote dichotomy--reflects this primary tripartite division of the living world. To remedy this situation we propose that a formal system of organisms be established in which above the level of kingdom there exists a new taxon called a "domain." Life on this planet would then be seen as comprising three domains, the Bacteria, the Archaea, and the Eucarya, each containing two or more kingdoms. (The Eucarya, for example, contain Animalia, Plantae, Fungi, and a number of others yet to be defined). Although taxonomic structure within the Bacteria and Eucarya is not treated herein, Archaea is formally subdivided into the two kingdoms Euryarchaeota (encompassing the methanogens and their phenotypically diverse relatives) and Crenarchaeota (comprising the relatively tight clustering of extremely thermophilic archaebacteria, whose general phenotype appears to resemble most the ancestral phenotype of the Archaea.
5,689 citations
••
TL;DR: The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes that is stable, extensible, and freely available to all researchers.
Abstract: Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers. A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats. The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data. http://metagenomics.nmpdr.org
3,322 citations
••
TL;DR: The short history, specific features and future prospects of research of microbial metabolites, including antibiotics and other bioactive metabolites, are summarized.
Abstract: The short history, specific features and future prospects of research of microbial metabolites, including antibiotics and other bioactive metabolites, are summarized. The microbial origin, diversity of producing species, functions and various bioactivities of metabolites, unique features of their chemical structures are discussed, mainly on the basis of statistical data. The possible numbers of metabolites may be discovered in the future, the problems of dereplication of newly isolated compounds as well as the new trends and prospects of the research are also discussed.
2,706 citations
••
University of California, San Diego1, University of Montana2, Stanford University3, Scripps Institution of Oceanography4, National Autonomous University of Mexico5, Salk Institute for Biological Studies6, San Diego State University7, Strathclyde Institute of Pharmacy and Biomedical Sciences8, Lawrence Berkeley National Laboratory9, Harvard University10, University of Rennes11, University of Minnesota12, University of Lorraine13, Technical University of Denmark14, University of California, Los Angeles15, J. Craig Venter Institute16, University of Washington17, ETH Zurich18, University of Illinois at Chicago19, National Sun Yat-sen University20, Academia Sinica21, University of Münster22, Victoria University of Wellington23, University of North Carolina at Chapel Hill24, Indiana University25, Smithsonian Tropical Research Institute26, Federal University of Mato Grosso do Sul27, University of São Paulo28, University of Notre Dame29, University of California, Santa Cruz30, Oregon State University31, University of California, Berkeley32, Florida International University33, University of Hawaii at Manoa34, University of Geneva35, Institut de Chimie des Substances Naturelles36, Pacific Northwest National Laboratory37, National Institutes of Health38, Chinese Academy of Sciences39
TL;DR: In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations and data-driven social-networking should facilitate identification of spectra and foster collaborations.
Abstract: The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.
2,365 citations