scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The confluence of big data and evolutionary genome mining for the discovery of natural products.

17 Nov 2021-Natural Product Reports (The Royal Society of Chemistry)-Vol. 38, Iss: 11, pp 2024-2040
TL;DR: This review covers literature between 2003-2021 and highlights examples where Big Data and evolutionary analyses have been combined to provide bioinformatic resources and tools for the discovery of novel natural products and their biosynthetic enzymes.
About: This article is published in Natural Product Reports.The article was published on 2021-11-17. It has received 21 citations till now.
Citations
More filters
01 Jan 2014
TL;DR: By performing a systematic computational analysis of BGC evolution, this work derives evidence for three findings that shed light on the ways in which, despite these constraints, nature successfully invents new molecules.
Abstract: © 2014 Medema et al. Bacterial secondary metabolites are widely used as antibiotics, anticancer drugs, insecticides and food additives. Attempts to engineer their biosynthetic gene clusters (BGCs) to produce unnatural metabolites with improved properties a

117 citations

Journal ArticleDOI
TL;DR: The authors analyzed ~170,000 bacterial genomes and ~47,000 metagenome assembled genomes using a modified BiG-SLiCE and the new clust-o-matic algorithm.
Abstract: Bacterial specialized metabolites are a proven source of antibiotics and cancer therapies, but whether we have sampled all the secondary metabolite chemical diversity of cultivated bacteria is not known. We analysed ~170,000 bacterial genomes and ~47,000 metagenome assembled genomes (MAGs) using a modified BiG-SLiCE and the new clust-o-matic algorithm. We estimate that only 3% of the natural products potentially encoded in bacterial genomes have been experimentally characterized. We show that the variation in secondary metabolite biosynthetic diversity drops significantly at the genus level, identifying it as an appropriate taxonomic rank for comparison. Equal comparison of genera based on relative evolutionary distance revealed that Streptomyces bacteria encode the largest biosynthetic diversity by far, with Amycolatopsis, Kutzneria and Micromonospora also encoding substantial diversity. Finally, we find that several less-well-studied taxa, such as Weeksellaceae (Bacteroidota), Myxococcaceae (Myxococcota), Pleurocapsa and Nostocaceae (Cyanobacteria), have potential to produce highly diverse sets of secondary metabolites that warrant further investigation.

60 citations

Journal ArticleDOI
TL;DR: Results from this model community show that bacterial BGC expression and chemical output depend on the identity and biosynthetic capacity of coculture partners, suggesting community composition and microbiome interactions may shape the regulation of secondary metabolism in nature.
Abstract: Significance Microbial communities have been implicated in human and plant disease and are essential to global biogeochemical cycles. However, our ability to reliably alter these communities is limited by insufficient understanding of the networks that drive community processes. The goal of this study was to understand how community membership alters secondary metabolism in a model microbial community. We found that community species composition affects expression of biosynthetic genes and abundance of metabolites. Dramatic changes were observed when the biosynthetic gene cluster of one metabolite, koreenceine, was deleted, suggesting that interspecies interaction networks may be driven by secondary metabolites. This work offers an approach to dissecting the flow of information through communities, which could lead to strategies for manipulating community function.

11 citations

Journal ArticleDOI
TL;DR: Genomics-based approaches for prioritizing candidate BGCs extracted from large-scale genomic data are discussed, by highlighting studies that have successfully produced compounds with high chemical novelty, novel biosynthesis pathway, and potent bioactivities.
Abstract: Large-scale genome-mining analyses have identified an enormous number of cryptic biosynthetic gene clusters (BGCs) as a great source of novel bioactive natural products. Given the sheer number of natural product (NP) candidates, effective strategies and computational methods are keys to choosing appropriate BGCs for further NP characterization and production. This review discusses genomics-based approaches for prioritizing candidate BGCs extracted from large-scale genomic data, by highlighting studies that have successfully produced compounds with high chemical novelty, novel biosynthesis pathway, and potent bioactivities. We group these studies based on their BGC-prioritization logics: detecting presence of resistance genes, use of phylogenomics analysis as a guide, and targeting for specific chemical structures. We also briefly comment on the different bioinformatics tools used in the field and examine practical considerations when employing a large-scale genome mining study.

8 citations

Journal ArticleDOI
TL;DR: This work considers innovative approaches which have led to prioritization of strain targets and have mitigated rediscovery rates, and discusses integration of principles of comparative evolutionary studies and retrobiosynthetic predictions to better understand biosynthetic mechanistic details and link genome sequence to structure.
Abstract: The pairing of analytical chemistry with genomic techniques represents a new wave in natural product chemistry. With an increase in the availability of sequencing and assembly of microbial genomes, interrogation into the biosynthetic capability of producers with valuable secondary metabolites is possible. However, without the development of robust, accessible, and medium to high throughput tools, the bottleneck in pairing metabolic potential and compound isolation will continue. Several innovative approaches have proven useful in the nascent stages of microbial genome-informed drug discovery. Here, we consider a number of these approaches which have led to prioritization of strain targets and have mitigated rediscovery rates. Likewise, we discuss integration of principles of comparative evolutionary studies and retrobiosynthetic predictions to better understand biosynthetic mechanistic details and link genome sequence to structure. Lastly, we discuss advances in engineering, chemistry, and molecular networking and other computational approaches that are accelerating progress in the field of omic-informed natural product drug discovery. Together, these strategies enhance the synergy between cutting edge omics, chemical characterization, and computational technologies that pitch the discovery of natural products with pharmaceutical and other potential applications to the crest of the wave where progress is ripe for rapid advances.

7 citations

References
More filters
Journal ArticleDOI
TL;DR: A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score.

88,255 citations

Journal ArticleDOI
TL;DR: It is proposed that a formal system of organisms be established in which above the level of kingdom there exists a new taxon called a "domain." Life on this planet would be seen as comprising three domains, the Bacteria, the Archaea, and the Eucarya, each containing two or more kingdoms.
Abstract: Molecular structures and sequences are generally more revealing of evolutionary relationships than are classical phenotypes (particularly so among microorganisms). Consequently, the basis for the definition of taxa has progressively shifted from the organismal to the cellular to the molecular level. Molecular comparisons show that life on this planet divides into three primary groupings, commonly known as the eubacteria, the archaebacteria, and the eukaryotes. The three are very dissimilar, the differences that separate them being of a more profound nature than the differences that separate typical kingdoms, such as animals and plants. Unfortunately, neither of the conventionally accepted views of the natural relationships among living systems--i.e., the five-kingdom taxonomy or the eukaryote-prokaryote dichotomy--reflects this primary tripartite division of the living world. To remedy this situation we propose that a formal system of organisms be established in which above the level of kingdom there exists a new taxon called a "domain." Life on this planet would then be seen as comprising three domains, the Bacteria, the Archaea, and the Eucarya, each containing two or more kingdoms. (The Eucarya, for example, contain Animalia, Plantae, Fungi, and a number of others yet to be defined). Although taxonomic structure within the Bacteria and Eucarya is not treated herein, Archaea is formally subdivided into the two kingdoms Euryarchaeota (encompassing the methanogens and their phenotypically diverse relatives) and Crenarchaeota (comprising the relatively tight clustering of extremely thermophilic archaebacteria, whose general phenotype appears to resemble most the ancestral phenotype of the Archaea.

5,689 citations

Journal ArticleDOI
TL;DR: The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes that is stable, extensible, and freely available to all researchers.
Abstract: Random community genomes (metagenomes) are now commonly used to study microbes in different environments. Over the past few years, the major challenge associated with metagenomics shifted from generating to analyzing sequences. High-throughput, low-cost next-generation sequencing has provided access to metagenomics to a wide range of researchers. A high-throughput pipeline has been constructed to provide high-performance computing to all researchers interested in using metagenomics. The pipeline produces automated functional assignments of sequences in the metagenome by comparing both protein and nucleotide databases. Phylogenetic and functional summaries of the metagenomes are generated, and tools for comparative metagenomics are incorporated into the standard views. User access is controlled to ensure data privacy, but the collaborative environment underpinning the service provides a framework for sharing datasets between multiple users. In the metagenomics RAST, all users retain full control of their data, and everything is available for download in a variety of formats. The open-source metagenomics RAST service provides a new paradigm for the annotation and analysis of metagenomes. With built-in support for multiple data sources and a back end that houses abstract data types, the metagenomics RAST is stable, extensible, and freely available to all researchers. This service has removed one of the primary bottlenecks in metagenome sequence analysis – the availability of high-performance computing for annotating the data. http://metagenomics.nmpdr.org

3,322 citations

Journal ArticleDOI
TL;DR: The short history, specific features and future prospects of research of microbial metabolites, including antibiotics and other bioactive metabolites, are summarized.
Abstract: The short history, specific features and future prospects of research of microbial metabolites, including antibiotics and other bioactive metabolites, are summarized. The microbial origin, diversity of producing species, functions and various bioactivities of metabolites, unique features of their chemical structures are discussed, mainly on the basis of statistical data. The possible numbers of metabolites may be discovered in the future, the problems of dereplication of newly isolated compounds as well as the new trends and prospects of the research are also discussed.

2,706 citations

Journal ArticleDOI
Mingxun Wang1, Jeremy Carver1, Vanessa V. Phelan2, Laura M. Sanchez2, Neha Garg2, Yao Peng1, Don D. Nguyen1, Jeramie D. Watrous2, Clifford A. Kapono1, Tal Luzzatto-Knaan2, Carla Porto2, Amina Bouslimani2, Alexey V. Melnik2, Michael J. Meehan2, Wei-Ting Liu3, Max Crüsemann4, Paul D. Boudreau4, Eduardo Esquenazi, Mario Sandoval-Calderón5, Roland D. Kersten6, Laura A. Pace2, Robert A. Quinn7, Katherine R. Duncan8, Cheng-Chih Hsu1, Dimitrios J. Floros1, Ronnie G. Gavilan, Karin Kleigrewe4, Trent R. Northen9, Rachel J. Dutton10, Delphine Parrot11, Erin E. Carlson12, Bertrand Aigle13, Charlotte Frydenlund Michelsen14, Lars Jelsbak14, Christian Sohlenkamp5, Pavel A. Pevzner1, Anna Edlund15, Anna Edlund16, Jeffrey S. McLean17, Jeffrey S. McLean15, Jörn Piel18, Brian T. Murphy19, Lena Gerwick4, Chih-Chuang Liaw20, Yu-Liang Yang21, Hans-Ulrich Humpf22, Maria Maansson14, Robert A. Keyzers23, Amy C. Sims24, Andrew R. Johnson25, Ashley M. Sidebottom25, Brian E. Sedio26, Andreas Klitgaard14, Charles B. Larson4, Charles B. Larson2, Cristopher A. Boya P., Daniel Torres-Mendoza, David Gonzalez2, Denise Brentan Silva27, Denise Brentan Silva28, Lucas Miranda Marques28, Daniel P. Demarque28, Egle Pociute, Ellis C. O’Neill4, Enora Briand11, Enora Briand4, Eric J. N. Helfrich18, Eve A. Granatosky29, Evgenia Glukhov4, Florian Ryffel18, Hailey Houson, Hosein Mohimani1, Jenan J. Kharbush4, Yi Zeng1, Julia A. Vorholt18, Kenji L. Kurita30, Pep Charusanti1, Kerry L. McPhail31, Kristian Fog Nielsen14, Lisa Vuong, Maryam Elfeki19, Matthew F. Traxler32, Niclas Engene33, Nobuhiro Koyama2, Oliver B. Vining31, Ralph S. Baric24, Ricardo Pianta Rodrigues da Silva28, Samantha J. Mascuch4, Sophie Tomasi11, Stefan Jenkins9, Venkat R. Macherla, Thomas Hoffman, Vinayak Agarwal4, Philip G. Williams34, Jingqui Dai34, Ram P. Neupane34, Joshua R. Gurr34, Andrés M. C. Rodríguez28, Anne Lamsa1, Chen Zhang1, Kathleen Dorrestein2, Brendan M. Duggan2, Jehad Almaliti2, Pierre-Marie Allard35, Prasad Phapale, Louis-Félix Nothias36, Theodore Alexandrov, Marc Litaudon36, Jean-Luc Wolfender35, Jennifer E. Kyle37, Thomas O. Metz37, Tyler Peryea38, Dac-Trung Nguyen38, Danielle VanLeer38, Paul Shinn38, Ajit Jadhav38, Rolf Müller, Katrina M. Waters37, Wenyuan Shi15, Xueting Liu39, Lixin Zhang39, Rob Knight1, Paul R. Jensen4, Bernhard O. Palsson1, Kit Pogliano1, Roger G. Linington30, Marcelino Gutiérrez, Norberto Peporine Lopes28, William H. Gerwick2, William H. Gerwick4, Bradley S. Moore4, Bradley S. Moore2, Pieter C. Dorrestein2, Pieter C. Dorrestein4, Nuno Bandeira2, Nuno Bandeira1 
TL;DR: In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations and data-driven social-networking should facilitate identification of spectra and foster collaborations.
Abstract: The potential of the diverse chemistries present in natural products (NP) for biotechnology and medicine remains untapped because NP databases are not searchable with raw data and the NP community has no way to share data other than in published papers. Although mass spectrometry (MS) techniques are well-suited to high-throughput characterization of NP, there is a pressing need for an infrastructure to enable sharing and curation of data. We present Global Natural Products Social Molecular Networking (GNPS; http://gnps.ucsd.edu), an open-access knowledge base for community-wide organization and sharing of raw, processed or identified tandem mass (MS/MS) spectrometry data. In GNPS, crowdsourced curation of freely available community-wide reference MS libraries will underpin improved annotations. Data-driven social-networking should facilitate identification of spectra and foster collaborations. We also introduce the concept of 'living data' through continuous reanalysis of deposited data.

2,365 citations

Related Papers (5)