scispace - formally typeset
Open accessPosted ContentDOI: 10.1101/2020.12.19.423547

Vertical inheritance governs biosynthetic gene cluster evolution and chemical diversification

02 Mar 2021-bioRxiv (Cold Spring Harbor Laboratory)-
Abstract: While specialized metabolites are thought to mediate ecological interactions, the evolutionary processes driving their diversification, particularly among closely related lineages, remain poorly understood. Here, we examine the evolutionary dynamics governing the distribution of natural product biosynthetic gene clusters (BGCs) using 118 strains within the marine actinomycete genus Salinispora. While previous evidence indicated that horizontal gene transfer (HGT) largely contributed to BGC diversity, we find that a majority of BGCs in Salinispora genomes are conserved through processes of vertical descent. In particular, vertical inheritance maintained BGCs over evolutionary timescales (millions of years) allowing for BGC diversification among Salinispora species. By coupling the genomic analyses with high-resolution tandem mass spectrometry (LC-MS/MS), we identified that BGC evolution in Salinispora proceeds largely through gene gain/loss events and constrained recombination that contributes to interspecies diversity at the gene, pathway, and metabolite levels. Consequently, the evolutionary processes driving BGC diversification had direct consequences for compound production and contributed to chemical diversification, as exemplified in our case study of the medically relevant proteosome inhibitors, the salinosporamides. Together, our results support the concept that specialized metabolites, and their cognate BGCs, represent functional traits associated with niche differentiation among Salinispora species. GRAPHICAL SIGNIFICANCE Natural products are traditionally exploited for their pharmaceutical potential; yet what is often overlooked is that the evolution of the biosynthetic gene clusters (BGCs) encoding these small molecules likely affects the diversification of the produced compounds and implicitly has an impact on the compounds’ activities and ecological functions. And while the prevailing dogma in natural product research attributes frequent and widespread horizontal gene transfer (HGT) as an integral driver of BGC evolution, we find that the majority of BGC diversity derives from processes of vertical descent, with HGT events being rare. This understanding can facilitate informed biosynthetic predictions to identify novel natural products, in addition to uncovering how these specialized metabolites contribute to the environmental distribution of microbes.

... read more


7 results found

Open accessJournal ArticleDOI: 10.1039/D1NP00006C
Abstract: Covering: up to 2021Metagenomics has yielded massive amounts of sequencing data offering a glimpse into the biosynthetic potential of the uncultivated microbial majority. While genome-resolved information about microbial communities from nearly every environment on earth is now available, the ability to accurately predict biocatalytic functions directly from sequencing data remains challenging. Compared to primary metabolic pathways, enzymes involved in secondary metabolism often catalyze specialized reactions with diverse substrates, making these pathways rich resources for the discovery of new enzymology. To date, functional insights gained from studies on environmental DNA (eDNA) have largely relied on PCR- or activity-based screening of eDNA fragments cloned in fosmid or cosmid libraries. As an alternative, shotgun metagenomics holds underexplored potential for the discovery of new enzymes directly from eDNA by avoiding common biases introduced through PCR- or activity-guided functional metagenomics workflows. However, inferring new enzyme functions directly from eDNA is similar to searching for a 'needle in a haystack' without direct links between genotype and phenotype. The goal of this review is to provide a roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes. We cover both computational and experimental strategies to mine metagenomes and explore protein sequence space with a spotlight on natural product biosynthesis. Specifically, we compare in silico methods for enzyme discovery including phylogenetics, sequence similarity networks, genomic context, 3D structure-based approaches, and machine learning techniques. We also discuss various experimental strategies to test computational predictions including heterologous expression and screening. Finally, we provide an outlook for future directions in the field with an emphasis on meta-omics, single-cell genomics, cell-free expression systems, and sequence-independent methods.

... read more

Topics: Metagenomics (58%), Environmental DNA (53%), Genomics (53%)

5 Citations

Open accessPosted ContentDOI: 10.1101/2021.08.11.455920
11 Aug 2021-bioRxiv
Abstract: Bacterial secondary metabolites have been studied for decades for their usefulness as drugs, such as antibiotics. However, the identification of new structures has been decelerating, in part due to rediscovery of known compounds. Meanwhile, multi-resistant pathogens continue to emerge, urging the need for new antibiotics. It is unclear how much chemical diversity exists in Nature and whether discovery efforts should be focused on established antibiotic producers or rather on understudied taxa. Here, we surveyed around 170,000 bacterial genomes as well as several thousands of Metagenome Assembled Genomes (MAGs) for their diversity in Biosynthetic Gene Clusters (BGCs) known to encode the biosynthetic machinery for producing secondary metabolites. We used two distinct algorithms to provide a global overview of the biosynthetic diversity present in the sequenced part of the bacterial kingdom. Our results indicate that only 3% of genomic potential for natural products has been experimentally discovered. We connect the emergence of most biosynthetic diversity in evolutionary history close to the taxonomic rank of genus. Despite enormous differences in potential among taxa, we identify Streptomyces as by far the most biosynthetically diverse based on currently available data. Simultaneously, our analysis highlights multiple promising high-producing taxas that have thus far escaped investigation.

... read more

Topics: Bacterial genome size (52%)

3 Citations

Journal ArticleDOI: 10.1039/D1NP00013F
Abstract: This review covers literature between 2003–2021 The development and application of genome mining tools has given rise to ever-growing genetic and chemical databases and propelled natural products research into the modern age of Big Data. Likewise, an explosion of evolutionary studies has unveiled genetic patterns of natural products biosynthesis and function that support Darwin's theory of natural selection and other theories of adaptation and diversification. In this review, we aim to highlight how Big Data and evolutionary thinking converge in the study of natural products, and how this has led to an emerging sub-discipline of evolutionary genome mining of natural products. First, we outline general principles to best utilize Big Data in natural products research, addressing key considerations needed to provide evolutionary context. We then highlight successful examples where Big Data and evolutionary analyses have been combined to provide bioinformatic resources and tools for the discovery of novel natural products and their biosynthetic enzymes. Rather than an exhaustive list of evolution-driven discoveries, we highlight examples where Big Data and evolutionary thinking have been embraced for the evolutionary genome mining of natural products. After reviewing the nascent history of this sub-discipline, we discuss the challenges and opportunities of genomic and metabolomic tools with evolutionary foundations and/or implications and provide a future outlook for this emerging and exciting field of natural product research.

... read more

2 Citations

Open accessPosted ContentDOI: 10.1101/2021.05.10.443473
11 May 2021-bioRxiv
Abstract: Bacteria of the phylum Acidobacteria are one of the most abundant bacterial across soil ecosystems, yet they are represented by comparatively few sequenced genomes, leaving gaps in our understanding of their metabolic diversity. Recently, genomes of Acidobacteria species with unusually large repertoires of biosynthetic gene clusters (BGCs) were reconstructed from grassland soil metagenomes, but the degree to which these species are widespread is still unknown. To investigate this, we augmented a dataset of publicly available Acidobacteria genomes with 46 metagenome-assembled genomes recovered from permanently saturated organic-rich soils of a vernal (spring) pool ecosystem in Northern California. We recovered high quality genomes for three novel species from Candidatus Angelobacter (a proposed subdivision 1 Acidobacterial genus), a genus that is genomically enriched in genes for specialized metabolite biosynthesis. Acidobacteria were particularly abundant in the vernal pool sediments, and a Ca. Angelobacter species was the most abundant bacterial species detected in some samples. We identified numerous diverse biosynthetic gene clusters in these genomes, and also in additional genomes from other publicly available soil metagenomes for other related Ca. Angelobacter species. Metabolic analysis indicates that Ca. Angelobacter likely are aerobes that ferment organic carbon, with potential to contribute to carbon compound turnover in soils. Using metatranscriptomics, we identified in situ expression of specialized metabolic traits for two species from this genus. In conclusion, we expand genomic sampling of the uncultivated Ca. Angelobacter, and show that they represent common and sometimes highly abundant members of dry and saturated soil communities, with a high degree of capacity for synthesis of diverse specialized metabolites.

... read more

Topics: Acidobacteria (64%)

1 Citations


84 results found

Open accessJournal ArticleDOI: 10.1093/BIOINFORMATICS/BTU033
Alexandros Stamatakis1Institutions (1)
01 May 2014-Bioinformatics
Abstract: Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting postanalyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU

... read more

Topics: Intrinsics (56%)

18,576 Citations

Open accessJournal ArticleDOI: 10.1093/MOLBEV/MSY096
Sudhir Kumar1, Sudhir Kumar2, Glen Stecher2, Michael Li2  +2 moreInstitutions (3)
Abstract: The Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from free of charge.

... read more

Topics: Mega- (60%), Virtualization (51%), Software (50%)

11,718 Citations

Journal ArticleDOI: 10.1111/J.1442-9993.1993.TB00438.X
K. R. Clarke1Institutions (1)
01 Mar 1993-Austral Ecology
Abstract: In the early 1980s, a strategy for graphical representation of multivariate (multi-species) abundance data was introduced into marine ecology by, among others, Field, et al. (1982). A decade on, it is instructive to: (i) identify which elements of this often-quoted strategy have proved most useful in practical assessment of community change resulting from pollution impact; and (ii) ask to what extent evolution of techniques in the intervening years has added self-consistency and comprehensiveness to the approach. The pivotal concept has proved to be that of a biologically-relevant definition of similarity of two samples, and its utilization mainly in simple rank form, for example ‘sample A is more similar to sample B than it is to sample C’. Statistical assumptions about the data are thus minimized and the resulting non-parametric techniques will be of very general applicability. From such a starting point, a unified framework needs to encompass: (i) the display of community patterns through clustering and ordination of samples; (ii) identification of species principally responsible for determining sample groupings; (iii) statistical tests for differences in space and time (multivariate analogues of analysis of variance, based on rank similarities); and (iv) the linking of community differences to patterns in the physical and chemical environment (the latter also dictated by rank similarities between samples). Techniques are described that bring such a framework into place, and areas in which problems remain are identified. Accumulated practical experience with these methods is discussed, in particular applications to marine benthos, and it is concluded that they have much to offer practitioners of environmental impact studies on communities.

... read more

Topics: Sample (statistics) (53%), Rank (computer programming) (51%), Multivariate statistics (51%) ... read more

11,277 Citations

Open accessJournal ArticleDOI: 10.1038/MSB.2011.75
Fabian Sievers1, Andreas Wilm2, David Dineen1, Toby J. Gibson  +8 moreInstitutions (6)
Abstract: Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

... read more

10,162 Citations

Open accessJournal ArticleDOI: 10.1101/GR.229202
W. James Kent1Institutions (1)
01 Apr 2002-Genome Research
Abstract: Analyzing vertebrate genomes requires rapid mRNA/DNA and cross-species protein alignments A new tool, BLAT, is more accurate and 500 times faster than popular existing tools for mRNA/DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences BLAT's speed stems from an index of all nonoverlapping K-mers in the genome This index fits inside the RAM of inexpensive computers, and need only be computed once for each genome assembly BLAT has several major stages It uses the index to find regions in the genome likely to be homologous to the query sequence It performs an alignment between homologous regions It stitches together these aligned regions (often exons) into larger alignments (typically genes) Finally, BLAT revisits small internal exons possibly missed at the first stage and adjusts large gap boundaries that have canonical splice sites where feasible This paper describes how BLAT was optimized Effects on speed and sensitivity are explored for various K-mer sizes, mismatch schemes, and number of required index matches BLAT is compared with other alignment programs on various test sets and then used in several genome-wide applications http://genomeucscedu hosts a web-based BLAT server for the human genome

... read more

Topics: Blat (60%)

7,686 Citations

No. of citations received by the Paper in previous years