scispace - formally typeset
Search or ask a question
Posted ContentDOI

Vertical inheritance governs biosynthetic gene cluster evolution and chemical diversification

02 Mar 2021-bioRxiv (Cold Spring Harbor Laboratory)-
TL;DR: In this article, the authors examined the evolutionary dynamics governing the distribution of natural product biosynthetic gene clusters (BGCs) using 118 strains within the marine actinomycete genus Salinispora.
Abstract: While specialized metabolites are thought to mediate ecological interactions, the evolutionary processes driving their diversification, particularly among closely related lineages, remain poorly understood. Here, we examine the evolutionary dynamics governing the distribution of natural product biosynthetic gene clusters (BGCs) using 118 strains within the marine actinomycete genus Salinispora. While previous evidence indicated that horizontal gene transfer (HGT) largely contributed to BGC diversity, we find that a majority of BGCs in Salinispora genomes are conserved through processes of vertical descent. In particular, vertical inheritance maintained BGCs over evolutionary timescales (millions of years) allowing for BGC diversification among Salinispora species. By coupling the genomic analyses with high-resolution tandem mass spectrometry (LC-MS/MS), we identified that BGC evolution in Salinispora proceeds largely through gene gain/loss events and constrained recombination that contributes to interspecies diversity at the gene, pathway, and metabolite levels. Consequently, the evolutionary processes driving BGC diversification had direct consequences for compound production and contributed to chemical diversification, as exemplified in our case study of the medically relevant proteosome inhibitors, the salinosporamides. Together, our results support the concept that specialized metabolites, and their cognate BGCs, represent functional traits associated with niche differentiation among Salinispora species. GRAPHICAL SIGNIFICANCE Natural products are traditionally exploited for their pharmaceutical potential; yet what is often overlooked is that the evolution of the biosynthetic gene clusters (BGCs) encoding these small molecules likely affects the diversification of the produced compounds and implicitly has an impact on the compounds’ activities and ecological functions. And while the prevailing dogma in natural product research attributes frequent and widespread horizontal gene transfer (HGT) as an integral driver of BGC evolution, we find that the majority of BGC diversity derives from processes of vertical descent, with HGT events being rare. This understanding can facilitate informed biosynthetic predictions to identify novel natural products, in addition to uncovering how these specialized metabolites contribute to the environmental distribution of microbes.

Summary (1 min read)

RESULTS

  • Salinispora delineated by biosynthetic potential The authors molecular clock analysis indicated that Salinispora recently diverged within the Micromonosporaceae family 89.1±37.1 million years ago ; yet the genus has already differentiated into nine species .
  • Given the high percentage of species-specific flexible genes associated with specialized metabolism, the authors expected that BGC diversity and distribution would similarly correspond with Salinispora species diversity.
  • To compare BGC composition across species, the 3041 predicted BGCs were grouped into 305 gene cluster families .
  • In contrast, the vast majority of BGCs were shared among strains .
  • As in flexible gene content, the authors found that 43.6% of the variation in GCF composition was explained by species designation , with geography explaining an additional 11.1% (p<0.01).

Drivers of BGC evolution

  • Given that BGC distributions were largely explained by shared phylogenetic history, the authors sought to identify the specific evolutionary processes that may contribute to BGC diversification.
  • The copyright holder for this preprintthis version posted March 2, 2021.
  • To better understand the impact of recombination in structuring the genetic diversity within the nine BGCs, the authors calculated the ratio at which nucleotides are replaced by either recombination or point mutations (r/m).
  • In contrast, most conserved biosynthetic genes showed no evidence of recent selective sweeps, with the relatively high nucleotide diversity indicating that recombination was insufficient to prevent BGC diversification.
  • While salinosporamides A and K were originally reported from S. tropica (40) and S. pacifica (39, 41), respectively, the authors now show that the sal BGC is observed in six of the nine Salinispora species .

DISCUSSION

  • It has become increasingly clear that the fine-scale genomic diversity observed in microbial communities reflects the large number of ecologically distinct lineages that co-occur within microbiomes (44–46).
  • Broadly, their results suggest that specialized metabolites contribute to functional differences capable of promoting ecological differentiation and subsequent fine-scale diversification in microbial communities.
  • Similarly, the authors observed a distinct phylogenetic signal at both the BGC and metabolite levels, indicating that vertical inheritance is a major driver of BGC evolution.
  • CC-BY-NC-ND 4.0 International licenseavailable under a (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity.
  • By examining the distribution of BGCs among closely related Salinispora species, the authors also detected a strong signal of vertical inheritance, even among BGCs that likely originated from ancestral HGT events.

DATA AVAILABILITY

  • All genomes are publicly available (Table S1).
  • Public datasets for all metabolomic spectra files are available at massive.ucsd.edu (MSV000085890).
  • All other data and relevant code used can be found at https://github.com/alex-b-chase/salBGCevol.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1
Vertical inheritance governs biosynthetic gene cluster
evolution and chemical diversification
Alexander B. Chase
1
, Douglas Sweeney
1,2
, Mitchell N. Muskat
1
, Dulce Guillén-Matus
1,2
, and Paul R. Jensen
1,2
1
Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, California
2
Marine Biology Research Division, Scripps Institution of Oceanography, University of California, San Diego, California
ABSTRACT
While specialized metabolites are thought to mediate ecological interactions, the evolutionary processes driving
their diversification, particularly among closely related lineages, remain poorly understood. Here, we examine
the evolutionary dynamics governing the distribution of natural product biosynthetic gene clusters (BGCs) using
118 strains within the marine actinomycete genus Salinispora. While previous evidence indicated that horizontal
gene transfer (HGT) largely contributed to BGC diversity, we find that a majority of BGCs in Salinispora genomes
are conserved through processes of vertical descent. In particular, vertical inheritance maintained BGCs over
evolutionary timescales (millions of years) allowing for BGC diversification among Salinispora species. By
coupling the genomic analyses with high-resolution tandem mass spectrometry (LC-MS/MS), we identified that
BGC evolution in Salinispora proceeds largely through gene gain/loss events and constrained recombination
that contributes to interspecies diversity at the gene, pathway, and metabolite levels. Consequently, the
evolutionary processes driving BGC diversification had direct consequences for compound production and
contributed to chemical diversification, as exemplified in our case study of the medically relevant proteosome
inhibitors, the salinosporamides. Together, our results support the concept that specialized metabolites, and their
cognate BGCs, represent functional traits associated with niche differentiation among Salinispora species.
GRAPHICAL ABSTRACT
KEYWORDS Salinispora | homologous recombination | microbial ecology | speciation
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2020.12.19.423547doi: bioRxiv preprint

2
Vertical inheritance governs biosynthetic gene cluster
evolution and chemical diversification
Alexander B. Chase
1
, Douglas Sweeney
1,2
, Mitchell N. Muskat
1
, Dulce Guillén-Matus
1,2
, and Paul R. Jensen
1,2
1
Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California, San Diego, California
2
Marine Biology Research Division, Scripps Institution of Oceanography, University of California, San Diego, California
While specialized metabolites are thought to mediate ecological
interactions, the evolutionary processes driving their diversification,
particularly among closely related lineages, remain poorly
understood. Here, we examine the evolutionary dynamics governing
the distribution of natural product biosynthetic gene clusters (BGCs)
using 118 strains within the marine actinomycete genus Salinispora.
While previous evidence indicated that horizontal gene transfer (HGT)
largely contributed to BGC diversity, we find that a majority of BGCs
in Salinispora genomes are conserved through processes of vertical
descent. In particular, vertical inheritance maintained BGCs over
evolutionary timescales (millions of years) allowing for BGC
diversification among Salinispora species. By coupling the genomic
analyses with high-resolution tandem mass spectrometry (LC-
MS/MS), we identified that BGC evolution in Salinispora proceeds
largely through gene gain/loss events and constrained
recombination that contributes to interspecies diversity at the gene,
pathway, and metabolite levels. Consequently, the evolutionary
processes driving BGC diversification had direct consequences for
compound production and contributed to chemical diversification, as
exemplified in our case study of the medically relevant proteosome
inhibitors, the salinosporamides. Together, our results support the
concept that specialized metabolites, and their cognate BGCs,
represent functional traits associated with niche differentiation
among Salinispora species.
Salinispora | homologous recombination | microbial ecology | speciation
espite linkages between abiotic factors and bacterial diversity (1
3), the key functional traits driving biotic interactions in microbial
communities remain poorly understood. These functional traits
likely include specialized metabolites, or small molecule natural products,
that are known to modulate ecological interactions between organisms (4).
Specialized metabolites, which include molecules to known as act as
antibiotics and siderophores, can drive biotic interactions via mechanisms
such as competition, nutrient uptake, and defense. Taken together, the
functions of these small molecules likely represent a major driver of
microbial community composition (4). Given that microbes produce a wide
variety of biologically active natural products, these compounds may
contribute to the diversification and environmental distribution of microbes,
as has been observed in eukaryotes (5). To date, however, studies of
bacterial specialized metabolite production have largely focused on the
discovery of compounds with pharmaceutical potential as opposed to
understanding their ecological and evolutionary significance.
Marine sediments represent a unique environment to assess the
relationships between microbial diversity and specialized metabolite
production. Sediments are associated with diverse microbial communities
(6) where competition for limited resources is facilitated via the secretion
of small molecules (7, 8). Among bacteria inhabiting marine sediments,
actinomycetes such as the genus Salinispora are well-known for the
production of specialized metabolites (911). Salinispora (family:
Micromonosporaceae) is a member of the rare biosphere in surface
sediments (12) and can readily be cultured from tropical and sub-tropical
locations (13). This relatively recently diverged lineage includes nine
closely related species that share >99% 16S rRNA sequence similarity
(14). The presence of this “microdiversity” suggests that fine-scale trait
differences contribute to differential resource utilization and niche
partitioning (15, 16). Indeed, two Salinispora species were recently shown
to employ differential ecological strategies for resource acquisition (17),
providing insights into the ecological and evolutionary mechanisms
contributing to Salinispora diversification.
While Salinispora has proven a robust model for natural product discovery
(18), much remains to be resolved concerning the evolutionary dynamics
of the biosynthetic gene clusters (BGCs) encoding these compounds. Prior
analyses of Salinispora BGCs revealed extensive horizontal gene transfer
(HGT) and exchange both within and between species (19), suggesting a
“plug and play” model of BGC evolution (20). These observations are
consistent with the prevailing view in natural product research that BGCs
are rapidly gained and lost via HGT (2123), particularly within
Actinobacteria taxa (24). In Salinispora, the exchange of BGCs between
species is likely mediated by conjugative elements, as in Streptomyces
(25), and further facilitated by the absence of geographic barriers in their
distribution (26). At the same time, profiles for the production of specialized
metabolites encoded by Salinispora BGCs revealed species-specific
patterns (27), providing competing models for BGC evolution.
Resolving the evolutionary dynamics driving BGC distributions in
Salinispora can help inform future natural product discovery efforts. By one
account, the horizontal exchange of Salinispora BGCs may be frequent
and highly dependent on the local community (19, 20), resulting in similar
strains from different locations yielding different metabolites. Alternatively,
the small molecules encoded by BGCs may represent phylogenetically
conserved traits (27, 28), in which case similar strains produce similar
metabolites. To reconcile these seemingly contrasting scenarios, we
sought to identify the evolutionary processes contributing to the distribution
of BGCs. We hypothesized that specialized metabolite production is
subject to strong selective pressures, with compounds and their associated
gene clusters representing functional traits contributing to niche
differentiation. If correct, we expected the distribution of BGCs, regardless
if they were horizontally acquired at some point in time (19), to be
subsequently maintained within Salinispora through processes of vertical
descent. To better understand the role of vertical inheritance in driving
BGC evolution, we revisited the distribution of BGCs in 118 genomes
across the nine newly described Salinispora species (14). Finally, to
assess the functional consequences of BGC diversification on compound
production, we focused on nine experimentally characterized BGCs and
applied targeted tandem mass spectrometry to detect their associated
molecules. Our results support the hypothesis that specialized metabolites
represent functional traits contributing to ecological differentiation among
closely related Salinispora species.
D
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2020.12.19.423547doi: bioRxiv preprint

3
RESULTS
Salinispora delineated by biosynthetic potential
Our molecular clock analysis indicated that Salinispora recently diverged
within the Micromonosporaceae family 89.1±37.1 million years ago (MYA;
Figure S1); yet the genus has already differentiated into nine species
(Figure 1A). To gain insights into the functional traits that promoted
differentiation within Salinispora, we first investigated differences in gene
content among the 118 Salinispora genomes (a.k.a., the flexible genome).
Flexible gene composition was highly congruent with species designations
(Figure 1B), with strains within the same species sharing more flexible
genes than expected by chance and explained 53.1% of the variation in
the flexible genome (permutational multivariate analysis of variance
(PERMANOVA); p<0.01). Geographic location, to a lesser degree,
accounted for 8.3% of the variation in flexible gene composition (p<0.01).
Within the flexible genome, we also identified species-specific orthologs,
genes shared by all strains within a species but not observed in any other
Salinispora species. These genes, which should encode the functional
traits that define Salinispora species, were largely annotated as
hypothetical proteins (Figure S2A). Of the available annotations,
18.1±15.3% of the species-specific orthologs were associated with
specialized metabolism (Figure S2B). In addition, when we searched the
genomic regions flanking all species-specific orthologs, regardless of
annotation, we found that 28.9±26.2% were located within the boundaries
of predicted biosynthetic gene clusters (BGCs; Figure S2C).
Given the high percentage of species-specific flexible genes associated
with specialized metabolism, we expected that BGC diversity and
distribution would similarly correspond with Salinispora species diversity.
To address this, we identified a total of 3041 complete or fragmented (on
contig edges) BGCs across all 118 Salinispora genomes (mean = 25.8 per
genome) accounting for 18±2.3% of an average 5.6 Mbp Salinispora
genome. When compared to other bacterial genera, including taxa well-
known for specialized metabolite production (e.g., Moorea and
Streptomyces), Salinispora dedicated the highest genomic percentage to
this form of metabolism (Figure S1), further highlighting its importance in
this genus. Between Salinispora species there was significant variation in
the total number of BGCs (Figure 1C; analysis of variance [ANOVA];
p<0.001) and the genomic percentage dedicated to specialized metabolite
production (Figure S1; ANOVA; p<0.001). In cases where the number of
genome sequences are low (e.g., S. vitiensis), BGC abundances may not
be representative of the species.
To compare BGC composition across species, the 3041 predicted BGCs
were grouped into 305 gene cluster families (GCFs; Figure S3A). Similar
to prior reports (20), 35% of the GCFs were populated by a single BGC
(Figure S3B). It can be inferred that these BGCs represent relatively recent
acquisition events that are not well represented in our genomic dataset.
Despite representing a large percentage of the total GCF diversity, the
singleton BGCs comprised only 3.6% of the BGCs detected among all
strains (108 out of 3041) and equate, on average, to only 0.9 BGCs per
Salinispora genome (inset Figure S3B). In contrast, the vast majority of
BGCs were shared among strains (Figure S3C). As in flexible gene
content, we found that 43.6% of the variation in GCF composition was
explained by species designation (Figure 1D; PERMANOVA; p<0.01), with
geography explaining an additional 11.1% (p<0.01). Correlations between
BGC distributions and species delineations are further supported by BGC
average nucleotide identity (ANI) values, which were highly similar to the
whole-genome ANI values used to delineate species boundaries (Figure
S4). Nonetheless, a small percentage of shared BGCs (1.4%) showed
evidence of relatively recent interspecific transfers (Figure S4B).
Collectively, these results indicate that HGT events provide a mechanism
to expand BGC diversity, while at the same time, BGC composition is
largely driven by processes of vertical descent.
Drivers of BGC evolution
Given that BGC distributions were largely explained by shared
phylogenetic history, we sought to identify the specific evolutionary
processes that may contribute to BGC diversification. To do so, we
concentrated on nine experimentally characterized BGCs that range in
conservation from species-specific to ubiquitous in the genus (Table S2).
Indeed, the nine BGCs span a range of evolutionary time, including BGCs
that were present prior to Salinispora speciation >100 MYA (i.e., lym, sta,
and spt), recently evolved BGCs (i.e., slc 6.3-18.3 MYA), and BGCs that
are prone to HGT events (Figure S5A). Despite these differences,
phylogenies of all nine BGCs (Figure S5B) revealed that genetic
differentiation within BGCs remained a function of time as they were
maintained by vertical inheritance (Figure 2; multiple r
2
=0.84, p<0.001).
The maintenance of BGCs over evolutionary time indicates that other
evolutionary processes can contribute to BGC diversification. An event-
inference parsimony model indicated that a variety of evolutionary
processes, including BGC duplication, transfer, and loss events,
contributed to the observed BGC distributions (Table 1). In particular, there
were frequent intraspecific recombination events, averaging 18.7±14.7
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2020.12.19.423547doi: bioRxiv preprint

4
across the nine BGCs, relative to interspecific horizontal transfers
(0.9±1.1), supporting the previous ANI result that horizontal transfer of
BGCs between species are rare (Figure S4). In the few cases where
interspecies transfers occurred (e.g., sal and slc BGCs), the BGCs
remained monophyletic post-transfer (Figure S5B) and their genetic
divergence remained a function of divergence time (Figure 2), supporting
a single transfer event followed by vertical inheritance. The model further
revealed that BGC loss events (Table 1) explained the patchy distributions
of some BGCs. For instance, the lom BGC followed a strict model of
vertical inheritance with predicted loss events in S. mooreana and S.
oceanensis (Figure S5A). Together, these results indicate that
diversification in the nine BGCs is highly correlated to divergence time with
frequent intraspecies recombination and loss events.
To better understand the impact of recombination in structuring the genetic
diversity within the nine BGCs, we calculated the ratio at which nucleotides
are replaced by either recombination or point mutations (r/m). At the
genome level, high levels of recombination (Table 1; r/m=1.8) among
closely related strains (ν=0.03 or 3% genetic divergence among strains)
maintains genetic cohesion within species. In fact, a recombination
network revealed no recent gene flow between recombining populations
within Salinispora species (Figure S6) suggesting this genetic isolation
may be evidence of nascent speciation events. Similarly, at the BGC level,
high levels of recombination (all r/m>1.5) were restricted to events between
closely related strains (ν
MEAN
=3.5%; Table 1). For example, recombination
in the rif BGC (r/m=13.9) was restricted to strains that have only diverged
by <0.3%. Recombination events were also restricted in the size of their
recombining segments, as the average length of a recombining segment
(δ) was a fraction of the total BGC length (Table 1). Finally, these events
had varying effects on the genetic diversity of the nine BGCs. For instance,
the two most widely distributed BGCs, lym and spt, exhibited drastically
different r/m values (1.6 and 10.2, respectively). As a result, reduced
recombination allowed the lym BGC to evolve in accordance with the core
genome, while frequent recombination of the spt BGC limited its
divergence (Figure 2). While recombination can homogenize genetic
diversity, these events were limited to small sections of the BGC.
Homologous recombination can also facilitate gene-specific sweeps (29),
which can be evident by reduced nucleotide diversity. A comparison of the
conserved biosynthetic genes found in the nine BGCs with those found in
the core genome revealed reduced nucleotide diversity in the sal and spo
BGCs in S. pacifica and the slc BGC in S. arenicola (Figure S7A). However,
the BGCs in these three instances were only observed in a small number
of closely related strains (i.e., three S. pacifica and five S. arenicola strains
sharing >99.6% and >99.4% genome-wide ANI, respectively), which likely
accounts for the reduced diversity. In contrast, most conserved
biosynthetic genes showed no evidence of recent selective sweeps, with
the relatively high nucleotide diversity indicating that recombination was
insufficient to prevent BGC diversification. Depressed gene flow observed
in the recombination network (Figure S6) provides further support that
recombination is unlikely to constrain species-level BGC diversification.
Notably, this diversification was neutral based on analyses of selection
coefficients (dN/dS). Among the 134 conserved biosynthetic genes
analyzed, 97.8% had dN/dS<1, indicating neutral, nondirectional selection
(Figure S7B). Thus, it appears that the high nucleotide diversity in BGCs,
such as rif (Figure S7A), is due to ancestral sweep events followed by
neutral divergence and constrained recombination among populations.
Specialized metabolites as functional traits
Since representative BGCs for each of the nine GCFs have been
experimentally characterized (Figure S8) (3038), we next sought to
understand how the observed evolutionary dynamics contributing to BGC
diversification affected, if at all, production of the compound. By applying
untargeted liquid chromatography, high-resolution tandem mass
spectrometry (LC-MS/MS) to 30 representative strains across the nine
Salinispora species, we first detected a total of 3575 unique molecular
features from cultured extracts. The total number of metabolites, or the
metabolome, revealed that strains within the same species produced more
similar molecular features than strains between species (Figure 3A;
PERMANOVA; p<0.01), with 44.9% of the variation explained by species
designation. These results, in combination with the GCF composition
(Figure 1D), provide clear evidence that vertical descent plays a major role
in structuring Salinispora specialized metabolism.
At some level, the genetic diversity observed within GCFs should translate
into structural differences in the compounds. Considering the nine BGCs
analyzed above, we noted a range of genetic differences from single
nucleotide polymorphisms (SNPs) to large variations in gene content. To
address how these genetic changes may affect compound production, we
applied targeted LC-MS/MS from the culture extracts to detect 25 known
compounds and many putative analogs encoded by the BGCs (Table S3).
The presence of a BGC did not always equate to product detection (i.e.,
salinichelins and salinipostins were not detected), suggesting the culture
or extraction methods may not have been appropriate. However, the
products of seven BGCs were detected, with strains from the same species
preferentially producing similar compounds and their known associated
analogs (Figure 3B; PERMANOVA, p<0.01). These results indicate that
BGC diversification at the species level translated to finer species-specific
signatures in terms of analog production.
Importantly, the genetic differences associated with BGC diversification
varied in their impact on metabolite production. At the genetic level, the
relatively high intraspecies nucleotide diversity observed within the rif BGC
in S. arenicola (Figure S7A) did not reflect differences in rifamycins
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2020.12.19.423547doi: bioRxiv preprint

5
detected (Figure 3B). This diversity, which was considered neutral in terms
of dN/dS values, does not affect compound production as all strains
maintain production of the potent antibiotic rifamycin S in S. arenicola.
Conversely, subtle gene differences in auxiliary enzymes in the sta BGC
affected compound production. While three of the four species with this
BGC produced detectable staurosporine, S. mooreana preferentially
produced 7-hydroxystaurosporine (Figure 3B). Comparative genomics
revealed that the NAD-dependent dehydratase enzyme is missing in S.
mooreana strains with the sta BGC (Figure S9A), which likely accounts for
the presence of the hydroxy group in 7-hydroxystaurosporine produced by
this species (Figure S8). More pronounced interspecies polymorphisms
were observed in the spo BGC between S. tropica and the subset of S.
pacifica (3 of 23 strains) that possess the BGC. While all strains maintain
the type I polyketide synthase (PKS) responsible for the polyketide core
(38), the three S. pacifica strains lack the 45 kbp nonribosomal peptide
synthetase (NRPS) region responsible for biosynthesis of the
cyclohexenone epoxide subunit (Figure S9B). Correspondingly, the S.
pacifica strains did not produce sporolides (Figure 3B) or any other
derivatives that could be identified. Interestingly, the other 20 S. pacifica
strains that lack the spo BGC encode a similar enediyne BGC linked to
cyanosporaside production (19), suggesting the products may perform
similar ecological functions. Together, these results directly link BGC
diversification to structural changes in the encoded metabolites.
Salinosporamides: a case study for BGC evolution
To further illustrate the evolutionary processes driving chemical
diversification, we examined the sal BGC, which encodes the biosynthesis
of the anti-cancer agent salinosporamide A and analogs (30, 39). While
salinosporamides A and K were originally reported from S. tropica (40) and
S. pacifica (39, 41), respectively, we now show that the sal BGC is
observed in six of the nine Salinispora species (Figure 4A). Phylogenetic
analysis provides evidence that the sal BGC was recently transferred
5.4±2.6 MYA between S. arenicola and S. tropica (Figure 4A) but has
otherwise descended vertically within the genus for >50 MYA (Figure S6A).
Notably, the sal BGC is rapidly diverging between species (note deviation
of the sal BGC compared to core genome in Figure 2), while at the same
time being highly conserved within species (Figure S10). Thus, the sal
BGC provides a useful model to address the relationships between species
diversification, BGC composition, and compound production.
We performed targeted metabolomics on 16 strains encoding the sal BGC
across the six Salinispora species (Table S3). All strains maintain the
biosynthetic genes responsible for the core γ-lactam-β-lactone ring
(salABCDEF) and the cyclohexenylalanine amino acid residue, although
the position of the latter varied (Figure 4B). While 15/16 strains produced
detectable amounts of salinosporamides, the analogs and their yields
.CC-BY-NC-ND 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 2, 2021. ; https://doi.org/10.1101/2020.12.19.423547doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: The authors analyzed ~170,000 bacterial genomes and ~47,000 metagenome assembled genomes using a modified BiG-SLiCE and the new clust-o-matic algorithm.
Abstract: Bacterial specialized metabolites are a proven source of antibiotics and cancer therapies, but whether we have sampled all the secondary metabolite chemical diversity of cultivated bacteria is not known. We analysed ~170,000 bacterial genomes and ~47,000 metagenome assembled genomes (MAGs) using a modified BiG-SLiCE and the new clust-o-matic algorithm. We estimate that only 3% of the natural products potentially encoded in bacterial genomes have been experimentally characterized. We show that the variation in secondary metabolite biosynthetic diversity drops significantly at the genus level, identifying it as an appropriate taxonomic rank for comparison. Equal comparison of genera based on relative evolutionary distance revealed that Streptomyces bacteria encode the largest biosynthetic diversity by far, with Amycolatopsis, Kutzneria and Micromonospora also encoding substantial diversity. Finally, we find that several less-well-studied taxa, such as Weeksellaceae (Bacteroidota), Myxococcaceae (Myxococcota), Pleurocapsa and Nostocaceae (Cyanobacteria), have potential to produce highly diverse sets of secondary metabolites that warrant further investigation.

60 citations

Journal ArticleDOI
TL;DR: A roadmap to navigate shotgun metagenomic sequencing data and identify new candidate biosynthetic enzymes is provided and an outlook for future directions in the field is provided with an emphasis on meta-omics, single-cell genomics, cell-free expression systems, and sequence-independent methods.

51 citations

Journal ArticleDOI
TL;DR: This review covers literature between 2003-2021 and highlights examples where Big Data and evolutionary analyses have been combined to provide bioinformatic resources and tools for the discovery of novel natural products and their biosynthetic enzymes.

21 citations

Posted ContentDOI
11 Aug 2021-bioRxiv
TL;DR: In this article, the authors surveyed around 170,000 bacterial genomes as well as several thousands of Metagenome Assembled Genomes (MAGs) for their diversity in Biosynthetic Gene Clusters (BGCs) known to encode the biosynthetic machinery for producing secondary metabolites.
Abstract: Bacterial secondary metabolites have been studied for decades for their usefulness as drugs, such as antibiotics. However, the identification of new structures has been decelerating, in part due to rediscovery of known compounds. Meanwhile, multi-resistant pathogens continue to emerge, urging the need for new antibiotics. It is unclear how much chemical diversity exists in Nature and whether discovery efforts should be focused on established antibiotic producers or rather on understudied taxa. Here, we surveyed around 170,000 bacterial genomes as well as several thousands of Metagenome Assembled Genomes (MAGs) for their diversity in Biosynthetic Gene Clusters (BGCs) known to encode the biosynthetic machinery for producing secondary metabolites. We used two distinct algorithms to provide a global overview of the biosynthetic diversity present in the sequenced part of the bacterial kingdom. Our results indicate that only 3% of genomic potential for natural products has been experimentally discovered. We connect the emergence of most biosynthetic diversity in evolutionary history close to the taxonomic rank of genus. Despite enormous differences in potential among taxa, we identify Streptomyces as by far the most biosynthetically diverse based on currently available data. Simultaneously, our analysis highlights multiple promising high-producing taxas that have thus far escaped investigation.

8 citations

References
More filters
Journal ArticleDOI
TL;DR: This work presents some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees.
Abstract: Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting postanalyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU

23,838 citations

Journal ArticleDOI
TL;DR: The Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine and has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses.
Abstract: The Molecular Evolutionary Genetics Analysis (Mega) software implements many analytical methods and tools for phylogenomics and phylomedicine. Here, we report a transformation of Mega to enable cross-platform use on Microsoft Windows and Linux operating systems. Mega X does not require virtualization or emulation software and provides a uniform user experience across platforms. Mega X has additionally been upgraded to use multiple computing cores for many molecular evolutionary analyses. Mega X is available in two interfaces (graphical and command line) and can be downloaded from www.megasoftware.net free of charge.

21,952 citations

Journal ArticleDOI
TL;DR: A new program called Clustal Omega is described, which can align virtually any number of protein sequences quickly and that delivers accurate alignments, and which outperforms other packages in terms of execution time and quality.
Abstract: Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

12,489 citations

Journal ArticleDOI
TL;DR: Which elements of this often-quoted strategy for graphical representation of multivariate (multi-species) abundance data have proved most useful in practical assessment of community change resulting from pollution impact are identified.
Abstract: In the early 1980s, a strategy for graphical representation of multivariate (multi-species) abundance data was introduced into marine ecology by, among others, Field, et al. (1982). A decade on, it is instructive to: (i) identify which elements of this often-quoted strategy have proved most useful in practical assessment of community change resulting from pollution impact; and (ii) ask to what extent evolution of techniques in the intervening years has added self-consistency and comprehensiveness to the approach. The pivotal concept has proved to be that of a biologically-relevant definition of similarity of two samples, and its utilization mainly in simple rank form, for example ‘sample A is more similar to sample B than it is to sample C’. Statistical assumptions about the data are thus minimized and the resulting non-parametric techniques will be of very general applicability. From such a starting point, a unified framework needs to encompass: (i) the display of community patterns through clustering and ordination of samples; (ii) identification of species principally responsible for determining sample groupings; (iii) statistical tests for differences in space and time (multivariate analogues of analysis of variance, based on rank similarities); and (iv) the linking of community differences to patterns in the physical and chemical environment (the latter also dictated by rank similarities between samples). Techniques are described that bring such a framework into place, and areas in which problems remain are identified. Accumulated practical experience with these methods is discussed, in particular applications to marine benthos, and it is concluded that they have much to offer practitioners of environmental impact studies on communities.

12,446 citations

Journal ArticleDOI
TL;DR: Prokka is introduced, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer, and produces standards-compliant output files for further analysis or viewing in genome browsers.
Abstract: UNLABELLED: The multiplex capability and high yield of current day DNA-sequencing instruments has made bacterial whole genome sequencing a routine affair. The subsequent de novo assembly of reads into contigs has been well addressed. The final step of annotating all relevant genomic features on those contigs can be achieved slowly using existing web- and email-based systems, but these are not applicable for sensitive data or integrating into computational pipelines. Here we introduce Prokka, a command line software tool to fully annotate a draft bacterial genome in about 10 min on a typical desktop computer. It produces standards-compliant output files for further analysis or viewing in genome browsers. AVAILABILITY AND IMPLEMENTATION: Prokka is implemented in Perl and is freely available under an open source GPLv2 license from http://vicbioinformatics.com/.

10,432 citations

Frequently Asked Questions (8)
Q1. What are the key functional traits driving biotic interactions in Salinispora?

Specialized metabolites, which include molecules to known as act as antibiotics and siderophores, can drive biotic interactions via mechanisms such as competition, nutrient uptake, and defense. 

Here, the authors examine the evolutionary dynamics governing the distribution of natural product biosynthetic gene clusters ( BGCs ) using 118 strains within the marine actinomycete genus Salinispora. 

By applying untargeted liquid chromatography, high-resolution tandem mass spectrometry (LC-MS/MS) to 30 representative strains across the nine Salinispora species, the authors first detected a total of 3575 unique molecular features from cultured extracts. 

To address this, the authors identified a total of 3041 complete or fragmented (on contig edges) BGCs across all 118 Salinispora genomes (mean = 25.8 per genome) accounting for 18±2.3% of an average 5.6 Mbp Salinispora genome. 

Among bacteria inhabiting marine sediments, actinomycetes such as the genus Salinispora are well-known for the production of specialized metabolites (9–11). 

While horizontal gene transfer (HGT) may play a major role in expanding BGC diversity, the prevailing view that BGCs are rapidly gained and lost (21–24) remains difficult to discern given that genomic signatures (e.g., GC% and tetranucleotide bias) can be lost over time (51). 

As a result, reduced recombination allowed the lym BGC to evolve in accordance with the core genome, while frequent recombination of the spt BGC limited its divergence (Figure 2). 

at the BGC level, high levels of recombination (all r/m>1.5) were restricted to events between closely related strains (νMEAN=3.5%; Table 1). 

Trending Questions (2)
What are the different types of vertical inheritance?

The paper does not provide information about the different types of vertical inheritance.

What are the different types of vertical inheritance?

The paper does not provide information about the different types of vertical inheritance.