scispace - formally typeset
Open AccessPosted ContentDOI

Gene loss and acquisition in lineages of bacteria evolving in a human host environment

Reads0
Chats0
TLDR
The analysis of the genomes of 45 Pseudomonas aeruginosa lineages evolving in the lungs of cystic fibrosis patients is analyzed to identify genes that are lost or acquired during the first years of infection in each of the different lineages, finding that a significant proportion of such genes are associated with virulence.
Abstract
While genome analyses have documented that there are differences in the gene repertoire between evolutionary distant lineages of the same bacterial species, less is known about micro-evolutionary dynamics of gene loss and acquisition within lineages of bacteria as they evolve over the timescale of years. This knowledge is valuable to understand both the basic mutational steps that on long timescales lead to evolutionary distant bacterial lineages, and the evolution of the individual lineages themselves. In the case that lineages evolve in a human host environment, gene loss and acquisition may furthermore have implication for disease. We analyzed the genomes of 45 Pseudomonas aeruginosa lineages evolving in the lungs of cystic fibrosis patients to identify genes that are lost or acquired during the first years of infection in each of the different lineages. On average, the lineage genome content changed with 88 genes (range 0-473). Genes were more often lost than acquired, and prophage genes were more variable than bacterial genes. We identified genes that were lost or acquired independently across different clonal lineages, i.e. convergent molecular evolution. Convergent evolution suggests that there is a selection for loss and acquisition of certain genes in the host environment. We find that a significant proportion of such genes are associated with virulence; a trait previously shown to be important for adaptation. Furthermore, we also compared the genomes across lineages to show that within-lineage variable genes more often belonged to genomic content not shared across all lineages. Finally, we used 4,760 genes shared by 446 P. aeruginosa genomes to develop a stable and discriminatory typing scheme for P. aeruginosa clone types (Pactyper, https://github.com/MigleSur/Pactyper). In sum, our analysis adds to the knowledge on the pace and drivers of gene loss and acquisition in bacteria evolving over multiple years in a human host environment and provides a basis to further understand how gene loss and acquisition plays a role in lineage differentiation and host adaptation.

read more

Content maybe subject to copyright    Report

General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright
owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
Users may download and print one copy of any publication from the public portal for the purpose of private study or research.
You may not further distribute the material or use it for any profit-making activity or commercial gain
You may freely distribute the URL identifying the publication in the public portal
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately
and investigate your claim.
Downloaded from orbit.dtu.dk on: Aug 10, 2022
Gene Loss and Acquisition in Lineages of Pseudomonas aeruginosa Evolving in
Cystic Fibrosis Patient Airways
Gabrielaite, Migle; Johansen, Helle K.; Molin, Søren; Nielsen, Finn C.; Marvig, Rasmus L.
Published in:
mBio
Link to article, DOI:
10.1101/2020.02.03.931667
10.1128/mBio.02359-20
Publication date:
2020
Document Version
Publisher's PDF, also known as Version of record
Link back to DTU Orbit
Citation (APA):
Gabrielaite, M., Johansen, H. K., Molin, S., Nielsen, F. C., & Marvig, R. L. (2020). Gene Loss and Acquisition in
Lineages of Pseudomonas aeruginosa Evolving in Cystic Fibrosis Patient Airways. mBio, 11(5), [e02359-20].
https://doi.org/10.1101/2020.02.03.931667, https://doi.org/10.1128/mBio.02359-20

Gene Loss and Acquisition in Lineages of Pseudomonas
aeruginosa Evolving in Cystic Fibrosis Patient Airways
Migle Gabrielaite,
a
Helle K. Johansen,
b,c
Søren Molin,
d
Finn C. Nielsen,
a
Rasmus L. Marvig
a
a
Center for Genomic Medicine, Rigshospitalet, Copenhagen, Denmark
b
Department of Clinical Microbiology, Rigshospitalet, Copenhagen, Denmark
c
Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
d
The Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, Lyngby, Denmark
ABSTRACT Genome analyses have documented that there are differences in gene
repertoire between evolutionary distant lineages of the same bacterial species; how-
ever, less is known about microevolutionary dynamics of gene loss and acquisition
within bacterial lineages as they evolve over years. Here, we analyzed the genomes
of 45 Pseudomonas aeruginosa lineages evolving in the lungs of cystic fibrosis (CF)
patients to identify genes that are lost or acquired during the first years of infection.
On average, lineage genome content changed by 88 genes (range, 0 to 473). Genes
were more often lost than acquired, and prophage genes were more variable than
bacterial genes. We identified convergent loss or acquisition of the same genes
across lineages, suggesting selection for loss and acquisition of certain genes in the
host environment. We found that a notable proportion of such genes are associated
with virulence; a trait previously shown to be important for adaptation. Furthermore,
we also compared the genomes across lineages to show that the within-lineage vari-
able genes (i.e., genes that had been lost or acquired during the infection) often be-
longed to genomic content not shared across all lineages. In sum, our analysis adds
to the knowledge on the pace and drivers of gene loss and acquisition in bacteria
evolving over years in a human host environment and provides a basis to further
understand how gene loss and acquisition play roles in lineage differentiation and
host adaptation.
IMPORTANCE Bacterial airway infections, predominantly caused by P. aeruginosa,
are a major cause of mortality and morbidity of CF patients. While short insertions
and deletions as well as point mutations occurring during infection are well studied,
there is a lack of understanding of how gene loss and acquisition play roles in bac-
terial adaptation to the human airways. Here, we investigated P. aeruginosa within-
host evolution with regard to gene loss and acquisition. We show that during long-
term infection P. aeruginosa genomes tend to lose genes, in particular, genes related
to virulence. This adaptive strategy allows reduction of the genome size and evasion
of the host’s immune response. This knowledge is crucial to understand the basic
mutational steps that, on the timescale of years, diversify lineages and adds to the
identification of bacterial genetic determinants that have implications for CF disease.
KEYWORDS Pseudomonas aeruginosa, computational biology, evolution, genomics,
host-pathogen interactions
G
ene acquisition and gene loss are prominent in bacterial evolution and are also
crucial during adaptation to new environments (
1, 2). In contrast to point muta-
tions,
small insertions and deletions (microindels), inversions, and translocations that
gradually alter existing genomic content, the acquisition or loss of entire genes rapidly
confer large changes to the genomic content which alter bacterial phenotypes such as
Citation Gabrielaite M, Johansen HK, Molin S,
Nielsen FC, Marvig RL. 2020. Gene loss and
acquisition in lineages of Pseudomonas
aeruginosa evolving in cystic fibrosis patient
airways. mBio 11:e02359-20.
https://doi.org/10
.1128/mBio.02359-20.
Editor Joanna
B. Goldberg, Emory University
School of Medicine
Copyright © 2020 Gabrielaite et al. This is an
open-access article distributed under the terms
of the
Creative Commons Attribution 4.0
International license.
Address
correspondence to Migle Gabrielaite,
migle.gabrielaite@regionh.dk, or Rasmus L.
Marvig, rasmus.lykke.marvig@regionh.dk.
Received 20 August 2020
Accepted 22 September 2020
Published
RESEARCH ARTICLE
Host-Microbe Biology
crossm
September/October 2020 Volume 11 Issue 5 e02359-20
®
mbio.asm.org 1
27 October 2020
on December 17, 2020 at TECH KNOWLEDGE CTR OF DENMARKhttp://mbio.asm.org/Downloaded from

virulence, antibiotic resistance, and metabolic capability (3, 4). Thus, genome-wide
analysis of the gene presence or absence is necessary to better understand bacterial
evolution and adaptation (
5).
While
genome comparison of evolutionarily distant lineages of the same bacterial
species gives insight into gene flux over the macroevolutionary scale, there is less
knowledge of the pace at which and mechanisms by which genes are lost and acquired
at the scale of microevolution, i.e., from studies of evolution of individual bacterial
lineages (
6, 7). Additionally, we have only a limited understanding of how lineage gene
loss
and acquisition are driven by selective versus genetic drift (
1, 2).
Evolutionary
studies on individual bacterial lineages are dependent on the ability to
obtain multiple samples of the same lineage, which can be difficult in natural, in vivo
environments that constantly change (
8, 9), so studies are more easily performed in
vitro (10–14). However,
Pseudomonas
aeruginosa infections in cystic fibrosis (CF) pa-
tients represent an infectious disease scenario in which the genomic evolution of
individual bacterial lineages can be followed over the years and thus give an oppor-
tunity to research bacterial evolution and adaptation in vivo in the human host (
15, 16).
There
is already a large pool of knowledge on the role of point mutations and
microindels in evolution and adaptation of P. aeruginosa in CF patients, whereas gene
loss and acquisition have been less extensively investigated (
17–19). A better under-
standing of the genetic changes responsible for P. aeruginosa pathogenicity in CF
patients is crucial to improve CF treatment strategies (
20–22).
To
better understand the role of gene loss and acquisition in within-host evolution
and adaptation, we used genomic data from 474 longitudinally collected isolates of P.
aeruginosa from children and young CF patients to investigate gene loss and acquisi-
tion in lineages of P. aeruginosa as they evolve from the initial invasion of CF airways
and onward as they adapt to the human host. In total, 34 patients and 45 different
clonal lineages were analyzed, and we aimed to identify gene loss or acquisition events
in each of the different lineages to detect patterns across lineages ultimately leading to
a better understanding of the genetic basis of bacterial adaptation in the human host.
RESULTS
De novo genome
assembly and gene annotation. We previously generated
short-read sequencing data for the genomes of 474 isolates of P. aeruginosa sampled
from the airways of 34 young CF patients to follow the genomic evolution of bacterial
lineages within the host airways over the initial 0 to 9 years of infection (
18). While the
previous analysis aligned sequence reads to a P. aeruginosa reference genome to
identify single nucleotide polymorphisms (SNPs) and small insertions and deletions
(indels), we here used the same sequencing reads for de novo assembly of genomes to
identify genes that are either lost or acquired during the course of infection.
We successfully de novo assembled the genomes of 446 isolates into 500 scaffolds
or fewer (median, 172 scaffolds). The sizes of the assembled genomes ranged from
6,032,338 to 7,593,423 nucleotides (nt), and they contained 5,462 to 7,111 genes. The
446 assembled genomes represented 51 clone types as defined previously by Marvig et
al. (2015) (
18) (see Fig. S1 in the supplemental material). We grouped the isolates into
45 lineages; i.e., isolates of the same clone type and from the same patient were
grouped together to allow identification of within-host accumulated gene differences
(
Fig. 1). In total, the 45 lineages encompassed 423 isolates distributed among 34
patients as 9 patients were infected with two (n 7) or more (n 2) clone types where
multiple isolates were available (Fig. S1). The remaining 23 isolates with successful
genome assembly were excluded from the analysis as there were no other clonal
genomes available for the respective patients (n 22) or the patient was infected
multiple times with the same clone type and no other clonal genomes were available
for that lineage (n 1); i.e., at least two genomes were required for intralineage
genome comparison.
Pan-genomes and identification of gene presence-absence. We analyzed 423
genomes in a two-step process to identify genes that showed variation within or
Gabrielaite et al.
®
September/October 2020 Volume 11 Issue 5 e02359-20 mbio.asm.org 2
on December 17, 2020 at TECH KNOWLEDGE CTR OF DENMARKhttp://mbio.asm.org/Downloaded from

between lineages. First, we compared the genomes of isolates of the same lineage to
determine the full set of nonredundant genes found within the lineage, i.e., the lineage
pan-genome. The lineage pan-genome consisted of (i) genes present in all isolates of
the respective lineage (lineage core genome) and (ii) genes present in only some of the
lineage isolates (lineage variable genes), i.e., genes that had been lost or acquired
during the infection, referred to here as variable genes (
Fig. 1). The lineage pan-
genomes
consisted of 5,607 to 7,008 genes longer than 150 bp, of which 0 to 473 were
variable genes (median, 44 variable genes). A weak positive correlation (Pearson’s
correlation coefficient 0.15, P value 2.5 10
3
) was identified between the assembly
quality (number of scaffolds) and the number of absent genes (Fig. S2A) which did not
explain the observed variability in gene content. Furthermore, by aligning the raw
sequencing reads to the pan-genomes of the corresponding lineages, we determined
that only 52 of 13,246 genes (0.4%) were incorrectly identified as absent by GenAPI
because of a lack of assemblies. These genes were treated as present in all further
analyses.
Second, we compared the lineage pan-genomes to determine the full set of 14,462
nonredundant genes found across all lineages, i.e., the aggregated pan-genome (
Fig. 1;
see
also
Fig. 2). The aggregated pan-genome consisted of 4,887 genes shared across all
lineage
pan-genomes (aggregated core genome) and an aggregated accessory ge-
nome of 9,575 genes (genes present in only one or some lineage pan-genomes) (
Fig. 1;
see
also
Fig. 2). About half (4,932) of the aggregated accessory genes were unique for
single
lineages, and, overall, the lineage pan-genomes contained 0 to 540 (median, 78)
of such lineage-specific genes (see Table S1 in the supplemental material). Furthermore,
we found that all 335 genes reported to be essential genes in PAO1 and UCBPP-PA14
(
23) were in the aggregated core genome; 29 of these genes were not present in one
or
more P. aeruginosa isolate genomes (Table S2).
Aggregated accessory genes were 15-fold more often variable within lineages than
genes in the aggregated core genome (Table S1). While several factors might drive the
FIG 1 Schematic visualization of how bacterial lineages, lineage pan-genomes, within-lineage variable
genes, and aggregated pan-genomes, core genomes, and accessory genomes were defined in this study.
Evolution of P. aeruginosa in Human Airways
®
September/October 2020 Volume 11 Issue 5 e02359-20 mbio.asm.org 3
on December 17, 2020 at TECH KNOWLEDGE CTR OF DENMARKhttp://mbio.asm.org/Downloaded from

higher turnover of aggregated accessory genes, one explanatory factor could be that
the aggregated accessory genome has a larger amount of mobile genetic elements,
such as prophage origin sequences. Therefore, we used the ACLAME database to
identify and annotate phage and prophage sequences (longer than 150 bp) in the core
and accessory genomes of the aggregated pan-genome, respectively. The accessory
genome contained 116-fold more prophage genes, and these genes were highly
variable over the course of infection; 58% of the prophage sequences in the accessory
genome of the aggregated pan-genome were variable within lineages.
Changes in gene content in lineages over the course of infection. Next, we
asked if the variable genes were either lost from or acquired in bacterial lineages. For
this, we defined a gene as lost when it was present in the first isolate but absent in one
or more of the later isolates and defined a gene as acquired when it was absent in the
first isolate but present in one or more of the later isolates. Note that this definition of
gene loss/acquisition might not be accurate as the first isolates might not represent the
most recent common ancestor for the lineage. We found that the variable genes were
more often lost. Of 3,955 variable genes, 3,411 were present in the first isolates and
absent in the later ones, and the opposite was true for only 544 genes. Accordingly, we
concluded that gene loss occurs at least 6 times more often than gene acquisition
(Table S3).
Prophage sequences and plasmids are known to be mobile elements in bacterial
genomes. Prophage genes were found in all 45 lineages by using the ACLAME
database. Prophage genes were among the variable genes in 22 of the lineages, and
the prophage genes were lost in 70% of cases (Table S3); i.e., they were present in the
early isolates and absent in the later ones. In contrast, plasmid genes were not
identified to be lost or acquired in any lineage (the PlasmidFinder database was used
to define plasmid genes). In total, three lineages (P41M3-DK19, P92F3-DK26, and
P72F4-DK19) carried a plasmid belonging to the replicon IncQ2_1.
FIG 2 Presence or absence of 14,462 aggregated pan-genome genes in 45 lineages evolving in cystic fibrosis patients. Blue denotes that gene is present in
all isolates of the lineage. Red denotes that the gene shows variable presence within the lineage. White denotes that the gene is not present in any of the
isolates in the lineage.
Gabrielaite et al.
®
September/October 2020 Volume 11 Issue 5 e02359-20 mbio.asm.org 4
on December 17, 2020 at TECH KNOWLEDGE CTR OF DENMARKhttp://mbio.asm.org/Downloaded from

Citations
More filters

SPAdes, a new genome assembly algorithm and its applications to single-cell sequencing ( 7th Annual SFAF Meeting, 2012)

Glenn Tesler
TL;DR: SPAdes as mentioned in this paper is a new assembler for both single-cell and standard (multicell) assembly, and demonstrate that it improves on the recently released E+V-SC assembler and on popular assemblers Velvet and SoapDeNovo (for multicell data).
Journal ArticleDOI

Pseudomonas aeruginosa adaptation and evolution in patients with cystic fibrosis.

TL;DR: The adaptive and evolutionary trajectories that lead to early diversification and late convergence, which enable P. aeruginosa to succeed in this niche are discussed, and it is pointed out how knowledge of these biological features may be used to guide diagnosis and therapy.
Posted ContentDOI

Transmission and genetic adaptation of Achromobacter in cystic fibrosis

TL;DR: It is found that whole-genome sequencing (WGS) is essential for Achromobacter species typing and patient-to-patient transmission identification which was identified in A. xylosoxidans and the development of antibiotic resistance is associated with chronic AchromOBacter infections.
Posted ContentDOI

Achromobacter genetic adaptation in cystic fibrosis

TL;DR: Achromobacter is an emerging pathogen in patients with cystic fibrosis (CF) and Achromobacter caused infections are associated with more severe disease outcomes and high intrinsic antibiotic resistance as mentioned in this paper.
Posted ContentDOI

Genetic adaptation and transmission of Achromobacter in cystic fibrosis

TL;DR: Findings on evolution and genetic adaptation can facilitate the understanding of disease progression, inform antibiotic treatment, and identify patient-to-patient transmission in Achromobacter infections and show the relevance of whole genome sequencing of clinical isolates.
References
More filters
Journal Article

R: A language and environment for statistical computing.

R Core Team
- 01 Jan 2014 - 
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Journal ArticleDOI

Fast gapped-read alignment with Bowtie 2

TL;DR: Bowtie 2 combines the strengths of the full-text minute index with the flexibility and speed of hardware-accelerated dynamic programming algorithms to achieve a combination of high speed, sensitivity and accuracy.
Journal ArticleDOI

RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies.

TL;DR: This work presents some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees.
Journal ArticleDOI

BLAST+: architecture and applications.

TL;DR: The new BLAST command-line applications, compared to the current BLAST tools, demonstrate substantial speed improvements for long queries as well as chromosome length database sequences.
Related Papers (5)
Frequently Asked Questions (2)
Q1. What contributions have the authors mentioned in the paper "Gene loss and acquisition in lineages of pseudomonas aeruginosa evolving in cystic fibrosis patient airways" ?

Here, the authors analyzed the genomes of 45 Pseudomonas aeruginosa lineages evolving in the lungs of cystic fibrosis ( CF ) patients to identify genes that are lost or acquired during the first years of infection. The authors found that a notable proportion of such genes are associated with virulence ; a trait previously shown to be important for adaptation. Furthermore, the authors also compared the genomes across lineages to show that the within-lineage variable genes ( i. e., genes that had been lost or acquired during the infection ) often belonged to genomic content not shared across all lineages. In sum, their analysis adds to the knowledge on the pace and drivers of gene loss and acquisition in bacteria evolving over years in a human host environment and provides a basis to further understand how gene loss and acquisition play roles in lineage differentiation and host adaptation. While short insertions and deletions as well as point mutations occurring during infection are well studied, there is a lack of understanding of how gene loss and acquisition play roles in bacterial adaptation to the human airways. Here, the authors investigated P. aeruginosa withinhost evolution with regard to gene loss and acquisition. The authors show that during longterm infection P. aeruginosa genomes tend to lose genes, in particular, genes related to virulence. The authors identified convergent loss or acquisition of the same genes across lineages, suggesting selection for loss and acquisition of certain genes in the host environment. 

Future studies based on long-read sequencing may overcome the issue of incomplete assemblies. The relative low turnover in the aggregated core genome of 4,887 genes shared by all lineages suggests that, while these genes are not essential per se, they may be generally important for survival under the conditions met by P. aeruginosa in the human host environment. This may be counterintuitive if loss of virulence is beneficial for bacteria in chronic infections ; nonetheless, the authors recognize that virulence factors may be downregulated rather than deleted as suggested previously by Rau et al. ( 2010 ) ( 44 ). Their analysis adds to the knowledge of how prevalent loss or acquisition of genes is within bacteria evolving in the human host environment and provides a basis to further understand how gene loss and acquisition play a role in host adaptation.