scispace - formally typeset
Search or ask a question
Posted ContentDOI

Large-scale computational discovery and analysis of virus-derived microbial nanocompartments

18 Mar 2021-bioRxiv (Cold Spring Harbor Laboratory)-
TL;DR: In this article, the authors developed an integrated search strategy to carry out a large-scale computational analysis of prokaryotic genomes with the goal of discovering an exhaustive and curated set of all HK97-fold encapsulin-like systems.
Abstract: Protein compartments represent an important strategy for subcellular spatial control and compartmentalization. Encapsulins are a class of microbial protein compartments defined by the viral HK97-fold of their capsid protein, self-assembly into icosahedral shells, and dedicated cargo loading mechanism for sequestering specific enzymes. Encapsulins are often misannotated and traditional sequence-based searches yield many false positive hits in the form of phage capsids. This has hampered progress in understanding the distribution and functional diversity of encapsulins. Here, we develop an integrated search strategy to carry out a large-scale computational analysis of prokaryotic genomes with the goal of discovering an exhaustive and curated set of all HK97-fold encapsulin-like systems. We report the discovery and analysis of over 6,000 encapsulin-like systems in 31 bacterial and 4 archaeal phyla, including two novel encapsulin families as well as many new operon types that fall within the two already known families. We formulate hypotheses about the biological functions and biomedical relevance of newly identified operons which range from natural product biosynthesis and stress resistance to carbon metabolism and anaerobic hydrogen production. We conduct an evolutionary analysis of encapsulins and related HK97-type virus families and show that they share a common ancestor. We conclude that encapsulins likely evolved from HK97-type bacteriophages. Our study sheds new light on the evolutionary interplay of viruses and cellular organisms, the recruitment of protein folds for novel functions, and the functional diversity of microbial protein organelles.

Summary (3 min read)

Introduction

  • In fact, biological entities like cells and viruses only exist because of the presence of a barrier that separates their interior from the environment.
  • Distinguishing features between eukaryotic lipid-based and prokaryotic protein-based organelles include their size range -micro vs. nano scale -and the fact that protein organelle structure is genetically encoded and thus generally more defined.
  • Still, compartmentalization, however it is achieved, can ultimately serve four distinct functions, namely, the creation of distinct reaction spaces and environments, storage, transport, and regulation.
  • They are proposed to be involved in oxidative stress resistance, 9, [11] [12] [13] iron mineralization and storage, 14, 15 anaerobic ammonium oxidation, 16 and sulfur metabolism.
  • The authors report the discovery and analysis of two novel encapsulin families (Family 3 and Family 4) as well as many new operon types that fall within Family 1 and Family 2.

Results and Discussion

  • Distribution, diversity, and classification of encapsulin systems found in prokaryotes.
  • It was discovered that all Pfam families associated with initial search hits belong to a single Pfam clan (CL0373) 19 encompassing the majority of HK97-fold proteins catalogued in the Pfam database.
  • Based on the sequence similarity and Pfam family membership of identified capsid proteins, and the genome-neighborhood composition of associated operons, encapsulin-like systems could be classified into 4 distinct families (Fig. 2 ).
  • The majority of systems can be found in the phyla Actinobacteria and Proteobacteria followed by Bacteroidetes and Cyanobacteria.
  • Family 2 operon organization is more complex compared to Family 1 due to the variable presence of a cNMPbinding domain (PF00027) fused to the encapsulin capsid component as well as the variable occurrence of two distinct capsid components within a single Family 2 operon.

Family 1 -Classical Encapsulins

  • The authors dataset of 2,383 Family 1 systems greatly expands the set of the previously described 932 Classical Encapsulins (Fig. 3 ).
  • Their function within the context of DyP encapsulin operons is currently unknown (Fig. S1A ).
  • 14, 30 Unlike ferritin cages with higher symmetries, Flp cargo proteins cannot store precipitated iron in a soluble form by themselves and rely on the encapsulin shell to achieve iron precipitate sequestration.
  • 9 Hemerythrin cargos have further been shown to offer oxidative and nitrosative stress protection when encapsulated.
  • In these systems, an Flp domain is Nterminally fused to the encapsulin capsid protein.

Family 2

  • Family 2 encapsulins are the most abundant class of encapsulins identified in this study and can be broadly grouped into two structurally distinct variants: Family 2A and Family 2B.
  • Instead, they possess an extended N-arm with a short N-terminal α-helix (N-helix), more characteristic of the canonical HK97 fold found in bacteriophages.
  • The putative cNMP-binding domains in Family 2B encapsulins are also highly variable, sharing only 19% pairwise identity between all identified domains.
  • 38 Sequestering a CD inside a protein shell might ensure that only a specific co-regulated rhodanese able to interact with the encapsulin capsid exterior can act as the sulfur acceptor thus making sure that sulfur is channeled to a specific subset of metabolic targets.
  • Family 2-associated TCs can be classified into two groups: 2-MIBS-like cyclases, and geosmin synthase (GS)-like cyclases (Fig. S6 ).

Family 3 -Natural Product Encapsulins

  • Classes were named based on the most prominent genera encoding a given class.
  • Family 3 BGCs encode diverse components but commonly found genes include sulfotransferases, short-chain dehydrogenases (SDRs), polyketide synthases (PKSs), non-ribosomal peptide synthetases , and amino-group carrier proteins .
  • Further, a subset of identified A-domain encapsulins, including all bacterial representatives, did not have any clearly associated enzymatic components and are thus referred to as Orphan/Unknown.
  • Studies indicate that the biological reductant of OsmC is dihydrolipoamide and not one of the more common cellular reducing agents like thioredoxin or glutathione. [61] [62] [63].
  • It is likely that A-domain Encapsulins fulfill a structural function in analogy to all other known HK97-fold proteins and that they retain the ability to selfassemble into higher order structures.

Conclusion

  • The curated set of encapsulin-like systems discovered and analyzed here, sheds light on the true functional diversity of microbial protein compartments.
  • Proposed encapsulin functions include roles as reaction spaces for various anabolic (Family 2 and 3) and catabolic (Family 2) processes, storage compartments (Family 1), enzyme regulatory systems (Family 2 and 4) as well as chaperones (Family 4).
  • Encapsulins are found in aerobic and anaerobic microbes that occupy nearly all terrestrial and aquatic habitats as well as host-associated niches.
  • It is possible that other viral capsid protein folds may also have undergone a similar recruitment process and now serve specific host metabolic functions.
  • This idea is supported by the recent description of the involvement of the retrovirus-like capsid protein Arc in inter-neuron nucleic acid transport.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

1
Large-scale computational discovery and analysis of virus-derived 1
microbial nanocompartments 2
Michael P. Andreas and Tobias W. Giessen* 3
Department of Biomedical Engineering, University of Michigan Medical School, Ann Arbor, MI, USA 4
Department of Biological Chemistry, University of Michigan Medical School, Ann Arbor, MI, USA 5
*correspondence: tgiessen@umich.edu 6
7
8
9
10
11
12
13
14
15
16
17
18
19
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 18, 2021. ; https://doi.org/10.1101/2021.03.18.436031doi: bioRxiv preprint

2
Abstract 20
Protein compartments represent an important strategy for subcellular spatial control and 21
compartmentalization. Encapsulins are a class of microbial protein compartments defined by the viral 22
HK97-fold of their capsid protein, self-assembly into icosahedral shells, and dedicated cargo loading 23
mechanism for sequestering specific enzymes. Encapsulins are often misannotated and traditional 24
sequence-based searches yield many false positive hits in the form of phage capsids. This has hampered 25
progress in understanding the distribution and functional diversity of encapsulins. Here, we develop an 26
integrated search strategy to carry out a large-scale computational analysis of prokaryotic genomes with 27
the goal of discovering an exhaustive and curated set of all HK97-fold encapsulin-like systems. We report 28
the discovery and analysis of over 6,000 encapsulin-like systems in 31 bacterial and 4 archaeal phyla, 29
including two novel encapsulin families as well as many new operon types that fall within the two 30
already known families. We formulate hypotheses about the biological functions and biomedical 31
relevance of newly identified operons which range from natural product biosynthesis and stress 32
resistance to carbon metabolism and anaerobic hydrogen production. We conduct an evolutionary 33
analysis of encapsulins and related HK97-type virus families and show that they share a common 34
ancestor. We conclude that encapsulins likely evolved from HK97-type bacteriophages. Our study sheds 35
new light on the evolutionary interplay of viruses and cellular organisms, the recruitment of protein 36
folds for novel functions, and the functional diversity of microbial protein organelles. 37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 18, 2021. ; https://doi.org/10.1101/2021.03.18.436031doi: bioRxiv preprint

3
Introduction 53
Spatial compartmentalization is a ubiquitous feature of biological systems.
1
In fact, biological entities 54
like cells and viruses only exist because of the presence of a barrier that separates their interior from the 55
environment. This concept of creating distinct spaces separate from their surroundings extends further 56
to intracellular organization with many layers of sub-compartmentalization found within most cells.
2,3
57
Intracellular compartments with a proteomically defined interior and a discrete boundary that fulfill 58
distinct biochemical or physiological functions are generally referred to as organelles.
4
This includes both 59
lipid-bound organelles, phase-separated structures, and protein-based compartments. Distinguishing 60
features between eukaryotic lipid-based and prokaryotic protein-based organelles include their size 61
range micro vs. nano scaleand the fact that protein organelle structure is genetically encoded and 62
thus generally more defined. Still, compartmentalization, however it is achieved, can ultimately serve 63
four distinct functions, namely, the creation of distinct reaction spaces and environments, storage, 64
transport, and regulation.
4
Often, compartmentalization can serve multiple of these functions at the 65
same time. More specifically, the functions of intracellular compartments include sequestering toxic 66
reactions and metabolites, creating distinct biochemical environments to stimulate enzyme or pathway 67
activity, and dynamically storing nutrients for later use, among many others.
4
68
One of the most widespread and diverse classes of protein-based compartments are encapsulin 69
nanocompartments, or simply encapsulins.
5-7
So far, two families of encapsulins have been reported in a 70
variety of bacterial and archaeal phyla.
8-10
They are proposed to be involved in oxidative stress 71
resistance,
9,11-13
iron mineralization and storage,
14,15
anaerobic ammonium oxidation,
16
and sulfur 72
metabolism.
8
All known encapsulins self-assemble from a single capsid protein into compartments 73
between 24 and 42 nm in diameter with either T=1, T=3 or T=4 icosahedral symmetry.
10,12,15
Their 74
defining feature is the ability to selectively encapsulate cargo proteins which include ferritin-like 75
proteins, hemerythrins, peroxidases and desulfurases.
8,9
In classical encapsulins (Family 1), 76
encapsulation is mediated by short C-terminal peptide sequences referred to as targeting peptides (TPs) 77
or cargo-loading peptides (CLPs)
10,15,17
while for Family 2 systems, larger N-terminal protein domains are 78
proposed to mediate encapsulation.
8
For most encapsulin systems, little is known about the specific 79
reasons or functional consequences of enzyme encapsulation. Suggestions include the sequestration of 80
toxic or reactive intermediates as well as enhancing enzyme activity and the prevention of unwanted 81
side reactions. One of the most intriguing features of encapsulins is that in contrast to all other known 82
protein-based compartments or organelles, their capsid monomer shares the HK97 phage-like fold.
10,12,15
83
This has led to the suggestion that encapsulins are derived from or in some way connected to the world 84
of phages and viruses.
5,9
85
Here, we carry out a large-scale in-depth computational analysis of prokaryotic genomes with the goal 86
of discovering and classifying an exhaustive set of all HK97-type protein organelle systems. We develop 87
a Hidden Markov Model (HMM)-, Pfam family-, and genome neighborhood analysis (GNA)-based search 88
strategy and substantially expand the number of identified encapsulin-like operons. We report the 89
discovery and analysis of two novel encapsulin families (Family 3 and Family 4) as well as many new 90
operon types that fall within Family 1 and Family 2. We formulate data-driven hypotheses about the 91
potential biological functions of newly identified operons which will guide future experimental studies of 92
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 18, 2021. ; https://doi.org/10.1101/2021.03.18.436031doi: bioRxiv preprint

4
encapsulin-like systems. Further, we conduct a detailed evolutionary analysis of encapsulin-like systems 93
and related HK97-type virus families and show that encapsulins and HK97-type viruses share a common 94
ancestor and that encapsulins likely evolved from HK97-type phages. Our study sheds new light on the 95
evolutionary interplay of viruses and cellular organisms, the recruitment of protein folds for novel 96
functions, and the functional diversity of microbial protein organelles. 97
Results and Discussion 98
Distribution, diversity, and classification of encapsulin systems found in prokaryotes 99
All bacterial and archaeal proteomes available in the UniProtKB
18
database (Family 1, 2, and 4: March 100
2020; Family 3: February 2021) were analyzed for the presence of encapsulin-like proteins using an 101
HMM-based search strategy. It was discovered that all Pfam families associated with initial search hits 102
belong to a single Pfam clan (CL0373)
19
encompassing the majority of HK97-fold proteins catalogued in 103
the Pfam database. Thus, we supplemented our initial hit dataset with all sequences associated with 104
CL0373. This was followed by GNA-based curation
20
of the expanded dataset to remove all false 105
106
Fig. 1. Distribution of encapsulin-like systems in prokaryotes. Left: Phylogenetic tree based on 108 of the major archaeal and 107
bacterial phyla.
21
Phyla containing encapsulin-like systems are highlighted in blue. Differently colored dots indicate the 108
presence of the respective encapsulin family within the phylum. Right: List of phyla discovered to encode encapsulin-like 109
systems. The Count column shows the number of identified systems and the total number of proteomes available in UniProt (# 110
systems identified / # UniProt proteomes). Ca. refers to candidate phyla. Phylum names colored red show new phyla or 111
uncultured/unclassified organisms not shown in the phylogenetic tree. *Ca. Modulibacteria is not an annotated phylum in 112
UniProt but has been proposed as a candidate phylum.
22
113
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 18, 2021. ; https://doi.org/10.1101/2021.03.18.436031doi: bioRxiv preprint

5
positives, primarily phage genomes, resulting in a curated list of 6,133 encapsulin-like proteins (Fig. 1 114
and Supplementary Data 1). Encapsulin-like systems can be found in 31 bacterial and 4 archaeal phyla. 115
Based on the sequence similarity and Pfam family membership of identified capsid proteins, and the 116
genome-neighborhood composition of associated operons, encapsulin-like systems could be classified 117
into 4 distinct families (Fig. 2). Family 1 and 2 represent previously identified encapsulin operon types 118
containing capsid proteins falsely annotated as bacteriocin (PF04454: Linocin_M18) and transcriptional 119
regulator/membrane protein (no Pfam), respectively. Family 1 will be referred to as Classical Encapsulins 120
given the fact that they were the first discovered and are the best characterized. Family 3 and 4 121
represent newly discovered systems. Family 3 encapsulins are falsely annotated as phage major capsid 122
protein (PF05065: Phage_capsid) and are found embedded within large biosynthetic gene clusters 123
(BGCs) encoding different peptide-based natural products. Therefore, Family 3 was dubbed Natural 124
Product Encapsulins. Family 4 is characterized by a highly truncated encapsulin-like capsid protein which 125
is generally annotated as an uncharacterized protein (PF08967: DUF1884) and arranged in conserved 126
two-component operons with different enzymes. Family 4 proteins represent the A-domain of the 127
canonical HK97-fold with all other domains usually associated with this fold missing. Thus, Family 4 will 128
be referred to as A-domain Encapsulins. 129
Classical Encapsulins (Family 1) represent the most widespread family of encapsulin-like systems. They 130
can be found in 31 out of 35 prokaryotic phyla found to encode encapsulin-like operons (Fig. 1). 2,383 131
Classical Encapsulin operons were discovered with the phyla Proteobacteria, Actinobacteria and 132
Firmicutes containing the majority of identified systems. However, it should be noted that these phyla 133
134
Fig. 2. Novel classification scheme for encapsulin-like operons. Shown are the 4 newly defined families of encapsulins with the 135
respective Pfam annotations if available. Encapsulin-like capsid components are shown in red. Confirmed and proposed cargo 136
proteins are shown in blue. Non-cargo accessory components are shown in grey. The number of identified systems of a given 137
family is shown after the operon in red (I, # identified) and the number of distinct cargo types is shown in cyan (CT, # cargo 138
types). Dotted lines indicate optional presence of operon components. cNMP: cyclic nucleotide-binding domain (orange), Enc: 139
encapsulin-like capsid component. BGC: biosynthetic gene cluster. 140
.CC-BY-NC 4.0 International licenseavailable under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made
The copyright holder for this preprintthis version posted March 18, 2021. ; https://doi.org/10.1101/2021.03.18.436031doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: The present manuscript reviews not only the findings made at the structural level, for both the encapsulin shell and cargo proteins, as well as their functions, the use of encapsulins as nanoreactors, nanomaterials, delivery platforms for diagnosis and therapeutics or imaging probes have been recently boosted and is also reviewed.

9 citations

Journal ArticleDOI
TL;DR: Encapsulins are proteinaceous nanocontainers, constructed by a single species of shell protein that self-assemble into 20-40 nm icosahedral particles as mentioned in this paper.
Abstract: Encapsulins are proteinaceous nanocontainers, constructed by a single species of shell protein that self-assemble into 20-40 nm icosahedral particles. Encapsulins are structurally similar to the capsids of viruses of the HK97-like lineage, to which they are evolutionarily related. Nearly all these nanocontainers encase a single oligomeric protein that defines the physiological role of the complex, although a few encapsulate several activities within a single particle. Encapsulins are abundant in bacteria and archaea, in which they participate in regulation of oxidative stress, detoxification, and homeostasis of key chemical elements. These nanocontainers are physically robust, contain numerous pores that permit metabolite flux through the shell, and are very tolerant of genetic manipulation. There are natural mechanisms for efficient functionalization of the outer and inner shell surfaces, and for the in vivo and in vitro internalization of heterologous proteins. These characteristics render encapsulin an excellent platform for the development of biotechnological applications. Here we provide an overview of current knowledge of encapsulin systems, summarize the remarkable toolbox developed by researchers in this field, and discuss recent advances in the biomedical and bioengineering applications of encapsulins.

9 citations

Posted ContentDOI
19 Apr 2021-bioRxiv
TL;DR: In this paper, a peptide capable of triggering conformational change at a key structural position in the largest known encapsulin nanocompartment is introduced, and the structure of the resulting engineered nanocage and demonstrate its ability to ondemand disassemble and reassemble under physiological conditions.
Abstract: Protein nanocages play crucial roles in sub-cellular compartmentalization and spatial control in all domains of life and have been used as biomolecular tools for applications in biocatalysis, drug delivery, and bionanotechnology. The ability to control their assembly state under physiological conditions would further expand their practical utility. To gain such control, we introduced a peptide capable of triggering conformational change at a key structural position in the largest known encapsulin nanocompartment. We report the structure of the resulting engineered nanocage and demonstrate its ability to on-demand disassemble and reassemble under physiological conditions. We demonstrate its capacity for in vivo encapsulation of proteins of choice while also demonstrating in vitro cargo loading capabilities. Our results represent a functionally robust addition to the nanocage toolbox and a novel approach for controlling protein nanocage disassembly and reassembly under mild conditions.

2 citations

References
More filters
Journal ArticleDOI
TL;DR: Two unusual extensions are presented: Multiscale, which adds the ability to visualize large‐scale molecular assemblies such as viral coats, and Collaboratory, which allows researchers to share a Chimera session interactively despite being at separate locales.
Abstract: The design, implementation, and capabilities of an extensible visualization system, UCSF Chimera, are discussed. Chimera is segmented into a core that provides basic services and visualization, and extensions that provide most higher level functionality. This architecture ensures that the extension mechanism satisfies the demands of outside developers who wish to incorporate new features. Two unusual extensions are presented: Multiscale, which adds the ability to visualize large-scale molecular assemblies such as viral coats, and Collaboratory, which allows researchers to share a Chimera session interactively despite being at separate locales. Other extensions include Multalign Viewer, for showing multiple sequence alignments and associated structures; ViewDock, for screening docked ligand orientations; Movie, for replaying molecular dynamics trajectories; and Volume Viewer, for display and analysis of volumetric data. A discussion of the usage of Chimera in real-world situations is given, along with anticipated future directions. Chimera includes full user documentation, is free to academic and nonprofit users, and is available for Microsoft Windows, Linux, Apple Mac OS X, SGI IRIX, and HP Tru64 Unix from http://www.cgl.ucsf.edu/chimera/.

35,698 citations

Journal ArticleDOI
TL;DR: Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.
Abstract: Cytoscape is an open source software project for integrating biomolecular interaction networks with high-throughput expression data and other molecular states into a unified conceptual framework. Although applicable to any system of molecular components and interactions, Cytoscape is most powerful when used in conjunction with large databases of protein-protein, protein-DNA, and genetic interactions that are increasingly available for humans and model organisms. Cytoscape's software Core provides basic functionality to layout and query the network; to visually integrate the network with expression profiles, phenotypes, and other molecular states; and to link the network to databases of functional annotations. The Core is extensible through a straightforward plug-in architecture, allowing rapid development of additional computational analyses and features. Several case studies of Cytoscape plug-ins are surveyed, including a search for interaction pathways correlating with changes in gene expression, a study of protein complexes involved in cellular recovery to DNA damage, inference of a combined physical/functional interaction network for Halobacterium, and an interface to detailed stochastic/kinetic gene regulatory models.

32,980 citations

Journal ArticleDOI
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Abstract: We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

27,771 citations

Journal ArticleDOI
TL;DR: A new program called Clustal Omega is described, which can align virtually any number of protein sequences quickly and that delivers accurate alignments, and which outperforms other packages in terms of execution time and quality.
Abstract: Multiple sequence alignments are fundamental to many sequence analysis methods. Most alignments are computed using the progressive alignment heuristic. These methods are starting to become a bottleneck in some analysis pipelines when faced with data sets of the size of many thousands of sequences. Some methods allow computation of larger data sets while sacrificing quality, and others produce high-quality alignments, but scale badly with the number of sequences. In this paper, we describe a new program called Clustal Omega, which can align virtually any number of protein sequences quickly and that delivers accurate alignments. The accuracy of the package on smaller test cases is similar to that of the high-quality aligners. On larger data sets, Clustal Omega outperforms other packages in terms of execution time and quality. Clustal Omega also has powerful features for adding sequences to and exploiting information in existing alignments, making use of the vast amount of precomputed information in public databases like Pfam.

12,489 citations