scispace - formally typeset
Open AccessPosted ContentDOI

Evolving Infection Paradox of SARS-CoV-2: Fitness Costs Virulence?

TLDR
In this article, the authors proposed a model combining a statistical and structural bioinformatics approach to explain the infection paradox by describing the epistatic effects of the clade-featured co-occurring mutations on viral fitness and virulence.
Abstract
BackgroundSARS-CoV-2 is continuously spreading worldwide at an unprecedented scale and evolved into seven clades according to GISAID where four (G, GH, GR and GV) are globally prevalent in 2020. These major predominant clades of SARS-CoV- 2 are continuously increasing COVID-19 cases worldwide; however, after an early rise in 2020, the death-case ratio has been decreasing to a plateau. G clade viruses contain four co- occurring mutations in their genome (C241T+C3037T+C14408T: RdRp.P323L+A23403G:spike.D614G). GR, GH, and GV strains are defined by the presence of these four mutations in addition to the clade-featured mutation in GGG28881- 28883AAC:N. RG203-204KR, G25563T:ORF3a.Q57H, and C22227T:spike.A222V+C28932T-N.A220V+G29645T, respectively. The research works are broadly focused on the spike protein mutations that have direct roles in receptor binding, antigenicity, thus viral transmission and replication fitness. However, mutations in other proteins might also have effects on viral pathogenicity and transmissibility. How the clade- featured mutations are linked with viral evolution in this pandemic through gearing their fitness and virulence is the main question of this study. MethodologyWe thus proposed a hypothetical model, combining a statistical and structural bioinformatics approach, endeavors to explain this infection paradox by describing the epistatic effects of the clade-featured co-occurring mutations on viral fitness and virulence. Results and DiscussionThe G and GR/GV clade strains represent a significant positive and negative association, respectively, with the death-case ratio (incidence rate ratio or IRR = 1.03, p <0.001 and IRR= 0.99/0.97, p < 0.001), whereas GH clade strains showed no association with the Docking analysis showed the higher infectiousness of a spike mutant through more favorable binding of G614 with the elastase-2. RdRp mutation p.P323L significantly increased genome-wide mutations (p<0.0001) since more expandable RdRp (mutant)-NSP8 interaction may accelerate replication. Superior RNA stability and structural variation at NSP3:C241T might impact upon protein or RNA interactions. Another silent 5UTR:C241T mutation might affect translational efficiency and viral packaging. These G- featured co-occurring mutations might increase the viral load, alter immune responses in host and hence can modulate intra-host genomic plasticity. An additional viroporin ORF3a:p.Q57H mutation, forming GH-clade, prevents ion permeability by cysteine (C81)- histidine (H57) inter-transmembrane-domain interaction mediated tighter constriction of the channel pore and possibly reduces viral release and immune response. GR strains, four G clade mutations and N:p.RG203-204KR, would have stabilized RNA interaction by more flexible and hypo-phosphorylated SR-rich region. GV strains seemingly gained the evolutionary advantage of superspreading event through confounder factors; nevertheless, N:p.A220V might affect RNA binding. ConclusionThese hypotheses need further retrospective and prospective studies to understand detailed molecular and evolutionary events featuring the fitness and virulence of SARS-CoV-2. HighlightsO_LIWe speculated an association of particular SARS-CoV-2 clade with death rate. C_LIO_LIThe polymerase mutant virus can speed up replication that corresponds to higher mutations. C_LIO_LIThe impact on viral epistasis by evolving mutations in SARS-CoV-2. C_LIO_LIHow the virus changes its genotype and circulate with other types given the overall dynamics of the epidemics? C_LIO_LIHuman intervention seems to work well to control the viral virulence. This hygiene practice will control the overall severity of the pandemic situation as recommended by the WHO. Our work has given the same message but explain with the dominant co-occurring mutations. C_LI

read more

Content maybe subject to copyright    Report

1
1
Dominant Clade-featured SARS-CoV-2 Co-occurring Mutations Reveals Plausible 2
Epistasis: An in silico based Hypothetical Model 3
4
A. S. M. Rubayet Ul Alam
1#
, Ovinu Kibria Islam
1#
, Md. Shazid Hasan
1
, Mir Raihanul Islam
2
, 5
Shafi Mahmud
3
, Hassan M. Al
Emran
4
, Iqbal Kabir Jahid
1
, Keith A. Crandall
5
, M. Anwar 6
Hossain
6,7*
7
8
1 Department of Microbiology, Jashore University of Science and Technology, Jashore-7408, 9
Bangladesh 10
2 BRAC James P Grant School of Public Health, BRAC University, Bangladesh 11
3 Genetic Engineering and Biotechnology, University of Rajshahi, Rajshai-6205, Bangladesh 12
4 Department of Biomedical Engineering, Jashore University of Science and Technology, 13
Jashore-7408, Bangladesh 14
5 Computational Biology Institute and Department of Biostatistics & Bioinformatics, Milken 15
Institute School of Public Health, The George Washington University, Washington, DC, 16
USA 17
6 Jashore University of Science and Technology, Jashore-7408, Bangladesh 18
7 Department of Microbiology, University of Dhaka, Dhaka-1000, Bangladesh 19
20
*Correspondence 21
M. Anwar Hossain, Jashore University of Science and Technology, Jashore-7408, 22
Bangladesh. 23
E-mail: hossaina@du.ac.bd
, Contact: +8801715363753 24
25
26
# Authors contributed equally 27
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted July 16, 2021. ; https://doi.org/10.1101/2021.02.21.21252137doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

2
28
ABSTRACT 29
SARS-CoV-2 is evolved into eight fundamental clades where four (G, GH, GR, and GV) are 30
globally prevalent in 2020. How the featured co-occurring mutations of these clades are 31
linked with viral fitness is the main question here and we thus proposed a hypothetical model 32
using in silico approach to explain the plausible epistatic effects of those mutations on viral 33
replication and transmission. Molecular docking and dynamics analyses showed the higher 34
infectiousness of a spike mutant through more favorable binding of G
614
with the elastase-2. 35
RdRp mutation p.P323L significantly increased genome-wide mutations (p<0.0001) since 36
more flexible RdRp (mutated)-NSP8 interaction may accelerate replication. Superior RNA 37
stability and structural variation at NSP3:C241T might impact protein and/or RNA 38
interactions. Another silent 5’UTR:C241T mutation might affect translational efficiency and 39
viral packaging. These four G-clade-featured co-occurring mutations might increase viral 40
replication. Sentinel GH-clade ORF3a:p.Q57H constricted ion-channel through inter-41
transmembrane-domain interaction of cysteine(C81)-histidine(H57) and GR-clade 42
N:p.RG203-204KR would stabilize RNA interaction by a more flexible and hypo-43
phosphorylated SR-rich region. GV-clade viruses seemingly gained the evolutionary 44
advantage of the confounding factors; nevertheless, N:p.A220V might modulate RNA 45
binding with no phenotypic effect. Our hypothetical model needs further retrospective and 46
prospective studies to understand detailed molecular events featuring the fitness of SARS-47
CoV-2. 48
49
50
51
52
53
54
55
Key words 56
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted July 16, 2021. ; https://doi.org/10.1101/2021.02.21.21252137doi: medRxiv preprint

3
SARS-CoV-2, COVID-19, Infection Paradox, Fitness, Virulence, Clades, Co-occurring 57
mutations 58
59
1. Introduction 60
Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the etiological agent of 61
COVID-19 pandemic, has gained some extraordinary attributes that make it extremely 62
infectious: High replication rate, large burst size, high stability in the environment, strong 63
binding efficiency of spike glycoprotein (S) receptor-binding domain (RBD) with human 64
angiotensin-converting enzyme 2 (ACE2) receptor, and additional furin cleavage site in S 65
protein
1-3
. In addition to those, it has proofreading capability ensuring relatively high-fidelity 66
replication
4
. The virus contains four major structural proteins: spike glycoprotein (S), 67
envelope (E), membrane (M), and nucleocapsid (N) protein along with 16 nonstructural 68
proteins (NSP1 to NSP16) and seven accessory proteins (ORF3a, ORF6, ORF7a, ORF7b, 69
ORF8a, ORF8b, and ORF10)
5,6
. Mutational spectra within the SARS-CoV-2 genome
7,8
, spike 70
protein
9
, RdRp
10
, ORF3a
11
, and N protein
12
were reported. 71
SARS-CoV-2 was classified into eight major clades, such as G, GH, GR, GV, S, V, L, 72
and O by global initiative on sharing all influenza data (GISAID) consortium 73
(https://www.gisaid.org/
) based on the dominant core mutations in genomes where four 74
clades (G, GH, GR, and GV) are globally and geographically prevalent in 2020
13
. Yin
14
75
reported that theUTR mutation 241C > T is co-occurring with three other mutations, 76
3037C > T (NSP3: C318T), 14408C > T (RdRp: p.P323L), and 23403A > G (S: p.D614G). 77
GISAID referred to these co-occurring mutations containing viruses as clade G (named after 78
the spike D614G mutation) or PANGO (https://cov-lineages.org/
) lineage B.1
15,16
. The GR 79
clade or lineage B.1.1.* is classified with additional trinucleotide mutations at 28881-28883 80
(GGG>AAC); creating two consecutive amino acid changes, R203K and G204R, in N 81
protein. Another derivative of G clade is GH or lineage B.1.*, characterized by an additional 82
ORF3a:p.Q57H mutation. The variant GV or lineage B.1.177 featured an A222V mutation in 83
the S protein along with other mutations of the clade G
13,16
. Also, N: A220V, ORF10: V30L 84
and three other synonymous mutations T445C, C6286T, and C26801G are observed for this 85
clade
17
. 86
The most frequently observed mutation is D614G of the S protein
18
, which has direct roles 87
in receptor binding, and immunogenicity, thus viral immune-escape, transmission, and 88
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted July 16, 2021. ; https://doi.org/10.1101/2021.02.21.21252137doi: medRxiv preprint

4
replication fitness
19,20
. Mutations in proteins other than spike could also affect viral 89
pathogenicity and transmissibility, but the role of those dominant clade-featured mutations 90
has remained largely underestimated. Although the possible role of ORF3a:p.Q57H in 91
replication cycle
21
has recently been investigated, the molecular perspective was not fully 92
explained there. The effect of 5'UTR: C241T, Leader: T445C, NSP3: C318T, RdRp:p.P323L, 93
N:p.RG203-203KR, and N:p.A220V is still being overlooked. 94
Different mutation(s) of SARS-CoV-2 may work independently or through epistatic 95
interactions
22,23
; however, it is difficult to determine exactly how these co-occurring 96
mutations, if not all, might have gained their selective evolutionary fitness
22,24,25
. Hence, 97
many hypothetical questions remain: What are the impacts of these mutations on protein 98
structures, and what can be their functional roles? How might these mutated proteins interact 99
together? Is there any possible role of the co-occurring ‘silent’ mutation? Could these 100
mutations have any plausible impact on viral fitness and virulence? We attempt here to 101
answer these questions by in silico molecular insights of SARS-CoV-2 mutants and possible 102
interactions of proteins containing co-occurring mutations. Overall, this study aims to 103
determine plausible individual and/or epistatic impacts of those mutants during replication in 104
terms of viral entry and fusion, evasion of host cell lysis, replication rate, ribonucleoprotein 105
stability, protein-protein interactions, translational capacity, and ultimately the probable 106
combined effect on viral transmission and fitness. 107
2. Materials and Methods 108
2.1 Retrieval of Sequences and Mutation Analyses 109
This study analyzed 225,526 high-coverage (<1% Ns and <0.05% unique amino acid 110
mutations) and complete (>29,000 nucleotides) genome sequences from a total of 3,16,166 111
sequences submitted to GISAID from January 01, 2020, to January 03, 2021. We removed 112
the non-human host-generated sequences during dataset preparation. The Wuhan-Hu-1 113
(Accession ID- NC_045512.2) isolate was used as the reference genome. 114
A python script (https://github.com/hridoy04/counting-mutations
) was used to partition a 115
significant part of the dataset into two subsets based on the RdRp: C14408T mutation and 116
estimated the genome-wide variations (single nucleotide changes) for each strain. For the 117
genome-wide mutation analysis, a total of 37,179 sequences (RdRp wild type orC’ variant: 118
9,815; and mutant orT’ variant: 27,364) were analyzed from our dataset. The frequency of 119
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted July 16, 2021. ; https://doi.org/10.1101/2021.02.21.21252137doi: medRxiv preprint

5
mutations was tested for significance with the Wilcoxon signed-rank test between RdRp ‘C120
variant and ‘T’ variant using IBM SPSS statistics 25. 121
2.2 Stability, Secondary and Three-Dimensional Structure Prediction Analyses 122
of S, RdRp, ORF3a, and N Proteins 123
DynaMut
26
and FoldX 5.0
27,28
were used to determine the stability of both wild and 124
mutant variants of N, RdRp, S, and ORF3a proteins. PredictProtein
29
was utilized for 125
analyzing and predicting the possible secondary structure and solvent accessibility of both 126
wild and mutant variants of those proteins. The SWISS-MODEL homology modeling 127
webtool
30
was utilized for generating the three-dimensional (3D) structures of the RdRp, S, 128
and ORF3a protein using 7c2k.1.A, 6xr8.1.A, and 6xdc.1.A PDB structure as the template, 129
respectively. Modeller v9.25
31
was also used to generate the structures against the same 130
templates. I-TASSER
32
with default protein modeling mode was employed to construct the N 131
protein 3D structure of wild and mutant type since there was no template structure available 132
for the protein. The built-in structural assessment tools (Ramachandran plot, MolProbity, and 133
Quality estimate) of SWISS-MODEL were used to check the quality of generated structures. 134
2.3 Molecular Docking and Dynamics of RdRp-NSP8 and Spike-Elastase2 135
Complexes 136
Determination of the active sites affected by binding is a prerequisite for docking 137
analysis. We chose 323 along with the surrounding residues (315-324) of RdRp and the 138
residues 110 to 122 of NSP8 monomer as the active sites based on the previously reported 139
structure
33
. The passive residues were defined automatically where all surface residues were 140
selected within the 6.5°A radius around the active residues. The molecular docking of the 141
wild and predicted mutated RdRp with the NSP8 monomer from the PDB structure 7C2K 142
was performed using the HADDOCKv2.4 to evaluate the interaction
34
. The binding affinity 143
of the docked RdRp-NSP8 complex was predicted using the PRODIGY
35
. The number and 144
specific interfacial contacts (IC) for each of the complexes were identified. 145
The human neutrophil elastase (hNE) or elastase-2 (PDB id: 5A0C) was chosen for 146
docking of the S protein, based on earlier reports
36
. Here we employed CPORT
37
to find out 147
the active and passive protein-protein interface residues of hNE. The S protein active sites 148
were chosen based on the target region (594-638) interacting with the elastase-2. The passive 149
residues of S protein were defined automatically as mentioned for RdRp-NSP8 docking 150
analysis. Afterward, we individually docked wild (614D) and mutated (614G) S protein with 151
All rights reserved. No reuse allowed without permission.
(which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprintthis version posted July 16, 2021. ; https://doi.org/10.1101/2021.02.21.21252137doi: medRxiv preprint

Citations
More filters
Journal ArticleDOI

Evolutionary dynamics of SARS-CoV-2 nucleocapsid protein and its consequences.

TL;DR: Continuous monitoring is required for tracing the ongoing evolution of the SARS‐CoV‐2 N protein in prophylactic and diagnostic interventions and observing the possible consequence of high‐frequency mutations and deletions on the tertiary structure of the N protein.
Journal ArticleDOI

Evolutionary trajectory of SARS-CoV-2 and emerging variants.

TL;DR: In this paper, the authors highlight the origins of all known human coronavirus (HCoVs) and map positively selected for mutations within HCoV proteins to discuss the evolutionary trajectory of SARS-CoV-2.
Journal ArticleDOI

Evolution of SARS-CoV-2: Review of Mutations, Role of the Host Immune System.

TL;DR: A review of the evolution of the SARS-CoV-2 virus can be found in this article, where the authors identify the mutations that have appeared since the beginning of the pandemic and their role in the temporal evolution.
Journal ArticleDOI

Case report: change of dominant strain during dual SARS-CoV-2 infection.

TL;DR: In this paper, two strains of SARS-CoV-2 were detected in the same patient during the same disease presentation, and the patient was transferred to the ICU (intensive care unit) of the hospital specialising in the treatment of COVID-19 patients.
Journal ArticleDOI

“Molecular characterization of SARS-CoV-2 from Bangladesh: Implications in genetic diversity, possible origin of the virus, and functional significance of the mutations”

TL;DR: In this article, the authors have downloaded 324 complete and near complete SARS-CoV-2 genomes submitted in GISAID database from Bangladesh which were isolated between 30 March to 7 September, 2020.
References
More filters
Journal ArticleDOI

Mfold web server for nucleic acid folding and hybridization prediction

TL;DR: The objective of this web server is to provide easy access to RNA and DNA folding and hybridization software to the scientific community at large by making use of universally available web GUIs (Graphical User Interfaces).
Journal ArticleDOI

Cryo-EM structure of the 2019-nCoV spike in the prefusion conformation.

TL;DR: The authors show that this protein binds at least 10 times more tightly than the corresponding spike protein of severe acute respiratory syndrome (SARS)–CoV to their common host cell receptor, and test several published SARS-CoV RBD-specific monoclonal antibodies found that they do not have appreciable binding to 2019-nCoV S, suggesting that antibody cross-reactivity may be limited between the two RBDs.
Journal ArticleDOI

I-TASSER: a unified platform for automated protein structure and function prediction

TL;DR: The iterative threading assembly refinement (I-TASSER) server is an integrated platform for automated protein structure and function prediction based on the sequence- to-structure-to-function paradigm.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What are the contributions in this paper?

How the featured co-occurring mutations of these clades are 31 linked with viral fitness is the main question here and the authors thus proposed a hypothetical model 32 using in silico approach to explain the plausible epistatic effects of those mutations on viral 33 replication and transmission. Their hypothetical model needs further retrospective and 46 prospective studies to understand detailed molecular events featuring the fitness of SARS47 CoV-2. 48