scispace - formally typeset
Search or ask a question
Posted ContentDOI

An Issue of Concern: Unique Truncated ORF8 Protein Variants of SARS-CoV-2

TL;DR: In this article, the authors identified 47 unique truncated ORF8 proteins (T-ORF8) due to the Q27STOP mutations were identified among 49055 available B.1.7 SARS-CoV-2 sequences.
Abstract: Open reading frame 8 (ORF8) protein is one of the most evolving accessory proteins in severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19). It was previously reported that the ORF8 protein inhibits presentation of viral antigens by the major histocompatibility complex class I (MHC-I) and interacts with host factors involved in pulmonary inflammation. The ORF8 protein assists SARS-CoV-2 to evade immunity and replication. Among many contributing mutations, Q27STOP, a mutation in the ORF8 protein defines the B.1.1.7 lineage of SARS-CoV-2, which is engendering the second wave of COVID-19. In the present study, 47 unique truncated ORF8 proteins (T-ORF8) due to the Q27STOP mutations were identified among 49055 available B.1.1.7 SARS-CoV-2 sequences. The results show that only one of the 47 T-ORF8 variants spread to over 57 geo-locations in North America, and other continents which includes Africa, Asia, Europe and South America. Based on various quantitative features such as amino acid homology, polar/non-polar sequence homology, Shannon entropy conservation, and other physicochemical properties of all specific 47 T-ORF8 protein variants, a collection of nine possible T-ORF8 unique variants were defined. The question of whether T-ORF8 variants work similarly to ORF8 has yet to be investigated. A positive response to the question could exacerbate future COVID-19 waves, necessitating severe containment measures.

Summary (3 min read)

1. Introduction

  • The world is proceeding through an unprecedented time due to the Coronavirus disease 2019 (COVID-19), of which the causative agent is the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1, 2, 3, 4, 5].
  • It directly interacts with major histocompatibility complex class I (MHC-I) both invitro and invivo, and is down-regulated, which impairs its ability to antigen presentation and rendering infected cells less sensitive to lysis by cytotoxic T lymphocytes [15].
  • The functional implications of SARS-CoV-2 ORF8 had already gained huge attention and ORF8 is considered an important component of the immune evasion machinery [11, 18, 19, 20].
  • Thus, it is of utmost importance to gain insight into the functionality of the truncated ORF8 protein variants to comprehend the B.1.1.7 lineage through theoretical and experimental characterization and genomic surveillance worldwide [36].

2. Data acquisition and methods

  • Truncated ORF8 protein (T-ORF8) sequences from five continents (Asia, Africa, Europe, South America, and North America) were downloaded in Fasta format (as of May 18, 2021) from the National Center for Biotechnology Information (NCBI) database (http://www.ncbi.nlm.nih.gov/).
  • Note that no T-ORF8 protein sequence was found from Oceania as of May 18th, 2021.
  • Further, Fasta files were processed in Matlab-2021a for extracting unique T-ORF8 sequences for each continent.

2.1. Derivation of polar/non-polar sequences and associated phylogeny

  • Every amino acid in a given T-ORF8 sequence was identified as polar (Q) and non-polar (P).
  • Thus, every unique T-ORF8 became a binary sequence with two symbols P and Q.
  • Then sequence homology of these sequences was derived using the Clustal Omega web-suite and then associated with nearest neighborhood phylogenetic relationship among the unique T-ORF8 variants.
  • Further, unique T-ORF8 variants having distinct binary polar/non-polar sequences were extracted [37, 38].

2.2. Frequency distribution of amino acids and phylogeny

  • The frequency of each amino acid present in a T-ORF8 sequence was determined using standard bioinformatics routine in Matlab-2021a.
  • For each T-ORF8 protein, a twenty-dimensional frequency-vector considering the frequency of standard twenty amino acids can be obtained.
  • Based on this frequency distribution of amino acids several consequences were drawn.
  • The distance (Euclidean metric) between any two pairs of frequency vectors was calculated for each pair of T-ORF8 sequences.
  • By having the distance matrix, a phylogenetic relationship was developed based on the nearest neighbor-joining method using the standard routine in Matlab-2021a [39, 40].

2.5. Intrinsic disorder analysis

  • All 47 T-ORF8 variants were subjected to the per-residue disorder analysis, for which PONDR-VSL2 algorithm was employed [46].
  • This tool shows good performance on proteins containing both structure and disorder and was favorably ranked in a recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment [47].

2.6. Finding functional motifs

  • The Eukaryotic Linear Motif (ELM) resource (http://elm.eu.org/) was used for finding functional sites in proteins [48].
  • ELMs (also known as short linear motifs (SLiMs)), are short protein interaction sites, which are commonly found in intrinsically disordered regions of proteins and define a wide range of protein functionality.

3. Results

  • Continent-wise, all unique T-ORF8 protein variants were segregated from a set of available truncated ORF8 protein sequences collected from the NCBI database.
  • Further, variability and commonality of the unique T-ORF8 proteins were analyzed from various quantitative measures such as amino acid homology-based phylogeny, frequency distribution of amino acids and associated phylogeny, polarity sequence-based phylogeny, and physicochemical properties.
  • Relying on these features, a set of nine possible unique T-ORF8 variants were identified, which were found to lie within the likelihood of a T-ORF8 variant named P15 (Table 3).

3.1. Characteristics of the unique variants of T-ORF8

  • For each continent, the number of total sequences, the unique truncated ORF8 (T-ORF8) sequences and percentages are presented in Table 1.
  • Note that in four positions 23, 25, 27 and 40 amino acids Q and C both were truncated due to mutations at the first and third position, respectively, of the respective codon.
  • There were 18 geo-locations, where the frequency of spread of the P15 variant was found to be less than 100 (Table 5).

3.2. Evaluation of intrinsic disorder content of 47 T-ORF8 proteins

  • The authors also analyzed the peculiarities of the distribution of per-residue intrinsic disorder predisposition within sequences of 47 T-ORF8 variants.
  • A kinase docking motif that mediates interaction towards the ERK1/2 and P38 subfamilies of MAP kinases and a Ser/Thr residue phosphorylated by the Plk1 kinase are present in 20 clusters, whereas 17 clusters also include a site for attachment of a fucose residue to a serine.

3.3.1. Polarity based variability of T-ORF8 variants

  • Each unique T-ORF8 variant possessed a binary polar/non-polar sequence and based on the sequence homology of these sequences, a phylogenetic relationship has been obtained .
  • The number of polar and non-polar residues in the unique T-ORF8 variants was found to be almost balanced (50-50 in percentage).
  • Note that, the P15 variant was placed in a single leaf and found to be distant from the other unique ORF8 variants as per polarity-based homology, although P15 was found to be the closest to the T-ORF8 variants P13 and P14 based on amino acid homology.

3.3.2. Variability of the frequency distribution of amino acids present in T-ORF8 variants

  • The frequency of each amino acid present in the unique T-ORF8 variants was enumerated, and consequently, a twentydimensional frequency vector was obtained (Table 8).
  • It was noted that the amino acids arginine, asparagine, aspartic acid, proline and tyrosine were absent in the T-ORF8 P15.
  • In the sequence P14 and P41, asparagine was present with frequency one.
  • It was found that the P15 variant is equidistant (1.41) from all other variants except P30 and P40 which were 1 distance apart from P15.

3.3.4. Molecular and physicochemical informatics of T-ORF8 unique variants

  • For each unique T-ORF8 variant and complete ORF8 protein, several physicochemical and molecular properties were computed using the web-servers as mentioned in section 2.4 (Table 10).
  • Distance matrix of property vectors and derived phylogenetic tree of 45 T-ORF8 variants, also known as Figure 11.
  • (A) represents distance matrix, (B) Phylogenetic tree based on physicochemical properties.
  • Note that the property vectors of P20 and P30 were highly distant from that of other ORF8 variants due to the huge difference in the extinction coefficients (for P20, EC: 1490 and for P30, EC: 1615).
  • Property vector distances from each 45 unique T-ORF8 variants from P15 are presented in Table 11.

3.4. Possible T-ORF8 variants in the likelihood of P15 variant

  • Based on the amino acid sequence homology and other various features such as the frequency distribution of amino acids, SE, and physicochemical properties of T-ORF8 variants a possible cluster of nine unique T-ORF8 variants are derived.
  • Note that the possible T-ORF8 variants were made of the set-theoretic union of the sets of possible T-ORF8 variants which were placed in the likelihood of P15 based on various quantitative measures mentioned in the result subsections.
  • All these nine unique T-ORF8 variants had unique polar/non-polar sequences as discussed in Table 7.

4. Discussion and Concluding Remarks

  • ORF8 is 121-amino-acid with two genotypes (orf8L and orf8S), Ig-Like fold, highly immunogenic, SARS-CoV-2 protein interacting with 47 human proteins 15 of them are drug targeting was noticed to interact with MHC-I molecules and significantly down-regulate their surface expression on various cell types [16, 49, 50].
  • It seems that the ORF8 has only a minor or non-impact on these activities and/or SARS-CoV-2 life cycle as it can survive without functional ORF8, due to many mutations and truncations raised in its gene and protein as above mentioned [23, 54].
  • Quantitative characteristics of the 47 unique truncated ORF8 protein variants were examined.

Did you find this useful? Give us your feedback

Figures (21)

Content maybe subject to copyright    Report

An Issue of Concern: Unique Truncated ORF8 Protein Variants of SARS-CoV-2
Sk. Sarif Hassan
a,
, Vaishnavi Kodakandla
b
, Elrashdy M. Redwan
c
, Kenneth Lundstrom
d
, Pabitra Pal Choudhury
e
, Tarek
Mohamed Abd El-Aziz
f
, Kazuo Takayama
g
, Ramesh Kandimalla
h
, Amos Lal
i
,
´
Angel Serrano-Aroca
j
, Gajendra Kumar
Azad
k
, Alaa A. A. Aljabali
l
, Giorgio Palu
m
, Gaurav Chauhan
n
, Parise Adadi
o
, Murtaza Tambuwala
p
, Adam M. Brufsky
q
,
Wagner Baetas-da-Cruz
r
, Debmalya Barh
s
, Nicolas G Bazan
t
, Vladimir N. Uversky
u,
a
Department of Mathematics, Pingla Thana Mahavidyalaya, Maligram, Paschim Medinipur, 721140, West Bengal, India
b
Department of Life sciences, Sophia College For Women, University of Mumbai, Bhulabhai Desai Road, Mumbai 400026, India
c
Faculty of Science, Department of Biological Science, King Abdulazizi University, Jeddah 21589, Saudi Arabia
d
PanTherapeutics, Rte de Lavaux 49, CH1095 Lutry, Switzerland
e
Indian Statistical Institute, Applied Statistics Unit, 203 B T Road, Kolkata 700108, India
f
Department of Cellular and Integrative Physiology, University of Texas Health Science Center at San Antonio, 7703 Floyd Curl Dr, San
Antonio, TX 78229-3900, USA, & Zoology Department, Faculty of Science, Minia University, El-Minia 61519, Egypt
g
Center for iPS Cell Research and Application, Kyoto University, Kyoto 6068507, Japan
h
Applied Biology, CSIR-Indian Institute of Chemical Technology, Uppal Road, Tarnaka, Hyderabad, 500007, Department of Biocemistry,
Kakatiya Medical College, Warangal, Telangana, India
i
Division of Pulmonary and Critical Care Medicine, Mayo Clinic, Rochester, Minnesota, USA
j
Biomaterials and Bioengineering Lab, Centro de Investigaci´on Traslacional San Alberto Magno, Universidad Cat´olica de Valencia San Vicente
artir, c/Guillem de Castro, 94, 46001 Valencia, Valencia, Spain
k
Department of Zoology, Patna University, Patna, Bihar, India
l
Department of Pharmaceutics and Pharmaceutical Technology, Yarmouk University, Faculty of Pharmacy, Irbid 566, Jordon
m
Department of Molecular Medicine, University of Padova, Via Gabelli 63, 35121, Padova, Italy
n
School of Engineering and Sciences, Tecnologico de Monterrey, Av. Eugenio Garza Sada 2501 Sur, 64849 Monterrey, Nuevo Lon, Mexico
o
Department of Food Science, University of Otago, Faculty of Pharmacy, Dunedin 9054, New Zealand
p
School of Pharmacy and Pharmaceutical Science, Ulster University, Coleraine BT52 1SA, Northern Ireland, UK
q
University of Pittsburgh School of Medicine, Department of Medicine, Division of Hematology/Oncology, UPMC Hillman Cancer
Center, Pittsburgh, PA, USA
r
Translational Laboratory in Molecular Physiology, Centre for Experimental Surgery, College of Medicine, Federal University of Rio de Janeiro
(UFRJ), Rio de Janeiro, Brazil
s
Centre for Genomics and Applied Gene Technology, Institute of Integrative Omics and Applied Biotechnology (IIOAB), Nonakuri, Purba
Medinipur, WB, India, & Departamento de Gen´etica, Ecologia e Evolucao, Instituto de Ciˆencias Biol´ogicas, Universidade Federal de Minas
Gerais, Belo Horizonte, Minas Gerais, Brazil
t
Neuroscience Center of Excellence, School of Medicine, LSU Health New Orleans, New Orleans, LA 70112, USA
u
Department of Molecular Medicine, Morsani College of Medicine, University of South Florida, Tampa, FL 33612, USA
Abstract
Open reading frame 8 (ORF8) protein is one of the most evolving accessory proteins in severe acute respiratory syndrome
coronavirus 2 (SARS-CoV-2), the causative agent of coronavirus disease 2019 (COVID-19). It was previously reported that the
ORF8 protein inhibits presentation of viral antigens by the major histocompatibility complex class I (MHC-I) and interacts
with host factors involved in pulmonary inflammation. The ORF8 protein assists SARS-CoV-2 to evade immunity and
replication. Among many contributing mutations, Q27STOP, a mutation in the ORF8 protein defines the B.1.1.7 lineage
of SARS-CoV-2, which is engendering the second wave of COVID-19. In the present study, 47 unique truncated ORF8
proteins (T-ORF8) due to the Q27STOP mutations were identified among 49055 available B.1.1.7 SARS-CoV-2 sequences.
The results show that only one of the 47 T-ORF8 variants spread to over 57 geo-locations in North America, and other
continents which includes Africa, Asia, Europe and South America. Based on various quantitative features such as amino
acid homology, polar/non-polar sequence homology, Shannon entropy conservation, and other physicochemical properties of
all specific 47 T-ORF8 protein variants, a collection of nine possible T-ORF8 unique variants were defined. The question
of whether T-ORF8 variants work similarly to ORF8 has yet to be investigated. A positive response to the question could
exacerbate future COVID-19 waves, necessitating severe containment measures.
Keywords: SARS-CoV-2, Truncated ORF8 (T-ORF8), Mutations, Continents, B.1.1.7 lineage.
Corresponding author
Email addresses: sarimif@gmail.com (Sk. Sarif Hassan), vaishnavikodakandla13@gmail.com (Vaishnavi Kodakandla), lradwan@kau.edu.sa
(Elrashdy M. Redwan), lundstromkenneth@gmail.com (Kenneth Lundstrom), pabitrapalchoudhury@gmail.com (Pabitra Pal Choudhury),
mohamedt1@uthscsa.edu (Tarek Mohamed Abd El-Aziz), kazuo.takayama@cira.kyoto-u.ac.jp (Kazuo Takayama),
ramesh.kandimalla@gmail.com (Ramesh Kandimalla), manavamos@gmail.com (Amos Lal), angel.serrano@ucv.es (
´
Angel Serrano-Aroca),
gkazad@patnauniversity.ac.in (Gajendra Kumar Azad), alaaj@yu.edu.jo (Alaa A. A. Aljabali), giorgio.palu@unipd.it (Giorgio Palu),
gchauhan@tec.mx (Gaurav Chauhan), pariseadadi@gmail.com (Parise Adadi), m.tambuwala@ulster.ac.uk (Murtaza Tambuwala),
brufskyam@upmc.edu (Adam M. Brufsky), wagner.baetas@gmail.com (Wagner Baetas-da-Cruz), dr.barh@gmail.com (Debmalya Barh),
nbazan@lsuhsc.edu (Nicolas G Bazan), vuversky@usf.edu (Vladimir N. Uversky)
Submitted to Computational and Structural Biotechnology Journal, Elsevier May 26, 2021
.CC-BY-NC-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 26, 2021. ; https://doi.org/10.1101/2021.05.25.445557doi: bioRxiv preprint

1. Introduction
The world is proceeding through an unprecedented time due to the Coronavirus disease 2019 (COVID-19), of which the
causative agent is the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) [1, 2, 3, 4, 5]. There are nine open
reading frames (ORFs), which encodes for accessory proteins important for the modulation of the metabolism in infected
host cells and innate immunity evasion via a complicated signalome and an interactome [6, 7, 8, 9, 10]. The ORF8 protein is
one of the most rapidly evolving accessory proteins among the beta coronaviruses, not only due to its ability to interfere with
host immune responses [11, 12, 13, 14]. It directly interacts with major histocompatibility complex class I (MHC-I) both
invitro and invivo, and is down-regulated, which impairs its ability to antigen presentation and rendering infected cells less
sensitive to lysis by cytotoxic T lymphocytes [15]. ORF8 suppresses type I interferon antiviral responses and interacts with
host factors involved in pulmonary inflammation and fibrogenesis [15, 16]. From all viral proteomes interacting with human
metalloproteome, the ORF8 interplay with 10 out 58 [17]. ORF8 (of SARS-CoV-2 and SARS-CoV) play crucial roles in
virus pathophysiological events, it dysregulates the TGF-β pathway, which is involved in tissue fibrosis [18]. The functional
implications of SARS-CoV-2 ORF8 had already gained huge attention and ORF8 is considered an important component of the
immune evasion machinery [11, 18, 19, 20]. The SARS-CoV-2 ORF8 protein has less than twenty percent amino acid sequence
homology with the SARS-CoV ORF8, and is a rapidly evolving protein [14, 21]. A molecular framework for understanding the
rapid evolution of ORF8, its contributions to COVID-19 pathogenesis, and the potential for its neutralization by antibodies
were supported by the structural analysis of the ORF8 protein [22, 23]. The crosstalk between viral (SARS-CoV-2 or
SARS-CoV) infections and host cell proteome at different levels may enable identification of distinct and common molecular
mechanisms [15]. Of note, SARS-CoV-2 ORF8 ORF8 not only interacts with a significant number of host proteome related
to endoplasmic reticulum quality control, glycosylation, and extracellular matrix organization, although the mechanism of
action of ORF8 concerning those interacting proteins is uncertain, so far [23, 24].
The clade S, a subtype of SARS-CoV-2, was identified to possess the mutation L84S in the ORF8 protein sequence
[25, 26, 27]. Presently, among many variants of SARS-CoV-2, the lineage B.1.1.7 carries a larger than usual number of
genetic changes [28, 29, 30]. Among many non-synonymous mutations, Q27STOP in the ORF8 protein contributed to
deduce the branch leading to lineage B.1.1.7 [31, 32]. The Q27STOP mutation inactivates ORF8 protein favoring further
downstream mutations and could be responsible for the increased transmissibility of the B.1.1.7 variant [28, 33]. The B.1.1.7
variant was found to be more transmissible than the wild-type SARS-CoV-2 and was first detected in September 2020 in
the UK [34, 35]. Further, it began to spread rapidly by mid-December, and is correlated with a significant increase in
SARS-CoV-2 infections in the UK and worldwide.
Functional implications on the immune surveillance of ORF8 due to the truncation at position 27 remain unclear [18].
Thus, it is of utmost importance to gain insight into the functionality of the truncated ORF8 protein variants to comprehend
the B.1.1.7 lineage through theoretical and experimental characterization and genomic surveillance worldwide [36]. The
present study was aimed to characterize the unique variations of truncated ORF8 proteins (T-ORF8) due to the Q27STOP
mutation. Further, this investigation differentiates a single T-ORF8 variant among 47 distinct unique T-ORF8 protein
variants present in SARS-CoV-2, worldwide as of May 20th, 2021. Several clusters of the unique T-ORF8 have been identified
based on various bioinformatics features and phylogenetic relationships, along with emerging variants of the unique T-ORF8.
2. Data acquisition and methods
Truncated ORF8 protein (T-ORF8) sequences (complete) from five continents (Asia, Africa, Europe, South America,
and North America) were downloaded in Fasta format (as of May 18, 2021) from the National Center for Biotechnology
Information (NCBI) database (http://www.ncbi.nlm.nih.gov/). Note that no T-ORF8 protein sequence was found from
Oceania as of May 18th, 2021. Further, Fasta files were processed in Matlab-2021a for extracting unique T-ORF8 sequences
for each continent.
2.1. Derivation of polar/non-polar sequences and associated phylogeny
Every amino acid in a given T-ORF8 sequence was identified as polar (Q) and non-polar (P). Thus, every unique T-ORF8
became a binary sequence with two symbols P and Q. Then sequence homology of these sequences was derived using the
Clustal Omega web-suite and then associated with nearest neighborhood phylogenetic relationship among the unique T-ORF8
variants. Further, unique T-ORF8 variants having distinct binary polar/non-polar sequences were extracted [37, 38].
2.2. Frequency distribution of amino acids and phylogeny
The frequency of each amino acid present in a T-ORF8 sequence was determined using standard bioinformatics routine in
Matlab-2021a. For each T-ORF8 protein, a twenty-dimensional frequency-vector considering the frequency of standard twenty
amino acids can be obtained. Based on this frequency distribution of amino acids several consequences were drawn. The
distance (Euclidean metric) between any two pairs of frequency vectors was calculated for each pair of T-ORF8 sequences.
By having the distance matrix, a phylogenetic relationship was developed based on the nearest neighbor-joining method using
the standard routine in Matlab-2021a [39, 40].
2
.CC-BY-NC-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 26, 2021. ; https://doi.org/10.1101/2021.05.25.445557doi: bioRxiv preprint

2.3. Amino acid conservation Shannon entropy
The degree of conservation of amino acids embedded in a T-ORF8 protein was obtained by the well-known information-
theoretic measure called Shannon entropy(SE)’. For each T-ORF8 protein, Shannon entropy of amino acid conservation over
the amino acid sequence of T-ORF8 protein was calculated using the following formula [39, 41]:
For a given T-ORF8 sequence of length l (here l = 26), the conservation of amino acids was calculated as follows:
SE =
20
X
i=1
p
s
i
log
20
(p
s
i
)
where p
s
i
=
k
i
l
; k
i
represents the number of occurrences of an amino acid s
i
in the T-ORF8 sequence [42].
2.4. Prediction of molecular and physicochemical properties
Theoretical pI (PI), extinction coefficient (EC), instability index (II), aliphatic index (AI), protein solubility (PS), grand
average of hydropathicity (GRAVY), and the number of tiny, small, aliphatic, aromatic, non-polar, polar, charged, basic and
acidic residues of all unique T-ORF8 proteins were calculated using the web-servers ’ProtParam’, ’Protein-sol’ and EMBOSS
Pepstats [43, 44, 45].
2.5. Intrinsic disorder analysis
All 47 T-ORF8 variants were subjected to the per-residue disorder analysis, for which PONDR-VSL2 algorithm was
employed [46]. This tool shows good performance on proteins containing both structure and disorder and was favorably
ranked in a recent Critical Assessment of protein Intrinsic Disorder prediction (CAID) experiment [47].
2.6. Finding functional motifs
The Eukaryotic Linear Motif (ELM) resource (http://elm.eu.org/) was used for finding functional sites in proteins [48].
ELMs (also known as short linear motifs (SLiMs)), are short protein interaction sites, which are commonly found in intrin-
sically disordered regions of proteins and define a wide range of protein functionality.
3. Results
Continent-wise, all unique T-ORF8 protein variants were segregated from a set of available truncated ORF8 protein
sequences collected from the NCBI database. Further, variability and commonality of the unique T-ORF8 proteins were
analyzed from various quantitative measures such as amino acid homology-based phylogeny, frequency distribution of amino
acids and associated phylogeny, polarity sequence-based phylogeny, and physicochemical properties. Relying on these features,
a set of nine possible unique T-ORF8 variants were identified, which were found to lie within the likelihood of a T-ORF8
variant named P15 (Table 3).
3.1. Characteristics of the unique variants of T-ORF8
For each continent, the number of total sequences, the unique truncated ORF8 (T-ORF8) sequences and percentages are
presented in Table 1.
Table 1: Frequency and percentages unique T-ORF8 variants (continent-wise)
Percentages of the unique T-ORF8 variants on continents
Continent Total T-ORF8 (T) Unique T-ORF8 (U) Percentage, continent-wise Percentage, worldwide
Africa 108 1 0.926 1.96
Asia 99 1 1.01 1.96
Europe 156 1 0.641 1.96
South America 1 1 100 1.96
North America 48691 47 0.096 92.16
Worldwide 49055 47 0.104
The results showed that 47 unique T-ORF8 proteins were present in North America. The unique T-ORF8 variants from
Africa, Asia, Europe, and South America were contained in the set of unique T-ORF8 variants available in North America.
Additionally, there were seven T-ORF8 with amino acid lengths 22, 24, 40 and 41 as of May 18, 2021 available in North
America (Table 2).
3
.CC-BY-NC-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 26, 2021. ; https://doi.org/10.1101/2021.05.25.445557doi: bioRxiv preprint

Table 2: Truncated ORF8 variants of length other than 26
Accession ID Length Date of collection Geo-location Remarks
QQX22250.1 22 20-10-2020 USA: KS
Identical sequence
QQX22346.1 22 24-09-2020 USA: MO
QVF74147.1 24 27-04-2021 USA: Colorado Worldwide frequency: 01
QRE01295.1 40 13-12-2020 USA: MD Worldwide frequency: 01
QQX21038.1 41 30-10-2020 USA: OK Worldwide frequency: 01
QLJ58176.1 41 09-04-2020 USA
Identical sequence
QLJ58236.1 41 16-04-2020 USA
Note that among the seven T-ORF8 sequences, only five were found to be unique as mentioned in Table 2. As of May
18, 2021 a single copy of the T-ORF8 proteins of amino acid lengths of 24 and 41 (Table 2) were found. There were two
T-ORF8 variants of 41 amino acids available in North America. The most frequent T-ORF8 proteins so far observed were
the T-ORF8 proteins of 26 amino acids. It was observed that the T-ORF8 arose due to truncation at the residue positions
23, 25, 27, 41, and 42 of the complete ORF8 protein (121 aa long sequence). We investigated the possible mutations for
such truncations. A snapshot of the amino acid residues and their possible mutations with respect to the reference sequence
NC 045512 is presented in Figure 1.
Figure 1: Possible mutations for truncation at 23, 25, 27, 40, and 42 residue position of ORF8 protein (N C 045512) of SARS-CoV-2.
Note that in four positions 23, 25, 27 and 40 amino acids Q and C both were truncated due to mutations at the first and
third position, respectively, of the respective codon. The amino acid Valine (V) was truncated due to three mutations at
the third, second and first positions of the codon ’GUG’. Furthermore, it was observed that the mutations at the positions
23 and 25 were identical (C to U) and the changes of bases were transition mutations i.e., pyrimidine (purine) to pyrimidine
(purine), whereas the changes of bases of the truncated mutations at positions 25 and 41 were transversal mutations i.e.
pyrimidine (purine) to purine (pyrimidine). For position 42, three sequences of mutations were hypothesized, taking place
at first, second, and third positions of the codon (GUG) i.e., transition mutations (purine to purine), transversal mutation
(pyrimidine to purine), and transversal mutation (purine to pyrimidine) respectively.
4
.CC-BY-NC-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 26, 2021. ; https://doi.org/10.1101/2021.05.25.445557doi: bioRxiv preprint

The list of unique T-ORF8 sequences of 26 amino acids with their representative accession IDs and sequences is presented
in Table 3.
Table 3: List of unique truncated ORF8 proteins and their representative accession IDs
Unique variants of truncated ORF8 proteins (worldwide)
Serial Name Representative Accession ID Unique T-ORF8 Sequence
P1 QVD87830.1 MKFHVFLGIITTVAAFHQECSLQSCT
P2 QUP01097.1 MKFLIFLGIITTVAAFHQECSLQSCT
P3 QUG18382.1 MKFLVFFGIITTVAAFHQECSLQSCT
P4 QVD86462.1 MKFLVFLEIITTVAAFHQECSLQSCT
P5 QVH28344.1 MKFLVFLGIIATVAAFHQECSLQSCT
P6 QVH31850.1 MKFLVFLGIIITVAAFHQECSLQSCT
P7 QVG09588.1 MKFLVFLGIIKTVAAFHQECSLQSCT
P8 QUM37110.1 MKFLVFLGIITPVAAFHQECSLQSCT
P9 QUW14113.1 MKFLVFLGIITTIAAFHQECSLQSCT
P10 QTZ13340.1 MKFLVFLGIITTLAAFHQECSLQSCT
P11 QUR40000.1 MKFLVFLGIITTVAAFHQDCSLQSCT
P12 QVG91448.1 MKFLVFLGIITTVAAFHQECSLQLCT
P13 QUM45811.1 MKFLVFLGIITTVAAFHQECSLQSCI
P14 QUU32993.1 MKFLVFLGIITTVAAFHQECSLQSCN
P15 QVG81736.1 MKFLVFLGIITTVAAFHQECSLQSCT
P16 QUD51009.1 MKFLVFLGIITTVAAFHQECSLQSFT
P17 QTS70520.1 MKFLVFLGIITTVAAFHQECSLQSRT
P18 QUU23055.1 MKFLVFLGIITTVAAFHQECSLQSST
P19 QVE77971.1 MKFLVFLGIITTVAAFHQERSLQSCT
P20 QTW55152.1 MKFLVFLGIITTVAAFHQEYSLQSCT
P21 QUS70793.1 MKFLVFLGIITTVAAFHQGCSLQSCT
P22 QUQ10187.1 MKFLVFLGIITTVAAFRQECSLQSCT
P23 QVH12765.1 MKFLVFLGIITTVAAFYQECSLQSCT
P24 QVH15024.1 MKFLVFLGIITTVAALHQECSLQSCT
P25 QVE01821.1 MKFLVFLGIITTVAASHQECSLQSCT
P26 QUX49158.1 MKFLVFLGIITTVAAVHQECSLQSCT
P27 QTJ05015.1 MKFLVFLGIITTVAVFHQECSLQSCT
P28 QUW13574.1 MKFLVFLGIITTVSAFHQECSLQSCT
P29 QVG29748.1 MKFLVFLGIITTVTAFHQECSLQSCT
P30 QUV63981.1 MKFLVFLGIIXTVAAFHQECSLQSCT
P31 QVE38306.1 MKFLVFLGITTTVAAFHQECSLQSCT
P32 QUX43061.1 MKFLVFLGTITTVAAFHQECSLQSCT
P33 QVH27673.1 MKFLVFLRIITTVAAFHQECSLQSCT
P34 QUL63530.1 MKFLVLLGIITTVAAFHQECSLQSCT
P35 QVE29502.1 MKLLVFLGIITTVAAFHQECSLQSCT
P36 QUV44185.1 MKSLVFLGIITTVAAFHQECSLQSCT
P37 QVH05963.1 MKFLVFLGIITTAAAFHQECSLQSCT
P38 QVD85995.1 MKFLVFLGIITTVAAFDQECSLQSCT
P39 QVD91055.1 MKFLVFLGIITTVAAFHQECSLRSCT
P40 QVI12553.1 MKFLVFLGIITTVAAFHQXCSLQSCT
P41 QVG37762.1 MKFLVFLGIITTVAAFNQECSLQSCT
P42 QVG91352.1 MKFLVFLGIITTVATFHQECSLQSCT
P43 QUX48812.1 MKFLVFLGIITTVVAFHQECSLQSCT
P44 QVE28267.1 MKFLVFLGIMTTVAAFHQECSLQSCT
P45 QVH31598.1 MKFLVFLVIITTVAAFHQECSLQSCT
P46 QVG23542.1 MKILVFLGIITTVAAFHQECSLQSCT
P47 QVF67630.1 MKFFVFLGIITTVAAFHQECSLQSCT
Further, it was found that the unique T-ORF8 variants from Africa, Asia, Europe and South America were identical with
relation to P15, as illustrated in Table 3.
The date of sample collection, geo-location and accession ID of the first identified SARS-CoV-2 containing unique T-ORF8
variants are presented in Table 4.
5
.CC-BY-NC-ND 4.0 International licensemade available under a
(which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is
The copyright holder for this preprintthis version posted May 26, 2021. ; https://doi.org/10.1101/2021.05.25.445557doi: bioRxiv preprint

Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the authors tracked how these mutations are emerging in SARS-CoV-2 proteins in six model countries and globally, considering the mutations having a frequency of detection of at least five hundred in each SARS CoV2 protein; they studied the countrywise percentage of invariant residues.

4 citations

Journal ArticleDOI
18 Sep 2021-Viruses
TL;DR: In this article, the authors investigated a SARS-CoV-2 outbreak in a local hospital and used nanopore sequencing with a modified ARTIC protocol employing 1200 bp long amplicons.
Abstract: Genomic surveillance of the SARS-CoV-2 pandemic is crucial and mainly achieved by amplicon sequencing protocols. Overlapping tiled-amplicons are generated to establish contiguous SARS-CoV-2 genome sequences, which enable the precise resolution of infection chains and outbreaks. We investigated a SARS-CoV-2 outbreak in a local hospital and used nanopore sequencing with a modified ARTIC protocol employing 1200 bp long amplicons. We detected a long deletion of 168 nucleotides in the ORF8 gene in 76 samples from the hospital outbreak. This deletion is difficult to identify with the classical amplicon sequencing procedures since it removes two amplicon primer-binding sites. We analyzed public SARS-CoV-2 sequences and sequencing read data from ENA and identified the same deletion in over 100 genomes belonging to different lineages of SARS-CoV-2, pointing to a mutation hotspot or to positive selection. In almost all cases, the deletion was not represented in the virus genome sequence after consensus building. Additionally, further database searches point to other deletions in the ORF8 coding region that have never been reported by the standard data analysis pipelines. These findings and the fact that ORF8 is especially prone to deletions, make a clear case for the urgent necessity of public availability of the raw data for this and other large deletions that might change the physiology of the virus towards endemism.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: The latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability are described.
Abstract: The EMBL-EBI provides free access to popular bioinformatics sequence analysis applications as well as to a full-featured text search engine with powerful cross-referencing and data retrieval capabilities. Access to these services is provided via user-friendly web interfaces and via established RESTful and SOAP Web Services APIs (https://www.ebi.ac.uk/seqdb/confluence/display/JDSAT/EMBL-EBI+Web+Services+APIs+-+Data+Retrieval). Both systems have been developed with the same core principles that allow them to integrate an ever-increasing volume of biological data, making them an integral part of many popular data resources provided at the EMBL-EBI. Here, we describe the latest improvements made to the frameworks which enhance the interconnectivity between public EMBL-EBI resources and ultimately enhance biological data discoverability, accessibility, interoperability and reusability.

3,529 citations

Journal ArticleDOI
David E. Gordon, Gwendolyn M. Jang, Mehdi Bouhaddou, Jiewei Xu, Kirsten Obernier, Kris M. White1, Matthew J. O’Meara2, Veronica V. Rezelj3, Jeffrey Z. Guo, Danielle L. Swaney, Tia A. Tummino4, Ruth Hüttenhain, Robyn M. Kaake, Alicia L. Richards, Beril Tutuncuoglu, Helene Foussard, Jyoti Batra, Kelsey M. Haas, Maya Modak, Minkyu Kim, Paige Haas, Benjamin J. Polacco, Hannes Braberg, Jacqueline M. Fabius, Manon Eckhardt, Margaret Soucheray, Melanie J. Bennett, Merve Cakir, Michael McGregor, Qiongyu Li, Bjoern Meyer3, Ferdinand Roesch3, Thomas Vallet3, Alice Mac Kain3, Lisa Miorin1, Elena Moreno1, Zun Zar Chi Naing, Yuan Zhou, Shiming Peng4, Ying Shi, Ziyang Zhang, Wenqi Shen, Ilsa T Kirby, James E. Melnyk, John S. Chorba, Kevin Lou, Shizhong Dai, Inigo Barrio-Hernandez5, Danish Memon5, Claudia Hernandez-Armenta5, Jiankun Lyu4, Christopher J.P. Mathy, Tina Perica4, Kala Bharath Pilla4, Sai J. Ganesan4, Daniel J. Saltzberg4, Rakesh Ramachandran4, Xi Liu4, Sara Brin Rosenthal6, Lorenzo Calviello4, Srivats Venkataramanan4, Jose Liboy-Lugo4, Yizhu Lin4, Xi Ping Huang7, Yongfeng Liu7, Stephanie A. Wankowicz, Markus Bohn4, Maliheh Safari4, Fatima S. Ugur, Cassandra Koh3, Nastaran Sadat Savar3, Quang Dinh Tran3, Djoshkun Shengjuler3, Sabrina J. Fletcher3, Michael C. O’Neal, Yiming Cai, Jason C.J. Chang, David J. Broadhurst, Saker Klippsten, Phillip P. Sharp4, Nicole A. Wenzell4, Duygu Kuzuoğlu-Öztürk4, Hao-Yuan Wang4, Raphael Trenker4, Janet M. Young8, Devin A. Cavero4, Devin A. Cavero9, Joseph Hiatt4, Joseph Hiatt9, Theodore L. Roth, Ujjwal Rathore9, Ujjwal Rathore4, Advait Subramanian4, Julia Noack4, Mathieu Hubert3, Robert M. Stroud4, Alan D. Frankel4, Oren S. Rosenberg, Kliment A. Verba4, David A. Agard4, Melanie Ott, Michael Emerman8, Natalia Jura, Mark von Zastrow, Eric Verdin4, Eric Verdin10, Alan Ashworth4, Olivier Schwartz3, Christophe d'Enfert3, Shaeri Mukherjee4, Matthew P. Jacobson4, Harmit S. Malik8, Danica Galonić Fujimori, Trey Ideker6, Charles S. Craik, Stephen N. Floor4, James S. Fraser4, John D. Gross4, Andrej Sali, Bryan L. Roth7, Davide Ruggero, Jack Taunton4, Tanja Kortemme, Pedro Beltrao5, Marco Vignuzzi3, Adolfo García-Sastre, Kevan M. Shokat, Brian K. Shoichet4, Nevan J. Krogan 
30 Apr 2020-Nature
TL;DR: A human–SARS-CoV-2 protein interaction map highlights cellular processes that are hijacked by the virus and that can be targeted by existing drugs, including inhibitors of mRNA translation and predicted regulators of the sigma receptors.
Abstract: A newly described coronavirus named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), which is the causative agent of coronavirus disease 2019 (COVID-19), has infected over 2.3 million people, led to the death of more than 160,000 individuals and caused worldwide social and economic disruption1,2. There are no antiviral drugs with proven clinical efficacy for the treatment of COVID-19, nor are there any vaccines that prevent infection with SARS-CoV-2, and efforts to develop drugs and vaccines are hampered by the limited knowledge of the molecular details of how SARS-CoV-2 infects cells. Here we cloned, tagged and expressed 26 of the 29 SARS-CoV-2 proteins in human cells and identified the human proteins that physically associated with each of the SARS-CoV-2 proteins using affinity-purification mass spectrometry, identifying 332 high-confidence protein–protein interactions between SARS-CoV-2 and human proteins. Among these, we identify 66 druggable human proteins or host factors targeted by 69 compounds (of which, 29 drugs are approved by the US Food and Drug Administration, 12 are in clinical trials and 28 are preclinical compounds). We screened a subset of these in multiple viral assays and found two sets of pharmacological agents that displayed antiviral activity: inhibitors of mRNA translation and predicted regulators of the sigma-1 and sigma-2 receptors. Further studies of these host-factor-targeting agents, including their combination with drugs that directly target viral enzymes, could lead to a therapeutic regimen to treat COVID-19. A human–SARS-CoV-2 protein interaction map highlights cellular processes that are hijacked by the virus and that can be targeted by existing drugs, including inhibitors of mRNA translation and predicted regulators of the sigma receptors.

3,319 citations

Journal ArticleDOI
TL;DR: The basic virology of SARS-CoV-2 is described, including genomic characteristics and receptor use, highlighting its key difference from previously known coronaviruses.
Abstract: Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly transmissible and pathogenic coronavirus that emerged in late 2019 and has caused a pandemic of acute respiratory disease, named ‘coronavirus disease 2019’ (COVID-19), which threatens human health and public safety. In this Review, we describe the basic virology of SARS-CoV-2, including genomic characteristics and receptor use, highlighting its key difference from previously known coronaviruses. We summarize current knowledge of clinical, epidemiological and pathological features of COVID-19, as well as recent progress in animal models and antiviral treatment approaches for SARS-CoV-2 infection. We also discuss the potential wildlife hosts and zoonotic origin of this emerging virus in detail. In this Review, Shi and colleagues summarize the exceptional amount of research that has characterized acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and coronavirus disease 2019 (COVID-19) since this virus has swept around the globe. They discuss what we know so far about the emergence and virology of SARS-CoV-2 and the pathogenesis and treatment of COVID-19.

2,904 citations

Journal ArticleDOI
09 Apr 2021-Science
TL;DR: Using a variety of statistical and dynamic modeling approaches, the authors estimate that this variant has a 43 to 90% (range of 95% credible intervals, 38 to 130%) higher reproduction number than preexisting variants, and a fitted two-strain dynamic transmission model shows that VOC 202012/01 will lead to large resurgences of COVID-19 cases.
Abstract: A severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) variant, VOC 202012/01 (lineage B.1.1.7), emerged in southeast England in September 2020 and is rapidly spreading toward fixation. Using a variety of statistical and dynamic modeling approaches, we estimate that this variant has a 43 to 90% (range of 95% credible intervals, 38 to 130%) higher reproduction number than preexisting variants. A fitted two-strain dynamic transmission model shows that VOC 202012/01 will lead to large resurgences of COVID-19 cases. Without stringent control measures, including limited closure of educational institutions and a greatly accelerated vaccine rollout, COVID-19 hospitalizations and deaths across England in the first 6 months of 2021 were projected to exceed those in 2020. VOC 202012/01 has spread globally and exhibits a similar transmission increase (59 to 74%) in Denmark, Switzerland, and the United States.

1,935 citations

Journal ArticleDOI
TL;DR: The epidemiology, clinical manifestations, diagnosis, treatments and preventions of this new type of coronavirus Covid-19 are aggregated and consolidates.

890 citations

Related Papers (5)
Frequently Asked Questions (2)
Q1. What are the contributions in "An issue of concern: unique truncated orf8 protein variants of sars-cov-2" ?

In the present study, 47 unique truncated ORF8 proteins ( T-ORF8 ) due to the Q27STOP mutations were identified among 49055 available B. 1. 1. 7 SARS-CoV-2 sequences. 

In Colorado, one T-ORF8 variant of length of 24 amino acids was noticed very recently on April 24, 2021, and this variant is likely to spread further in the future. After Europe, Maryland was the first US state to notice the first B. 1. 1. 7 variant with the T-ORF8 P15, but although later this strain remained limited in Maryland it spread further over to other states, such as Florida and Minnesota ( Table 5 ). A systematic analysis of its peptide map to determine the effects of these mutations/truncations on the diagnostic potential of the ant-ORF8 antibodies.