scispace - formally typeset
Search or ask a question
Posted ContentDOI

Genomic epidemiology of a densely sampled COVID19 outbreak in China

TL;DR: An analysis of 20 whole SARS-CoV 2 genomes from a single relatively small and geographically constrained outbreak in Weifang, People's Republic of China finds that these estimates are consistent with reported cases and there is unlikely to be a large undiagnosed burden of infection over the period the authors studied.
Abstract: Analysis of genetic sequence data from the pandemic SARS Coronavirus 2 can provide insights into epidemic origins, worldwide dispersal, and epidemiological history. With few exceptions, genomic epidemiological analysis has focused on geographically distributed data sets with few isolates in any given location. Here we report an analysis of 20 whole SARS-CoV 2 genomes from a single relatively small and geographically constrained outbreak in Weifang, People9s Republic of China. Using Bayesian model-based phylodynamic methods, we estimate the reproduction number for the outbreak to be 1.99(95% CI:1.48-3.14). We further estimate the number of infections through time and compare these estimates to confirmed diagnoses by the Weifang Centers for Disease Control. We find that these estimates are consistent with reported cases and there is unlikely to be a large undiagnosed burden of infection over the period we studied.

Summary (3 min read)

1. Introduction

  • The authors report a genomic epidemiological analysis of one of the first geographically concentrated community transmission samples of SARS-CoV-2 genetic sequences collected outside of the initial outbreak in Wuhan, China.
  • These data comprise 20 wholegenome sequences from confirmed COVID-19 cases in Weifang, Shandong Province, People’s Republic of China.
  • The data were collected over the course of several weeks up to 10 February 2020, and overlap with a period of intensifying public health and social distancing measures.
  • In contrast to the early spread of COVID-19 in Hubei Province of China, most community transmissions within Weifang took place after these measures were put in place.
  • Using an adaptation of these methods, and based on the local genetic data available, the objective of this study is to evaluate the growth rate and reproduction number in Weifang after seeding events that took place in mid to late January, 2020.

2.1 Epidemiological investigation, sampling and genetic sequencing

  • As of 10 February 2020, 136 suspected cases and 214 close contacts were diagnosed by Weifang Center for Disease Control and Prevention; of these, 38 cases were confirmed positive with SARS-CoV-2.
  • Concentration of RNA samples was measured by the Qubit RNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA).
  • The remaining RNA was used to construct the single-stranded circular DNA library with the MGIEasy RNA Library preparation reagent set (MGI, Shenzhen, China).
  • The Coronaviridae-like reads of samples with >100 average sequencing depth across the SARS-CoV-2 genome were subsampled to achieve 100 sequencing depth before being assembled.

2.2.1 Nonlinear epidemiological dynamics in Weifang

  • The maximum number of daily confirmed COVID-19 cases occurred on February 5, but it is unknown when the maximum prevalence of infection occurred.
  • To capture a nonlinear decrease in cases following epidemic peak, and to account for a realistic distribution of generation times, the authors use an extension of the susceptible- exposed-infectious-recovered (SEIR) model (Keeling and Rohani 2011) for epidemic dynamics in Weifang, shown in Equations (1–5).

2.2.2 Variance in transmission rates

  • To estimate total numbers infected, the phylodynamic model must account for epidemiological variables which are known to significantly influence genetic diversity (Lloyd-Smith et al. 2005).
  • Foremost among these is the variance in offspring distribution (number of transmissions per primary case).
  • High variance of transmission rates will reduce genetic diversity of a sample and failure to account for this factor will lead to highly biased estimates of epidemic size (Li et al. 2017).
  • The authors therefore elaborate the SEIR model with an additional compartment J which has a higher transmission rate (s -fold higher) than the I compartment.
  • Upon leaving the incubation period, individuals progress to the J compartment with probability ph, or otherwise to I.

2.2.3 Importation of lineages from Wuhan

  • The outbreak in Weifang was seeded by multiple lineages imported at various times from the rest of China.
  • The equation governing this population is: _Y tð Þ ¼ q lð ÞYðtÞ 6 Migration only depends on the size of variables in the Weifang compartment and thus does not influence epidemic dynamics; it will only influence the inferred probability that a lineage resides within Weifang.

2.2.4 Model fitting

  • Other parameters are fixed based on prior information.
  • This dispersion is similar to values estimated for the 2003 SARS epidemic (Lloyd-Smith et al. 2005).
  • The SEIR model dynamics begin on 10 January.
  • An exploration of this parameterisation is discussed in Supplementary Information Section 4.1.

3. Phylogenetic analysis

  • Using, MAFFT (Katoh and Standley 2013), the authors aligned the 20 Weifang sequences with a previous alignment of 57 nonidentical SARS-CoV 2 sequences from outside of Weifang, hereby the ’reference set’ (Volz et al. 2020).
  • The distribution of sample dates from inside and outside of Weifang is shown in Fig. 1B.
  • Bayesian phylogenetic analysis was carried out using BEAST 2.6.1 (Bouckaert et al. 2019) with a HKYþG4 substitution model and a strict molecular clock.
  • In order to demonstrate the added utility of the sequence data, the analysis was repeated assuming a constant likelihood, that is sampling only from the prior probability distributions.
  • Code to replicate this analysis and BEAST XML files can be found at https://github.com/emvolz/weifang-sarscov2.

4. Results

  • The number of confirmed cases by Weifang CDC show that the outbreak peaked early and the maximum number of cases occurred on 5 February.
  • The authors estimate the peak of daily infections in late January, preceding the time series of confirmed cases by about a week; this is expected due to delays from infection to appearance of symptoms and delays from symptoms to diagnosis.
  • The authors detect a significant decrease in effective reproduction number as the epidemic progressed, during a period (late January) when Weifang was implementing a variety of public health interventions and contact tracing to limit epidemic spread.
  • There is correspondingly low confidence in tree topology (Supplementary Fig. S3), and only two monophyletic Weifang clades had greater than 50 per cent posterior probability, neither of which are larger than two samples.
  • These dates cover a similar range as the posterior TMRCA of all Weifang sequences (Supplementary Fig. S4).

5. Discussion

  • The authors analysis of 20 SARS-CoV-2 genomes has confirmed independent observations regarding the rate of spread and burden of infection in Weifang, China.
  • Analysis of genetic sequence data provides an alternative source of information about epidemic size.
  • The authors do not find evidence for a large hidden burden of infection within Weifang, with an estimated total number of cases around 365 (102–1174) at the date of last sample, towards the end of the outbreak.
  • (B) Daily estimated infections through time compared to daily reported cases (yellow points).

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

Genomic epidemiology of a densely sampled COVID-19
outbreak in China
Lily Geidelberg,
1,
*
Olivia Boyd,
1
David Jorgensen,
1
Igor Siveroni,
1,
**
Fabrı´cia F. Nascimento,
1
Robert Johnson,
1
Manon Ragonnet-Cronin,
1
Han Fu,
1
Haowei Wang,
1
Xiaoyue Xi,
2
Wei Chen,
3
Dehui Liu,
3
Yingying Chen,
3
Mengmeng Tian,
3
Wei Tan,
4
Junjie Zai,
5
Wanying Sun,
6
Jiandong Li,
6
Junhua Li,
6
Erik M. Volz
1,
*
,††
Xingguang Li
7,
*
,†,‡‡
and
Qing Nie
3,
*
,‡,‡‡
1
Department of Infectious Disease Epidemiology and MRC Centre for Global Infectious Disease Analysis,
Imperial College London, Norfolk Place W2 1PG, UK,
2
Department of Mathematics, Imperial College London,
London SW7 2AZ, UK,
3
Department of Microbiology, Weifang Center for Disease Control and Prevention,
Weifang 261061, China,
4
Department of Respiratory Medicine, Weifang People’s Hospital, Weifang 261061,
China,
5
Immunology Innovation Team, School of Medicine, Ningbo University, Ningbo 315211, China,
6
Shenzhen Key Laboratory of Unknown Pathogen Identification, BGI-Shenzhen, Shenzhen 518083, China and
7
Department of Hospital Office, The First People’s Hospital of Fangchenggang, Fangchenggang, 538021, China
*Corresponding author: l.geidelberg@imperial.ac.uk (LG); e.volz@imperial.ac.uk (EV); nieqing0454@163.com (QN); xingguanglee@hotmail.com (XL)
Present address: Department of Hospital Office, The First People’s Hospital of Fangchenggang, No. 23, Fangqin Road, Fangchenggang, 538021, China.
Present address: Hubei Engineering Research Center of Viral Vector, Wuhan University of Bioengineering, Wuhan, 430415, China.
§
https://orcid.org/0000-0002-8057-1844
**https://orcid.org/0000-0003-2595-3062
††
https://orcid.org/0000-0001-6268-8937
‡‡
These authors contributed equally to this work.
Abstract
Analysis of genetic sequence data from the SARS-CoV-2 pandemic can provide insights into epidemic origins, worldwide
dispersal, and epidemiological history. With few exceptions, genomic epidemiological analysis has focused on geographi-
cally distributed data sets with few isolates in any given location. Here, we report an analysis of 20 whole SARS- CoV-2
genomes from a single relatively small and geographically constrained outbreak in Weifang, People’s Republic of China.
Using Bayesian model-based phylodynamic methods, we estimate a mean basic reproduction number (R
0
) of 3.4 (95% high-
est posterior density interval: 2.1–5.2) in Weifang, and a mean effective reproduction number (R
t)
that falls below 1 on 4
February. We further estimate the number of infections through time and compare these estimates to confirmed diagnoses
by the Weifang Centers for Disease Control. We find that these estimates are consistent with reported cases and there is un-
likely to be a large undiagnosed burden of infection over the period we studied.
V
C
The Author(s) 2021. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/),
which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
1
Virus Evolution, 2021, 7(1): veaa102
doi: 10.1093/ve/veaa102
Research Article
Downloaded from https://academic.oup.com/ve/article/7/1/veaa102/6170691 by Imperial College London Library user on 29 March 2021

Key words: SARS-CoV-2; phylodynamics; phylogenetics; genetic epidemiology; structured coalescent; modelling.
1. Introduction
We report a genomic epidemiological analysis of one of the first
geographically concentrated community transmission samples
of SARS-CoV-2 genetic sequences collected outside of the initial
outbreak in Wuhan, China. These data comprise 20 whole-
genome sequences from confirmed COVID-19 cases in Weifang,
Shandong Province, People’s Republic of China. The data were
collected over the course of several weeks up to 10 February
2020, and overlap with a period of intensifying public health
and social distancing measures. These interventions included
public health messaging, establishing phone hot-lines, encour-
aging home isolation for recent visitors from Wuhan (January
23–26), optimising triage of suspected cases in hospitals
(January 24), travel restrictions (January 26), extending school
closures, and establishing ‘fever clinics’ for consultation and di-
agnosis (January 27) (Mao 2020). In contrast to the early spread
of COVID-19 in Hubei Province of China, most community trans-
missions within Weifang took place after these measures were
put in place.
Model-based phylodynamic methods have been previously
used to analyse sequence data from Wuhan and exported inter-
national cases (Volz et al. 2020). Using an adaptation of these
methods, and based on the local genetic data available, the ob-
jective of this study is to evaluate the growth rate and reproduc-
tion number in Weifang after seeding events that took place in
mid to late January, 2020. A secondary aim is to provide esti-
mates of the epidemiological trajectory of the Weifang outbreak
and comparing them to confirmed diagnosed COVID-19 cases
reported by Weifang Centers for Disease Control (CDC), to ex-
plore whether there was a significant unmeasured burden of in-
fection due to imperfect case ascertainment from mild or
asymptomatic illness.
2. Methods and materials
2.1 Epidemiological investigation, sampling and genetic
sequencing
As of 10 February 2020, 136 suspected cases and 214 close con-
tacts were diagnosed by Weifang Center for Disease Control and
Prevention; of these, 38 cases were confirmed positive with
SARS-CoV-2. The median age of patients was 36 (range: 6–75).
Two of twenty patients suffered severe or critical illness.
Viral RNA was extracted using the Maxwell 16 Viral Total
Nucleic Acid Purification Kit (Promega AS1150) with the mag-
netic bead method, and the RNeasy Mini Kit (QIAGEN 74104)
with the column method. Quantitative reverse transcription po-
lymerase chain reaction (RT-qPCR) was carried out using the
2019 novel coronavirus nucleic acid detection kit (BioGerm,
Shanghai, China) to confirm the presence of SARS-CoV-2 viral
RNA with cycle threshold (Ct) values ranging from 17 to 34, tar-
geting the highly conservative region (ORF1ab/N gene) in the
SARS-CoV-2 genome.
Concentration of RNA samples was measured by the Qubit
RNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA,
USA). The enzyme DNase was used to remove host DNA. The
remaining RNA was used to construct the single-stranded circu-
lar DNA library with the MGIEasy RNA Library preparation re-
agent set (MGI, Shenzhen, China). Purified RNA was then
fragmented. Using these short fragments as templates, random
hexamers were used to synthesise the first-strand cDNA and
then the second strand. Using the short double-strand DNA, a
DNA library was constructed through end repair, adaptor liga-
tion and PCR amplification. PCR products were transformed into
a single-strand circular DNA library through DNA-denaturation
and circularisation. DNA nanoballs (DNBs) were generated with
the single-strand circular DNA library by rolling circle replica-
tion. The DNBs were loaded into the flow cell and pair-end
100 bp sequencing was performed on DNBSEQ-T7 platform 8
(MGI, Shenzhen, China). Twenty genomes were assembled with
length from 26,840 to 29,882 nucleotides.
Total reads were first processed using Kraken v0.10.5 (de-
fault parameters) with a self-built database of Coronaviridae
genomes (including SARS, MERS, and SARS-CoV-2 genome
sequences downloaded from GISAID, NCBI, and CNGB) to iden-
tify Coronaviridae-like reads. To remove low-quality reads,
duplications and adaptor contaminations, fastp v0.19.5 (param-
eters: -q 20-u 20 -n 1 -l 50) and SOAPnuke v1.5.6 (parameters: -l
20 -q 0.2 -E 50 -n 0.02 -5 0 -Q 2 -G -d) were used. The
Coronaviridae-like reads of samples with <100 average se-
quencing depth were directly assembled de novo with SPAdes
v3.14.0 using default settings. The Coronaviridae-like reads of
samples with >100 average sequencing depth across the
SARS-CoV-2 genome were subsampled to achieve 100 se-
quencing depth before being assembled.
The 20 Weifang sequences have mean 1.1 per cent N content
and are deposited in GISAID (gisaid.org).
2.2 Mathematical model
The phylodynamic model is designed to account for 1, nonlin-
ear epidemic dynamics in Weifang with a realistic course of in-
fection (incubation and infectious periods), 2, variance in
transmission rates that can influence epidemic size estimates,
and 3, migration of lineages in and out of Weifang.
2.2.1 Nonlinear epidemiological dynamics in Weifang
The maximum number of daily confirmed COVID-19 cases oc-
curred on February 5, but it is unknown when the maximum
prevalence of infection occurred. To capture a nonlinear de-
crease in cases following epidemic peak, and to account for a re-
alistic distribution of generation times, we use an extension of
the susceptible- exposed-infectious-recovered (SEIR) model
(Keeling and Rohani 2011) for epidemic dynamics in Weifang,
shown in Equations (1–5).
2.2.2 Variance in transmission rates
To estimate total numbers infected, the phylodynamic model
must account for epidemiological variables which are known to
significantly influence genetic diversity (Lloyd-Smith et al.
2005). Foremost among these is the variance in offspring distri-
bution (number of transmissions per primary case). We draw on
previous evidence based on the previous SARS epidemic, which
indicates that the offspring distribution is highly over-
dispersed. High variance of transmission rates will reduce ge-
netic diversity of a sample and failure to account for this factor
will lead to highly biased estimates of epidemic size (Li et al.
2017). Recent analyses of sequence data drawn primarily from
Wuhan have found that high over-dispersion was required for
estimated cases to be consistent with the epidemiological
2|Virus Evolution , 2021, Vol. 7, No. 1
Downloaded from https://academic.oup.com/ve/article/7/1/veaa102/6170691 by Imperial College London Library user on 29 March 2021

record (Volz et al. 2020). Models assuming low variance in trans-
mission rates between people would generate estimates of
cases that are lower than the known number of confirmed
cases. Separately, Endo (2020) found that high over-dispersion
is required to reconcile estimated reproduction numbers with
the observed frequency of international outbreaks. We there-
fore elaborate the SEIR model with an additional compartment J
which has a higher transmission rate (s -fold higher) than the I
compartment.
The variance of the implied offspring distribution is cali-
brated to give a similar over-dispersion to that of the SARS epi-
demic. Upon leaving the incubation period, individuals progress
to the J compartment with probability p
h,
or otherwise to I. The
model is implemented as a system of ordinary differential
equations:
_
St
ðÞ
¼ bIt
ðÞ
þ bsJt
ðÞ

SðtÞ
St
ðÞ
þ Et
ðÞ
þ It
ðÞ
þ Jt
ðÞ
þ RðtÞ
1
_
Et
ðÞ
¼ bIt
ðÞ
þ bsJt
ðÞ

St
ðÞ
St
ðÞ
þ Et
ðÞ
þ It
ðÞ
þ Jt
ðÞ
þ Rt
ðÞ
c
0
Et
ðÞ
2
_
It
ðÞ
¼ c
0
1 p
h
ðÞ
Et
ðÞ
c
1
IðtÞ 3
_
Jt
ðÞ
¼ c
0
p
h
Et
ðÞ
c
1
JðtÞ 4
_
Rt
ðÞ
¼ c
1
It
ðÞ
þ Jt
ðÞðÞ
5
2.2.3 Importation of lineages from Wuhan
The outbreak in Weifang was seeded by multiple lineages
imported at various times from the rest of China. We therefore
account for location of sampling in our model. Migration is
modelled as a bi-directional process with rates proportional to
epidemic size in Weifang. The larger reservoir of COVID-19
cases outside of Weifang (Y (t)) serves as a source of new infec-
tions and is assumed to be growing exponentially (at rate q)
over this time period.
The equation governing this population is:
_
Yt
ðÞ
¼ q l
ðÞ
YðtÞ 6
Migration only depends on the size of variables in the
Weifang compartment and thus does not influence epidemic
dynamics; it will only influence the inferred probability that a
lineage resides within Weifang. For compartment X (E, I,orJ) , g
is the per-lineage rate of migration out of Weifang, and the total
rate of migration in and out of Weifang is gX.
2.2.4 Model fitting
During phylodynamic model fitting g, b and q are estimated.
Additionally, we estimate initial sizes of Y, E, and S. Initial val-
ues of I, J, and R are fixed at 0. Other parameters are fixed based
on prior information. We fix 1/c
0
¼ 4.1 days and 1/c
1
¼ 3.8 days
(Volz et al. 2020). We set p
h
¼ 0.20 and s ¼ 74 which yields a dis-
persion of the reproduction number that matches a negative bi-
nomial distribution with k ¼ 0.124 for any value of R
0
between 2
and 5. This dispersion is similar to values estimated for the 2003
SARS epidemic (Lloyd-Smith et al. 2005).
The phylodynamic model is illustrated in Fig. 1A as a flow-
chart. The SEIR model dynamics begin on 10 January.
It is important to note that the S compartment does not ex-
plicitly represent the number of susceptibles in Weifang, but
rather it is used as a simple parameterisation to permit R
t
to de-
crease, and for epidemic control to be achieved. An exploration
of this parameterisation is discussed in Supplementary
Information Section 4.1.
3. Phylogenetic analysis
Using, MAFFT (Katoh and Standley 2013), we aligned the 20
Weifang sequences with a previous alignment of 57 non-
identical SARS-CoV 2 sequences from outside of Weifang,
hereby the ’reference set’ (Volz et al. 2020). The reference set
was sampled from the GISAID database (Elbe and Buckland-
Merrett 2017) downloaded on June 7, 2020, and explicitly in-
cluded close genetic matches to sequences from Weifang. An
upper bound at 1 May on the date of sampling was placed. The
distribution of sample dates from inside and outside of Weifang
is shown in Fig. 1B. Of the 57 sequences in the reference set,
20 (35%) were sampled from China.
Maximum likelihood analysis was carried using IQTree
(Minh et al. 2019) with a HKYþG4 substitution model, and a
time-scaled tree was estimated using treedater 0.5.0 (Volz and
Frost 2017).
Bayesian phylogenetic analysis was carried out using BEAST
2.6.1 (Bouckaert et al. 2019) with a HKYþG4 substitution model
and a strict molecular clock. The phylodynamic model was
implemented using the PhyDyn package v1.3.7 (Volz and
Siveroni 2018) using the QL likelihood approximation and the
RK ODE solver. The model was fitted by running 8 MCMC chains
of 30 million steps in parallel, and combining chains after re-
moving 50 per cent burn-in. In order to demonstrate the added
utility of the sequence data, the analysis was repeated assum-
ing a constant likelihood, that is sampling only from the prior
probability distributions.
A
B
Figure 1. Epidemiological model and sample times. (A) A diagram representing
the structure of the epidemiological SEIR model which was fitted in tandem
with the time scaled phylogeny. Infected and infectious individuals may occupy
a low (I) or a high (J) transmission rate state to account for high dispersion of the
reproduction number. (B) Sampling density of sequences from inside (yellow)
and outside (grey) of Weifang respectively through time.
L. Geidelberg et al. | 3
Downloaded from https://academic.oup.com/ve/article/7/1/veaa102/6170691 by Imperial College London Library user on 29 March 2021

The ggtree package was used for all phylogeny visualisations
(Yu et al. 2017).
Code to replicate this analysis and BEAST XML files can be
found at https://github.com/emvolz/weifang-sarscov2.
4. Results
Despite an initial rapid increase in confirmed cases in Weifang
in late January and early February, the number of confirmed
cases by Weifang CDC show that the outbreak peaked early and
the maximum number of cases occurred on 5 February.
Phylodynamic analysis supports the interpretation that control
efforts reduced epidemic growth rates and contributed to even-
tual control. Estimates of the epidemiological parameters are
summarised in Table 1.
The estimated cumulative and daily number of infections
are shown in Fig. 2A and B, respectively. We estimate the peak
of daily infections in late January, preceding the time series of
confirmed cases by about a week; this is expected due to delays
from infection to appearance of symptoms and delays from
symptoms to diagnosis. The genetic data are strongly informa-
tive about timing and size of the epidemic peak: trajectories
sampled from the Bayesian prior distribution have a smaller
and later epidemic peak (c.f. Fig. 2) with much less precision.
Our central estimate for the cumulative number infected on 10
February is 365 (highest posterior density (HPD) 102–1174), com-
pared with 38 cumulative confirmed cases. We therefore esti-
mate that around 10 per cent of infections were diagnosed
(Supplementary Fig. S5), an unknown proportion of infections
will be missed by the surveillance system due to very mild, sub-
clinical or asymptomatic infection. This supports the hypothe-
sis that there was a modest (but not large) burden of infection in
Weifang over the period that the sequence data were sampled.
Effective reproduction number over time is shown in Fig. 2C.
We estimate R
0
¼ 3.4 (95% HPD: 2.1–5.2) and the initial growth
rate in cases was approximately 22 per cent per day, consistent
with those estimated in other settings and during the early epi-
demic in Wuhan (Alimohamadi et al. 2020). Sampling from the
prior yields a much higher estimate for R
0
with an unrealistic
HPD upper bound over 10. We detect a significant decrease in ef-
fective reproduction number as the epidemic progressed, during
a period (late January) when Weifang was implementing a vari-
ety of public health interventions and contact tracing to limit
epidemic spread. Our central estimate of R
t
drops below 1 on
the 4th of February.
Although previous studies have shown the significance of
realistic modelling for fidelity of phylogenetic inference (Moller
et al. 2018), our analysis has found that the phylodynamic prior
did not greatly influence estimated molecular clock rate or in-
ferred time to most recent common ancestors (TMRCAs). This is
likely due to our choice of reference sequence set, which com-
prised sequences spanning several months of the epidemic,
and therefore reflecting a range of transmission dynamics.
In this analysis, there is a mean of three pairwise differences
among sequences from Weifang; the corresponding number
among the sequences outside of Weifang is eight.
Figure 3 shows the estimated time-scaled maximum clade
credibility (MCC) tree including 20 lineages sampled from dis-
tinct patients in Weifang and 57 genomes sampled from Wuhan
and internationally.
There is correspondingly low confidence in tree topology
(Supplementary Fig. S3), and only two monophyletic Weifang
clades had greater than 50 per cent posterior probability, neither
of which are larger than two samples.
The earliest Weifang sequence was sampled on 25 January
from a patient who first showed symptoms on 16 January.
These dates cover a similar range as the posterior TMRCA of all
Weifang sequences (Supplementary Fig. S4).
5. Discussion
Our analysis of 20 SARS-CoV-2 genomes has confirmed inde-
pendent observations regarding the rate of spread and burden
of infection in Weifang, China. Surveillance of COVID-19 is ren-
dered difficult by high proportions of illness with mild severity
and an unknown proportion of asymptomatic infection (Guan
2019). The extent of under-reporting and case ascertainment
rates has been widely debated. Analysis of genetic sequence
data provides an alternative source of information about epi-
demic size. We do not find evidence for a large hidden burden
of infection within Weifang, with an estimated total number of
cases around 365 (102–1174) at the date of last sample, towards
the end of the outbreak.
Our decreasing central estimate of R
t
over time, falling below
1 on 4 February, suggests a slower rate of spread outside of
Wuhan and effective control strategies implemented in late
January. It is consistent with a previous modelling study of
Shandong province (Zhang et al. 2020), which showed that R
t
fell below 1 on 29 January. Our posterior molecular clock rate
shown in Table 1 is consistent with previous estimates of SARS-
CoV-2 phylogenetic analyses (Nie et al. 2020).
The modest number of sequences from Weifang (twenty) is
a limitation of this study. However, this represents a significant
proportion of the total number of cases reported; there were
Table 1. Summary of primary epidemiological and evolutionary parameters, including Bayesian prior distributions and estimated posteriors.
Parameter Prior Posterior mean 95% HPD
Initial infected Exponential (mean ¼ 1) 4.8 1.3–10.1
Initial susceptible Exponential (mean ¼ 500) 550 117–1501
Migration rate
a
Exponential (mean ¼ 10) 1.68 1.03–1.99
Transmission rate Log-normal (mean log ¼ 3.21, SD log ¼ 0.5) 21.5 13.0–32.1
Reproduction number Log-normal (mean log ¼ 1.03, SD log ¼ 0.5) 3.4 2.1–5.2
Molecular clock rate
b
Uniform (0.0007,0.003) 0.0013 0.00098–0.0017
Transition/transversion Log-normal (mean log ¼ 1, SD log ¼ 1.25) 4.6 3.3–6.5
Gamma shape Exponential (mean ¼ 1) 0.29 0.0070–1.50
Posterior uncertainty is summarised using a 95 per cent HPD interval.
a
Units: Migrations per lineage per year.
b
Units: Substitutions per site per year.
4|Virus Evolution , 2021, Vol. 7, No. 1
Downloaded from https://academic.oup.com/ve/article/7/1/veaa102/6170691 by Imperial College London Library user on 29 March 2021

thirty-eight confirmed cases at the date of the last genetic sam-
ple (10 February), rising no further than forty-four from 16
February onwards (Fig. 2A). Despite relatively few sequences,
our estimated trajectories display uncertainties that are signifi-
cantly reduced and more realistic, compared with sampling
only from the prior.
Further, it is possible that the outbreak observed in Weifang
could be due not to community transmission, but rather multi-
ple importations. However, given that we sampled the reference
set from a GISAID database downloaded in June, it is reasonable
to assume close genetic matches would have been chosen. A
maximum-likelihood tree of the entire alignment
(Supplementary Fig. S1) shows that lineages from Weifang have
common ancestry with other Chinese lineages at two distinct
polytomies and the phylogeny alone gives no information about
location of these nodes (Weifang or exogenous). We therefore
conclude that the MCC in Fig. 3, which reflects significant clus-
tering of the Weifang samples, is reasonable.
Community transmission is further supported by the fact
that cases were identified via contact tracing. This forms an-
other limitation, as it suggests non-random sampling of cases
in Weifang. This could lead to an underestimate of the total
number of cases in Weifang. However, as a large proportion of
reported cases were included in this analysis, the bias is un-
likely to be too significant.
Finally, the SEIR model structure also presents some limita-
tions. As b has a constant value, R
t
can decrease only as a result
of depleting susceptibles. The decrease in R
t
is therefore a
Figure 2. Epidemiological trajectory of the Weifang SARS-CoV-2 epidemic in 2020 when fitting the SEIR model to genetic data (blue) and sampling only from prior
(grey). Solid lines and shaded area reflect posterior median and 95 per cent HPD. The vertical dashed line represents the date of the last sequence sampled in Weifang.
(A) Cumulative estimated infections through time compared with cumulative cases (yellow points) reported by Weifang CDC. (B) Daily estimated infections through
time compared to daily reported cases (yellow points). (C) Effective reproduction number through time R
t
. The horizontal dotted line indicates R
t
¼ 1.
L. Geidelberg et al. | 5
Downloaded from https://academic.oup.com/ve/article/7/1/veaa102/6170691 by Imperial College London Library user on 29 March 2021

Citations
More filters
Journal ArticleDOI
TL;DR: This work sequences 212 SARS-CoV-2 sequences and uses them to perform a comprehensive analysis to trace the origins and spread of the virus and finds that travelers returning from the United States of America significantly contributed to viral spread in Israel.
Abstract: Full genome sequences are increasingly used to track the geographic spread and transmission dynamics of viral pathogens. Here, with a focus on Israel, we sequence 212 SARS-CoV-2 sequences and use them to perform a comprehensive analysis to trace the origins and spread of the virus. We find that travelers returning from the United States of America significantly contributed to viral spread in Israel, more than their proportion in incoming infected travelers. Using phylodynamic analysis, we estimate that the basic reproduction number of the virus was initially around 2.5, dropping by more than two-thirds following the implementation of social distancing measures. We further report high levels of transmission heterogeneity in SARS-CoV-2 spread, with between 2-10% of infected individuals resulting in 80% of secondary infections. Overall, our findings demonstrate the effectiveness of social distancing measures for reducing viral spread.

128 citations

Journal ArticleDOI
TL;DR: In this paper , the authors describe how phylogenetic and phylodynamic methods provide insight into viral evolution, focusing on the SARS-CoV-2 pandemic, and summarize their contributions to our understanding of SARS transmission and control.
Abstract: Determining the transmissibility, prevalence and patterns of movement of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections is central to our understanding of the impact of the pandemic and to the design of effective control strategies. Phylogenies (evolutionary trees) have provided key insights into the international spread of SARS-CoV-2 and enabled investigation of individual outbreaks and transmission chains in specific settings. Phylodynamic approaches combine evolutionary, demographic and epidemiological concepts and have helped track virus genetic changes, identify emerging variants and inform public health strategy. Here, we review and synthesize studies that illustrate how phylogenetic and phylodynamic techniques were applied during the first year of the pandemic, and summarize their contributions to our understanding of SARS-CoV-2 transmission and control. In this Review, the authors describe how phylogenetic and phylodynamic methods provide insight into viral evolution, focusing on the SARS-CoV-2 pandemic. The approaches reveal routes and timings of transmission events, and they can assess the effectiveness of various intervention measures aimed at controlling the virus.

52 citations

Journal ArticleDOI
TL;DR: Analysis of 247 full-genome SARS-CoV-2 sequences from two nearby communities in Wisconsin, USA finds surprisingly distinct patterns of viral spread, which suggest patterns of Sars-Cov-2 transmission may vary substantially even in nearby communities.
Abstract: Evidence-based public health approaches that minimize the introduction and spread of new SARS-CoV-2 transmission clusters are urgently needed in the United States and other countries struggling with expanding epidemics. Here we analyze 247 full-genome SARS-CoV-2 sequences from two nearby communities in Wisconsin, USA, and find surprisingly distinct patterns of viral spread. Dane County had the 12th known introduction of SARS-CoV-2 in the United States, but this did not lead to descendant community spread. Instead, the Dane County outbreak was seeded by multiple later introductions, followed by limited community spread. In contrast, relatively few introductions in Milwaukee County led to extensive community spread. We present evidence for reduced viral spread in both counties following the statewide "Safer at Home" order, which went into effect 25 March 2020. Our results suggest patterns of SARS-CoV-2 transmission may vary substantially even in nearby communities. Understanding these local patterns will enable better targeting of public health interventions.

38 citations

Journal ArticleDOI
01 Mar 2022-Gene
TL;DR: In this article , the authors discuss the impact of genomics in the ongoing COVID-19 pandemic in this review and discuss how the genomics technology has aided in the investigation of the CoV-2 outbreak.

11 citations

Journal ArticleDOI
TL;DR: In this article, an ANN architecture was developed to predict the serious pandemic outbreak impact in Qatar, Spain, and Italy, and the verified and validated growth model of COVID-19 for these countries showed the effects of the measures taken by the government and medical sectors to alleviate the pandemic effect and the effort to decrease the spread of the virus in order to reduce the death rate.
Abstract: The present study illustrates the outbreak prediction and analysis on the growth and expansion of the COVID-19 pandemic using artificial neural network (ANN). The first wave of the pandemic outbreak of the novel Coronavirus (SARS-CoV-2) began in September 2019 and continued to March 2020. As declared by the World Health Organization (WHO), this virus affected populations all over the globe, and its accelerated spread is a universal concern. An ANN architecture was developed to predict the serious pandemic outbreak impact in Qatar, Spain, and Italy. Official statistical data gathered from each country until July 6th was used to validate and test the prediction model. The model sensitivity was analyzed using the root mean square error (RMSE), the mean absolute percentage error and the regression coefficient index R2, which yielded highly accurate values of the predicted correlation for the infected and dead cases of 0.99 for the dates considered. The verified and validated growth model of COVID-19 for these countries showed the effects of the measures taken by the government and medical sectors to alleviate the pandemic effect and the effort to decrease the spread of the virus in order to reduce the death rate. The differences in the spread rate were related to different exogenous factors (such as social, political, and health factors, among others) that are difficult to measure. The simple and well-structured ANN model can be adapted to different propagation dynamics and could be useful for health managers and decision-makers to better control and prevent the occurrence of a pandemic.

10 citations

References
More filters
Journal ArticleDOI
TL;DR: This version of MAFFT has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update.
Abstract: We report a major update of the MAFFT multiple sequence alignment program. This version has several new features, including options for adding unaligned sequences into an existing alignment, adjustment of direction in nucleotide alignment, constrained alignment and parallel processing, which were implemented after the previous major update. This report shows actual examples to explain how these features work, alone and in combination. Some examples incorrectly aligned by MAFFT are also shown to clarify its limitations. We discuss how to avoid misalignments, and our ongoing efforts to overcome such limitations.

27,771 citations

Journal ArticleDOI
TL;DR: During the first 2 months of the current outbreak, Covid-19 spread rapidly throughout China and caused varying degrees of illness, and patients often presented without fever, and many did not have abnormal radiologic findings.
Abstract: Background Since December 2019, when coronavirus disease 2019 (Covid-19) emerged in Wuhan city and rapidly spread throughout China, data have been needed on the clinical characteristics of...

22,622 citations

Journal ArticleDOI
03 Feb 2020-Nature
TL;DR: Identification and characterization of a new coronavirus (2019-nCoV), which caused an epidemic of acute respiratory syndrome in humans in Wuhan, China, and it is shown that this virus belongs to the species of SARSr-CoV, indicates that the virus is related to a bat coronav virus.
Abstract: Since the outbreak of severe acute respiratory syndrome (SARS) 18 years ago, a large number of SARS-related coronaviruses (SARSr-CoVs) have been discovered in their natural reservoir host, bats1–4. Previous studies have shown that some bat SARSr-CoVs have the potential to infect humans5–7. Here we report the identification and characterization of a new coronavirus (2019-nCoV), which caused an epidemic of acute respiratory syndrome in humans in Wuhan, China. The epidemic, which started on 12 December 2019, had caused 2,794 laboratory-confirmed infections including 80 deaths by 26 January 2020. Full-length genome sequences were obtained from five patients at an early stage of the outbreak. The sequences are almost identical and share 79.6% sequence identity to SARS-CoV. Furthermore, we show that 2019-nCoV is 96% identical at the whole-genome level to a bat coronavirus. Pairwise protein sequence analysis of seven conserved non-structural proteins domains show that this virus belongs to the species of SARSr-CoV. In addition, 2019-nCoV virus isolated from the bronchoalveolar lavage fluid of a critically ill patient could be neutralized by sera from several patients. Notably, we confirmed that 2019-nCoV uses the same cell entry receptor—angiotensin converting enzyme II (ACE2)—as SARS-CoV. Characterization of full-length genome sequences from patients infected with a new coronavirus (2019-nCoV) shows that the sequences are nearly identical and indicates that the virus is related to a bat coronavirus.

16,857 citations


"Genomic epidemiology of a densely s..." refers background in this paper

  • ...…genomic analyses is widely recognized for estimating dates of127 emergence(Verity Hill, 2020; Gire et al., 2014) and identifying animal reservoirs(Zhou et al., 2020;128 Dudas et al., 2018), analysis of pathogen sequences also has potential to inform epidemic surveil-129 lance and intervention…...

    [...]

Journal ArticleDOI
TL;DR: Some notable features of IQ-TREE version 2 are described and the key advantages over other software are highlighted.
Abstract: IQ-TREE (http://www.iqtree.org, last accessed February 6, 2020) is a user-friendly and widely used software package for phylogenetic inference using maximum likelihood. Since the release of version 1 in 2014, we have continuously expanded IQ-TREE to integrate a plethora of new models of sequence evolution and efficient computational approaches of phylogenetic inference to deal with genomic data. Here, we describe notable features of IQ-TREE version 2 and highlight the key advantages over other software.

4,337 citations


"Genomic epidemiology of a densely s..." refers methods in this paper

  • ...A time scaled phylogeny estimated using IQTree and treedater and using the same data as used for the Bayesian analysis....

    [...]

  • ...Maximum likelihood analysis was carried using IQTree(Minh et al., 2019) with a HKY+G4173 substitution model and a time-scaled tree was estimated using treedater 0.5.0(Volz and Frost, 2017).174 Two outliers according to the molecular clock model were identified and removed using ‘treedater’175 which was also used to compute the root to tip regression.176 Bayesian phylogenetic analysis was carried out using BEAST 2.6.1(Bouckaert et al., 2019) using a177 HKY+G4 substitutionmodel and a strict molecular clock....

    [...]

  • ...Maximum likelihood analysis was carried using IQTree(Minh et al., 2019) with a HKY+G4173 substitution model and a time-scaled tree was estimated using treedater 0.5.0(Volz and Frost, 2017).174 Two outliers according to the molecular clock model were identified and removed using ‘treedater’175 which…...

    [...]

Book
28 Oct 2007
TL;DR: Mathematical modeling of infectious dis-eases has progressed dramatically over the past 3 decades and continues to be a valuable tool at the nexus of mathematics, epidemiol-ogy, and infectious diseases research.
Abstract: By Matthew James Keelingand Pejman RohaniPrinceton, NJ: Princeton University Press,2008.408 pp., Illustrated. $65.00 (hardcover).Mathematical modeling of infectious dis-eases has progressed dramatically over thepast 3 decades and continues to flourishat the nexus of mathematics, epidemiol-ogy, and infectious diseases research. Nowrecognized as a valuable tool, mathemat-ical models are being integrated into thepublic health decision-making processmore than ever before. However, despiterapid advancements in this area, a formaltraining program for mathematical mod-eling is lacking, and there are very fewbooks suitable for a broad readership. Tosupport this bridging science, a commonlanguage that is understood in all con-tributing disciplines is required.

3,467 citations


"Genomic epidemiology of a densely s..." refers methods in this paper

  • ...We use a susceptible-exposed-infectious-recovered (SEIR) model(Keeling and Rohani, 2011) 58 for epidemic dynamics in Weifang....

    [...]

  • ...We use a susceptible-exposed-infectious-recovered (SEIR) model(Keeling and Rohani, 2011)58 for epidemic dynamics in Weifang....

    [...]

Related Papers (5)
Frequently Asked Questions (14)
Q1. What are the contributions in "Genomic epidemiology of a densely sampled covid-19 outbreak in china" ?

Analysis of genetic sequence data from the SARS-CoV-2 pandemic can provide insights into epidemic origins, worldwide dispersal, and epidemiological history. Here, the authors report an analysis of 20 whole SARSCoV-2 genomes from a single relatively small and geographically constrained outbreak in Weifang, People ’ s Republic of China. The authors find that these estimates are consistent with reported cases and there is unlikely to be a large undiagnosed burden of infection over the period they studied. VC The Author ( s ) 2021. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http: //creativecommons. org/licenses/by/4. 0/ ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. The authors further estimate the number of infections through time and compare these estimates to confirmed diagnoses by the Weifang Centers for Disease Control. 

The phylodynamic model is designed to account for 1, nonlinear epidemic dynamics in Weifang with a realistic course of infection (incubation and infectious periods), 2, variance in transmission rates that can influence epidemic size estimates, and 3, migration of lineages in and out of Weifang. 

The Coronaviridae-like reads of samples with >100 average sequencing depth across the SARS-CoV-2 genome were subsampled to achieve 100 sequencing depth before being assembled. 

Although other methods which allow for time-varying transmission rate (including other PhyDyn model templates) or models with a piece-wise Rt function (Frost and Volz 2010), their SEIR-type model with constant b required fewer parameters, appropriate for an analysis with only 20 internal sequences. 

Total reads were first processed using Kraken v0.10.5 (default parameters) with a self-built database of Coronaviridae genomes (including SARS, MERS, and SARS-CoV-2 genome sequences downloaded from GISAID, NCBI, and CNGB) to identify Coronaviridae-like reads. 

The authors estimate the peak of daily infections in late January, preceding the time series of confirmed cases by about a week; this is expected due to delays from infection to appearance of symptoms and delays from symptoms to diagnosis. 

These interventions included public health messaging, establishing phone hot-lines, encouraging home isolation for recent visitors from Wuhan (January 23–26), optimising triage of suspected cases in hospitals (January 24), travel restrictions (January 26), extending school closures, and establishing ‘fever clinics’ for consultation and diagnosis (January 27) (Mao 2020). 

The larger reservoir of COVID-19 cases outside of Weifang (Y (t)) serves as a source of new infections and is assumed to be growing exponentially (at rate q) over this time period. 

the added value of fitting to only 20 local sequences in this analysis demonstrates the utility of phylodynamic modelling for outbreaks as compared with traditional epidemiological modelling fitted only to case data. 

As of 10 February 2020, 136 suspected cases and 214 close contacts were diagnosed by Weifang Center for Disease Control and Prevention; of these, 38 cases were confirmed positive with SARS-CoV-2. 

Im perial C ollege London Library user on 29 M arch 2021thirty-eight confirmed cases at the date of the last genetic sample (10 February), rising no further than forty-four from 16 February onwards (Fig. 2A). 

High variance of transmission rates will reduce genetic diversity of a sample and failure to account for this factor will lead to highly biased estimates of epidemic size (Li et al. 2017). 

The remaining RNA was used to construct the single-stranded circular DNA library with the MGIEasy RNA Library preparation reagent set (MGI, Shenzhen, China). 

This is likely due to their choice of reference sequence set, which comprised sequences spanning several months of the epidemic, and therefore reflecting a range of transmission dynamics.