Genomic epidemiology of a densely sampled COVID19 outbreak in China
Summary (3 min read)
1. Introduction
- The authors report a genomic epidemiological analysis of one of the first geographically concentrated community transmission samples of SARS-CoV-2 genetic sequences collected outside of the initial outbreak in Wuhan, China.
- These data comprise 20 wholegenome sequences from confirmed COVID-19 cases in Weifang, Shandong Province, People’s Republic of China.
- The data were collected over the course of several weeks up to 10 February 2020, and overlap with a period of intensifying public health and social distancing measures.
- In contrast to the early spread of COVID-19 in Hubei Province of China, most community transmissions within Weifang took place after these measures were put in place.
- Using an adaptation of these methods, and based on the local genetic data available, the objective of this study is to evaluate the growth rate and reproduction number in Weifang after seeding events that took place in mid to late January, 2020.
2.1 Epidemiological investigation, sampling and genetic sequencing
- As of 10 February 2020, 136 suspected cases and 214 close contacts were diagnosed by Weifang Center for Disease Control and Prevention; of these, 38 cases were confirmed positive with SARS-CoV-2.
- Concentration of RNA samples was measured by the Qubit RNA HS Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA).
- The remaining RNA was used to construct the single-stranded circular DNA library with the MGIEasy RNA Library preparation reagent set (MGI, Shenzhen, China).
- The Coronaviridae-like reads of samples with >100 average sequencing depth across the SARS-CoV-2 genome were subsampled to achieve 100 sequencing depth before being assembled.
2.2.1 Nonlinear epidemiological dynamics in Weifang
- The maximum number of daily confirmed COVID-19 cases occurred on February 5, but it is unknown when the maximum prevalence of infection occurred.
- To capture a nonlinear decrease in cases following epidemic peak, and to account for a realistic distribution of generation times, the authors use an extension of the susceptible- exposed-infectious-recovered (SEIR) model (Keeling and Rohani 2011) for epidemic dynamics in Weifang, shown in Equations (1–5).
2.2.2 Variance in transmission rates
- To estimate total numbers infected, the phylodynamic model must account for epidemiological variables which are known to significantly influence genetic diversity (Lloyd-Smith et al. 2005).
- Foremost among these is the variance in offspring distribution (number of transmissions per primary case).
- High variance of transmission rates will reduce genetic diversity of a sample and failure to account for this factor will lead to highly biased estimates of epidemic size (Li et al. 2017).
- The authors therefore elaborate the SEIR model with an additional compartment J which has a higher transmission rate (s -fold higher) than the I compartment.
- Upon leaving the incubation period, individuals progress to the J compartment with probability ph, or otherwise to I.
2.2.3 Importation of lineages from Wuhan
- The outbreak in Weifang was seeded by multiple lineages imported at various times from the rest of China.
- The equation governing this population is: _Y tð Þ ¼ q lð ÞYðtÞ 6 Migration only depends on the size of variables in the Weifang compartment and thus does not influence epidemic dynamics; it will only influence the inferred probability that a lineage resides within Weifang.
2.2.4 Model fitting
- Other parameters are fixed based on prior information.
- This dispersion is similar to values estimated for the 2003 SARS epidemic (Lloyd-Smith et al. 2005).
- The SEIR model dynamics begin on 10 January.
- An exploration of this parameterisation is discussed in Supplementary Information Section 4.1.
3. Phylogenetic analysis
- Using, MAFFT (Katoh and Standley 2013), the authors aligned the 20 Weifang sequences with a previous alignment of 57 nonidentical SARS-CoV 2 sequences from outside of Weifang, hereby the ’reference set’ (Volz et al. 2020).
- The distribution of sample dates from inside and outside of Weifang is shown in Fig. 1B.
- Bayesian phylogenetic analysis was carried out using BEAST 2.6.1 (Bouckaert et al. 2019) with a HKYþG4 substitution model and a strict molecular clock.
- In order to demonstrate the added utility of the sequence data, the analysis was repeated assuming a constant likelihood, that is sampling only from the prior probability distributions.
- Code to replicate this analysis and BEAST XML files can be found at https://github.com/emvolz/weifang-sarscov2.
4. Results
- The number of confirmed cases by Weifang CDC show that the outbreak peaked early and the maximum number of cases occurred on 5 February.
- The authors estimate the peak of daily infections in late January, preceding the time series of confirmed cases by about a week; this is expected due to delays from infection to appearance of symptoms and delays from symptoms to diagnosis.
- The authors detect a significant decrease in effective reproduction number as the epidemic progressed, during a period (late January) when Weifang was implementing a variety of public health interventions and contact tracing to limit epidemic spread.
- There is correspondingly low confidence in tree topology (Supplementary Fig. S3), and only two monophyletic Weifang clades had greater than 50 per cent posterior probability, neither of which are larger than two samples.
- These dates cover a similar range as the posterior TMRCA of all Weifang sequences (Supplementary Fig. S4).
5. Discussion
- The authors analysis of 20 SARS-CoV-2 genomes has confirmed independent observations regarding the rate of spread and burden of infection in Weifang, China.
- Analysis of genetic sequence data provides an alternative source of information about epidemic size.
- The authors do not find evidence for a large hidden burden of infection within Weifang, with an estimated total number of cases around 365 (102–1174) at the date of last sample, towards the end of the outbreak.
- (B) Daily estimated infections through time compared to daily reported cases (yellow points).
Did you find this useful? Give us your feedback
Citations
128 citations
52 citations
38 citations
11 citations
10 citations
References
27,771 citations
22,622 citations
16,857 citations
"Genomic epidemiology of a densely s..." refers background in this paper
...…genomic analyses is widely recognized for estimating dates of127 emergence(Verity Hill, 2020; Gire et al., 2014) and identifying animal reservoirs(Zhou et al., 2020;128 Dudas et al., 2018), analysis of pathogen sequences also has potential to inform epidemic surveil-129 lance and intervention…...
[...]
4,337 citations
"Genomic epidemiology of a densely s..." refers methods in this paper
...A time scaled phylogeny estimated using IQTree and treedater and using the same data as used for the Bayesian analysis....
[...]
...Maximum likelihood analysis was carried using IQTree(Minh et al., 2019) with a HKY+G4173 substitution model and a time-scaled tree was estimated using treedater 0.5.0(Volz and Frost, 2017).174 Two outliers according to the molecular clock model were identified and removed using ‘treedater’175 which was also used to compute the root to tip regression.176 Bayesian phylogenetic analysis was carried out using BEAST 2.6.1(Bouckaert et al., 2019) using a177 HKY+G4 substitutionmodel and a strict molecular clock....
[...]
...Maximum likelihood analysis was carried using IQTree(Minh et al., 2019) with a HKY+G4173 substitution model and a time-scaled tree was estimated using treedater 0.5.0(Volz and Frost, 2017).174 Two outliers according to the molecular clock model were identified and removed using ‘treedater’175 which…...
[...]
3,467 citations
"Genomic epidemiology of a densely s..." refers methods in this paper
...We use a susceptible-exposed-infectious-recovered (SEIR) model(Keeling and Rohani, 2011) 58 for epidemic dynamics in Weifang....
[...]
...We use a susceptible-exposed-infectious-recovered (SEIR) model(Keeling and Rohani, 2011)58 for epidemic dynamics in Weifang....
[...]
Related Papers (5)
Frequently Asked Questions (14)
Q2. What is the phylodynamic model for Weifang?
The phylodynamic model is designed to account for 1, nonlinear epidemic dynamics in Weifang with a realistic course of infection (incubation and infectious periods), 2, variance in transmission rates that can influence epidemic size estimates, and 3, migration of lineages in and out of Weifang.
Q3. How many reads were used to construct the Coronaviridae-like library?
The Coronaviridae-like reads of samples with >100 average sequencing depth across the SARS-CoV-2 genome were subsampled to achieve 100 sequencing depth before being assembled.
Q4. How many internal sequences were required to fit the model?
Although other methods which allow for time-varying transmission rate (including other PhyDyn model templates) or models with a piece-wise Rt function (Frost and Volz 2010), their SEIR-type model with constant b required fewer parameters, appropriate for an analysis with only 20 internal sequences.
Q5. What was the first step in the process of identifying Coronaviridae-like read?
Total reads were first processed using Kraken v0.10.5 (default parameters) with a self-built database of Coronaviridae genomes (including SARS, MERS, and SARS-CoV-2 genome sequences downloaded from GISAID, NCBI, and CNGB) to identify Coronaviridae-like reads.
Q6. What is the peak of infections in the epidemic?
The authors estimate the peak of daily infections in late January, preceding the time series of confirmed cases by about a week; this is expected due to delays from infection to appearance of symptoms and delays from symptoms to diagnosis.
Q7. What were the interventions used to prevent the spread of SARS?
These interventions included public health messaging, establishing phone hot-lines, encouraging home isolation for recent visitors from Wuhan (January 23–26), optimising triage of suspected cases in hospitals (January 24), travel restrictions (January 26), extending school closures, and establishing ‘fever clinics’ for consultation and diagnosis (January 27) (Mao 2020).
Q8. What is the phylodynamic model of COVID-19?
The larger reservoir of COVID-19 cases outside of Weifang (Y (t)) serves as a source of new infections and is assumed to be growing exponentially (at rate q) over this time period.
Q9. How many local sequences are included in this analysis?
the added value of fitting to only 20 local sequences in this analysis demonstrates the utility of phylodynamic modelling for outbreaks as compared with traditional epidemiological modelling fitted only to case data.
Q10. How many cases were confirmed positive with SARS-CoV-2?
As of 10 February 2020, 136 suspected cases and 214 close contacts were diagnosed by Weifang Center for Disease Control and Prevention; of these, 38 cases were confirmed positive with SARS-CoV-2.
Q11. How many confirmed cases were there at the time of the last genetic sample?
Im perial C ollege London Library user on 29 M arch 2021thirty-eight confirmed cases at the date of the last genetic sample (10 February), rising no further than forty-four from 16 February onwards (Fig. 2A).
Q12. What is the effect of the high variance of transmission rates on the genetic diversity of a sample?
High variance of transmission rates will reduce genetic diversity of a sample and failure to account for this factor will lead to highly biased estimates of epidemic size (Li et al. 2017).
Q13. What was used to construct the single-stranded circular DNA library?
The remaining RNA was used to construct the single-stranded circular DNA library with the MGIEasy RNA Library preparation reagent set (MGI, Shenzhen, China).
Q14. Why did the authors choose a reference sequence set?
This is likely due to their choice of reference sequence set, which comprised sequences spanning several months of the epidemic, and therefore reflecting a range of transmission dynamics.