scispace - formally typeset
Open AccessPosted ContentDOI

Detection of SARS-CoV-2 variants in Switzerland by genomic analysis of wastewater samples

TLDR
In this paper, the authors report a genomic analysis of SARS-CoV-2 in 48 raw wastewater samples collected from three wastewater treatment plants in Switzerland between July 9 and December 21, 2020.
Abstract
The SARS-CoV-2 lineages B.1.1.7 and 501.V2, which were first detected in the United Kingdom and South Africa, respectively, are spreading rapidly in the human population. Thus, there is an increased need for genomic and epidemiological surveillance in order to detect the strains and estimate their abundances. Here, we report a genomic analysis of SARS-CoV-2 in 48 raw wastewater samples collected from three wastewater treatment plants in Switzerland between July 9 and December 21, 2020. We find evidence for the presence of several mutations that define the B.1.1.7 and 501.V2 lineages in some of the samples, including co-occurrences of up to three B.1.1.7 signature mutations on the same amplicon in four samples from Lausanne and one sample from a Swiss ski resort dated December 9 - 21. These findings suggest that the B.1.1.7 strain could be detected by mid December, two weeks before its first verification in a patient sample from Switzerland. We conclude that sequencing SARS-CoV-2 in community wastewater samples may help detect and monitor the circulation of diverse lineages.

read more

Content maybe subject to copyright    Report

Detection and surveillance of SARS-CoV-2
genomic variants in wastewater
Katharina Jahn
1,2,*
, David Dreifuss
1,2,*
, Ivan Topolsky
1,2,*
, Anina Kull
3
, Pravin Ganesanandamoorthy
3
,
Xavier Fernandez-Cassi
4
, Carola Bänziger
3
, Alexander J. Devaux
3
, Elyse Stachler
3
, Lea Caduff
3
,
Federica Cariti
4
, Alex Tuñas Corzón
4
, Lara Fuhrmann
1,2
, Chaoran Chen
1,2
, Kim Philipp Jablonski
1,2
,
Sarah Nadeau
1,2
, Mirjam Feldkamp
1
, Christian Beisel
1
, Catharine Aquino
5
, Tanja Stadler
1,2
, Christoph
Ort
3
, Tamar Kohn
4
, Timothy R. Julian
3, 6, 7
, Niko Beerenwinkel
1,2,+
1
Department of Biosystems Science and Engineering, ETH Zurich, CH-4058 Basel, Switzerland;
2
SIB Swiss Institute of Bioinformatics, CH-1015 Lausanne, Switzerland;
3
Eawag, Swiss Federal Institute of Aquatic Science and Technology, CH-8600 Dübendorf, Switzerland;
4
Laboratory of Environmental Chemistry, School of Architecture, Civil and Environmental Engineering, École
Polytechnique Fédérale de Lausanne (EPFL), CH-1015 Lausanne, Switzerland;
5
Functional Genomics Center Zurich, ETH Zurich, CH-8057 Zurich, Switzerland;
6
Swiss Tropical and Public Health Institute, CH-4051 Basel, Switzerland;
7
University of Basel, CH-4055 Basel, Switzerland
*Equal contributions;
+
Correspondence to: niko.beerenwinkel@bsse.ethz.ch
Abstract
The emergence of SARS-CoV-2 mutants with altered transmissibility, virulence, or
immunogenicity emphasizes the need for early detection and epidemiological surveillance of
genomic variants. Wastewater samples provide an opportunity to assess circulating viral
lineages in the community. We performed genomic sequencing of 122 wastewater samples
from three locations in Switzerland to analyze the B.1.1.7, B.1.351, and P.1 variants of
SARS-CoV-2 on a population level. We called variant-specific signature mutations and
monitored variant prevalence in the local population over time. To enable early detection of
emerging variants, we developed a bioinformatics tool that uses read pairs carrying multiple
signature mutations as a robust indicator of low-frequency variants. We further devised a
statistical approach to estimate the transmission fitness advantage, a key epidemiological
parameter indicating the speed at which a variant spreads through the population, and
compared the wastewater-based findings to those derived from clinical samples. We found
that the local outbreak of the B.1.1.7 variant in two Swiss cities was observable in
wastewater up to 8 days before its first detection in clinical samples. We detected a high
prevalence of the B.1.1.7 variant in an alpine ski resort popular among British tourists in
December 2020, a time when the variant was still very rare in Switzerland. We found no
evidence of local spread of the B.1.351 and P.1 variants at the monitored locations until the
end of the study (mid February) which is consistent with clinical samples. Estimation of local
variant prevalence performs equally well or better for wastewater samples as for a much
larger number of clinical samples. We found that the transmission fitness advantage of
B.1.1.7, i.e. the relative change of its reproductive number, can be estimated earlier and
based on substantially fewer wastewater samples as compared to using clinical samples.
Our results show that genomic sequencing of wastewater samples can detect, monitor, and
evaluate genetic variants of SARS-CoV-2 on a population level. Our methodology provides a
blueprint for rapid, unbiased, and cost-efficient genomic surveillance of SARS-CoV-2
variants.
1
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.01.08.21249379doi: medRxiv preprint
NOTE: This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice.

Introduction
The ongoing spread and evolution of SARS-CoV-2 has generated several variants of interest
and variants of concern (1–3), which can affect, to different degrees, transmissibility (1),
disease severity (4), diagnostics, and the effectiveness of treatment (5) and vaccines.
Therefore, early detection and monitoring local variant spread has become an important
public health task (6).
Viral RNA of SARS-CoV-2 infected persons can be detected in the sewage collected in
wastewater treatment plants (WWTPs) and its concentration has been shown to correlate
with case reports (7). Moreover, wastewater samples can provide a snapshot of the
circulating viral lineages and their diversity in the community through RT-qPCR analysis (8,9)
or genomic sequencing (9–12). Recently, it has been shown that variant prevalence in
wastewater correlates with clinical data (13). Therefore variant monitoring in wastewater may
serve as an efficient and complementary approach to genomic epidemiology based on
individual patient samples.
However, it is challenging to analyze wastewater samples for their SARS-CoV-2 genomic
composition, because concentrations of SARS-CoV-2 are low, samples are enriched in PCR
inhibitors, viral genomes may be fragmented, and sewage contains large amounts of
bacterial, human and other viral DNA and RNA genomes. In addition, the data quality
obtained from sequencing the mixture of viral genomes is affected by amplification biases,
sequencing errors, and incomplete phasing information, which further complicates the
detection of an emerging viral lineage that is present only in a small fraction of infected
persons.
Here, we address some of the key challenges and demonstrate that genomic sequencing of
wastewater samples can be used for early detection, quantitative monitoring, and estimation
of transmission fitness of any genetic variant of SARS-CoV-2.
Methods
Study overview
We collected a total of 122 samples from three Swiss wastewater treatment plants (WWTPs)
located in Zurich, Lausanne and an alpine ski resort between July 2020 and February 2021
(Supplementary Figure 1A). These samples include a close-meshed time-series for Zurich
and Lausanne between December 2020 and mid-February 2021. Viral RNA was extracted
from raw influent samples and subjected to amplicon-based next-generation sequencing
(NGS) using 2x250 bp paired-end sequencing (Figure 1A). We compared normalized
amplicon coverage to clinical SARS-CoV-2 sequencing data to assess general data quality
and performed replicate and spike-in experiments to assess reproducibility and quantifiability
of SARS-CoV-2 variants in wastewater. Then, we searched the 122 wastewater samples for
evidence of the respective signature mutations of the variants B.1.1.7, B.1.351, and P.1
(Supplementary Table S1). For early detection in individual samples, we developed a
bioinformatics tool that searches for groups of signature mutations that can be observed
directly on the same sequencing read pair originating from the same amplicon. We used the
dense time-series data available for December 2020 to mid-February 2021 to estimate the
2
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.01.08.21249379doi: medRxiv preprint

local prevalence of the B.1.1.7 variant, calculated its fitness advantage and compared our
results to estimates based on clinical data from the same geographical areas.
Wastewater sample collection and preparation
Raw wastewater samples were collected from three Swiss WWTPs: Werdhölzli, Zurich (64
samples, Jul 2020-Feb 2021, population connected: 450,000), Vidy, Lausanne (49 samples,
Sep 2020-Feb 2021, population connected: 240,000), and an alpine ski resort (8 samples,
Dec 2020) (Figure 1A, Supplementary Figure S1). Samples were concentrated and viral
RNA was extracted as described before (14). In brief, 24-hour composite samples (Zurich
and Lausanne) or grab samples (ski resort) were collected in 500 ml polystyrene or
polypropylene plastic bottles, shipped on ice, and stored at 4°C for up to 8 days before
processing. Aliquots of 50 ml were clarified by filtration (2 µm glass fiber filter (Millipore)
followed by a 0.22 µm filter (Millipore), Zurich samples), or by centrifugation (4,863 xg for 30
minutes, Lausanne and ski resort samples). Clarified samples were then concentrated using
centrifugal filter units (Centricon Plus-70 Ultrafilter, 10kDa, Millipore, USA) by centrifugation
at 3,000 xg for 30 minutes. Centricon cups were inverted and the concentrate was collected
by centrifugation at 1,000 xg for 3 minutes. The resulting concentrate (up to 280 µL) was
extracted using the QiaAmp Viral RNA MiniKit (Qiagen, USA) according to the
manufacturer’s instructions, adapted to the larger volumes, and eluted in 80 µL. Samples
collected after February 1 were further purified using OneStep PCR Inhibitor Removal
columns (Zymo Research, USA). RNA extracts were stored at -80°C for up to 4 months
before sequencing.
Genomic sequencing
RNA extracts from wastewater samples were used to produce amplicons and to prepare
libraries according to the COVID-19 ARTIC v3 protocol(15) with minor modifications. Briefly,
extracted RNA was reversed transcribed using the NEB LunaScript RT SuperMix Kit (New
England Biolabs, USA) and the resulting cDNA was amplified with the ARTIC v3 panel from
IDT(IDT, USA). The amplicons were end-repaired and polyadenylated before ligation of
adapters using NEB Ultra II (New England Biolabs, USA). Fragments containing adapters on
both ends were selectively enriched and barcoded with unique dual indexing with PCR.
Libraries were sequenced using the Illumina NovaSeq 6000 and MiSeq platforms, resulting
in paired-end reads of length 250 bp each.
Mutation calling
NGS data was analyzed using V-pipe(16), a bioinformatics pipeline for end-to-end analysis
of viral sequencing reads obtained from mixed samples. Individual low-frequency mutations
were called based on local haplotype reconstruction using ShoRAH (17). For detecting
mutation co-occurrence, we developed a novel computational tool called Cojac
(CoOccurrence adJusted Analysis and Calling). The ARTIC v3 protocol relies on tiled
amplification, and some amplicons cover multiple positions mutated in a variant
(Supplementary Table S1). As the samples are sequenced with paired end 250 bp reads,
each 400 bp amplicon can be fully observed on the read pairs in close to all instances.
Detecting multiple signature mutations on the same amplicon increases the confidence of
mutation calls at very low variant read counts. This opens the possibility of earlier detection,
while variant concentrations are still too low for reliable detection of individual mutations.
Cojac takes the multiple read alignments (BAM files) and counts read pairs with
3
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.01.08.21249379doi: medRxiv preprint

variant-specific mutational patterns. Cojac can be configured to work with any tiled
amplification scheme and to simultaneously search for multiple variants, each defined by a
list of signature mutations. Cojac is freely available at https://github.com/cbg-ethz/cojac/ or
as a bioconda package at https://bioconda.github.io/recipes/cojac/README.html.
Statistical data analysis
For Zurich, we used the 55 sequencing experiments (excluding 1 failed) covering 46 dates
ranging from December 8, 2020 to February 11, 2021. For Lausanne, we used the 52
sequencing experiments (excluding 4 failed) covering 43 dates ranging from December 8,
2020 to February 13, 2021. When a WWTP sample was sequenced multiple times, we fixed
the empirical frequencies of the B.1.1.7 signature mutations for a given day by averaging
their values between the different sequencing experiments. We only used non-synonymous
substitutions for quantification. Frequencies of the B.1.1.7 signature substitutions in
wastewater-derived NGS data were resampled with replacement and averaged per
wastewater sample, before being smoothed across time by local regression using locally
weighted scatterplot smoothing (lowess) with ⅓ bandwidth from the Python v3.7.7 library
statsmodels v0.12.1(18). This process was repeated 1000 times to construct bootstrap
estimates of the B.1.1.7 per-day frequency curves. The smoothed resampled values were
used to compute point estimates by averaging the daily B.1.1.7 prevalence as well as
confidence intervals as the empirical 2.5% and 97.5% quantiles. For the prevalence of
B.1.1.7 in clinical samples, we used the whole-genome sequencing data comprising
randomly selected SARS-CoV-2-positive samples provided by Viollier AG, as described
previously(19). Daily cantonal relative abundances of variants were estimated as their
empirical frequencies in sequenced samples. For each canton, the sequenced cases were
resampled with replacement and aggregated into daily relative frequencies of B.1.1.7, which
was then smoothed temporally using the same lowess smoother as mentioned above. This
process was repeated 1000 times to construct bootstrap estimates of the B.1.1.7 daily
cantonal relative prevalence, which were aggregated into point estimates and confidence
intervals by the same method as described above.
Estimation of epidemiological parameters
Following Chen et al.(19), we assume that the relative frequency of the B.1.1.7 variant
𝑝(𝑡)
in the population at time follows a logistic growth with rate and inflection point ,
𝑡 𝑎 𝑡
0
𝑝(𝑡) =
𝑒𝑥𝑝{𝑎(𝑡 − 𝑡
0
)}
1 + 𝑒𝑥𝑝{𝑎(𝑡 − 𝑡
0
)}
For the wastewater samples, we further assume that the B.1.1.7 signature mutation counts
are distributed according to a binomial distribution, with expected value equal to times
𝑝(𝑡)
the total coverage at the respective site. Similarly, we assume that the B.1.1.7-positive
clinical samples are also distributed according to a binomial distribution with expected value
equal to times the number of clinical samples analyzed. The R v3.6.1 package stats(20)
𝑝(𝑡)
was used to produce maximum likelihood estimates of the model parameters with a
generalized linear model. Confidence intervals were computed based on their asymptotically
normal distribution. To account for overdispersion due to the inherently noisy nature of
wastewater sequencing data, the confidence intervals were computed using the variance of
a quasibinomial(21) distribution. Although clinical data are not expected to exhibit
4
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.01.08.21249379doi: medRxiv preprint

overdispersion, the same procedure was applied for the sake of consistency. Confidence
bands were first generated for the linear predictors, and then back-transformed into
confidence bands for the regression curves to ensure that they are restricted to the interval
[0,1]. Estimates of the logistic growth parameter were then transformed into estimates of
𝑎
the transmission fitness advantage assuming the discrete-time model of Chen et al.(19)
𝑓
𝑑
with generation time days, such that . Confidence intervals for the
𝑔 = 4. 8 𝑓
𝑑
= 𝑒𝑥𝑝 𝑎𝑔( ) 1
logistic growth parameter were then back-transformed into confidence intervals for the
𝑎
fitness advantage . This inference procedure was repeated at multiple timepoints with only
𝑓
𝑑
the clinical and wastewater sequencing data available at these timepoints, to generate online
estimates and confidence intervals of what could have been inferred about at that time.
𝑓
𝑑
These estimates were compared to the estimates of reported in Chen et al.(19) for the
𝑓
𝑑
Lake Geneva region (population 1.6 Mio), which includes Lausanne, and the Greater Zurich
Area (population 1.5 Mio). The confidence intervals for these regional estimates of were
𝑓
𝑑
recomputed using back-transformation of the confidence intervals reported for the regional
estimates of , so that they could be meaningfully compared with the ones based on our
𝑎
data.
Dilution experiment
RNA samples of cultivated SARS-CoV-2 wild type (Wuhan strain) and of a clinical B.1.1.7
strain were obtained. Each RNA sample was diluted in an RNA extract produced from
SARS-CoV-2-free wastewater (November 2019, Lausanne) to a final concentration of 200
gc/µL. Wild type and B.1.1.7 solutions were then mixed at ratios of 10:1, 50:1 and 100:1, and
each mixture was sequenced five times.
Replicate experiment
RNA extract was produced as described above from two samples obtained from the
Lausanne WWTP on January 7, 2020. The extracts were pooled and subsequently divided
into 9 replicate samples for sequencing.
Patient sequences
Per-patient SARS-CoV-2 consensus sequences were downloaded from GISAID(22) for all
samples collected in Switzerland between February 24, 2020, and February 13, 2021, and
not identified as either B.1.1.7, P.1, or B.1.351 (see Supplementary Material for the list of
accession numbers).
Results
We first assessed the quality of genomic sequencing data derived from wastewater samples.
We found that the normalized amplicon coverage obtained from the wastewater samples
was not significantly different from the coverage of clinical samples (Figure 1C) and that it
allowed for calling low-frequency mutations in most genomic regions of most wastewater
samples we analyzed (Supplementary Figure S2). The additional replicate and spike-in
experiments indicate that the relative prevalence of genomic variants can be quantified from
5
. CC-BY-NC 4.0 International licenseIt is made available under a
is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review)
The copyright holder for this preprint this version posted July 15, 2021. ; https://doi.org/10.1101/2021.01.08.21249379doi: medRxiv preprint

Figures
Citations
More filters
Journal ArticleDOI

Evaluation of Sampling, Analysis, and Normalization Methods for SARS-CoV-2 Concentrations in Wastewater to Assess COVID-19 Burdens in Wisconsin Communities

TL;DR: The differences in the strength of SARS-CoV-2 relationships to COVID-19 incidence and the effect of normalization on these data among communities demonstrate that rigorous validation should be performed at individual sites where wastewater surveillance programs are implemented.
Journal ArticleDOI

Wastewater monitoring outperforms case numbers as a tool to track COVID-19 incidence dynamics when test positivity rates are high.

TL;DR: In this article, the authors investigated how the dynamics of new COVID-19 infections estimated based on wastewater monitoring or confirmed cases compare to true COVID19 incidence dynamics, and they focused on the first pandemic wave in Switzerland (February to April, 2020), when test positivity ranged up to 26%.
Journal ArticleDOI

Wastewater SARS-CoV-2 monitoring as a community-level COVID-19 trend tracker and variants in Ohio, United States.

TL;DR: In this article, three SARS-CoV-2 target genes (N1 and N2 gene regions, and E gene) were quantified from wastewater influent samples obtained from the capital city and 7 other cities in various size in central Ohio from July 2020 to January 2021.
Journal ArticleDOI

Detection of SARS-CoV-2 variants by genomic analysis of wastewater samples in Israel.

TL;DR: In this paper, a large-scale surveillance of SARS-CoV-2 variants in nine different locations across Israel that were sampled from August 2020 to February 2021 and sequenced (n = 58) Viral sequences obtained from the wastewater samples had high coverages of the genome, and mutation analyses successfully identified the penetration of the B117 variant into Israel in December 2020 in the central and north regions, and its spread into additional regions in January and February 2021, corresponding with clinical sampling results.
Journal ArticleDOI

Droplet digital RT-PCR to detect SARS-CoV-2 signature mutations of variants of concern in wastewater.

TL;DR: In this article, the authors demonstrate the use of RT-ddPCR on wastewater samples for specific detection of mutation N501Y in SARS-CoV-2 RNA.
References
More filters
Journal Article

R: A language and environment for statistical computing.

R Core Team
- 01 Jan 2014 - 
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Book

Generalized Linear Models

TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Proceedings ArticleDOI

Statsmodels: Econometric and Statistical Modeling with Python

TL;DR: The current relationship between statistics and Python and open source more generally is discussed, outlining how the statsmodels package fills a gap in this relationship.
Journal ArticleDOI

GISAID: Global initiative on sharing all influenza data - from vision to reality.

TL;DR: This poster presents a poster presenting a poster presented at the 2016 International Conference of the Association for the Study of Viral Influenza and its Disruption in China, where it was presented for the first time.
Journal ArticleDOI

Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England.

TL;DR: Using a variety of statistical and dynamic modeling approaches, the authors estimate that this variant has a 43 to 90% (range of 95% credible intervals, 38 to 130%) higher reproduction number than preexisting variants, and a fitted two-strain dynamic transmission model shows that VOC 202012/01 will lead to large resurgences of COVID-19 cases.
Related Papers (5)
Frequently Asked Questions (1)
Q1. What contributions have the authors mentioned in the paper "Detection and surveillance of sars-cov-2 genomic variants in wastewater" ?

The authors performed genomic sequencing of 122 wastewater samples from three locations in Switzerland to analyze the B. 1. 1. 7, B. 1. 351, and P. 1 variants of SARS-CoV-2 on a population level. The authors found no evidence of local spread of the B. 1. 351 and P. 1 variants at the monitored locations until the end of the study ( mid February ) which is consistent with clinical samples. It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. This preprint reports new research that has not been certified by peer review and should not be used to guide clinical practice. The authors further devised a statistical approach to estimate the transmission fitness advantage, a key epidemiological parameter indicating the speed at which a variant spreads through the population, and compared the wastewater-based findings to those derived from clinical samples. The authors found that the transmission fitness advantage of B. 1. 1. 7, i. e. the relative change of its reproductive number, can be estimated earlier and based on substantially fewer wastewater samples as compared to using clinical samples.