SARS-CoV-2 detection in river and sewage
Utilizing our previously established ES program for Salmonella enterica serovar Typhi24,25 we based our ES for SARS-CoV-2 on 30ml “grab” water samples. Water from rivers, informal sewage systems and a defunct wastewater treatment plant was collected in two phases; from May 2020 – January 2021 we developed our sample framework, field sampling method and laboratory methods and from January 2021 – May 2022 we greatly expanded the scale of our program26 (Fig. 1A). Phase 1 included collections in 7 sites spanning 3 urban areas of Blantyre, Malawi, and Phase 2 scaled the collection to 112 sites covering 22 areas representing river catchment areas of diverse population size in Urban Blantyre (Fig. 1B, Table S1).
In Phase 1, we tested water effluent for SARS-CoV-2 by PCR after filtering through a 0.45-micron filter for SARS-CoV-2. After comparing filtered samples to unfiltered grab samples, we found no difference in SARS-CoV-2 detection (Fig S1A). We also compared two widely used concentration methods, polyethylene glycol (PEG)27 and skimmed milk flocculation (SMF)27,28 concentration and found no significant difference in diagnostic outcome (Fig S1B). We chose grab samples due to cost and straightforwardness in the field and PEG due to more predictable availability of consumables and laboratory infrastructure.
We first detected SARS-CoV-2 on May 11th, 2020, and from May 2020 to May 2022, we found 8.3% (220/2625) of our samples were positive for SARS-CoV-2 by PCR, with Ct range of 30-39.45 (Fig S1C). Sampling rates and positive detections varied over time (Fig. 1A) and by site (Fig. 1B-D). Sewage samples had higher SARS-CoV-2 positivity rates (21% [55/258]) than river water samples (7.2% [170/2367]), however river water samples still yielded a signal which varied in line with the occurrence of new waves of COVID-19 in Blantyre. One or more sample(s) had detectable SARS-CoV-2 at 42/112 sites (Fig. 1B-D). Using Anselin Local Moran’s I and Getis-Ord Gi* analysis we identified statistically significant high-high clusters and hotspots of positivity, respectively. Specifically, increased rates of detection were geographically centered toward the southwest of Blantyre along the Mudi and Naperi rivers which drain through the most densely populated areas of the city (Fig. 1E-F). Moreover, local Moran’s I also identified statistically significant low-low clusters of positivity in the southeast and north of Blantyre indicating areas of significantly low rates of detection. This has allowed us to decrease sample collection sites to 21 highly informative locations that demonstrate clear waves of positivity in ES data, and provide representative surveillance for around 70%, and of the population.
Validation of ES as a predictor of clinical prevalence
Next, we aimed to validate ES as a predictor of clinical cases in the surveillance system. The clinical dataset we utilized was from an active recruitment study, where from November 2020 to July 2021, a maximum of 20 individuals/day who met the WHO syndromic definition of COVID-19 were tested. From August 2021, a maximum of 21 patients per site per weekday were recruited, at a 2:1 ratio of those who were suspected cases and those who were not, to capture asymptomatic transmission29. The active community surveillance program tracked both symptomatic and asymptomatic cases as well as number of diagnostic tests conducted. Using a quasi-binomial generalized linear model, we found that the patterns of SARS-CoV-2 detection in ES samples significantly predict the rate of positivity in the community surveillance system for all waves (p < 0.001) (Table 1). The estimated intercepts in the model can inform the sensitivity of the ES system, and we find an ES detection rate of zero corresponds with approximately 5–6% prevalence in the community in the Delta and Omicron waves, indicating lack of sensitivity of the ES system below this prevalence rate. As COVID-19 increased in the community, with each wave driven by a new VOC, so did detection of SARS-CoV-2 in wastewater.
Table 1
Estimated model parameters for the predictive model of Covid cases showing wave-specific coefficients for the ES rate as a predictor (α), as well as wave-specific intercepts (β) denoted for each wave driven by a VOC.
|
Estimate
|
Std. Error
|
Pr(>|t|)
|
\({\alpha }_{Beta}\)
|
-4.65926
|
0.493613
|
6.84E-19
|
\({\alpha }_{Delta}\)
|
-2.90824
|
0.16818
|
3.18E-48
|
\({\alpha }_{Omicron}\)
|
-2.77117
|
0.203336
|
6.73E-34
|
\({\beta }_{Beta}\)
|
23.32583
|
2.950117
|
3.95E-14
|
\({\beta }_{Delta}\)
|
10.2418
|
0.893807
|
7.72E-26
|
\({\beta }_{Omicron}\)
|
11.40473
|
1.196987
|
3.50E-19
|
Further, we allowed for wave-specific lagged effects. For the Beta wave, we found that the best fit of the model resulted in the ES predating peak prevalence by almost two months (Fig S2). Other COVID-19 waves were more closely aligned with ES data (Fig S2). Further, we detected differences in wave-specific sensitivities of the ES system (Table 1, Fig S3). A 20% detection rate in ES during the beta-wave corresponded to a 50% prevalence rate in the community. The same rate of detection in ES would correspond to a 30% prevalence rate if it occurred in the delta wave, indicating a possible decline in healthcare seeking or reporting in the second wave.
Finally, we tested the robustness of these estimates to uncertainty in the ES data. We generated 1,000 realizations from the multivariate normal distribution parameterized by the ES regression model covariates (Fig S4). The realizations were significant (p < 0.05) for 100% and 99.9% of iterations for Delta and Omicron, respectively, and 92% for the Beta wave (Fig S5), reflecting uncertainty in the first wave’s predictive power based on fewer samples during this early time period.
Comparison of the peaks of SARS-CoV-2 from ES in relation to clinical datasets
We next wanted to compare peaks of SARS-CoV-2 detections across datasets. We utilized a secondary dataset reflecting clinical patterns of COVID-19, data reported to the Blantyre District Health Office (DHO). This dataset contained daily counts of reported city-wide cases and was available from the beginning of ES sampling until January 2022, and represented case detection from symptomatic patients seeking treatment and limited contact tracing. A large proportion of individuals presented to secondary community health centers which is representative of data more typically available in low-income settings. There were key limitations to this dataset. The DHO dataset did not record the total number of tests administered and there was limited testing of asymptomatic individuals mainly at the beginning of the pandemic. In addition, there were periods of low testing due to PCR resources and there was a national switch from PCR based diagnostics to rapid diagnostic tests, but this was not captured in the dataset available. Examining this data, the Delta wave appears significantly smaller than the other waves, with the Omicron wave the largest (Fig S6), and these data were thus distinct from the ES and community-based active surveillance data (Fig. 2).
We compared the timing of the peak of COVID-19 rates across all three datasets. The reported DHO positivity data lagged behind ES data by 2, 31, and 7 days for Wave 1, Beta wave and Omicron wave, respectively, however for the Delta wave the ES data lagged behind the DHO data by 13 days (Table 2). Despite the differences between active surveillance, passive surveillance and ES, the peaks of the waves were very similar, further validating ES as a useful indicator for COVID-19 outbreaks.
Table 2
Estimated date of peak detection. Timing of the peak of each wave between datasets shows both active and passive detection lags behind ES detection except during the Delta wave.
Wave (VOC)
|
Environmental Surveillance
|
Community Active Surveillance
|
Community passive Surveillance (DHO)
|
Wave 1
|
7-13-2020
(5-28-2020, 8-01-2020)
|
NA
|
7-15-2020
(7-14-2020, 7-17-2020)
|
Wave 2 (Beta)
|
12-05-2020
(10-29-2020, 12-23-2020)
|
1-27-2021
(1-22-2021, 1-31-2021)
|
1-05-2021
1-04-2021, 1-06-2021)
|
Wave 3 (Delta)
|
7-22-2021
(7-14-2021, 8-04-2021)
|
7-22-2021
(7-19-2021, 7-25-2021)
|
7-09-2021
(7-08-2021, 7-12-2021)
|
Wave 4 (Omicron)
|
12-28-2021
(12-22-2021, 3–07, 2022)
|
12-30-2022
(12-26-2021, 1-05-2022)
|
1-04-2022
(1-03-2022, 1-05-2022)
|
SARS-COV-2 sequencing and identification of variants of concern.
To better understand if ES can be used to identify VOCs circulating in river and informal sewage, we carried out amplicon sequencing (see Methods) using the Nanopore Minion for 90 samples with the lowest diagnostic cycle threshold (Ct) between 30.2–38.8, of which 85 produced sequences. Although this is now commonly done HIC with refined sewage systems it was unclear if we could identify VOCs from very mixed samples with low viral loads. Of our sequenced samples, only 24/86 samples had > 50% coverage and 52/86 samples had > 20% coverage at 20x depth. To confirm our sequencing results and compare sequencing methodologies we carried out further matched sequencing of 68 samples using the EasySeq method (see Methods) and an Illumina MiSeq or NovaSeq (Fig S7). Of these samples, 27/68 had > 50% coverage and 42/68 samples had > 20% coverage at 5x depth showing that low viral load greatly affected both sequencing methods.
We were able to sequence SARS-CoV-2 from both informal sewage and river water, important for ES in communities with limited to no refined sewage treatment centers. We utilized the Freyja analysis tool19 to identify circulating VOCs (Fig. 3A). We first identified VOCs on the following dates: Beta (Jan 19th, 2021), Delta (May 18th, 2021) and Omicron (28th, Sept 2021) using both Minion and Illumina data. Detection of Beta was consistent with clinical data; however, Delta was detected in wastewater a week before the first clinical detection in Blantyre (May 26th, 2021). Our initial analysis identified Omicron over two months before we observed a clinical case of Omicron in Malawi (Dec 6th, 2021), and 6 weeks before the first identification of this new variant globally (Nov 9th, 2021)30,31. Freyja analysis uncovered some examples of cryptic transmission where Beta continues to pop up throughout the year and Omicron does not fully replace Delta until December (Fig. 3A) which was not identified in our patient data.
At face value the two genomes from September appeared to be Omicron by Freyja and one was identified as Omicron using the ARTIC to Pangolin computational pipeline31. Upon deeper analysis of the two putative Omicron genomes from wastewater samples collected in September 2021, we found that one majority consensus genome had multiple SNPs found in a clinical case (a BA.1.14 lineage virus) from Malawi collected in January 2022 and sequenced a few weeks before our wastewater sample was sequenced (Fig S9A). Although concentration, extraction and library preparation were carried out in an ES only lab the amplicon sequencing was carried out in a shared lab space. We re-sequenced the wastewater sample from the original water sample and the re-sequence did not match the clinical sample and did not have the full Omicron repertoire of SNPs (Fig S9B), confirming our suspicion of contamination. The re-sequenced sample contained key Omicron SNPs including specific to BA.2 and BA.4, which is not in line with the putative early Omicron signatures (Fig S9B).
For the second putative Omicron positive sample from September 2021, we further investigated how to analyze a sample with two VOCs present. For this sample Omicron was estimated to be only about 10% of total virus present, we attempted to produce a unique genome by leveraging physically linked mutations specific to Omicron, SNP-frequency linkage (i.e., similar mutation frequency) and by filtering out reads corresponding to circulating Delta lineages, the major VOC in our sample. We were able to generate an Omicron consensus genome but at low sequencing coverage and many Omicron SNPs were either missing or at frequencies under 50% (Fig. 3B). We further scrutinized this genome to better define if the phylogenetic timestamp of this genome was indicative of an Omicron precursor. Since genomes from wastewater are a mixture of multiple viruses calling a consensus genome is much more complicated and minor decisions (e.g., physically linked vs unlinked SNPs, frequency of each SNP, classifying SNPs that are linked to multiple VOC, etc.) can greatly change the time-aware phylogeny (Fig. 3C). Using a Bayesian phylodynamic approach, we tested the effect of three different consensus calling approaches on the estimated sampling date of the virus (i.e., while blinding our model to the “known” sampling date of the virus). If we included all physically linked and SNP frequency-linked mutations, we estimated sampling date in early 2022 (median Feb 8, 95% HPD = (Jan 16-Feb14)), in line with the global spread of Omicron and not indicative of early emergence in Malawi.
However, since Omicron was identified as a minority component in the mixture, our initial consensus approach may incorrectly include additional mutations associated with Delta lineages at a comparable frequency. Hence, we considered the inclusion of two additional cases, corresponding to using only physically linked mutations and only physically linked mutations plus the frequency-linked C12786T (Orf1a:T4174I) mutation (a lineage-defining mutation for BA.1.14, which later circulated in Malawi). When considering only physically linked mutations we found a median estimated sampling date of October 15th, 2021 (95% HPD= (Aug 21, Nov 14) consistent with early Omicron circulation in Malawi prior to November 2021. Although we cannot rule out that our retrospective detections of Omicron in September 2021 constitute bona fide early evidence of this VOC, given our combined findings, we believe it is more likely than not that our September 2021 detection could be due to laboratory contamination during sequencing thus highlighting the importance of real-time sequencing.
To understand how the ES sample genomes relate to patient data from Malawi as well as SARS-CoV-2 more globally we constructed maximum likelihood phylogenetic trees for Delta and Omicron. We utilized consensus genomes derived from the Freyja analysis that had greater than 50% estimated Omicron abundance. We built trees utilizing patient derived genomes from a time period similar to our samples with a bias toward genomes from Africa. For Delta the ES samples largely cluster with genomes from patients’ samples from Malawi and samples from South Africa (Fig S9).
Next, we sought to understand cryptic transmission of VOCs even with low samples and low coverage. Among samples with ≥ 20% genome coverage 15/51 contained more than one VOC and for samples with ≥ 50% genome coverage 5/28 samples contained more than one VOC. This timeline of cryptic detection and then the emergence of dominant VOCs is consistent with sequencing data from patients in Malawi where new VOCs become dominant in the patient population. Although we have low numbers and low genome coverage, we do see examples of cryptic transmission of known VOCs. We detect Beta intermittently until the end of 2021 and we continue to detect Delta even after Omicron moved through the population (Fig. 3A). Our current patient genomic surveillance failed to detect the continued circulation of these VOCs30.