Included genetic sequence data and association with reported cases
Here, we analyze 3843 VP1 sequences from wild-type polioviruses detected in samples collected between January 2012 and June 2023. During this period, there were a far greater number of genetic sequences from ES detections than AFP cases (2727 ES to 1116 AFP) (Table S1), and the number of ES sites expanded significantly through the time-period covered (Fig. S6). Throughout this analysis, we divide Afghanistan and Pakistan into 16 regions, defined based on previous epidemiological analysis of polio, other diseases and climatic regions (see map in Fig. 1C and Methods). We provide alternative estimates of key reported parameters obtained with the full dataset (including AFP and ES sequences; presented in the main figures) and using only AFP data, considered a priori less biased than ES detections, which tend to be targeted to large sewage catchments and key high-risk areas for polio transmission.18,19 Figures for the analysis of the AFP data alone are presented in the Supplementary Information. Nevertheless, there was a strong correlation between the number of sequences and the number of reported WPV1 AFP cases across regions (R = 0.75, p = 0.007) and across years (R = 0.82, p = 0.002). This suggests that ES data provide a representative sample across the study area and time period, although this is likely biased at the regional level by differences in ES sampling intensity.
Genetic diversity through time
A phylogeny of all sequences shows two clear lineages (here labelled A and B), which drove all WPV1 transmission during the most recent outbreak, in 2019–2020 (Fig. 2). The root of these two poliovirus lineages was found to be in mid-2009 (2009-08-22; 95% HPD: 2009-02-27 to 2010-02-02) when including all sequences, with no significant difference when using AFP data alone (2009-04-16; 95% HPD: 2008-02-23 to 2010-01-21). Lineage A died out in early 2021 and has not been detected since. Three distinct sub-lineages of the B lineage persisted into 2023 (Fig. 2).
Bayesian skyline analysis of AFP data inferred increases in viral diversity during the two major outbreaks, followed by a reduction between the outbreaks and a significant decline since early 2021 (Fig. 2C). In contrast, a lineages-through-time (LTT) analysis of both AFP and ES data demonstrated the early-warning capabilities of ES data, detecting an increase in viral lineages ahead of both major outbreaks, even when AFP data showed no such signal. Towards the end of the study period, the LTT analysis indicated expanding transmission outside the Pakistan Central Corridor and Afghanistan North Corridor, detectable through ES but not AFP. Both methods revealed that viral diversity in early 2021 dropped to very low levels, with only a few lineages present, supporting the three clusters detected in 2023.
Phylogeographic assessment of viral movement
Movement of poliovirus across the 16 geographic regions over the past decade was inferred using a discrete trait analysis (Methods), and showed repeated cyclical movement of poliovirus between the southern regions of both countries. This particularly affected the South Corridor regions and Karachi (Fig. 2, purple and dark green regions). The virus was also inferred to circulate and expand in the south linked to importation from the North Corridor regions (Fig. 2, deep blue regions). Interestingly, cyclical movement of the B lineage between Karachi and the South Corridor regions during the 2019–2020 outbreak was observed with the full dataset (Fig. 2), but not picked when restricting the analysis to AFP data alone (Fig. S3). Indeed, almost all AFP detections in the South Corridor Afghanistan region (72/73) and the Karachi region (6/7) in 2019 and 2020 were linked to the now extinct A lineage.
Overall, 895 (95% HPD 867–922) transitions were inferred between regions over the full time-period when including all data (Fig. 3), compared to only 268 (95% HPD 253–283) without the inclusion of ES data (Fig. S3). Splitting numbers of inferred transitions by region suggests the city of Karachi has been the primary region seeding transmission in other regions over the decade of the study (Fig. 3B). This association is found both with and without the inclusion of ES data, although a significantly greater number of exportations are estimated when both AFP and ES data are considered (all data: 240 exportations; 95% HPD: 212–266, AFP data: 63 exportations; 95% HPD 40–82, Fig. 3B, Fig. S3C). This analysis also highlights major historical importers of poliovirus, with the North and South Corridors of Pakistan being consistently high importers (Fig. 3A). Sindh and the East Pakistan region are found to be major importers only when ES data are included (Fig. 3A and S3B), suggesting that many of these importations, which are only detected through ES, do not circulate for a long enough time to result in paralysis cases.
Local transmission lineages
We define local transmission lineages (LTLs) as clusters of phylogenetically linked detections in the same geographic region (Fig. 4A shows descriptive diagrams of the splitting process and classification into distinct categories). Using these LTLs, the estimated mean time to detection of a virus imported into a region was 93 days (CI 85–101) overall across the whole period. Looking at the regions supporting persistent transmission in 2023 (the Central Corridor Pakistan and North Corridor Afghanistan regions), this was 112 days (CI 89–134), compared to 92 days (CI 84–101) in the other regions. Without the early warning given by ES, we estimate an overall mean time to detection of an imported cluster of 166 days (CI: 144–188). This corresponds to an estimated average improvement of 73 days (equivalent to around 2 and a half months), in the time to detection of an emerging lineage with the ES system, over AFP surveillance alone.
We classified the reconstructed LTLs into four categories based on the number of detections, the duration of detection, and the number of exportation events associated with each lineage. These categories are ‘Dead end’ (circulating for less than 6 months with less than 5 associated sequences), ‘Persistent’ (circulating for greater than 1 year with greater than 10 associated sequences), ‘Export’ (Any lineage with onwards transmissions not falling into the previous categories and ‘Other’ (Any remaining lineages not classified). The median number of lineages falling into each category over time is shown in Fig. 4B, with the 95% HPD across 300 sampled phylogenies shown in the Supplementary Information (Fig. S8). Median numbers of lineages obtained with only AFP data are also provided in the Supplementary Information (Fig. S4). Over the full period, an average of 72% (median: 588/816) of all detected LTLs across reconstructed phylogenies were dead ends, not leading to onwards transmission in other regions or persistent circulation, suggesting good population immunity across Pakistan and Afghanistan. Only the Central Corridor Pakistan (median: 8; CrI: 7–9), Karachi (median: 8, CrI: 6–11) and South Corridor Pakistan (median: 8, CrI: 6–10) regions were found to have significantly greater than 5 persistent LTLs, highlighting their historic importance as long term harbors of poliovirus transmission, and key regions to target for poliovirus eradication. Long-term persistence has also been supported in the two North Corridor regions on either side of the border, although with a smaller number of distinct persistent lineages.
Grouping the LTLs into categories also shows that persistent transmission in most regions was interrupted following the 2019–2020 outbreak (Fig. 4). Each of the North Corridor of Afghanistan and the Central Corridor of Pakistan regions, the current focal regions for polio eradication efforts in the region, are found to have harbored a single persistent LTL each since early 2021. These regions supported the majority of the continued circulation of polio during the low transmission period (clusters 2 and 3 on Fig. 2A). Another persistent lineage was hosted in the city of Karachi, driven by a single detection in wastewater in mid-2023 (cluster 1 on Fig. 2A). This detection was highly divergent to other viruses in the region (5.4% divergent from any previously sequenced virus), with our analysis suggesting this isolate had a most recent common ancestor in May 2020, (2020-05-20, CrI 2020-01-25 to 2020-08-22), around 3 years before its detection. The reasons for this genetically distinct detection and any potential reservoir which supported its long-term persistence have yet to be identified.
All recent circulation (apart from the single detection in Karachi in mid-2023) has been linked to the two distinct persistent chains of transmission in the North Corridor Afghanistan and Central Corridor Pakistan regions (clusters 2 and 3 on Fig. 2A), with spill-over events into other regions seen towards the end of the included data. Full plots showing which regions poliovirus was imported from and exported to for each LTL are provided in the Supplementary Information (Fig. S2).
Recurring patterns of spread
The 2013-14 and 2019-20 outbreaks occurred primarily in the same set of focal regions, with both outbreaks emerging from exportations from the North Corridor (Fig. 2A, Fig. 3C and D). This exportation is a pattern that repeats towards the end of the included data, in early 2023. This is less clear without ES data, where the main stem (the most common ancestral line) of the phylogenetic tree is inferred to move between regions more regularly (Fig. S3A). Although large numbers of AFP cases are detected in the Central Corridor of Pakistan during both outbreaks, very little exportation is observed from this region into other areas (Fig. 3C and D). The role of the city of Karachi to amplify and repeatedly export poliovirus during outbreak periods, with cyclical movement between this region and the Southern Corridor on both sides of the border is also visible on the phylogenetic tree (Fig. 2A and Fig. 3C and D). This cyclical movement is again less clear without the inclusion of ES data, although we do infer exportations from both Karachi and the surrounding Sindh region into the South Corridor with AFP data alone.
Sensitivity to sampling
AFP data are considered here a proxy unbiased sample for comparison with the mixed AFP and ES analysis, and the results using these data are presented in the Supplementary Information. AFP surveillance is often considered a gold-standard method for poliovirus detection due to the high reporting likelihood of such a severe outcome. Despite this, the non-Polio AFP reporting rate, which is independent of poliovirus transmission in each of the regions, increased over time in all regions and varied widely between regions (Fig. S5). A method to incorporate sampling bias into estimates of movements between regions on the AFP tree was applied to polio data and presented in the Supplementary Information.20 This method accounts for sampling rate differences between regions at the sampled tips, in a similar way to previous published work.21 The results with this correction support the findings presented using the AFP data only, with all estimated rates falling within the bounds of the corrected analysis.