Ratio of samples affected by mutations in different primer system TRs
A raw list of mutations overlapping PCR primer TRs in the investigated samples is uploaded to a GitHub repository at github.com/csabaiBio/coveo_pcr_primers2021. For details on mutation filtering criteria, see Methods.
A total of 665,325 good-quality SARS-CoV-2 genomic samples were analyzed from the CoVEO database (see Methods) collected in 2021. Most of these samples were Alpha or Delta variants, while the proportion of other VOCs (Variants of Concern) was negligible. Samples that could not be unambiguously categorized to WHO-designated lineages or were classified to a lineage other than Alpha, Beta, Gamma, Delta or Omicron were assigned the umbrella term “other variants” (Table 2). The low number of Omicron samples in the dataset is due to the fact that sample collection was limited to the 1st of January to the 31st of December of the year 2021.
Table 2. Total number of SARS-CoV-2 samples analyzed, and ratio of samples affected by a genomic variation in at least one investigated TR
|
All samples
|
Alpha
|
Beta
|
Gamma
|
Delta
|
Omicron
|
Other variants
|
Number of analyzed samples
|
665,325
|
152,891
|
2,664
|
4,219
|
367,727
|
2,459
|
135,365
|
Ratio of samples with variants in any of the investigated TRs (%)
|
96.24
|
99.33
|
98.46
|
98.91
|
99.26
|
99.35
|
84.39
|
We found reliable genetic variations in 1,826 of all 2,188 genomic positions overlapping the 141 primer or probe binding sites (TRs) in the investigated SARS-CoV-2 samples. In many cases, different primer sets target the same sections of the genome. For example, primer systems designed for the E gene of the genome necessarily share some of their TRs due to the short length of the gene (Figure 1a-b). The E gene also has a low estimated mutation rate (Figure 1c), in line with basic intuition that primer systems are best designed to target relatively conserved regions of the genome.
Most of the mutations affecting the TRs were point mutations (with a slightly higher frequency of transitions (1,677) than transversions (1,510)), while the numbers of distinct deletions (79) and insertions (23) were significantly lower.
The ratio of samples with any variants in the TR of a given primer system (any of its forward primer/probe/reverse primer regions) was calculated (Figure 2, bottom panel). We found that even for the primer system targeting the seemingly most conserved genomic regions (Mollaei-ORF1ab), 1,062 samples contained at least a single mutation in the TRs. On the other hand, the ratio of samples affected by at least one variant is below 10% for 43 of the 53 investigated primer systems. In the TRs of the remaining 10 primer systems a considerable fraction of the samples had at least one variant: almost 80% of the samples contained a mutation in the TRs of primer system Niu-N; about 55% of samples had a variant in the TRs of primer systems Niu-RdRp, Won-RdRp-1, Corman-RdRp, Tombuloglu-RdRp, and Sarkar-S, furthermore around 10-20% of samples were mutated in the TRs of primer system Davi-S-2, Davi-S-1, Young-S and Davi-ORF1a-4.
Different variant strains show highly diverse mutational patterns in various primer systems (Figure 2, top six panels). While many samples tend to have a mutated TR in the Niu-N primer system independent of their lineage, the TRs of many primer systems are almost exclusively mutated in samples of a specific variant (e.g. Davi-S-1 and Davi-S-2 systems are mainly affected in Alpha samples, the TRs of the Young-S system are usually mutated in Alpha and Beta samples, etc.).
This result suggests that the performance of a given primer set largely depends on the specific genomic characteristics of the presently circulating most dominant lineages. Thus, PCR efficacy should be dynamically reevaluated throughout the course of the pandemic.
Possible effects of mutations on PCR amplification
We calculated the ratio of samples with a single, two, and three or more genomic variations in the TRs of a given primer system. Our results show that samples have on average 1.16 mutations in the TRs of the investigated primer systems. As shown in Figure 3, most of the affected samples have only a single variant position (Figure 3a green bars) over the TRs. Nevertheless, there are a few samples for every primer system with two or more variations present in the TRs (Figure 3a yellow and red bars), but their number is generally below 1,000, accounting for less than 0.5% of all samples. Remarkably, more than 150,000 samples (about 23% of all samples) contained at least three mutations in the TRs of the Niu-N primer system, with one of the samples presenting seven variant positions.
As a next step, we examined the type of the detected variants and their location in the TRs of different primer systems and categorized them as either “high risk” or “moderate risk” mutations (see Methods for details). 0.012% to 55.25% of the samples contain high risk mutations for a particular primer set. The distribution of samples with variants belonging to different risk-categories is presented in Figure 3b. Most of the samples that had any mutations in the TRs of any given primer system contained only variants with no drastic effect on PCR efficiency based on their location. For example, the highly mutation-prone TRs of the Niu-N primer system usually contain variants at moderately risky positions which are unlikely to disrupt the PCR process. In contrast, the TRs of two primer systems (Niu-RdRp and Corman-RdRp) are mutated in high risk positions in many samples, comprising around 55% of the total samples analyzed.
Based on our results, the most common high and moderate risk mutations that were identifiable in the majority of samples are listed in Table 3.
Table 3. Summary of the most frequent mutations in the TRs of investigated PCR primer systems.
Primer
|
Mutation
|
Ratio of mutated samples in the CoVEO database
|
Ratio of mutated samples by WHO designation (*)
|
Corman-RdRp-FH, Niu-RdRp-FH, Tombuloglu-RdRp-FM, Won-RdRp-1-FM
|
SNP:
G15451A
|
54.92%
|
Delta (90.32%), Other variant (24.53%), Beta (<1%), Gamma (<1%), Alpha (<1%)
|
Sarkar-S-FM
|
SNP:
C21618G
|
53.15%
|
Delta (89.14%), Other variant (19.07%)
|
Niu-N-FM
|
SNP:
G28881T
|
44.50%
|
Delta (77.19%), Other variant (9.02%), Alpha (<1%)
|
Davi-S-1-PM,
Davi-S-2-PM
|
SNP:
C23271A
|
23.09%
|
Alpha (95.97%), Other variant (5.11%), Delta (<1%)
|
Niu-N-RM
|
SNP:
C28977T
|
18.87%
|
Alpha (80.27%), Other variant (2.09%), Delta (<1%)
|
Young-S-FH
|
Deletion:
ATACATG21764A
|
17.09%
|
Alpha (71.53%), Other variant (3.16%), Beta (<1%), Delta (<1%)
|
Niu-N-FH
|
„AAC”-triplet:
G28881A,
G28882A, G28883C
|
23.35%
|
Gamma (87.34%), Alpha (85.44%), Omicron (85.16%), Other variant (13.98%), Beta (<1%), Delta (<1%)
|
Primer names are based on the nomenclature: [first author last name]-[target gene name]-[id, when multiple primer systems target the same gene]-[type of oligo: forward (F), reverse (R) or probe (P)]. „M” marks the primers where the variant was defined as a moderate-risk mutation; „H” marks the primers if the variant was defined as a high risk mutation. Mutation names are based on the nomenclature: [reference base][genomic position of the start of the variant][alternate non-reference base]. Asterisk: ratio of samples which contain the mutation in a given WHO designation. Lineages with no mutated samples are not listed. Abbreviations: SNP - Single-Nucleotide Polymorphism.
Potential false-negative results due to misclassification
Since diagnostic COVID-19 tests generally aim to amplify several gene regions simultaneously, thus employing primer sets of multiple primer systems, we investigated whether there are samples with damaged TRs (see Methods for definition) in multiple primer systems of specific primer sets. We differentiated between samples having a “slight change of misclassification” and samples “susceptible to misclassification” with a primer set based on the number and ratio of damaged TRs in the primer systems of the given set. Samples with no damaged TRs in the set and sufficient sequencing depth for all of them were regarded as having “no reasonable chance of misclassification” (see Methods for details).
A relatively large number of samples had a slight chance of misclassification with the Niu-, Corman- or Young-sets, with respectively only 10.03%, 29.48% and 37.12% of them having evidence of absolutely no damaged TRs (Figure 4).
Nevertheless, there is only a negligible number of samples (with a maximum ratio of 0.51% for the Chu-set) susceptible to misclassification with any of the investigated primer sets, and in most cases, only very few TRs of a primer set are damaged simultaneously in each sample. Based on these observations, for most primer sets, a dominant part (49.47-96.83%) of the investigated samples could be reliably detected as positive ones if partially inconclusive results are not rejected automatically by the test protocol (i.e., if a primer set consists of three primer systems, and among them, one is damaged, the result of the PCR is not automatically considered as negative).
An important additional insight is that the ratio of ambiguous samples (not shown in Figure 4) with no satisfactory coverage across all TRs for a definite categorization vary greatly for different primer sets. This is partly explained by the fact that the number of primer systems employed by a given set is also highly variable and statistically there is a smaller chance to obtain a sample with high enough coverage in all TR positions for 9 primer systems (e.g. for the Won-set 50.53% of all samples were ambiguous) than it is for a single one (e.g. for the DMSC-set the same ratio was 2.99%). On the other hand, some primer sets are notable exceptions to this trend. For example the Davi-set, also containing 9 primer systems, had inconclusive results for only 27.35% of the samples. On the contrary, for the Young-set with only 3 primer systems 43.6% of the samples were ambiguous.
It is also worth noting that primer sets with an overall low proportion of samples susceptible to misclassification can have an increased chance of failure in cohorts of samples belonging to a specific variant. For example, the IP-set showed an appeasing 0.49% for the proportion of samples susceptible for misclassification across all sample groups, but particularly for Omicron samples this ratio increased to 6.75%.
These results suggest that to truly minimize the number of samples susceptible to misclassification, it can be beneficial to simultaneously use three or more primer systems within a single PCR test. This way, even with a damaged TR, more than 50% of the employed primer systems would yield a positive test result. Notably, primer sets with at least 5 primer systems (Won-set, Davi-set, Mollaei-set) were extremely unlikely to misclassify samples due to mutations present in the TRs (see the light red columns on Figure 4, bottom panel).
Additionally, given that primer sets perform differently across variant groups, it is important to continously surveil the ratio of samples prone to misclassification to determine whether the given primer set is suitable for the detection of SARS-CoV-2 samples of the presently spreading lineage.
Ratio of samples having a slight chance of or being susceptible to misclassification over time
It is also a matter of concern to monitor the relative occurrence of variants on the TRs of different primer systems over time to predict if a primer set is at danger of becoming obsolete as new strains of the virus emerge. The ratio of samples having a slight chance of (Figure 5a) and being susceptible to (Figure 5b) misclassification was calculated over time using a 30-day rolling average method.
Most of the primer sets analyzed in this work (with the exceptions of the Davi-, Sarkar- and Tombuloglu-sets) were designed in 2020 at the beginning of the pandemic, with only a few SARS-CoV-2 genomes available, hence the mutational patterns of the more recent Alpha and Delta lineages were inaccessible at the time.
With the appearance of the Alpha variant in early 2021, the number of samples with at least one damaged TR of the Niu- and the Young-sets increased. Around June, with the emergence of the Delta variant, the mutations that damaged the TRs of the Young-set disappeared from the dominant portion of the samples, as Delta variants lack this mutation. At the same time, new mutations appeared in the TRs of the Corman-set, causing the ratio of samples having a slight chance of misclassification with this primer set to increase. This trend seems to be reversing since the widespread arrival of Omicron samples, which also induced the decrease in the ratio of samples having a slight chance of misclassification with the Niu-set. However, the TRs of the Sarkar-set seem to be gaining damaging mutations in Omicron samples, thus samples having a slight chance of misclassification with this primer set are getting more frequent since November of 2021.
The ratio of samples susceptible to misclassification remained below 4% for the entire timeline for all investigated primer sets. It is interesting to observe, however, that the Tombuloglu-set, made public in March, 2021 would have been significantly less efficient on samples leading up to the publication of the primer set than on samples sequenced after March, 2021. This suggests that this primer set was optimized to detect strains emerging right around the time of its development, rendering it a state-of-the-art detection method of the time.
In case of the Chu-set, it appears that the then dominating Alpha variant may have acquired increasingly more frequent mutations that damaged the TRs of this primer set resulting in an elevated ratio of samples susceptible to misclassification, but the arrival of the new VoC (Delta) decreased their ratio by spreading a different mutational pattern. On the other hand, the emergence of the Omicron variant seems to be negatively affecting the performance of the relatively vulnerable IP-set employing only two primer systems.
It is important to note that either the spread of a new variant or simply the emergence of a damaging mutation within the dominant strain might drastically increase the number of samples prone to misclassification for any given primer set. Thus, it is essential to continuously monitor genomic variations overlapping the TRs of primer sets used in routine diagnostics.
Comparison with the GISAID database
We compared our results with genomic variants found in SARS-CoV-2 samples from the GISAID (www.gisaid.org33) database collected in the same time period as our original sample set, where a total of 6,287,362 samples (Number of samples classified by WHO-lineages: Alpha: 901,778, Beta: 311,561, Gamma: 407,404, Delta: 3,949,899, Omicron: 418,792) were analyzed. We found genetic variants in all 2,188 genomic positions mapped to 141 primer or probe TRs in the investigated samples. In this analysis, we only focused on point mutations, as the number of insertions and deletions was difficult to determine with high confidence. We found that the ratio of GISAID samples containing either mutations of any kind, moderate risk mutations or high risk mutations in the TRs was similar to that of in the CoVEO database for all analyzed primer systems. The most frequent mutations overlapping the TRs in the CoVEO database are also present in the GISAID database with a similar frequency of affected samples (G15451A: 63.52%, C21618G: 63.94%, G28881T: 64.32%, C23271A: 17.51%, C28977T: 17.49%, „AAC”-triplet: 27.51%) (for comparison see Table 3). When analyzing GISAID samples over time, we found that samples susceptible to misclassification were present at a daily rate of 2% or lower. On the other hand, the daily ratio of samples having a slight chance of misclassification with a certain primer set could reach almost 100%, similarly to our results on the CoVEO database.
The consistent results acquired across multiple databases suggest that the mutations observed in CoVEO samples overlapping the TRs are not due to sequencing artifacts or the by-products of the bioinformatical analysis pipeline, but are in fact true genomic variants occurring frequently and possibly affecting PCR test accuracy. Even though the obtained results are in great agreement across data providers, it should be underlined that samples of the CoVEO database were processed with a single, standardized, publicly available workflow, while GISAID consensus sequences are generated individually by data uploaders. Moreover, the CoVEO database contains detailed information about genomic variants (sequencing depth, alternate allele frequency, alternate alleles by read orientation, etc.), which can be utilized to specifically filter variants based on different scientific research requirements.