A cohort of vaccinated and non-vaccinated samples from COVID-19 patients in Kenya.
A total of 601 SARS-CoV-2 samples collected from October 2021 to December 2022 were selected for analysis. This included 234 samples from non-vaccinated and 367 from vaccinated patients. All samples were reported to be SARS-CoV-2 positive based on RT-PCR testing and included information on the vaccination status as either yes or no. All samples were obtained from residents in the Kenyan counties of Bungoma, Busia, Homabay, Kakamega, Kisii, Migori, Nyamira, Trans Nzoia, Vihiga and West Pokot (Fig. 1A). The cohort contained 347 females and 254 males, with ages ranging between 20 and 50 (Fig. 1B).
We mapped this cohort to globally available sequences on UShER, where we show that the sequences were mainly from Delta and Omicron SARS-CoV-2 variants (Fig. 1C). This was expected based on the timing of sample collection. We identified the most frequently occurring SNPs in this cohort and noted that most genes are found on the S gene, ORF1a/b, and on the N gene (Supplementary Data 1). The most frequently occurring mutations in the S gene were D614G, H69_V70 deletion, T95I, G142_Y145 deletion, and T547. For the ORF1a/b gene, T3255I, L4715L, L3674_G3676 deletion, I3758V, and P3395H occurred the most, while in the N gene, P13L, E31_S33del, R203K, and G204R were the most occurring.
Analysis of recombination events in vaccinated and non-vaccinated patients.
With an increase in genomic surveillance, SARS-CoV-2 recombination events of interest have been reported globally, making recombination a key factor in virus evolution 16,20,22. Recombination events were evaluated to determine SARS-CoV-2 genetic evolution in vaccinated and non-vaccinated patients. We used ViReMa, a viral recombination mapper that identifies intrahost recombination events including deletions, insertions, duplication, copy-back, snap-back, and viral-host chimeric events as described previously 18,36,37.
An evaluation of SARS-CoV-2 transmission waves reveals differential recombination patterns.
The possibility of inter-variant recombination was assessed. Following the pattern of SARS-CoV-2 transmission waves in Kenya 41, we grouped samples collected in to two categories (Table 1): Samples collected at the peak of transmission of a particular variant (lineage) and samples collected in the transition period between two variant waves of transition (interwave) (Fig. 3A) (Table 1). During the initial B.1 variant transmission wave, we detected an average of 688 deletions and 668 duplication events per patient, followed by a significant jump in the Beta transmission wave which had 2645 deletion, and 2676 duplication events. At the peak of Alpha variant transmission wave, we detected an average of 957 deletions, and 1050 duplication events. For the Omicron variant peak transmission wave, we observed 276 deletion and 247 duplication events (Fig. 3A) (Table 1).
The interwave 2 (Beta and Alpha) samples showed an average of 3877 deletion and 3904 duplication events per sample whereas interwave 3 (Alpha and Delta) had 4158 deletion and 4184 duplication events. Except for the peak transmission wave for Delta variant which showed the highest number of recombination events per sample (13629 deletion and 13700 duplication events), high number of recombination events were observed in samples collected during the interwave periods. This observation suggests the possibility of intervariant recombination events arising from mixed variant infection.
Next, we assessed recombination hotspots on the SARS-CoV-2 genome between and during the transmission waves and identified the location and frequency of recombination events. In the initial transmission waves, such as B.1 and interwave 1, we observe multiple locations and high frequency recombination events in different regions of the SARS-CoV-2 genome (Fig. 3B). Interestingly, however, the most recent variant (Omicron) had high frequency recombination hotspots mainly in the ORF1 a/b, S and N gene (Fig. 3B). We also noted an increase of recombination hotspots in most of the interwave periods of transmission compared to the transmission wave peaks. These observations suggest that over time, there was natural selection of recombination events in the ORF1 a/b, S and N gene and an increase of recombination event hotspots and frequency during mixed infection (interwave). This data reveals insights into the recombination activities within and between peak variants transmission waves.
Table 1
Shows the average deletion, duplication, and insertion events per patient in SARS-CoV-2 variants in Kenya.
Average events per patient | B.1 N = 97 | Interwave 1 (IW1) N = 31 | Beta N = 407 | Interwave 2 (IW2) N = 142 | Alpha N = 306 | Interwave 3 (IW3) N = 38 | Delta N = 442 | Interwave 4 (IW4) N = 19 | Omicron N = 574 |
Deletion | 688 | 364 | 2645 | 3877 | 957 | 4158 | 13629 | 8945 | 276 |
Duplication | 668 | 441 | 2676 | 3904 | 1050 | 4184 | 13700 | 8674 | 247 |
Insertions | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 3 | 1 |
Analysis of SNPs between non-vaccinated and vaccinated patients reveals low-frequency unique non-synonymous mutations.
Recombination analysis identified genome positions of the four most common deletion events as 2883–2902 and 11286–11296 on the ORF1a/b, 21986–21996 on the S gene and 28362–28372 on the N gene. We analyzed the SNPs occurring in these recombination ‘hotspots’ to gain more insight on the genetic evolution in these regions. Of all the SNPs within the recombination hotspots, 66% in non-vaccinated and 69% in vaccinated patients were non-synonymous, whereas ≤ 34% were synonymous.
We also identified overlapping and unique SNPs, in the context of vaccinated and non-vaccinated. As shown on the Venn diagrams, 27% of all SNPs from vaccinated patients and 45% from non-vaccinated patients were unique to their vaccination status in the S gene (Fig. 4A). In the ORF1a/b gene, 32% of SNPs in vaccinated patients and 57% in non-vaccinated patients were unique to their vaccination status (Fig. 4B), and on the N gene, 42% of SNPs in vaccinated patients and 32% in non-vaccinated patients are unique to their vaccination status (Fig. 4C).
Further, we mapped all the unique non-synonymous SNPs on the S, ORF1a/b, and N genes to pinpoint their distribution on the functional domains of each gene product. As shown on the schematic representations of the gene products (Fig. 4B & 3C), mutations on the S gene were found to be distributed across the entire protein covering all the domains (Fig. 4A). Interestingly, however, on the ORF1a/b, we observed that all the unique mutations were concentrated within the N-terminal domain of the gene product, between nsp1 and nsp3 (Fig. 4B). This finding corroborates previous findings showing that nsp-3 is the gene with the largest number of non-synonymous mutations on the ORF1a/b region of SARS-CoV-2, and mutations in this region have been shown to affect the virus papain like protease inhibitors, GRL-0617 and S43 binding capabilities 42,43. Like the S gene, the N gene unique mutations were distributed across the entire gene product (Fig. 4C).
In-depth analysis of unique non-synonymous SNPs in the ORF1a/b, S, and N genes.
Focusing only on the non-synonymous unique SNPs in this cohort's S, ORF1a/b, and N genes, as they are likely to cause changes in the protein function, we sought to determine whether the unique SNPs have been previously reported. Although low in frequency, our analysis of SNPs in the ORF1a/b identified new mutations that have not been reported elsewhere. In non-vaccinated patients, the identified mutations include G150C, P371S, H533Y, E743D, K1763R, S1856P, I3476V, L3919R, T4355I, L4460F, T4847I, N4969S, S5529F, L5624F, L6519R, M6580K, Q6843L, and A7014V. (Supplementary Fig. 3A). Of interest among these mutations was I3476V, which appeared in 11 non-vaccinated patients in Nyamira county in Kenya and was not found in any vaccinated cohort. In vaccinated patients, the new unreported mutations include V214A, I281T, A702T, H1141N, V1291F, K1202N, K2741E, A3615V, V3708L, T5355M, L6174S, and S6537F, (Supplementary Fig. 3A). Mutations with the highest frequency in this cohort were V214A (n = 6), A702T (n = 4), A3615V (n = 4), and V3708L (n = 4) (Fig. 4A).
Unique mutations were also identified in the spike protein. In non-vaccinated samples, we show that mutations T95L, I197T, Y200F, L229F, C432R, F429P, T732I, L858P, A958S, V1096A, I1198V, G1219C, and C1243G, are new and have not been reported before (Supplementary Fig. 3B). In vaccinated patients, we identified new unique mutations such V6I, A27V, T33I, G72E, T95V, R214S, A260D, F318I, R326N, P330S, S371I, K417Q, G431A, L552P, F565L, V622A, Q628K, T638A, V642G, I670V, M740I, P812Q, R847T, L959S, K1038E, F1062L, L1063P, Y1067H, K1086E, V1129A, C1243F, and G1246V (Supplementary Fig. 3B). Notably, SNP K417Q was found in position 417, that lies close to the interface of interaction between the Spike protein and ACE-2 receptors of the host. Several studies have shown this position to be mutated from a K to a T, however, in our samples it’s mutated from a K to a Q.
On the N gene, we found the following new mutations in non-vaccinated patients that have not been previously reported: A35V, L45S, D63Y, A173V, A182S, Q228H, M322V, S327L, T329A, K361E, and T362K (Supplementary Fig. 3C). Mutations D3G, T24N, Q28R, R36Q, R40H, Q70H, Y86H, I94T, A152V, A155S, R203fs, G204Q, S206fs, M210_A211_delins, A211S, G212fs, G214fs, Q289H, P344L, and K370N were unique to vaccinated patients (Supplementary Fig. 3C).
Evaluation of a minority variant with linked co-mutations and recombination events.
We sought to determine if the unique SNP mutations, based on vaccination status in the S, N and ORF1a/b genes occur in the same patient and if there was any correlation with recombination events. We evaluated the mutations based on the location of the patients, the number of patients in the cohort, and the frequency of recurring (Table 2). In the S gene of the non-vaccinated group, samples from Bungoma, Kakamega, Kisii, Migori, and Nyamira counties showed common mutations that are unique to non-vaccinated patients (Table 2). The most frequent mutation was S255F, found in 5 out of 8 patients in Nyamira county. We also identified mutation G1219C in 2 patients in Migori County. On the ORF1a/b genes, unique mutations were found in Bungoma, Kakamega, Migori, and Nyamira counties. The most frequently occurring mutation was I3476V, found in 10 out of 15 patients in Nyamira. Other mutations frequently occurring in Nyamira samples were N4969S (5/15), S5229F (5/15), and P1640L (4/15) (Table 1).
Table 2
Patients with unique mutations in vaccinated and non-vaccinated patients in Kenya.
NON-VACCINATED |
SARS-CoV-2 Gene | Location | # of patients | Recurring mutations | Frequency |
S GENE | BUNGOMA | 23 | NONE | 0 |
| KAKAMEGA | | NONE | 0 |
| KISII | 1 | NONE | 0 |
| MIGORI | 2 | G1219C | 2/2 |
| NYAMIRA | 8 | S255F | 5/8 |
ORF1ab | BUNGOMA | 4 | NONE | 0 |
| KAKAMEGA | 16 | K1763R | 2/16 |
| | | E743D | 2 |
| | | H4533Y | 2 |
| | | S1856P | 3 |
| | | L3919R | 3 |
| MIGORI | 7 | L5624F | 2 |
| | | M6580K | 2 |
| NYAMIRA | 15 | I3476V | 10/15 |
| | | P1640L | 4 |
| | | A3209V | 2 |
| | | A7014V | 2 |
| | | G150C | 3 |
| | | N4969S | 5 |
| | | S5229F | 5 |
| | | T4355I | 2 |
NON-VACCINATED |
SARS-CoV-2 Gene | Location | # of patients | Recurring mutations | Frequency |
S GENE | BUNGOMA | 12 | NONE | 0 |
| KAKAMEGA | 4 | L212C | 2/4 |
| | | R214S | 2/4 |
| MIGORI | 14 | R346N | 2/14 |
| BUSIA | 1 | NONE | 0 |
ORF1ab | BUNGOMA | 3 | I4308T | 2/3 |
| KAKAMEGA | 9 | K2741E | 2/9 |
| | | T350I | 2/9 |
| | | K1202N | 2/9 |
| | | L6174S | 3/9 |
| MIGORI | 19 | A3615V | 4/19 |
| | | V2149A | 6/19 |
| | | V3708L | 4/19 |
| | | V702T | 4/19 |
| BUSIA | 2 | H1141N | 2/2 |
Mutations unique to vaccinated group of patients sampled from Bungoma, Kakamega, Migori, and Busia were also identified (Table 2). In the S gene, L212C (2/4) and R214S (2/4) were the most frequently occurring mutations in Kakamega. Whereas on the ORF1a/b gene, V2149A (6/19) was the most frequently occurring mutation event in Migori, followed by A3615V (4/19), V3708L (4/19), and V702T (4/19).
We next determined if these low-frequently unique SNPs in the S gene and the ORF1a/b co-occur in the same patients. Interestingly, we observed that in the non-vaccinated cohort, five samples in Nyamira had the same set of unique (only in non-vaccinated patients) mutations on S gene and the ORF1a/b (Fig. 5A). All five patients had mutations S255F on the S gene and I376V, N4969S, and S5339F on the ORF1a/b. S255F is an important mutation previously identified in the S gene, with immune escape properties 1,4, however, the I376V, N4969S, and S5339F are all unique to the non-vaccinated patients and have not been reported previously (Fig. 5A). This finding suggests a possible spread of a variant (minority variant) within a pocket of population and with an important immune evasion capability based on the presence of S255F.
To further characterize the evolution of this minority variant in this cohort, we analyzed the ViReMa recombination events in this group compared to other non-vaccinated patients (Fig. 5B). We noted that the top 5 recombination event positions for this cohort were 28247–28254, 76–26480, 75–27047, 4068–21432, 78–27769, and 18606–18985, which is different from 11286–11296, 2883–2902, 28362–28372, 21986–21996, 75–21562 found in other non-vaccinated patients (Fig. 5B). This observation portends that patients with this minority variant also had differential recombination events that could imply functional effects in virus's adaptation, fitness, and infectivity.