Genetic polymorphic features of KP pvama-1
The 416 bp sequences of pvama-1 flanking the DI (322-737 nucleotide positions) were successfully amplified in 84 P. vivax KP samples. Comparison of the sequences to the reference sequence, Sal I (AF063138), revealed that the 84 Pakistani pvama-1 classified into 57 haplotypes (Figure 2). Analysis of the KP pvama-1 sequences compared to reference Sal I identified a large numbers of single nucleotide polymorphisms (SNPs) in KP pvama-1 sequences. Among these, 68 were non-synonymous SNPs (nsSNPs) causing amino acid substitutions including 53 dimorphic, 10 trimorphic, 3 tetramorphic, and 2 pentamorphic. The two pentamorphic amino acid changes were R112K/T/E/S and S228D/N/R/K, The ten trimorphic amino acid substitution included N132D/G, A141E/G, E145A/G, K190E/Q, T191K/P, A199T/V, S209G/C, P210S/L, P223L/S, and V233L/P. While the three tetramorphic amino acid changes are K120R/S/G, E189N/K/G, and E227V/K/G. These amino acid substitutions were observed at varied frequencies in the KP samples. Among the 68 nsSNPS, the 59 have previously been reported in literature for P. vivax isolates from different geographical origins. However, the rest of 9 nsSNPs were specific to KP samples set (Table 1). These nsSNPs were observed at low frequencies (1.19%). Few nsSNPs such as K120R, N132D, L140I, A141E, K190E, E227V, and S228D were commonly observed with high frequency in KP, as well as some other continental pvama-1 sequences (Table 1). The KP pvama-1 showed overall haplotype diversity (Hd) of 0.978±0.008. A total of 62 segregating sites (S) and 67 mutations were identified for the samples. The Fu and Li D’s test inferred the effect of natural selection on genetic composition. The negative values of Tajima’s D implied an excess of low frequent polymorphism, suggesting the population size expansion (Table 2).
Table 1
The sixty eights SNPs identified in KP P. vivax samples in comparison to the reference pvama1 Sal I (AF063138) sequence
1 | R112K/T/E/S | 15 | L140I* | 29 | V170A | 43 | E201G | 57 | N226D |
2 | P113S | 16 | A141E* | 30 | M171T | 44 | M203T | 58 | E227V* |
3 | G117R# | 17 | N142D | 31 | A172T | 45 | G204D | 59 | S228D* |
4 | D118N | 18 | K144T | 32 | V184A | 46 | R206G | 60 | N231D# |
5 | Q119H | 19 | E145A/G | 33 | K188N | 47 | S209G# | 61 | V233L/P |
6 | K120R* | 20 | K148Q | 34 | E189N/K/G | 48 | P210S/L* | 62 | Y234H# |
7 | F126S | 21 | D149N | 35 | K190E* | 49 | A212V# | 63 | L235S |
8 | N130K | 22 | M153T | 36 | T191K/P | 50 | N214S | 64 | N238D# |
9 | A131T | 23 | I159T | 37 | C192R | 51 | R215T | 65 | R240C |
10 | N132D/G* | 24 | A160T | 38 | H193Y | 525 | V218L | 66 | N241D |
11 | D133N | 25 | L161V | 39 | M194V | 53 | F221L | 67 | D242E |
12 | H134R | 26 | C162R# | 40 | Y196H# | 54 | K222N | 68 | W234R |
13 | S136T | 27 | A166P | 41 | S198P | 55 | P223L# | | |
14 | T139A | 28 | A167P | 42 | A199T/V | 56 | K225E | | |
* The common nsSNPs identified in KP and otherP. vivax samples deposited in Genbank, NCBI. |
#Novel amino acid polymorphism in KP sample acquired from different region and high and low frequency observed. |
Table 2
The neutrality test and genetic polymorphism estimation for Pvama-1 domain-1 DNA sequences of KP-Pakistan and global samples
Countries | Total isolates(n) | Segregating sites (S) | Singleton variable sites | Parsimony informative sites | Total no of Mutation | K | H | Hd± SD | π ± SD | Tajimas D | D*(F&L) | F*(F&L) |
KP (Pakistan) | 84 | 62 | 41 | 21 | 67 | 7.335 | 57 | 0.978 ±0.008 | 0.01763 ±0.00084 | -1.490 P > 0.10 | -5.167 P< 0.02 | (-4.418) P< 0.02 |
China-Myanmar boarder | 73 | 22 | 3 | 19 | 24 | 6.249 | 25 | 0.914 ±0.021 | 0.01502 ±0.00077 | 0.81996 P> 0.10 | 0.00217 P> 0.10 | 0.36221 P> 0.10 |
Iran | 80 | 19 | 4 | 17 | 23 | 7.101 | 30 | 0.975 ±0.010 | 0.01707 ±0.00072 | 1.17202 P> 0.10 | 0.43585 P> 0.10 | 0.81792 P> 0.10 |
Korea | 66 | 23 | 4 | 19 | 23 | 4.205 | 15 | 0.782 ±0.047 | 0.01011 ±0.00111 | -0.40425 P> 0.10 | 0.32761 P> 0.10 | 0.07398 P> 0.10 |
Myanmar | 38 | 45 | 23 | 22 | 47 | 8.580 | 37 | 0.999 ±0.006 | 0.02063 ±0.00091 | -0.83341 P> 0.10 | -1.88701 P> 0.10 | -1.80822 P> 0.10 |
PNG | 100 | 21 | 2 | 19 | 22 | 6.10667 | 28 | 0.941 ±0.009 | 0.01468 ±0.00057 | 1.28215 P> 0.10 | 0.95445 P> 0.10 | 1.28946 P> 0.10 |
Sri lanka | 23 | 15 | 3 | 12 | 15 | 0.01024 | 9 | 0.858 ±0.047 | 0.01024 ±0.00158 | 0.17272 P> 0.10 | 0.44859 P> 0.10 | 0.42662 P> 0.10 |
Venezuela | 73 | 12 | 0 | 12 | 13 | 4.718 | 12 | 0.847 ±0.019 | 0.01134 ±0.00044 | 2.15646 P< 0.05 | 1.43284 P< 0.05 | 2.06299 P< 0.02 |
Thailand | 100 | 18 | 1 | 17 | 23 | 6.6000 | 34 | 0.919 ±0.015 | 0.01587 ±0.00057 | 1.43285 P> 0.10 | 0.59998 P> 0.10 | 1.09630 P> 0.10 |
India | 59 | 23 | 4 | 19 | 26 | 7.10286 | 41 | 0.980 ±0.008 | 0.01707 ±0.000 | 0.86316 P> 0.10 | 0.21989 P> 0.10 | 0.53811 P> 0.10 |
S: number of polymorphic sites (Segregating sites), K: average number of pair-wise nucleotide differences, H: haplotype, Hd: haplotype diversity, π: observed average pair-wise nucleotide diversity, D* (F&L): Fu and Li’s D* value, F* (F&L): Fu and Li’s F* value. P value < 0.05 is considered as significant difference. |
Haplotype Networking Analysis
Total of 57 KP pvama-1 haplotypes were identified for the 84 isolates sequences with the haplotype diversity (Hd) of 0.978 (±0.008). The haplotype (H14) was identified with high frequency and shared among samples collected from six different KP districts including, Kohat, Hungo, Buner, Swat, Timergara and Bannu. The haplotype (H5) was identified as second predominant haplotype shared among samples collected from five different KP districts (i.e. Mardan, Swat, Hungo, Bannu and Kohat). The haplotype (H3) was also found with the highest frequency in samples collected from Swat, Mardan, Peshawar and Bannu. The pairwise AMOVA (Analysis of Molecular Variance) inferred the pairwise distances among haplotypes. The haplotype-53, i.e. predominant in Peshawar samples, was identified as distinct and showed significant genetic differentiation against the haplotype-6 and haplotype-55. The H-6 and H-55 were identified with high frequency in samples collected from Mardan and Peshawar regions respectively. The size of each node in haplotype network plot indicates the frequency of a particular haplotype. The length of the line between nodes is proportion to the number of nucleotide substitutions, composing the haplotypes. The majorly shared haplotypes of KP samples, collected from different districts appeared on shared nodes, however, some haplotypes for samples collected from Timergara, Peshawar, Kohat and Hungo districts occupied distinct nodes in the network plot which inferred their distinctive features (Figure 3).
The functional impact of the novel nsSNPs was assessed with respect to amino acids substitution in the IURs motifs of pvama-1. This region considered important in vaccine designing and diagnosis based on pvama-1 [30]. None of the residue substituted due to KP samples specific novel SNPs are mapped within the disordered regions of the pvama-1. The result showed that two SNPs i.e. M171T, V172T mapped within the IURs motifs and four SNPs i.e. R240C, N241D, D242E and W243R were detected in RBC binding region, while most of the amino acid changes caused by nsSNPs were mapped within the predicted B-cell epitopes of pvama-1. Among novel nsSNPs, the G117R, S209G, A212V, and P223L, being identified in current study, are mapped within the epitopic region of pvama-1 and predicted to modulate the possible host immune response. The top lead epitopes were predicted based on BepiPred - 2.0 threshold score of > 0.5. The region comprises of 240-254 residues have four SNPs (i.e. R240C, N241D, D242E, and W243R) that mapped within the B-cell epitopes as well as RBC binding sites.
Recombination and linkage disequilibrium analysis
The recombination events across pvama-1 and decline of LD index R2 with the increase of nucleotide distance was identified for the KP samples. This speculate high meiotic recombination events across the pvama-1 in the KP samples (Figure 4). The R value for KP samples were observed higher compare to those of East Asian (i.e. China Myanmar boarder, and Korea), South Asia (Sri lanka) samples, while lower than those of the Myanmar samples. The lowest R value for Myanmar samples depicts opportunity of high multiclonal infections, cross fertilization and recombination. The higher values of recombination and rapid LD decay observed in KP and some other geographical samples indicate high meiotic recombination in pvama-1, supporting the recombination as a possible factor to provoke genetic diversity (Table 3).
Table 3
Comparison of different estimates of recombination in Pvama-1(Domain-1) among KP and global P. vivax samples
| Ra | Rb | Rm |
KP, Pakistan | 0.0892 | 37 | 5 |
China Myanmar | 0.0506 | 21 | 5 |
Iran | 0.0846 | 35.1 | 6 |
Korea | 0.0022 | 0.9 | 2 |
Myanmar | 0.1940 | 8.6 | 6 |
PNG | 0.0513 | 21.3 | 5 |
Sri lanka | 0.0154 | 6.4 | 4 |
Venezuela | 0.0381 | 15.8 | 4 |
Thailand | 0.0879 | 36.5 | 5 |
India | 0.1210 | 50.2 | 6 |
Abbreviations: |
Ra recombinant parameter between adjacent sites, Rb recombinant parameter for the whole genes, Rm minimum number of recombinant events. |
Nucleotide diversity across pvama-1 in context of global isolates
The sequences of KP isolates (n = 84) were compared to the global pvama-1 sequences deposited in Genbank. The values of K and π observed for KP sequences were more or less similar to previously reported sequences from Iran and India, while differentiated from rest global sequences (Table 2). The fixation index Fst statistic was used to assess the genetic differentiation across pvama-1 gene among KP samples collected from different regions as well as in context of global samples. The pairwise analysis inferred genetic distinction of samples collected from Swabi district compare to rest of the KP regions. The top Fst differentiation was detected between the Bannu and Swabi isolates (Fst = 0.16258, P-value = 0.00977), followed by Swabi and Kohat (Fst = 0.12932; P-value = 0.04199) samples. The lowest Fst depicted between Swat and Bannu groups (Fst = -0.07427; P- value = 0.96973), followed by Swat and Hungo groups (Fst= -0.06635; P-value = 0.89551) (Figure 5A). In context of global samples, marked genetic distinction inferred for KP samples compare to India, Iran, Thailand, Sri-Lanka, Korea, Venezuela, Myanmar, PNG, and China-Myanmar. High genetic differentiation was observed between KP and Korean samples (Fst = 0.40915). The Korean samples showed significant genetic distinction in pairwise comparison to rest of the global samples. Meanwhile, least genetic differentiation was observed among KP, Iranian, and Indian samples (Figure 5B). The highest pairwise net number of nucleotide variation (DA) and mean pairwise differences (πxy) was observed between Bannu and Swabi samples (Figure 5a), i.e. congruent to Fst analysis. The highest within population genetic differentiation (π) was found for Korean samples followed by South East Asian samples (Figure 5b). Pearson correlation plot showed relationship among KP, Sri lanka, Iran, India and Myanmar samples, congruent to pairwise Fst (Figure S1). The plot showed correlation among the populations in hierarchical order. However, the Korean samples showed high genetic distinction in term of Fst value, probably due to geographical separation. Distinction for Korean samples also depicted in correlation plot. Likewise, the PCA plot also unveiled the samples clustered with more or less same fashion (Figure S2).
The AMOVA test was performed to determine genetic differentiation at single and multiple loci because of variation within a population group as well as between population groups. The AMOVA analysis depicted that genetic diversity in KP samples set mainly arose due to within population differentiation i.e. 100.24%, instead of among groups differentiation (-0.24%). This indicates limited or no genetic differentiation in KP samples despite their geographical distinction across the KP region of Pakistan (Table 4).
Table 4
AMOVA-based genetic differentiation analysis across Pvama-1 (domain-1) in samples acquired from different districts of KP, Pakistan
Source of variation | d.f | Sum of squares deviation | estimates of Variance components | Percentage of variation | P-value |
Among populations | 8 | 28.742 | -0.00888 Va | -0.24% | P= 0.50556 |
Within populations | 75 | 275.650 | 3.67534 Vb | 100.24% | P= 0.50556 |
Total | 83 | 304.393 | 3.66646 | | |
Local Fst value = -0.00242 |