Sequencing of Plasmodium falciparum and analysis of allele frequency
High-quality sequence data obtained from 25 P. falciparum clinical isolates collected from the West Arsi of Ethiopia enabled the identification of 672,956 biallelic SNPs with less than 10% missing SNPs data and < 5% sample missing data in individual isolate. All isolates had 95.95% (645715/672,956) SNPs call. Sequences from the intergenic regions had lower read coverage compared to those sequences in the coding regions, and as a result, 78.92% (531120/672,956) of all SNPs called were located within genes. Of 5,058 genes analyzed, 3,370 genes had at least one SNP (Figure 1A). About 18517 SNPs were polymorphic marker in at least one sample in Ethiopian (n=25) samples of which 43.4 % (8,037/18,517) were non-synonymous coding SNPs, 22.8 % (4,222/18,517) synonymous coding SNPs, 26.6 % (4,932/18,517) in intergenic regions, 3% (587/18,517) intragenic regions and 3.5 % (656/18,517) SNPs in intron region. Similarly, P. falciparum populations from Cambodia (n=46), DR Congo (n=50), Malawi (n=50), and Thailand (n=49) had 32,854, 68,476, 79,250 and 30,427 biallelic polymorphic SNPs marker in at least one sample, respectively (Figure 1B).
In general, all populations had a high percentage of non-synonymous coding SNPs at polymorphic marker consistent with previous findings [17]. SNPs with minor allele frequency (MAF) <5% were common in all analyzed P. falciparum populations following exclusion of monomorphic SNPs in each population. Further, SNPs with minor allele’s frequency of < 5% occurred more frequently in samples from Malawi than in Ethiopian isolates (Figure 2).
Genetic diversity of Ethiopian Plasmodium falciparum population
The overall π value in Ethiopian P. falciparum population was 0.00022 which is relatively lower than the genetic diversity in Malawian P. falciparum [20]. High variability of genetic diversity π values across the chromosomes was observed in Ethiopia with minimum value of 0.00015 in chromosome 12 and maximum value of 0.00045 in chromosome 4 (Figure 3). This was consistent with variable genetic diversity π values observed in different chromosomes of other P. falciparum population [35].
Genomic diversity of Plasmodium falciparum infections
FWS scores ranged from 0.837 to 0.997 (mean = 0.97, median = 0.99) for Ethiopian P. falciparum infections whereas the FWS values in Cambodia ranged from 0.702 to 0.999 (mean = 0.962, median = 0.995), from 0.483 to 0.998 (mean = 0.94, median = 0.994) in Thailand, from 0.321 to 0.998 (mean = 0.94, median = 0.994) in DR Congo and from 0.194 to 0.997 (mean = 0.747, median = 0.762) in Malawi (Figure 4; Additional file 2).
The FWS value of > 0.95 suggests that the individual samples predominantly contained single genotype and could have other additional genotypes in lower proportions. In this study, FWS values of > 0.95 were observed in 84%, 79.6%, 78%, 50% and 36 % of samples from Ethiopia, Thailand, Cambodia, DR Congo, and Malawi, respectively.
The mean FWS scores of Ethiopian P. falciparum population were not significantly different from Cambodia’s (Welch two Sample t-test, P = 0.42) and Thailand’s (Welch two-sample t-test, p = 0.083) at 95% confidence intervals. However, mean FWS was significantly higher in Ethiopia compared to DR Congo (Welch two-sample t-test, p = 5.603e-06) and Malawi (Welch two-sample t-test, p = 3.242e-08) at 95% confidence intervals.
Population structure and admixtures
Analysis using PCA revealed the presence of four clear major population groups of isolates, which were coincident with their geographical origins (Figure 5A-C).
Similarly, the findings from admixture analysis were consistent with the PCA clustering. The isolates from the three regions were distinguished. This admixture analysis showed that four major components could be differentiated with a cluster value of K = 5. Multiple parasite subpopulations were observed in Malawi and DR Congo parasite populations suggestive of gene flow between these two populations (Figure 6). There was no detectable gene flow between the isolates from Ethiopia and East African or Southeast Asia.
The clustering of Ethiopian P. falciparum isolates was consistent with the fixation index (FST) values with or without correcting for sample size. The FST values of Ethiopian isolates versus those from the two other East African regions (DR Congo and Malawi) ranged from 0.08 to 0.09, while FST value of Ethiopian P. falciparum versus the two southeast Asian regions (Thailand and Cambodia) was 0.18 (Table 1).
Table 1: Pairwise population Divergence (measured by FST) among P. falciparum populations.
Signatures of selection in the Plasmodium falciparum isolates
The Ethiopian isolates had the average Tajima’s D value of 0.18 across the entire genome (One sample t-test, p < 2x10-16). There were 1,450 genes that had at least one SNP with TajimaD value > 1 of which 125 genes had at least five SNPs with Tajima D values > 1 of which 125 genes had at least five SNPs with Tajima D values >1 (Additional file 3). These genes include apical membrane antigen-1 (ama1), erythrocyte binding antigen-175 (eba175), merozoites surface protein-1 (msp1),thrombospondin related anonymous protein ( trap), duffy binding like merozoites surface protein (dblmsp), and cytoadherence linked asexual protein 2 (clag2), that were previously reported for the balancing selection [24,29].
The standardized integrated haplotype homozygosity score (IHS) was applied to investigate genome-wide evidence for recent positive directional selection due to drug pressure, immune impact, or other mechanisms. Using |IHS| score of > 2.5 (top 1% of the expected distribution) as a threshold for hits, 36 genes with at least one SNP that could be under significant positive selection were identified, and out of these, 15 genes had at least two SNPs (Table 2; Figure 7).
Table 2: Genes under recent positive directional selection in P. falciparum in Ethiopia as identified using the integrated haplotype score at a significance threshold of P< 0.01. SNPs and ID stand for single nucleotide polymorphisms and gene identification number, respectively.
Thirteen (13) out of above 15 genes under positive directional selection showed both positive balancing and directional selections (Table 3) and these genes includes the vaccine candidate gene SURF4.2 on chromosome 4 and CLAG8 (cytoadherence linked asexual protein 8) on chromosome 8 [36].
Table 3: Genes under both recent positive directional selection and balancing selection in Ethiopian P. falciparum populations.
Interestingly, our analysis failed to detect selection signals in drug-resistance genes such as pfcrt, pfmdr1, pfdhfr, and pfdhps. The reason could be that IHS may not be suitable for detecting positive selection for those SNPs that have reached or are near fixation in the local P. falciparum population [34].
Prevalence of mutations conferring antimalarial drug resistance in Plasmodium falciparum
Table 4 shows inter-population differences in the prevalence of drug resistance genes observed among the P. falciparum global datasets analyzed. In tandem with previous studies [20,37] that suggest temporal differences in the geographical distribution of antimalarial drug resistance mutations, we observed that CQ-resistance alleles (pfcrt-K76T, pfcrt-A220S and pfcrt-Q271E) were fixed in Ethiopia, Cambodia, and Thailand, regions where malaria transmission rates are comparably low; however, the prevalence of these same alleles were 0% in Malawi and ranged from 66% to 72% in DR Congo.
Similarly, drug resistance mutations in pfmdr1 (pfmdr1-N86Y and pfmdr1-Y184F) were also variable among populations. For instance, the Ethiopian parasite population showed presence of 14% pfmdr1-N86Y and 100% pfmdr1-Y184F gene mutations, whereas pfmdr1-N86Y detected in 48% of DR Congo isolates and in 3% of Malawi’s. In addition, pfmdr1-Y184F drug resistance marker was detected in 58% of P. falciparum population in Cambodia, 32% in DR Congo, 35% in Malawi, and 6% in Thailand’s parasite isolates.
Sulfadoxine/pyrimethamine drug resistance mutations were also present in pfdhfr and pfdhps genes in all analyzed P. falciparum populations. The major pyrimethamine resistance-conferring alleles such as pfdhfr-N51I and pfdhfr-C59R were also identified in all parasite populations with fixed or near fixation in frequency. pfdhfr-S108N was fixed in other P. falciparum populations, except in Ethiopia’s. Variable prevalence of drug resistance-conferring alleles was also observed in pfdhps (pfdhps-S436A, pfdhps-G437A, pfdhps-K540N, and pfdhps-A581G), for the parent drug sulfadoxine resistance.
Table 4: Drug resistance conferring allele’s frequency across the 5 P. falciparum populations
In terms of artemisinin resistance, the African population-specific Pfk13-K189T mutation was observed in Ethiopia (in 20% of the samples), DR Congo (17%), and Malawi (13%). This mutation was previously identified in African P. falciparum populations [20,37]. As previously reported [8] the validated and most characterized artemisinin resistance-conferring mutation pfK13-C580Y was identified in Cambodia (36% of the samples) as well as in Thailand (26%), but not in Africa.