Study sites, sample collection
This study was conducted in Kédougou region, (KG, Southern Senegal) and Podor and Matam (PM, Northern Senegal). Kédougou (685 Km from the capital, Dakar) (Fig. 1) is hyper-endemic for malaria with an incidence higher than fifteen malaria cases per 1000 habitants. In this area, malaria transmission is seasonal from July to December with a high entomological inoculation rate (EIR) ranging from 20 to 100 infectious bites/person/ year [31]. In contrast, Northern Senegal (Podor and Matam, Fig. 1) is a malaria pre-elimination area with an incidence of less than five cases per 1,000 habitants and an EIR equal to 1.3 infectious bites/person/ year [31]. Samples were collected from patients with malaria symptoms attending the National Malaria Control Programme’s (NMCP) sentinel site health facilities during malaria transmission season between September and December 2016 (Fig. 1). In the South, venous blood samples were collected on filter paper from patients fulfilling the following inclusion criteria: having fever (axillary temperature ≥ 37.5 C) or history of fever in the previous 48 hours, age ranging from 6 months to 75 years and suffering from uncomplicated P. falciparum malaria with parasite density ≥ 1000 asexual forms per microlitre. Patients who presented signs or symptoms of severe malaria as defined by the WHO [32] and pregnant women were not included in the study. In the North, P. falciparum positive Rapid Diagnostic Tests (RDTs) were collected in health facilities (NMCP sentinel sites) in Matam and Podor.
Amplification and sequence analysis of Pfmsp1 and Pfmsp2
The experimental workflow from amplicon preparation to data analysis are detailed in Fig. 2.
Multiplex PCR amplification of Pfmsp1 and Pfmsp2 genes
Parasite genomic deoxyribonucleic acid (DNA), from filter paper and RDTs was extracted using QIAamp DNA Mini kit (Qiagen, QIAGEN, USA) according to the manufacturer's instructions. P. falciparum molecular identification was performed using the photo-induced electron transfer (PET)-PCR assay [33] on a Roche LightCycler 96 instrument (Roche Molecular Systems,
Inc). Each experimental run included both a negative (no template) and a positive (3D7 P. falciparum strain) control. Samples with a cycle threshold (CT) of 40 or less were scored as positive [33-34]. The two Pfmsp1 and Pfmsp2 polymorphic genes were amplified by multiplex PCR. All PCR reactions were conducted in a 20 μl reaction mixture containing 2 μl of template DNA, 4μl Phusion high-fidelity (HF) PCR master mix, 0.2 μl HF Phusion Taq Polymerase, 250 μM deoxyribonucleotide triphosphate (dNTPs), 0.5 μM of each forward and reverse primer of each gene. Cycling conditions were as follows: initial denaturation at 94 °C for 5 min, followed by 40 cycles of denaturation at 94 °C for 30 s, annealing at 58 °C for 1 min and extension at 68 °C for 1 min 40 s; a final extension was done at 68°C for 5 min [23, 35]. The PCR products were revealed by electrophoresis on 2% agarose gels stained with ethidium bromide and visualized under ultraviolet (UV) trans-illumination (VersaDoc®, BIORAD, Hercules, USA). The size of PCR fragments was estimated using a 100 bp molecular weight ladder (Fig. 2A). The length of PCR product varied from 800bp-1000bp and from 600bp-800bp for Pfmsp1 and Pfmsp2, respectively.
Multiplex amplicon deep sequencing of Pfmsp1 and Pfmsp2
Amplicons from each sample were purified using a 0.6X DNA SPRI (Agencourt AMPure XP beads Beckman Coulter ®, CA, USA). Samples were normalized to 0.25ng/μl concentration using the Qubit® 3.0 Fluorometer and Invitrogen™ Qubit® Quantitation Kit (Life Technologies, Carlsbad, California). The Nextera XT DNA Library Preparation kit (Illumina, USA) was used for library preparation of 1ng of purified PCR product from each sample [36]. Sequencing libraries were cleaned using a 0.6X DNA SPRI and quantified by qPCR using the KAPA Library Quantification kit (KapaBiosystems, MA, USA) on a Roche LightCycler 96 instrument. The concentration of sequencing libraries was normalized and all samples were pooled. The concentration and mean fragment length of the pool were determined using the Agilent High Sensitivity DNA Kit (2100 Expert Software) and the concentration was confirmed by KAPA qPCR. The pool of all samples was sequenced on one run of an Illumina Miseq sequencer with a Miseq v2 reagent Kit (VEROGEN, North America) and 100nt paired end sequencing (Fig. 2B).
Data analysis
Sequencing data was analysed using viral-ngs, version 1.19.2 pipeline implemented on the DNAnexus cloud-based platform [37]. Briefly, raw reads were demultiplexed to individual sample libraries and reads mapping to the human genome or to other known technical contaminants (e.g., sequencing adapters) were removed. After filtering the reads, they were assembled based on Pfmsp1 or Pfmsp2 gene mapping using the Trinity software [38]. For every sample, the number of clones of each gene were determined from the number of contigs generated by Trinity software. Following de-novo assembly, clones were filtered considering an average of k-mer coverage of 2 with length of at least 200bp [2*(k – 1), k = 101]. The average read coverage of the individual contig was 1000 X, the percentage of the average read coverage of individual contig was 1% of the average read coverage of overall contigs and allele frequencies >1%. The clones found were classified into allelic families according to the polymorphism of block 2 of Pfmsp1, and block 3 of Pfmsp2 using the Geneious software (Fig.2.C). For Pfmsp1, K1-like were characterized by amino acid tripeptide repeat units SAQ, SGT, SGA, SGP, MAD20-like by SVA, SGG, SKG, SVT and RO33-like by non-synonymous amino acid change. For Pfmsp2, IC3D7-like presented GA or SG amino acid dipeptides repeat units and a poly-threonine repeat while FC27-like were characterized by 32 amino acid, ADTIASGSQSSTNSASTSTTNNGESQTTTPTA or its variants and 12 amino acid: ESISPSPPITTT.
For genetic diversity and population structure analysis, DNA sequences were first aligned using BioEdit Sequence Alignment Editor V.7.0.5.3 [39]. DnaSP v. 6.0 software was used to estimate haplotype diversity (Hd), nucleotide diversity (π) and Tajima's D (TD) [40-41]. Genetic differentiation between populations, Wright’s F-statistic (Fst), was estimated using Arlequin software version 3.5 [42]. The Geneious software was used to create the neighbor-joining tree in order to identify the number of parasite clusters and parental links between strains.
The MOI mean was calculated by dividing the total number of alleles detected by the total number of samples for each gene [43]. The online Biostatgv Student’s test (https://biostatgv.sentiweb.fr/?module=tests/student) was used to compare MOI between localities. For all tests, comparisons were considered statistically significant when P-values < 0.05.