Multiple Annealing and Looping Based Amplification Cycles (MALBAC) for Single Cell Transcriptomes from P. falciparum Schizonts
Here, we wished to develop a WTA strategy for scRNAseq analysis of the human malaria parasite P. falciparum that is based on quasilinear amplification to achieve broad, uniform and quantitatively reproducible transcriptome coverage. The goal was to significantly improve detections of genuine cell-to-cell (stochastic) gene expression variations within otherwise morphologically uniform parasite populations, such as a single developmental stage. The rationale is to minimize extensive PCR cycling that could reduce the complexity of the final cDNA library by amplification bias 38–40 and is known to be a main limitation of the standard techniques applied in previous studies 26–30, 32. With this rationale, we tested several (linear/quasilinear) strategies including rolling circle amplification (RCA), multiple displacement amplification (MDA) and MALBAC 40,41; and compared these to the PCR-based SMARTseq2 method 42. Comparing 1-cell and 10-cell transcriptomes generated by the four WTA methods, we identify MALBAC as the most suitable technique for the intended study (Supplementary Fig. 1 and Fig. 1a). Out of the four approaches, only MALBAC and SMARTseq2 yielded > 50 % parasite-specific sequencing reads, whereas < 10% of the RCA- and MDA-generated reads mapped to the P. falciparum genome (Supplementary Fig. 1). The results generated by MALBAC, however, exhibited the highest correlations between, both, 1-cell and 10-cell transcriptome replicates, with R2 coefficients of 0.369 and 0.646, compared to 0.098 and 0.148 for SMARTseq2, respectively (Fig. 1a). Finally the MALBAC results exhibited lower standard deviations (SD) in gene expression across replicates with lower SD dependency on the overall mean transcript abundance (Fig. 1a, right panel). This demonstrates that MALBAC captures 1-cell and 10-cell transcriptomes with high quantitative reproducibility across a wide range of transcript abundance, making it most suitable to analyse cell-to-cell gene expression variability.
Hence, we applied MALBAC to generate single cell transcriptomes of P. falciparum schizonts collected from non-isogenic and isogenic parasite populations. Briefly, using fluorescence-assisted cell sorting (FACS) we isolated 300 individual schizonts from a highly synchronized culture of the P. falciparum 3D7MR4 strain; representing a non-isogenic population. Subsequently, we generated an isogenic clonal culture by serial dilution of the 3D7MR4 parasites and collected 100 individual schizonts after fourteen generations of clonal expansion (presumably from a single founding cell). Overall, we detected 5488 transcripts with 2695 being detected in more than 10 % of the non-isogenic group (Supplementary Fig. 2). Similar transcriptome coverage was achieved with the isogenic group of parasites with 5260 transcripts detectable in total. The mean number of transcripts detected per schizont was ~ 1040 in both isogenic and non-isogenic cohorts as compared to ~ 3800 transcripts detected in 100-cell bulk control (n = 15) (Figs. 1b and 1c). This is consistent with the transcriptome generated from samples in which RNA obtained from 10 schizonts was diluted 10-fold to achieve an equivalent amount of total RNA of a single cell (PfRNAdil). Here, we detected a mean of 1145 transcripts per PfRNAdil (n = 49) (Fig. 1b). This represents a significant improvement to the previous studies in which mean/median of genes detected per schizont ranged between 212 to 938 26,27,29. To assess the quality of the MALBAC-generated results, we correlated the average of all 1-cell and 100-cell transcriptomes achieving a R2 value of 0.72, thus confirming that the gene expression signal amplified by MALBAC represents true biological signal detected in bulk samples (Fig. 1d). The R2 values correlating individual 1-cell and 100-cell transcriptomes ranged from 0.02 to 0.33 (Fig. 1d, top left inset). Lastly, majority of the sequenced 1-cell transcriptomes were estimated to be between and around 40 to 44 hours post invasion (HPI) (see below) by correlation to a high resolution bulk 3D7 reference transcriptome 43 (Fig. 1e), thus confirming successful isolation and amplification of late asexual stages.
Schizont Subpopulations with Unique Transcriptional Profiles Identified in P. falciparum Parasites
First, we interrogated the generated dataset for the presence of parasite sub-populations that reflect transcriptional variability at a broader (supracellular) level. Applying a custom designed data quality filter (refer to Methods), we select 271 non-isogenic and 79 isogenic 1-cell transcriptomes and subjected these to data normalization and dimensionality reduction by ZINB-WaVE modelling as described 44. As shown in Figs. 2a and 2b, the first two components of the multidimensional scaling (MDS) account for the maximum differences between 1-cell transcriptomes of both non-isogenic and isogenic parasites. This MDS-based stratification aligned well with the parasite age differentiation (described in Fig. 1e) with a fraction of cells falling into the 36–40 and > 44 HPI groups, respectively (Figs. 2a and 2b). Crucially, similar HPI categorization could be deciphered for the non-isogenic as well as the isogenic population with comparable distribution patterns. Subsequently, we applied resampling-based sequential ensemble clustering (RSEC) to evaluate a possible finer substructure within each dataset that would indicate the existence of distinct subpopulations. Interestingly, eight subpopulations were detected in the non-isogenic parasite group each defined by a specific transcriptional profile (termed as Single-Cell Transcriptional Subpopulations, SCTS) (Fig. 2a). The pseudotemporal ordering followed the estimated age progression of schizonts with SCTS 2,3 and 8 falling onto ~ 36 to 40 HPI and 1,4,5,6 and 7 onto ≥ 44 HPI groups. Interestingly, for the isogenic 1-cell population, identical RSEC clustering parameters did not detect any subpopulations (Fig. 2b). This suggests that while the HPI (age) separation reflects intrinsic properties of schizont maturation of any given schizont population, the SCTSs represent a deeper differentiation of genetically (or otherwise) more diverse parasite cohorts.
Next, we applied the loess regression method and identified 441 genes exhibiting differential expression along the pseudotemporal progression defining the eight identified SCTS (Fig. 2c and Supplementary Table 1). Pathway enrichment analyses using the Malaria Parasite Metabolic Pathways (MPM) database showed that each SCTS carries a unique set of biological functions/pathways represented by the differentially expressed genes (Supplementary Table 2) 45. As shown in Fig. 2d, protein folding, COPI/COPII-mediated vesicular trafficking and protein degradation pathways generally showed higher average expression in the SCTS 1, 2 and 3. This included genes such as, coatomer subunits (α,β,γ), endoplasmic reticulum chaperone GRP170 and E3-ubiquitin ligase, etc. Interestingly, ubiquitin activation was recently shown to be essential for schizonts maturation 46. Expression of pH regulation, nucleosome assembly and phosphatidyl inositol biosynthesis related pathways was also higher in SCTS 1, 2 and 3. This may be indicative of organisation of newly replicated DNA and active membrane biogenesis concomitant with the segregation of daughter parasites (merozoites) during mid-late stage schizogony 47. On the other hand, expression of parasitophorous vacuole membrane associated proteins and PTEX translocon pathway involved in protein export, was found to be highest in SCTS 4 and 5 mainly consisting of late stage schizonts (i.e. > 44 HPI). Interestingly, pathways linked to invasion and exported proteins showed heterogeneous expression across all SCTS (Fig. 2d). For example, SCTS 1, 4 and 6 showed higher average expression of invasion ligands for erythrocyte receptors, merozoite surface proteins and motility, represented by genes such as, erythrocyte binding antigen (eba) 175, merozoite surface protein (msp) 5 and 6, and reticulocyte binding protein homologues. On the other hand, cGMP signalling involved in invasion and egress was uniformly highly expressed across all SCTS with only subtle variations. Taken together, these results suggest the existence of multiple cellular ‘physiological states’ established within a single parasite population over extended periods of cell diversifications (here prolonged in vitro culturing). The SCTS may represent subtle but significant diversion of the schizont maturation process with a putative checkpoint at approximately 44 HPI, all of which may contribute to phenotypic plasticity of malaria parasite prior and during the invasion process.
To assess the relationship between the SCTS-driving transcriptional variation and genetic diversity, we identified single nucleotide polymorphisms (SNPs) from the RNA-seq data for each schizont. A total of 173 and 265 SNPs were detected with an allele frequency > 0.05 in the non-isogenic and isogenic group of parasites, respectively. Using these SNPs, two genetic clusters were observed for the non-isogenic parasites, while no clusters were found in the isogenic group (Figs. 2e and 2f). Interestingly, none of the genetic clusters for non-isogenic parasites showed enrichment of any SCTS, thus suggesting that the observed transcriptional heterogeneity is not driven by the (exonic) SNP profile of the parasites (Fig. 2e). However, we cannot exclude the possibility that other more subtle genetic variations in other regions (such as promoter/intergenic regions) that were beyond detection of the scRNAseq analysis may interline these transcriptional subpopulations.
Single Cell Transcription of Host-Parasite Interaction Factors
Variant expression of surface exported proteins encoded by gene families such as var, rifin and phist has long been of interest due to their immunogenic, host cell remodeling and cytoadhesive properties, essential for parasite survival during asexual blood stages 15,37,48−50. As such, understanding cell-to-cell variation patterns of their individual members may shed more light into their biological relevance. Indeed, our results showed that these three families have fundamentally different transcriptional profiles suggesting their distinct roles in host-parasite interactions (Figs. 3a-3f). For the var gene family, which is believed to be expressed in a mutually exclusive manner, we identify 72 and 62 members out of 105 var transcripts (including pseudogenes) expressed at more than 10 FPKM (normalized sequencing reads expressed as fragments per kilobases per million), in the non-isogenic and isogenic parasite populations, respectively. In the non-isogenic schizonts, we found two members, PF3D7_937800 and PF3D7_0400200 detected at the highest frequency in 28.5 % and 27.8% of single cells respectively (Fig. 3a). In contrast, the isogenic parasites expressed one dominant var gene transcript, PF3D7_937800, at a frequency of 56.7 % followed by PF3D7_0400200 and PF3D7_0400100, at 27.8 % and 22.7 % respectively. This is consistent with the presumed mutual exclusivity suggesting that PF3D7_937800 was the dominant transcript in the “founding cell” of the isogenic population and is still dominant in more than half of the cells. The two other high frequency var genes (PF3D7_0400200 and PF3D7_0400100) likely represent early switches that expanded in the isogenic population within the 14 generations. Even though a single var transcript was detected in 17% and 28% of non-isogenic and isogenic cells respectively, a considerable proportion of cells expressed more than one (2–10) var transcripts (Fig. 3b). Nonetheless, most of these multiple var transcripts are expressed at low levels with a relatively tighter distribution for higher FPKM levels. This suggests that while many cells indeed express one dominant var transcript, a significant proportion of parasite cells express additional members potentially as precursors for var gene switching that was estimated to occur in ~ 2% of each generation 51,52.
In contrast, the rifin gene family exhibited a much more restricted expression pattern both in the non-isogenic and isogenic populations. Most rifin expressing schizonts had only 1–2 rifin transcripts expressed at high levels. This indicates that in schizonts, rifins are more tightly regulated compared to the var genes (Figs. 3c and 3d). First, 102 and 87 out of 185 rifin transcripts (including pseudogenes) were detected at > 10 FPKM in at least one of the non-isogenic and isogenic cells, respectively. Out of these PF3D7_1300600 was expressed at the highest frequency of ~ 12% in non-isogenic parasites. After the isogenic clone expansion, a new rifin gene, PF3D7_0223400, gained a slight dominance being expressed in ~ 14% cells while the original PF3D7_1300600 dropped in its frequency to ~ 9%. In both groups, the remaining rifin transcripts were expressed in less than 11 % of the population with only slight variations between the non-isogenic and isogenic cell populations. On the contrary, phist genes showed expression in relatively larger proportion of schizonts. Expression of PF3D7_1149200, PF3D7_0102200 and PF3D7_1401600 was most frequent in non-isogenic schizonts at ~ 45 %, ~ 38% and ~ 37% as compared to PF3D7_1401600 (52 %), PF3D7_0219800 (34%) and PF3D7_1252800 (28%) in isogenic parasites. Taken together, these data indicate that transcriptional patterns of the rifin and phist genes undergo only slight shifts upon clonal expansion thus distinguishing these from var genes whose mutually exclusive expression is believed to be a key aspect of immune evasion. Interestingly, only a handful of rifin genes appear to be expressed in an individual cell, which suggests their redundant biological roles. Conversely, simultaneous expression of a higher proportion of the phist gene family may indicate a functional diversification of the individual members most (if not all) of which are required for survival of an individual cell. The ability to change their expression profile after a clonal expansion of the parasite population implicate the rifin and phist gene family in phenotypic plasticity of the malaria parasites as well.
The functional analyses of the SCTS (previous section) indicated that genes involved in the merozoite invasion are particularly active in terms of their cell-to-cell variation at the schizont stage. To study this further, we quantified transcriptional variability of individual genes by assigning a variability index score (VIS) calculated as the SD of z-score per gene with respect to PfRNAdil. A striking difference was observed in the overall VIS distribution between non-isogenic and isogenic schizonts, with considerably lower indices observed for invasion genes within the isogenic group (Fig. 4a, left). In particular, msp1, msp3, msp7, high molecular weight rhoptry protein 2 (rhopH2), rhoptry associated protein rap1 and eba181 ranked high on the index (VIS = 9 to 16) in the non-isogenic group whereas much lower VIS were found for the same in isogenic schizonts. As shown in Fig. 4a, the invasion genes can be grouped into two distinct clusters based on their transcription that also outlines their functional significance/relatedness. Gene cluster 1 predominantly consists of merozoite surface proteins (e.g. msp1, msp3, msp7), rhoptry proteins (e.g. ron2, ron4, ron5, ron12, rhopH2) and glideosome-associated protein (e.g. gap50). These genes represent the core invasion factors facilitating the initial merozoite-erythocyte interactions and the process of invasion. In contrast, cluster 2 showed co-expression of reticulocyte binding protein homologues (rh1, rh2a, rh2b, rh4, rh5), erythrocyte binding antigens (eba140, eba175, eba181) and apical membrane antigen ama1. These genes are believed to be crucial for selective surface antigens interactions facilitating merozoite recognition of a suitable erythrocyte for binding 53,54. There is a strong association of the two clusters with the HPI projection (Fig. 4a bottom inset) suggesting their relevance to the merozoite maturation process. However, we also observed presence of HVGs within both clusters suggesting the role of the invasion genes in malaria parasite adaptability; presumably driving distinct invasion phenotypes (see below).
Stochastic Expression of Highly Variable Genes in P. falciparum Schizonts
Further inspection of our data suggested a presence of a small but distinct set of genes with fundamentally broader transcriptional differences indicating stochastic mRNA variations. To identify such HVGs, we applied the F-test analysis (padj < 0.005) to determine transcript variability differences between the 1-cell dataset and PfRNAdils 55. This revealed two groups of HVGs in non-isogenic (88 genes) and isogenic (45 genes) schizonts with an overlap of 11 genes (Figs. 4b and 4c, Supplementary Tables 3 and 4). As expected, invasion genes previously found to have high VIS (such as msp1, msp3, msp7, rap1, rhopH2 and rhopH3) were enriched amongst the HVGs (Fig. 4b). In addition to these, genes involved in diverse cellular functions, such as, exported protein 2 (exp2), endoplasmic reticulum-resident calcium-binding protein (pferc), histone 2B, actin I and folate transporter 1 were also highly variable between single non-isogenic schizonts (Supplementary Table 3). Interestingly, the HVG group in the isogenic schizonts included somewhat different set of functionalities such as erythrocyte vesicle protein 1, peroxiredoxin, calcium dependent protein kinase 1 (cdpk1), rh2b, vacuolar protein sorting-associated protein 29, etc. (Supplementary Table 4). Crucially, the 11 HVGs that overlapped between the non-isogenic and isogenic schizonts were enriched for invasion antigens including msp1, msp3, rap1, rap3, rhopH2 and rhopH3 (Fig. 4c). These were accompanied by other functions such as exp2, AP2 transcription factor, histone H2B, protein phosphatase (putative) and actin I. Interestingly, two of the HVGs, AP2 transcription factor (PF3D7_0420300), and, a putative long chain polyunsaturated fatty acid elongation enzyme (PF3D7_0605900) were also previously reported to have variable expression levels in bulk RNA sequencing of 3D7 schizonts 56. Taken together, here we identified a small but significant group of genes with fundamentally higher levels of cell-to-cell transcriptional variations that is far beyond differential mRNA levels underlying the HPI projections, SCTS and variable antigenic families (see above). Enrichment of invasion genes in this category (along with other functionalities) suggests that maintaining stochastic expression of these genes is crucial (hence conserved) presumably to ensure successful productive invasion across variable host cell surface receptor repertoire/environments. Maintenance of HVGs even within the isogenic population (presumably devoid of fundamental genetic variations) suggests a role of epigenetic factors in these processes. Indeed, the presence of several var genes (10), rifins (2) and surfin1.2 amongst the HVGs in non-isogenic schizonts only supports this concept given their association with a key epigenetic factor HP1, characterizing heterochromatin regions of the P. falciparum genome 57.
RNA-FISH and RT-PCR Validations of Highly Variable Gene Expression in Individual Schizonts
As stated in the beginning, here we wished to derive a scRNAseq based protocol that allows us to assess cell-to-cell variations of P. falciparum parasites of gene expression with high confidence. Using the derived MALBAC approach seemed to generate such data for schizont stages. However, to gain a stronger confidence in this derived method, we experimentally validated the detected stochastic expression for a subset of the HVGs using RNA fluorescence in situ hybridization (RNA-FISH) and quantitative reverse transcription PCR (qRT-PCR) experiments. Specifically, RNA-FISH was performed on individual schizonts by labeling these with eba181 and rh5 specific ViewRNA™ probe sets simultaneously (refer to Methods), and transcripts levels were quantified through confocal microscopy imaging. As shown in Fig. 4e, eba181 transcript intensity is variable between individual schizonts in contrast to rh5, which is known to be essential for 3D7 invasion 58, with low variability hence not classified as HVG. Clearly, parasites with similar number of daughter nuclei (a proxy of developmental maturation) also exhibit highly variable expression of eba181 transcript, while, rh5 RNA-FISH signal remained stable across multiple schizont cells (Fig. 4e). Using a triple labeling protocol, RNA-FISH detected variable degree of transcriptional variability of msp1, followed by msp3 and msp7 measured simultaneously (Fig. 4f). In addition, we also confirmed HVG status of two other invasion genes, i.e., gap45 and rhop2, along with other genes such as exp2, Pferc and cept (choline/ethanolaminephosphotransferase, putative) by qRT-PCR and/or RNA-FISH (Fig. 4d, Supplementary Fig. 3). Overall, these results not only provide experimental validation for the HVG status of these genes but also substantiate the MALBAC-based analyses of stochastic gene expression in P. falciparum schizonts presented in this study.