Detecting complex infections in Trypanosomatids using whole genome sequencing

doi:10.21203/rs.3.rs-4648421/v1

Download PDF

Research Article

Detecting complex infections in Trypanosomatids using whole genome sequencing

https://doi.org/10.21203/rs.3.rs-4648421/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background

Trypanosomatid parasites are a group of protozoans that cause devastating diseases that disproportionately affect developing countries. These protozoans have developed several mechanisms for adaptation to survive in the mammalian host, such as extensive expansion of multigene families enrolled in host-parasite interaction, adaptation to invade and modulate host cells, and the presence of aneuploidy and polyploidy. Two mechanisms might result in “complex” isolates, with more than two haplotypes being present in a single sample: multiplicity of infections (MOI) and polyploidy. We have developed and validated a methodology to identify multiclonal infections and polyploidy using Whole Genome Sequencing reads, based on fluctuations in allelic read depth in heterozygous positions, which can be easily implemented in experiments sequencing genomes from one sample to larger population surveys.

Results

The methodology estimates the complexity index (CI) of an isolate, and compares real samples with simulated clonal infections at individual and populational level, excluding regions with somy and gene copy number variation. It was primarily validated with simulated MOI and known polyploid isolates respectively from Leishmania and Trypanosoma cruzi. Then, the approach was used to assess the complexity of infection using genome wide SNP data from 530 Trypanosomatid samples from four clades, L. donovani/L. infantum, L. braziliensis, T. cruzi and T. brucei providing an overview of multiclonal infection and polyploidy in these cultured parasites. We show that our method robustly detects complex infections in samples with at least 25x coverage, 100 heterozygous SNPs and where 5–10% of the reads correspond to the secondary clone. We find that relatively small proportions (≤ 7%) of cultured Trypanosomatid isolates are complex.

Conclusions

The method can accurately identify polyploid isolates, and can identify multiclonal infections in scenarios with sufficient genome read coverage. We pack our method in a single R script that requires only a standard variant call format (VCF) file to run (https://github.com/jaumlrc/Complex-Infections). Our analyses indicate that multiclonality and polyploidy do occur in all clades, but not very frequently in cultured Trypanosomatids. We caution that our estimates are lower bounds due to the limitations of current laboratory and bioinformatic methods.

Complex infections

polyploidy

multiplicity of infection

Trypanosomatids

aneuploidy

protozoan parasites

Trypanosomatid parasites are a group of protozoans that cause devastating diseases, imposing severe health and economic burdens primarily upon developing countries [1–3]; (https://www.paho.org/en/topics/chagas-disease). Among them, African trypanosomiasis, American trypanosomiasis and leishmaniasis, caused respectively by Trypanosoma brucei; Trypanosoma cruzi and species from the Leishmania genus are Neglected Tropical diseases (NTDs), with more than one billion people living at risk of infection. These diseases are a part of the WHO NTDs elimination road map for 2021–2030 (WHO/UCN/NTD/2020.01) [3].

Various mechanisms for immune evasion and adaptation to survive in the mammalian host have evolved in these parasites; such as antigenic variation in the extracellular parasite T. brucei [4–7]; extensive expansion of multigene families enrolled in host-parasite interaction in T. cruzi [8–10]; adaptation to invade and modulate host cells in T. cruzi and Leishmania [11–13]; and the presence of aneuploidy and polyploidy [14–16]. Genome instability, observable within population by variation in chromosome copy numbers [14], and frequent formation of triploids and tetraploids [17–20] are also features of these species. There is also evidence of the occurrence of multiplicity of infections (MOI) in these parasites, where more than one diploid genotype is observed in the same host, which might have consequences to the parasite biology [21–28]. MOI is an expected consequence of insect vectors taking more than one blood meal (including from different infected individuals) and important for the resulting meiotic recombination within the vectors [29]. Both MOI and allopolyploidy will result in complex isolates, with more than two haplotypes being present in a single sample.

The complexity of natural infections is relevant to understanding Trypanosomatid biology and disease control, as MOI cases provide direct evidence for genetically diverse infections that could increase the speed in which virulence and drug resistance genes may be shared in the population, may be more challenging to treat, and may result in diverse clinical presentations. In general, parasite diversity allows sub-populations to be selected in different environments, increasing adaptability [21, 28, 30].

MOI has already been described in Leishmania infections [22, 23, 31], where there is usually a dominant genotype combined with rare genotypes of the same species; and in the insect vector [32], where different species of the parasite may cohabit the same insect [33], which can result in interspecies hybrids [34]. Multiclonal infections were also described in T. cruzi using microsatellite and marker genes, where it appears to be more prevalent in mammalian reservoirs, such as rodents and opossums, when compared to human patients [24–26]. There is also evidence of MOI in T. brucei in the mammalian host [27], and in the inset vector [35]. In T. brucei, coinfection with two strains in the mammalian host leads to competitive suppression, enhancing host survival [36], reinforcing that MOI may impact patient clinical outcomes in these parasites.

Hybridization leading to temporary trisomy/tetraploidy was already demonstrated in Trypanosomatids. In T. cruzi, experimental hybrids originated from diploid parental strains were mostly tetraploid, and underwent genome erosion throughout culture passages, reverting to trisomy [17]. In Leishmania, hybridization was shown to generate diploid, triploid or tetraploid strains [37], both in intra species [38, 39], as well as between species hybrids [40]. This transient presence of four haplotypes (in allotetraploids) in a single cell might increase genetic exchange and recombination, increasing the potential variability, as the parasites revert back to trisomy and disomy by genome erosion. In the context of this analysis, only allotetraploids (containing two different diploid genotypes) will be detected, not autotetraploids (containing two copies of the same diploid genotype).

In the present work, we have developed a methodology to identify multiclonal infections and polyploidy in any diploid species using Whole Genome Sequencing (WGS) reads, based on fluctuations in allelic read depth in heterozygous positions, which can be easily implemented in experiments sequencing genomes of one or a few samples, or larger population surveys. This methodology uses the complexity index (CI) proposed in Franssen et al. [31]. We parameterize this metric by comparing the allelic read depth at heterozygous sites in real samples to simulated clonal infections, which were generated using allelic read depths sampling by binomial trials to generate stochastic allelic depths. This approach was used to assess the complexity of infection in 530 Trypanosomatid isolates from four species/complexes, L. donovani/L. Infantum (L. donovani complex), L. braziliensis, T. cruzi and T. brucei based on genome-wide markers, providing a large overview of multiclonal infection and polyploidy in these parasites. We show that our method robustly detects complex infections with at least 25x coverage and at least 100 heterozygous SNPs. We find that a relatively small proportion (≤ 7%) of cultured Trypanosomatid isolates are complex. For methodological reasons, these proportions represent a lower bound of complex infections in these species.

2.1 Overview

We define the complexity index (CI) as the deviation from the expected 50% of reads in each allele in heterozygous positions, as proposed in Franssen 2021 [31]. It is estimated by the absolute value of the difference between the alternate allele read depth (ARRD) in heterozygous positions and 0.5, the expected AARD in diploid, clonal heterozygous SNPs. To estimate the CI of an isolate, we have carefully filtered SNP calls, removing SNPs in repetitive regions, aneuploid chromosomes, duplicated genes and samples with low coverage.

2.2 Heterozygous SNP calling and alternate allele read depth (AARD) estimation

Representative whole genome sequencing (WGS) read data from Trypanosomatid isolates were downloaded from the National Centre for Biotechnology Information (NCBI) Sequence Read Archive (SRA) using Fastq-dump [41]. Only Illumina sequencing reads from publicly available datasets were used (Supplementary_Table_1, Supplementary_Table_2 and Supplementary_Table_3). Each read library was filtered using fastp v2.10.7 [42], with the parameters: average Q20, minimal length 50 and removing the read extremities with base quality lower than Q25. Next, for each species the reads were mapped to an appropriate reference genome, listed in Supplementary_table_4, using BWA-mem v.0.7.17 [43], retaining only reads with mapping quality 30 or higher and removing PCR duplicates using SAMtools v.1.10 [44]. The number of mapped reads was estimated using SAMtools v.1.10. The genome coverage was estimated by the mean coverage of all single copy genes in the genome, using SAMtools depth. The single copy genes were selected using OrthoFinder v. 2.5.4 [45].

For the SNP calls, read groups were assigned for the filtered mapped read libraries, using PicardTools v.2.21.6 (https://github.com/broadinstitute/picard). SNPs and indels were called using the Genome Analysis Toolkit (GATK) v.4.1.0.0 HaplotypeCaller and Freebayes v. 1.3.5 (https://github.com/ekg/freebayes), with a minimum alternative allele read count of 5. Only SNP/Indel positions that were identified by both callers were kept. For each dataset, the single-sample VCFs were merged with VCFtools v.0.1.16 and regenotyped using Freebayes. Next, the VCF file was filtered using BCFtools v.1.12 [46], to select only biallelic SNPs, with call quality above 200, coverage greater than half of the mean genome coverage (i.e, at least haploid), and lower than twice the genome coverage (i.e. is not duplicated) with mapping quality 40 or higher and properly paired reads (-m2 -M2 -i ' TYPE="snp" & QUAL > 200 & INFO/DP > Cov/2 & INFO/DP < Cov*2 & INFO/MQM > 40 & INFO/MQMR > 40 & INFO/PAIRED > 0.9 & INFO/PAIREDR > 0.9 ). The only exception was the T. cruzi dataset, as several samples were single-end reads, so the “INFO/PAIRED > 0.9 & INFO/PAIREDR > 0.9” were not used. To remove SNP call bias from repetitive regions and paralogous genes, only SNPs in single copy genes were used in subsequent analysis. After filtering, the multisample VCF was split into single sample VCFs, to be used in the complexity pipeline (see below). For the individual sample VCFs, only SNP positions with read depth ≥ 5 in both the reference and alternate alleles were considered as heterozygous. SNPs where the read depth in one allele was > 5, and between 1 and 4 in the other allele were classified as dubious, and not used in the complexity estimation. This was a conservative measure to remove potential noise and sequencing/mapping errors.

To control the bias of aneuploidy in the CI estimation, chromosome(s) with coverage higher than 1.15x or lower than 0.85x of the genome coverage in a sample were excluded from downstream analysis. Similarly, to mitigate bias from gene copy number variants (CNVs), SNPs in genes with coverage higher than 1.15x or lower than 0.85x of its chromosome coverage were also removed. The gene coverage was estimated using SAMtools depth and the gene coordinates from the General Feature Format (GFF) obtained in TriTrypDB v.55. The chromosomal somy for each sample was estimated using the median read depth coverage of single copy genes in each chr with non-outlier coverage (Grubb’s tests, with P < 0·05), normalised by genome coverage. Data from Leishmania and Trypanosoma cruzi chromosomes 31 were always excluded, as they are consistently supernumerary in all isolates from these species [14]. Only read libraries with genome coverage ≥ 25x were used in posterior analysis.

2.3 Complexity evaluation, Cochran-Mantel-Haenszel (CMH) estimation and AARD distribution

The classification of an isolate as complex was based on comparisons between the real data with simulated clonal isolates. Samples that were classified as complex had to have: A higher CI than clonal simulated isolates, a significant CMH p-value associating the real sample to deviations from the expected allele read counts, and an alternate allele read depth (ARRD - the read depth proportion (0 to 1) that corresponds to the alternate allele in a SNP position) distribution that deviates from the simulated clonal isolate. Only isolates that were above both Complexity and CMH cutoffs were assumed to be complex. Details are described below.

Complexity

For each SNP site i, CIi is the absolute value of the difference between the AARD in that position and the expected AARD in diploid, non-mixed SNP positions, within a sample (which is expected to be close to 0.5). To account for the random sampling of reads sequenced from each allele of heterozygous sites, a simulated “clonal-diploid” SNP data sample was generated for each isolate in each population, with the same number of SNPs and read depth as in the real sample, using series of binomial trials. For each SNP position (i) in the real sample, we conducted n binomial trials, by randomly sampling from a binary array (0 or 1), where 0 represents the reference allele and 1 the alternate allele, where n is the read depth in the position in the real sample. The AARDi for the ith position in the simulated clone was the sum of the binomial trials (b), divided by the total read coverage at site i (n);

$$ARRDi = \frac{{\sum }_{1}^{n}b }{n}$$

and the complexity index of this position (CI_i) was calculated as the absolute difference between the expected AARD of 0.5

$$CIi = \left|ARRDi - 0.5\right|$$

The CI of the isolate with i heterozygous SNPs is calculated as the mean of all CIi values (R script available in GitHub: https://github.com/jaumlrc/Complex-Infections.git). To classify an isolate as “potentially complex” the CI had to be higher than the mean + 3 standard deviations (SD) from all simulated clonal isolates in the population. For an isolate to be classified as “complex” it had to have a CI value > 0.1, which is slightly higher than the cutoff for the simulated data for all trypanosomatid populations (see results section). We recommend the CI threshold of 0.1 be used to classify samples in projects with a small number of samples.

CMH test: Another metric used to assess the isolate complexity was the CMH test, which tests the association between binary predictors (expected counts of reference and alternate alleles to generate the expected ARRD of 0.5) and binary outcomes (observed counts of reference and alternate alleles) considering stratification from a third variable (in our case the position in the genome). In this case, it was used to compare the combined effect of all SNP read depth in each allele to classify an isolate as complex, comparing real samples and binomial trial simulated clonal-diploid samples. For each isolate with i heterozygous SNP positions, CMH p-values were generated using i 2x2 contingency tables using row 1; actual read depth of each allele at position i, row 2; the result of n binomial trials (where n is the total read depth at site i), and columns being the reference and alternate allele counts. We found that a significance level of p ≤ 10^− 10 was a reasonable CMH test threshold Supplementary_Figure1. R scripts for both tests are available on GitHub (https://github.com/jaumlrc/Complex-Infections.git).

AARD distribution

The AARD distribution for all SNP positions in the real sample and its paired simulated clonal sample were generated in R, and deviations from their distributions were accounted as evidence for complexity.

2.4 Assessing complexity on mixed samples

To estimate the accuracy of the combined CI and CMH tests to identify multiclonal infections with different proportions of the secondary clone, a collection of 24 L. donovani clones from East Africa (EA) described in Franssen 2021 were used (Supplementary_table_1). To create artificial multiclonal samples and assess the impact of the proportion of a secondary clone in complexity estimations, three clones, ERR205809, ERR205816 and ERR205819, were selected. Each of these three read libraries was combined with the full library of another of these three clones, where the secondary strain was downsampled to 2.5, 5, 10, 15 and 50% of the full combined data, using SAMtools v.1.9 [44], to create ‘MIX’ samples. In each case, the main and secondary clones were permuted, generating a total of 33 combinations. The results from these artificially multiclonal samples were compared with those obtained from clones and simulated clonal isolates, based on their complexity index and CHM test, as previously described.

The impact of the number of heterozygous SNPs in the CI evaluation was examined by sub-sampling the number of SNP calls at random to 10, 50, 100, 300 SNPs, in 100 iterations, assessing the number of true positives (number of MIX samples that were classified as complex) and comparing with the estimations with the full set of SNPs.

2.5 Assessing complexity on polyploid samples

Besides multiclonality, another source of complexity is polyploidy, having extra full sets of chromosomal copies. To evaluate the impact of polyploidy in the CI estimations, we used samples from the T. cruzi dataset described in Matos 2022 [17], containing 8 diploid parental clones, and 11 triploid or tetraploid hybrids clones, where the somy of some samples were validated by flow cytometry [17] (Supplementary_Table_2). As performed for the multiclonal isolates, the SNPs counts for each sample were downsampled to 10, 50, 100, 300 or full set, in 100 iterations, and the accuracy to classify each group as complex was evaluated.

2.6 Assessing complexity on laboratory and field isolates

After validating our complexity estimations with simulated/controlled data, we went on to evaluate field isolates (FI) and lab-derived parasites that were available in NCBI SRA. We estimated the complexity of four Trypanosomatid species/complexes: L. donovani/L. Infantum (L. donovani complex), L. braziliensis, T. cruzi and T. brucei, using a total of 573 WGS data sets from publicly available field isolates and laboratory strains (Table 1, Supplementary_Table_3). Only read libraries from samples with read coverage ≥ 25x and where at least 100 heterozygous SNPs were called where used in this analysis. For these samples, the whole genome complexity, as well as the complexity estimation for each chromosome was estimated. The proportion of chromosomes that were used in the complexity estimation was calculated for each isolate by dividing the number of chromosomes that were used in complexity estimates (had at least one identifiable SNP and were not aneuploid) by the total number of chromosomes in the species: L. donovani = 36; L. braziliensis = 35; T. brucei = 11; T. cruzi = 41. The proportion of complex chromosomes in an isolate was estimated dividing the number of chromosomes with complexity index ≥ 0.1 by the number of evaluated chromosomes. In practise, most chromosomes were utilised in all CI estimates. Exceptions are isolates with several duplicated chromosomes.

The evaluation of statistical differences in the genome coverage and heterozygous SNPs/Kb in complex and non-complex isolates was performed with Mann–Whitney U test, in R. The Pearson correlation between the genome coverage, SNPs/Kb and complexity was estimated in R. For both analyses the heterozygous SNPs/Kb were estimated dividing the heterozygous SNP numbers by the sum of the lengths of the single copy genes, in each set.

We also estimated the number of heterozygous SNPs in the maxicircle (kDNA) genome, as well as its coverage (as described for the nuclear genome). To identify kDNA SNPs, the maxicircle sequence (L. donovani = BK010877.1, L. braziliensis = OY748431, T. cruzi = MW732647, and T. brucei M94286.1; downloaded from NCBI) was combined with the genome reference for the read mapping. Only heterozygous SNPs that were outside the repetitive region and had at least 5% of the kDNA genome coverage were considered.

3.1 Genomic phenomena that might alter Trypanosomatid isolate complexity

The proportion of reads in each allele in a heterozygous SNP position can be impacted by several phenomena, including multiclonality (multiple clones or genotypes in a sample, often present in different proportions), polyploidy (multiple copies of all chromosomes) and aneuploidy (multiple copies of some chromosomes). (Fig. 1). In a non-complex clonal, euploid, diploid isolate, the mean alternate allele read depth (AARD), meaning the proportion of reads that correspond to the alternate allele in heterozygous positions, is expected to be close to 0.5, as there will be a similar number of reads mapping in both alleles (Fig. 1A). Hence, when all heterozygous SNPs in a genome are evaluated, the distribution of the AARDs will have a peak in 0.5, with a distribution of AARD values expected from ‘random draws’ of reads calling each allele. Some phenomena that have already been observed in Trypanosomatids, such as multiclonality (Fig. 1-B), polyploidy (Fig. 1-C) and aneuploidy (Fig. 1-D) will alter this proportion, changing the distribution peaks or flattening their curve, which may be seen in density plots of ARRD values for all heterozygous SNPs in a sample.

To provide a numeric and statistical large-scale evaluation of this deviation from expected AARD, we estimated CI: the absolute value of the deviation from the expected 0.5 proportion in each heterozygous SNP position [31]; and compared real samples with simulated clonal isolates at individual and populational level. To exclude deviations from AARD due to paralogous genes and aneuploidy, we included only single-copy gene regions and excluded chromosomes with somy variation and gene duplication/losses. We have estimated cutoffs for complex isolates based on the mean complexity of simulated clonal isolates from population genomic data from various Trypanosomatid clades, and used the Cochran-Mantel-Haenszel (CMH) test to support the evaluation in each isolate.

3.2 Assessing the accuracy of the CI to identify multi-clonal and polyploid isolates, using simulated or controlled data

To evaluate the accuracy of the CI metric to identify multiclonal isolates, we created sequencing read data sets to represent multiclonal isolates from L. donovani, by combining downsampled read files from laboratory cloned field isolates in various proportions. We evaluated the two features that could impact the complexity estimations in multiclonal infections: the proportion of the secondary clone and the number of heterozygous SNP positions.

To simulate multiclonal isolates with different proportion of the secondary clones, WGS data from three L. donovani lab-derived isolates that have been cloned were used: ERR205809, ERR205816 and ERR205819 [31]. Reads from these clones were combined pairwise in proportions of 2.5, 5, 10, 15, 25 and 50% of the reads from the secondary clone, resulting in 33 datasets. The complexity of these mixed samples (MIX) and the 24 clones from the Frassen 2021 [31] dataset was assessed based using two parameters: the CI: which had to be higher than the mean + 3 standard deviations (SDEV) from the simulated clonal isolates in the population; and CMH test to evaluate if the real isolate AARD differs from the expected clonal isolate, with a p-value lower than 10^− 10 (Fig. 2). Based on these cutoffs, zero clones (0%), and 26 (79%) of the MIX samples were classified as complex isolates. When evaluated separately, the CI parameter was the most specific, as only one clone was classified as complex (false positive), compared to 3 for CMH. However, CI was the least sensitive, as it only classified 26/33 (79%) MIX as complex, when compared to 31/33 (94%) for CMH (Fig. 2A and B, Supplementary Figure_2).

The complexity estimation accuracy was greatly influenced by the proportion of the secondary isolate, where lower proportions resulted in false negative results. None of the six MIX samples where the secondary clone read proportion was 2.5% was classified as complex. Increasing the proportion of the secondary clone resulted in higher accuracy, where five of the six samples where the secondary clone corresponded to 5% of the reads, and all samples where the secondary clone had 10–50% of the reads were classified as complex (Supplementary Figure_2). This was expected, as a low proportion of the secondary clone had a low impact in AARD distributions (Fig. 2B). Hence, our method can detect complex isolates when the secondary clone represents at least 5–10% of the reads.

To evaluate the impact of the number of heterozygous SNPs in the complexity estimation, the heterozygous SNP counts for each MIX sample was downsampled to 10, 50, 100, 300 or full set, and the accuracy to classify each group as complex was assessed (Supplementary Figure_2). To remove potential sampling bias, the analysis was repeated in 100 iterations, re-sampling random SNP positions each time, and the final results are a combination of all iterations. When compared with the full dataset, which had between 978 and 5910 SNPs, the use of 10 SNPs resulted in poor accuracy in all proportions of the secondary clone. By using 100 and 300 SNPs, the results were similar to those observed for the full set, with lower accuracy only for samples with ~ 5% of the reads originating from the secondary clone (Supplementary Figure_2, Supplementary_table_5). Hence, the complexity index estimation requires 100 or more heterozygous SNPs to be accurate.

Besides multiclonality, another source of complexity is polyploidy, having extra full sets of chromosomal copies. To evaluate the impact of polyploidy in the complexity estimations, we used the T. cruzi dataset described in Matos 2022, [17], containing 8 diploid parental clones, and 11 triploid or tetraploid hybrids clones, where the somy of some were validated by flow cytometry Supplementary_Table_2. As performed for the multiclonal isolates, the SNPs counts for each sample were downsampled to 10, 50, 100, 300 or full set, in 100 iterations, and the accuracy to classify each group as complex was evaluated (Fig. 2C and D; and Supplementary Table 6).

Using the combination of CI and CHM cutoffs, on average 4.4, 73.5, 82.5, 90.9 and 100% of the polyploid isolates, respectively for the 10, 50, 100, 300 or the full set of SNPs were correctly classified as complex. No parental diploid clones were classified as complex in any replicate. As expected, the triploid isolates had a distribution of AARD with peak distributions in 0.33 and 0.66, while the tetraploid had peaks in 0.25, 0.5 and 0.75. Both triploid and tetraploid isolates had an CI higher than the observed for the diploid isolates (Fig. 2C and D, Supplementary_Figure 3). These results suggest that the complexity estimate can also be used to identify polyploid isolates with reasonable sensitivity (~ 80%) when 100 or more heterozygous SNPs are present.

Based on these results, we decided to use the combined results of CI and CHM tests to identify complex isolates, and to only evaluate samples with 100 or more SNPs. A conservative approach that minimises false positives, accepting some false negatives, especially in cases where the secondary clone proportion is low.

3.3 Complexity evaluation among Trypanosomatid species:

After establishing the accuracy of the CI metric to identify multiclonal and polyploid samples with simulated and controlled data, we estimated the complexity in a total of 530 laboratory and field isolates from L. donovani, L. braziliensis, T. brucei and T. cruzi, identifying a total of 28 complex isolates (Fig. 3, Table 1, Supplementary Figs. 4–7). The CI cutoff was similar among the evaluated species, with the lowest value in T. cruzi (0.072) and the highest in L. braziliensis (0.089), which supports the robustness of the method. We propose a global cutoff of 0.1 (slightly higher than the highest cutoff, in L. braziliensis) as a value that may be used to classify any Trypanosomatid isolate, or possibly other diploid eukaryotic samples, as complex, which will allow any researcher to classify single isolates without the need of population data to estimate a custom cutoff. Samples with CI values lower than the global cutoff but still higher than their species cutoff were classified as “potential complex” and evaluated separately. Only three potentially complex isolates were identified, two in T. cruzi and one in L. donovani.

Even though we removed aneuploid chromosomes that had deviations from the mean genome coverage from each isolate, the intra isolate chromosome mosaicism (mosaic aneuploidy, and chromosome imbalance) [47–51] may add noise to complexity measurements in field isolates, by having unbalanced values in a few chromosomes. Hence, we are only considering as “complex”, isolates that had at least 50% of its evaluated chromosomes with a mean complexity value higher than 0.1.

The proportion of isolates that were classified as complex varied across clades, where T. cruzi and T. brucei had the lowest (~ 2.5%) and L. braziliensis had the highest (30%) proportion of complex samples in the evaluated dataset. Complexity values also varied in different isolates, where the lowest value was observed in T. brucei SRR17479767 (0.025), and the highest in L. donovani ERR205770 (0.398). Complex isolates have more heterozygous SNPs than non complex samples (Mann-Whitney p-value = 0.003), especially for L. braziliensis (Mann-Whitney p-value 2.87 x 10^− 6) and T. brucei (Mann-Whitney, p-value 0.0074). This increase was not observed in the L. donovani evaluated samples (Supplementary_Figure_8). The increase in overall heterozygous SNP counts may reflect the presence of multiclonal infections, where using WGS bulk data, clone specific homozygous SNPs are interpreted as heterozygous SNPs, increasing their counts; or allopolyploids.

The genome coverage from each set also varied, and this might have an impact on the CI limit of detection, as we only classified SNPs with coverage ≥ 5 in both alleles as heterozygous to be used in the complexity estimations. The median of the genome coverages for each dataset was 39 for L. braziliensis, 31 for L. donovani, 56 for T. brucei and 45 for T. cruzi (Supplementary_Figure_8 G), which due to our cutoff of at least 5 reads in the rarer allele, would only allow the identification of multiclonal infections where the secondary clone proportion of reads was respectively of at least 17%, 16%, 8% and 11%.

When each dataset was evaluated separately, from the 85 evaluated L. donovani samples, 6 were classified as complex (7%), and one as potentially complex. Among the 6 complex isolates, three ERR205724 (MHOM/SD/82/GILANI), ERR205770 (MHOM/IT/02/ISS2429) and ERR205774 (MHOM/BR/2003/MAM), were already classified as multiclonal by Fransen 2020 [52], and one, ERR3956121 (1052_ToD_1_primary_neg), was classified as complex by Frassen 2021. In fact, ERR205774 also presented a higher count of heterozygous SNPs in the maxicircle sequence, which further corroborates that it is a multiclonal infection (Supplementary Figure_9). Two isolates, ERR205748 (MHOM/CY/2006/CH32) and ERR205789 (MHOM/SD/62/LRC-L61), were classified by Frassen 2020 as hybrids, and had a ARRD distribution compatible with triploidy in our analysis. Hybridizations in trypanosomatids may result in polyploid lineages, which might revert back to diploidy by genome erosion [17]. The sample that was classified as potential complex ERR3956143 (1073_ToD_1_primary_neg), corresponds to a field isolate obtained from a patient from Ethiopia, which might be multiclonal.

For the L. braziliensis dataset, from the 42 evaluated samples, 13 were classified as complex and all had previous evidence of being polyploid (30%). From these 13, 10 corresponded to experimental tetraploid hybrids, described in [53], while SRR21604774 corresponded to a triploid L. braziliensis and Leishmania guyanensis hybrid. Finally the last two samples, ERR2508271 and ERR2508272, correspond to read libraries used in the assembly of the triploid L. braziliensis M2904 genome [16, 54]. We found no strong evidence of multiclonal infections in any of the evaluated L. braziliensis samples.

From the 211 T. cruzi evaluated samples, five were classified as complex, and two were classified as potential complex. From the complex set, three were isolated from the insect vector: SRR8503553 (Panstrongylus lignarius in Peru); SRR3676272 and SRR3676273 (Triatoma dimidiata in Texas) [55–57]. The AARD density peaks in these three samples are similar to those expected for triploid isolates (0.33 and 0.66), suggesting that they are polyploid. The other two complex samples were isolated from chronic chagasic human patients in Panama (SRR3676281, SRR3676310) [55], and had AARD peaks that are not similar to what is expected for tri or tetraploid isolates. This suggests that they might be multiclonal infections. In fact, SRR3676310 had also a higher count of heterozygous SNPs in the maxicircle sequence when compared to other T. cruzi isolates (Supplementary_Figure_9), which further support that it is potentially a multiclonal infection.

Finally, for T. brucei we identified 4 complex isolates in a dataset of 159 samples. From those, SRR17479764 corresponds to a triploid hybrid from the J10 and KETRI 1738 strains [58], while SRR17479766 was previously suggested to be a multiclonal infection [58]. The final two complex T. brucei strains, ERR270813 and SRR6052140 have AARD profiles that are similar to what is respectively expected for tetraploid and triploid isolates.

Table 1

**Complexity evaluation of each Trypanosomatid group of samples.** In the Complex samples column, the number in parentheses indicates the number of potential complex samples.
Species	Sample number	CI threshold	Max. CI	Min. CI	Complex samples	Complex %	Assessment
L. donovani	85	0.083	0.398	0.041	6 (+ 1)	7	4 multiclonal 2 polyploid
L. braziiensis	42	0.089	0.194	0.035	13	30	13 polyploid
T. brucei	159	0.077	0.167	0.025	4	2.5	1 multiclonal 3 polyploid
T. cruzi	211	0.072	0.227	0.030	5 (+ 2)	2.3	2 multiclonal (chronic cases) 3 triploid (insect source)
Simulated	33	0.075	0.3	0.05	25	75	-
Polyploid	11	0.084	0.22	0.134	11	100	-

Taken together these results suggest that complex isolates represent a small percentage of the cultured field isolates for all the TriTryp species evaluated. This corresponds to the lower bound of potential complex infections compared to what is observed in natural conditions due to limitations in parasite isolation, culture and a low proportion of the secondary clone, which hampers complexity detection by our method. We suggest that the CI should be estimated in all new WGS from Trypanosomatids isolates obtained in the future, to identify polyploid/multiclonal isolates before proceeding with downstream analysis. To facilitate this, we provide the R code for this method on github (https://github.com/jaumlrc/Complex-Infections.git), which requires only a variant call format (VCF) file to produce complexity estimates.

In the present study, we identified and characterised two genome modifications that result in more than two haplotypes being present in a single parasite isolate in Trypanosomatids: multiclonal infections and polyploidy. We developed and validated a method to assess the complexity of Trypanosomatid samples based on WGS reads, and implemented the method to evaluate complexity in a representative collection of Leishmania, T. cruzi and T. brucei field-isolates and lab strains. Our method only uses chromosomes with similar somy as the genome ploidy, and removes genes with evidence of duplication/loss, as these could be confounding factors in the estimations. We have identified complex (polyploidy or multiclonal) infections in all evaluated species, and proposed a global complexity index cutoff that can be used in any Trypanosomatid single sample, and likely other diploid eukaryote samples. We provide an R script that can estimate complexity directly from VCF files (https://github.com/jaumlrc/Complex-Infections.git).

In the last decade, the reduction in sequencing costs and the relevance of questions that may be answered with genomic data have resulted in a large increase in the number of studies that generate population WGS data for trypanosomatid parasites [31, 50, 52, 55, 59–64]. However, the occurrence of multiclonal infection and polyploidy is not always assessed in these studies. Complex infections also occur in bacterial infections, viruses and some protozoan parasites as Plasmodium, where the main stage that infects humans and other mammalian hosts is haploid [65, 66]. Trypanosomatid parasites are usually diploid, and often aneuploid and/or polyploid [16, 20, 50, 67], where the somy of different chromosomes can vary even within clones [47]. This increases the challenge of estimating complex infections in these parasites. Hence, a method is needed to identify complex infections using WGS in these species at scale, that takes into account gene copy number variants and aneuploidy, which might be used to evaluate publicly available datasets, as well as future in projects.

By using WGS reads, the method that we propose has the advantage of assessing genome-wide SNP variation as evidence for complexity simply and robustly [22, 52]. When compared to methods based on microsatellite loci and marker genes [23–25, 27], the use of WGS data allows the removal of aneuploid chromosomes and duplicated genes by read depth values, which may add noise if not removed. A clonal aneuploid isolate with a trisomic chromosome containing three different alleles (one in each chromosome) in a microsatellite locus might be classified as “multiclonal” in microsatellite analysis, for having more than two alleles. This is especially relevant if the marker is in the trypanosomatid ancestral supernumerary chromosome (TASC), which has been shown to have four stable copies and a higher sequence variability when compared to other chromosomes [14]. By evaluating the complexity in each euploid chromosome from an isolate we could separate complex infections (multiclonal/polyploid) from “chromosome instability” (CIN) and “mosaic aneuploidy” events [16, 47, 49, 50]. This was achieved by only classifying as complex field isolates that have complexity evidence supported by at least half of the evaluated chromosomes.

In the current study, we identified a low proportion of complex infections in all Trypanosomatid field isolates and lab derived strains, with clear perturbations of the AARD distributions, and high CI values. We identified around 7% of complex infections for the L. donovani group, where 4 isolates had evidence of multiclonal infections and 2 isolates had evidence of polyploidy. This is in accordance with what was identified in [23], where even though different Leishmania genotypes were identified in different tissues, the number of isolates with MOI in the same tissue was low. For T. cruzi, we identified a very low proportion of complex infections (~ 2%), which is lower than the ~ 15–17% that was reported in the literature for inter Discrete Typing Units (DTU) [68] mixed infections in human patients from Latin America [24, 69, 70]; and to the ~ 13% of MOI in the vector Triatoma infestans [71]. This might be caused in part as a significant proportion of the evaluated T. cruzi isolates were cloned prior to sequencing [61], which would remove multiclonal infections but not polyploidy. As cloned samples could be polyploid, they were still evaluated in this work. Although most of the T. cruzi isolates had an AARD distribution that matched the expectation from a “non-complex clonal, euploid, diploid isolate”, there were some non-complex isolates with perturbations in the AARD distribution and high CI; such as SRR3676315, SRR3676316, SRR3676317, SRR3676318, SRR3676319; that had a distribution pattern similar to the potential complex isolate SRR3676320. These isolates had a high CI in less than half of the evaluated chromosomes, which suggests that they have a high level of mosaic aneuploidy and CIN [20, 49–51], rather than being polyploid or multiclonal infections. T. brucei isolates also had a low proportion of complex infections identified in the WGS data (~ 2%) with only one isolate with strong evidence of multiclonal infection, which is lower than the 8–20% of multiclonal infections reported in humans and vector infections in East Africa [27].

Potential limitations of complexity estimation based on WGS are data collection, processing and genome coverage. Most Trypanosomatid WGS data is obtained from parasites that are isolated from the host and cultured in axenic media or used to infect mice, which might reduce complexity when compared to the variation present in the patient [22, 23]. Different strains might have different growth rates in media, and secondary clones in low proportions might be outcompeted in culture. The methodologies to sequence parasite genomes directly from patient tissue, such as Selective Whole Genome Amplification (SWGA) [72–74], SureSelect [22, 75] and Nanopore adaptive sampling [76, 77] are improving in the last years, and might allow WGS in Trypanosomatids to be done in large scale without a culturing step in a affordable way in the future. This might allow complex infection assessments using WGS data to be performed directly from host tissues, as long as allelic proportions are not altered by these methods.

Another potential limitation of complexity estimations in multiclonal infections in WGS data is the proportion of the secondary clone. The proposed method was able to identify complex infections when the secondary clone corresponded to at least 5–10% of the sequencing reads. This is in the range of what was observed WGS in clinical samples with Sure-select sequencing, where in the three identified complex infections, the proportion of the secondary clone was ~ 6–10% both in SureSelect and in cultured samples [22]. However, the mean genome coverage of the laboratory and field isolates evaluated here varied from 29 to 56 among the datasets, limiting our potential to identify multiclonal infections to cases where the secondary clone corresponded to at least 8–17% of the total reads in the sample. This might lead to an underestimation of the total frequency of complex infections identified. The multiclonal isolate with the lowest genome coverage was ERR3956121, from the L. donovani group, with a genome coverage of 27. Differently to what is observed for multiclonal infections, the low coverage should not have a great impact on polyploidy estimations, as the proportion of the rarer allele should be higher in this case.

Due to ethical and health limitations, Trypanosomatid parasite isolates are usually obtained from only one tissue during diagnostic procedures in human patients, such as bone marrow, lymph nodes, spleen, heart, gut or blood. This might mitigate the complex infection frequency estimation, as different organs might harbour different parasite strains/variants [23, 78]. A potential option to assess the complexity of Trypanosomatids infection in different organs is using samples from reservoirs, as dogs with visceral leishmaniasis in Brazil, where samples can be collected from any tissue post-mortem, with the approval of owners [23]. Similar analysis might be performed on other reservoirs for T. cruzi and T. brucei.

While polyploidy appears to be a relatively common occurrence within Trypanosomatids [14, 17–20], including an example of a two-species allotriploid [79], we detect few multiclonal infections in the Trypanosomatid read libraries examined here (Table 1; six L. donovani multiclonal, one T. brucei and two T. cruzi). However, all these Trypanosomatid species do undergo some degree of sexual recombination and outcrossing [29], indicating that different genotypes must be present in the same insect at some point to undergo meiosis and outcrossing. We imagine two, non mutually exclusive, explanations for these results. First, the occurrence of multiclonal infections may be underestimated. Laboratory culture of strains is likely to result in loss of genotypes that are less fit in culture, reducing the apparent complexity of the infections below our threshold of detection. Consistent with this view, a study of infections in HIV + and HIV- visceral leishmaniasis cases Ethiopia found that 6 of the 68 infections (9%) were multiclonal [31], and a study of canine leishmaniasis in the state of Mato Grosso, Brazil identified 9 multilocus (polyclonal) genotypes in different organs, out of 23 genotyped (39%) [23]. These studies indicate that substantial proportions of multiclonal infections do occur in some populations. In other populations, multiclonal infections and outcrossing may be rare. Bottlenecks in the number of parasites transferred to or from vectors may reduce genetic diversity of infections at the outset, and long incubation of parasites within mammalian hosts with selection for the fittest parasite genotype may reduce genetically complex populations to a single clone. A reduction in within-host diversity would be expected to reduce outcrossing, unless vectors frequently feed on more than one host. We can expect that there will be alternative explanations for different species, populations, depending on the frequencies of transmission, endemicity and within host/within vector population dynamics. It is our perspective that more study of these factors will enhance our understanding of transmission dynamics in Trypanosomatids.

The method we describe can accurately identify polyploid isolates; and can identify multiclonal infections in samples sequenced with modest read depth (> 25x), as little as 100 heterozygous SNPs, and as little as 5–10% of the secondary genotype. We find that multiclonality and polyploidy are not frequent in cultured Trypanosomatid field isolates, although there are good reasons to expect that our estimates are lower bounds. Future projects could explore new sequencing methods to identify multiclonal infections, such as single-cell sequencing [80] that could directly identify different clones; and long-read sequencing followed by haplotype phasing, to identify different haplotypes in a sample [81–83]. These methods could quantify the proportion and number of the different clones in a mixed infection.

Ethics approval and consent to participate
Not applicable

Consent for publication
Not applicable

Competing interests

The authors declare that they have no competing interests

Funding:

J.L.R.-C. and D.C.J. are supported by a MRC New Investigator Research grant (MR/T016019/1) and by MRC Newton as a component of the UK:Brazil Joint Centre Partnership in Leishmaniasis (MR/S019472/1).

Author Contribution

J.L.R.-C and D.C.J. conceived, designed the study, drafted and revised the manuscript.

Acknowledgement

We thank the funding agencies that provided funds for this study, the Medical Research Council (MRC), and the Newton UK:Brazil Joint CentrePartnership in Leishmaniasis. This project was undertaken on the Viking Cluster, which is a high-performance compute facility provided by the University of York. We thank the University of York High Performance Computing service, Viking, and the Research Computing team for computational support.

Data Availability

All the read libraries used in this are available in NCBI (see Supplementary Tables 1, 2 and 3). The script and test set can be obtained from GitHub: https://github.com/jaumlrc/Complex-Infections.git

Burza S, Croft SL, Boelaert M, Leishmaniasis. Lancet. 2018;392:951–70.
Kennedy PGE. Update on human African trypanosomiasis (sleeping sickness). J Neurol. 2019;266:2334–7.
Horn D. A profile of research on the parasitic trypanosomatids and the diseases they cause. PLoS Negl Trop Dis. 2022;16:e0010040.
Vickerman K. Antigenic variation in trypanosomes. Nature. 1978;273:613–7.
Horn D. Antigenic variation in African trypanosomes. Mol Biochem Parasitol. 2014;195:123–9.
Stockdale C, Swiderski MR, Barry JD, McCulloch R. Antigenic variation in Trypanosoma brucei: joining the DOTs. PLoS Biol. 2008;6:e185.
Faria J, Briggs EM, Black JA, McCulloch R. Emergence and adaptation of the cellular machinery directing antigenic variation in the African trypanosome. Curr Opin Microbiol. 2022;70:102209.
De Pablos LM, Osuna A. Multigene families in Trypanosoma cruzi and their role in infectivity. Infect Immun. 2012;80:2258–64.
El-Sayed NM, Myler PJ, Bartholomeu DC, Nilsson D, Aggarwal G, Tran A-N, et al. The genome sequence of Trypanosoma cruzi, etiologic agent of Chagas disease. Science. 2005;309:409–15.
Herreros-Cabello A, Callejas-Hernández F, Gironès N, Fresno M. Trypanosoma Cruzi Genome: Organization, Multi-Gene Families, Transcription, and Biological Implications. Genes. 2020;11.
Gupta G, Oghumu S, Satoskar AR. Mechanisms of immune evasion in leishmaniasis. Adv Appl Microbiol. 2013;82:155–84.
Cardoso MS, Reis-Cunha JL, Bartholomeu DC. Evasion of the Immune Response by Trypanosoma cruzi during Acute Infection. Front Immunol. 2015;6:659.
Fernandes MC, Andrews NW. Host cell invasion by Trypanosoma cruzi: a unique strategy that promotes persistence. FEMS Microbiol Rev. 2012;36:734–47.
Reis-Cunha JL, Pimenta-Carvalho SA, Almeida LV, Coqueiro-Dos-Santos A, Marques CA, Black JA, et al. Ancestral aneuploidy and stable chromosomal duplication resulting in differential genome structure and gene expression control in trypanosomatid parasites. Genome Res. 2024;34:441–53.
Dumetz F, Imamura H, Sanders M, Seblova V, Myskova J, Pescher P et al. Modulation of Aneuploidy in Leishmania donovani during Adaptation to Different In Vitro and In Vivo Environments and Its Impact on Gene Expression. MBio. 2017;8.
Rogers MB, Hilley JD, Dickens NJ, Wilkes J, Bates PA, Depledge DP, et al. Chromosome and gene copy number variation allow major structural change between species and strains of Leishmania. Genome Res. 2011;21:2129–42.
Matos GM, Lewis MD, Talavera-López C, Yeo M, Grisard EC, Messenger LA et al. Microevolution of Trypanosoma cruzi reveals hybridization and clonal mechanisms driving rapid genome diversification. Elife. 2022;11.
Louradour I, Ferreira TR, Ghosh K, Shaik J, Sacks D. Vitro Generation of Leishmania Hybrids. Cell Rep. 2020;31:107507.
Tihon E, Imamura H, Dujardin J-C, Van Den Abbeele J. Evidence for viable and stable triploid Trypanosoma congolense parasites. Parasit Vectors. 2017;10:468.
Black JA, Reis-Cunha JL, Cruz AK, Tosi LRO. Life in plastic, it’s fantastic! How Leishmania exploit genome instability to shape gene expression. Front Cell Infect Microbiol. 2023;13:1102462.
Balmer O, Tanner M. Prevalence and implications of multiple-strain infections. Lancet Infect Dis. 2011;11:868–78.
Domagalska MA, Imamura H, Sanders M, Van den Broeck F, Bhattarai NR, Vanaerschot M, et al. Genomes of Leishmania parasites directly sequenced from patients with visceral leishmaniasis in the Indian subcontinent. PLoS Negl Trop Dis. 2019;13:e0007900.
Cupolillo E, Cavalcanti AS, Ferreira GEM, Boité MC, Morgado FN, Porrozzi R. Occurrence of multiple genotype infection caused by Leishmania infantum in naturally infected dogs. PLoS Negl Trop Dis. 2020;14:e0007986.
Martinez-Perez A, Poveda C, Ramírez JD, Norman F, Gironés N, Guhl F, et al. Prevalence of Trypanosoma cruzi’s Discrete Typing Units in a cohort of Latin American migrants in Spain. Acta Trop. 2016;157:145–50.
Llewellyn MS, Rivett-Carnac JB, Fitzpatrick S, Lewis MD, Yeo M, Gaunt MW, et al. Extraordinary Trypanosoma cruzi diversity within single mammalian reservoir hosts implies a mechanism of diversifying selection. Int J Parasitol. 2011;41:609–14.
Pronovost H, Peterson AC, Chavez BG, Blum MJ, Dumonteil E, Herrera CP. Deep sequencing reveals multiclonality and new discrete typing units of Trypanosoma cruzi in rodents from the southern United States. J Microbiol Immunol Infect. 2020;53:622–33.
Balmer O, Caccone A. Multiple-strain infections of Trypanosoma brucei across Africa. Acta Trop. 2008;107:275–9.
Bose J, Kloesener MH, Schulte RD. Multiple-genotype infections and their complex effect on virulence. Zoology. 2016;119:339–49.
Gutiérrez-Corbo C, Domínguez-Asenjo B, Martínez-Valladares M, Pérez-Pertejo Y, García-Estrada C, Balaña-Fouce R et al. Reprod Trypanosomatids: Past Present Biology. 2021;10.
Read AF, Taylor LH. The ecology of genetically diverse infections. Science. 2001;292:1099–102.
Franssen SU, Takele Y, Adem E, Sanders MJ, Müller I, Kropf P, et al. Diversity and Within-Host Evolution of Leishmania donovani from Visceral Leishmaniasis Patients with and without HIV Coinfection in Northern Ethiopia. MBio. 2021;12:e0097121.
Darvishi M, Yaghoobi-Ershadi MR, Shahbazi F, Akhavan AA, Jafari R, Soleimani H, et al. Epidemiological study on sand flies in an endemic focus of cutaneous leishmaniasis, bushehr city, southwestern iran. Front Public Health. 2015;3:14.
Chajbullinova A, Votypka J, Sadlova J, Kvapilova K, Seblova V, Kreisinger J, et al. The development of Leishmania turanica in sand flies and competition with L. major. Parasit Vectors. 2012;5:219.
Lypaczewski P, Matlashewski G. Leishmania donovani hybridisation and introgression in nature: a comparative genomic investigation. Lancet Microbe. 2021;2:e250–8.
MacLeod A, Turner CM, Tait A. A high level of mixed Trypanosoma brucei infections in tsetse flies detected by three hypervariable minisatellites. Mol Biochem Parasitol. 1999;102:237–48.
Balmer O, Stearns SC, Schötzau A, Brun R. Intraspecific competition between co-infecting parasite strains enhances host survival in African trypanosomes. Ecology. 2009;90:3367–78.
Ferreira TR, Sacks DL. Experimental Hybridization in Leishmania: Tools for the Study of Genetic Exchange. Pathogens. 2022;11.
Akopyants NS, Kimblin N, Secundino N, Patrick R, Peters N, Lawyer P, et al. Demonstration of genetic exchange during cyclical development of Leishmania in the sand fly vector. Science. 2009;324:265–8.
Inbar E, Akopyants NS, Charmoy M, Romano A, Lawyer P, Elnaiem D-EA, et al. The mating competence of geographically diverse Leishmania major strains in their natural and unnatural sand fly vectors. PLoS Genet. 2013;9:e1003672.
Romano A, Inbar E, Debrabant A, Charmoy M, Lawyer P, Ribeiro-Gomes F, et al. Cross-species genetic exchange between visceral and cutaneous strains of Leishmania in the sand fly vector. Proc Natl Acad Sci U S A. 2014;111:16808–13.
Nucleotide Sequence I. The sequence read archive. Nucleic acids. 2010.
Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018;34:i884–90.
Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv [q-bio.GN]. 2013.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Emms DM, Kelly S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 2019;20:238.
Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO et al. Twelve years of SAMtools and BCFtools. Gigascience. 2021;10.
Negreira GH, Monsieurs P, Imamura H, Maes I, Kuk N, Yagoubat A, et al. High throughput single-cell genome sequencing gives insights into the generation and evolution of mosaic aneuploidy in Leishmania donovani. Nucleic Acids Res. 2022;50:293–305.
Negreira GH, de Groote R, Van Giel D, Monsieurs P, Maes I, de Muylder G, et al. The adaptive roles of aneuploidy and polyclonality in Leishmania in response to environmental stress. EMBO Rep. 2023;24:e57413.
Lachaud L, Bourgeois N, Kuk N, Morelle C, Crobu L, Merlin G, et al. Constitutive mosaic aneuploidy is a unique genetic feature widespread in the Leishmania genus. Microbes Infect. 2014;16:61–6.
Reis-Cunha JL, Rodrigues-Luiz GF, Valdivia HO, Baptista RP, Mendes TAO, de Morais GL, et al. Chromosomal copy number variation reveals differential levels of genomic plasticity in distinct Trypanosoma cruzi strains. BMC Genomics. 2015;16:499.
Reis-Cunha JL, Valdivia HO, Bartholomeu DC. Gene and Chromosomal Copy Number Variations as an Adaptive Mechanism Towards a Parasitic Lifestyle in Trypanosomatids. Curr Genomics. 2018;19:87–97.
Franssen SU, Durrant C, Stark O, Moser B, Downing T, Imamura H et al. Global genome diversity of the Leishmania donovani complex. Elife. 2020;9.
Louradour I, Ferreira TR, Duge E, Karunaweera N, Paun A, Sacks D. Stress conditions promote Leishmania hybridization in vitro marked by expression of the ancestral gamete fusogen HAP2 as revealed by single-cell RNA-seq. Elife. 2022;11.
González-de la Fuente S, Camacho E, Peiró-Pastor R, Rastrojo A, Carrasco-Ramiro F, Aguado B, et al. Complete and de novo assembly of the Leishmania braziliensis (M2904) genome. Mem Inst Oswaldo Cruz. 2018;114:e180438.
Talavera-López C, Messenger LA, Lewis MD, Yeo M, Reis-Cunha JL, Matos GM, et al. Repeat-Driven Generation of Antigenic Diversity in a Major Human Pathogen, Trypanosoma cruzi. Front Cell Infect Microbiol. 2021;11:614665.
Berry ASF, Salazar-Sánchez R, Castillo-Neyra R, Borrini-Mayorí K, Chipana-Ramos C, Vargas-Maquera M, et al. Immigration and establishment of Trypanosoma cruzi in Arequipa, Peru. PLoS ONE. 2019;14:e0221678.
Berry ASF, Salazar-Sánchez R, Castillo-Neyra R, Borrini-Mayorí K, Chipana-Ramos C, Vargas-Maquera M, et al. Sexual reproduction in a natural Trypanosoma cruzi population. PLoS Negl Trop Dis. 2019;13:e0007392.
Kay C, Peacock L, Williams TA, Gibson W. Signatures of hybridization in Trypanosoma brucei. PLoS Pathog. 2022;18:e1010300.
Reis-Cunha JL, Baptista RP, Rodrigues-Luiz GF, Coqueiro-Dos-Santos A, Valdivia HO, de Almeida LV, et al. Whole genome sequencing of Trypanosoma cruzi field isolates reveals extensive genomic variability and complex aneuploidy patterns within TcII DTU. BMC Genomics. 2018;19:816.
Almeida LV, Coqueiro-Dos-Santos A, Rodriguez-Luiz GF, McCulloch R, Bartholomeu DC, Reis-Cunha JL. Chromosomal copy number variation analysis by next generation sequencing confirms ploidy stability in Trypanosoma brucei subspecies. Microb Genom. 2018;4.
Schwabl P, Imamura H, Van den Broeck F, Costales JA, Maiguashca-Sánchez J, Miles MA, et al. Meiotic sex in Chagas disease parasite Trypanosoma cruzi. Nat Commun. 2019;10:3972.
Weir W, Capewell P, Foth B, Clucas C, Pountain A, Steketee P, et al. Population genomics reveals the origin and asexual evolution of human infective trypanosomes. Elife. 2016;5:e11473.
Zackay A, Cotton JA, Sanders M, Hailu A, Nasereddin A, Warburg A, et al. Genome wide comparison of Ethiopian Leishmania donovani strains reveals differences potentially related to parasite survival. PLoS Genet. 2018;14:e1007133.
Grace CA, Sousa Carvalho KS, Sousa Lima MI, Costa Silva V, Reis-Cunha JL, Brune MJ, et al. Parasite Genotype Is a Major Predictor of Mortality from Visceral Leishmaniasis. MBio. 2022;13:e0206822.
Assefa SA, Preston MD, Campino S, Ocholla H, Sutherland CJ, Clark TG. estMOI: estimating multiplicity of infection using parasite deep sequencing data. Bioinformatics. 2014;30:1292–4.
Zhong D, Koepfli C, Cui L, Yan G. Molecular approaches to determine the multiplicity of Plasmodium infections. Malar J. 2018;17:172.
Reis-Cunha JL, Valdivia HO, Bartholomeu DC. Trypanosomatid Genome Organization and Ploidy. Front Parasitol. 2017;:61–103.
Velásquez-Ortiz N, Herrera G, Hernández C, Muñoz M, Ramírez JD. Discrete typing units of Trypanosoma cruzi: Geographical and biological distribution in the Americas. Sci Data. 2022;9:360.
Perez-Molina JA, Poveda C, Martinez-Perez A, Guhl F, Monge-Maillo B, Fresno M, et al. Distribution of Trypanosoma cruzi discrete typing units in Bolivian migrants in Spain. Infect Genet Evol. 2014;21:440–2.
Cura CI, Lucero RH, Bisio M, Oshiro E, Formichelli LB, Burgos JM, et al. Trypanosoma cruzi discrete typing units in Chagas disease patients from endemic and non-endemic regions of Argentina. Parasitology. 2012;139:516–21.
Perez E, Monje M, Chang B, Buitrago R, Parrado R, Barnabé C, et al. Predominance of hybrid discrete typing units of Trypanosoma cruzi in domestic Triatoma infestans from the Bolivian Gran Chaco region. Infect Genet Evol. 2013;13:116–23.
Pilling OA, Reis-Cunha JL, Grace CA, Berry ASF, Mitchell MW, Yu JA, et al. Selective whole-genome amplification reveals population genetics of Leishmania braziliensis directly from patient skin biopsies. PLoS Pathog. 2023;19:e1011230.
Clarke EL, Sundararaman SA, Seifert SN, Bushman FD, Hahn BH, Brisson D. swga: a primer design toolkit for selective whole genome amplification. Bioinformatics. 2017;33:2071–7.
Leichty AR, Brisson D. Selective whole genome amplification for resequencing target microbial species from complex natural samples. Genetics. 2014;198:473–81.
Cai W, Nunziata S, Rascoe J, Stulberg MJ. SureSelect targeted enrichment, a new cost effective method for the whole genome sequencing of Candidatus Liberibacter asiaticus. Sci Rep. 2019;9:18962.
Martin S, Heavens D, Lan Y, Horsfield S, Clark MD, Leggett RM. Nanopore adaptive sampling: a tool for enrichment of low abundance species in metagenomic samples. Genome Biol. 2022;23:11.
De Meulenaere K, Cuypers WL, Gauglitz JM, Guetens P, Rosanas-Urgell A, Laukens K, et al. Selective whole-genome sequencing of Plasmodium parasites directly from blood samples by nanopore adaptive sampling. MBio. 2024;15:e0196723.
Santi-Rocca J, Fernandez-Cortes F, Chillón-Marinas C, González-Rubio M-L, Martin D, Gironès N, et al. A multi-parametric analysis of Trypanosoma cruzi infection: common pathophysiologic patterns beyond extreme heterogeneity of host responses. Sci Rep. 2017;7:8893.
Van den Broeck F, Heeren S, Maes I, Sanders M, Cotton JA, Cupolillo E, et al. Genome Analysis of Triploid Hybrid Leishmania Parasite from the Neotropics. Emerg Infect Dis. 2023;29:1076–8.
Nawy T. Single-cell sequencing. Nat Methods. 2014;11:18.
Maestri S, Maturo MG, Cosentino E, Marcolungo L, Iadarola B, Fortunati E et al. A Long-Read Sequencing Approach for Direct Haplotype Phasing in Clinical Settings. Int J Mol Sci. 2020;21.
Kronenberg ZN, Rhie A, Koren S, Concepcion GT, Peluso P, Munson KM, et al. Extended haplotype-phasing of long-read de novo genome assemblies using Hi-C. Nat Commun. 2021;12:1935.
Hosch S, Wagner P, Giger JN, Dubach N, Saavedra E, Perno CF, et al. PHARE: a bioinformatics pipeline for compositional profiling of multiclonal Plasmodium falciparum infections from long-read Nanopore sequencing data. J Antimicrob Chemother. 2024;79:987–96.

No competing interests reported.

Download PDF

Editorial decision: Revision requested
23 Jul, 2024
Reviews received at journal
20 Jul, 2024
Reviews received at journal
19 Jul, 2024
Reviews received at journal
16 Jul, 2024
Reviewers agreed at journal
08 Jul, 2024
Reviewers agreed at journal
05 Jul, 2024
Reviewers agreed at journal
01 Jul, 2024
Reviewers agreed at journal
01 Jul, 2024
Reviewers invited by journal
28 Jun, 2024
Editor assigned by journal
28 Jun, 2024
Submission checks completed at journal
27 Jun, 2024
First submitted to journal
27 Jun, 2024

You are reading this latest preprint version

Detecting complex infections in Trypanosomatids using whole genome sequencing

Status:

Version 1

Abstract

Background

Results

Conclusions

Figures

Background

Methods

2.1 Overview

2.2 Heterozygous SNP calling and alternate allele read depth (AARD) estimation

2.3 Complexity evaluation, Cochran-Mantel-Haenszel (CMH) estimation and AARD distribution

2.4 Assessing complexity on mixed samples

2.5 Assessing complexity on polyploid samples

2.6 Assessing complexity on laboratory and field isolates

Results

3.1 Genomic phenomena that might alter Trypanosomatid isolate complexity

3.3 Complexity evaluation among Trypanosomatid species:

Discussion

Conclusions

Declarations

Ethics approval and consent to participate
Not applicable

Consent for publication
Not applicable

Competing interests

Funding:

Author Contribution

Acknowledgement

Data Availability

References

Additional Declarations

Supplementary Files

Status:

Version 1

Detecting complex infections in Trypanosomatids using whole genome sequencing

Status:

Version 1

Abstract

Background

Results

Conclusions

Figures

Background

Methods

2.1 Overview

2.2 Heterozygous SNP calling and alternate allele read depth (AARD) estimation

2.3 Complexity evaluation, Cochran-Mantel-Haenszel (CMH) estimation and AARD distribution

2.4 Assessing complexity on mixed samples

2.5 Assessing complexity on polyploid samples

2.6 Assessing complexity on laboratory and field isolates

Results

3.1 Genomic phenomena that might alter Trypanosomatid isolate complexity

3.3 Complexity evaluation among Trypanosomatid species:

Discussion

Conclusions

Declarations

Ethics approval and consent to participate Not applicable

Consent for publication Not applicable

Competing interests

Funding:

Author Contribution

Acknowledgement

Data Availability

References

Additional Declarations

Supplementary Files

Status:

Version 1

Ethics approval and consent to participate
Not applicable

Consent for publication
Not applicable