The CRISPR locus is an important genetic locus that can be used for bacterial typing in molecular epidemiology analyses [39]. Whereas CRISPR-based typing and comparison of strains has become an established technique for human pathogens, it has remained largely unexplored for plant pathogens [50]. To the best of our knowledge, only a few studies have been published, mostly on a single plant pathogen, E. amylovora [42,43,51]. Very recently, two CRISPR loci, one of which displayed sufficient complexity for being used as a strain subtyping technique, were reported from X. fragariae [44]. CRISPR data, analyzed from a collection of 55 X. fragariae strains, yielded a genetic structure in agreement with that derived from MLVA data targeting 27 microsatellites and 9 minisatellites.
Presence of CRISPR loci in citrus-infection xanthomonads
In the present study, we analyzed 57 strains of X. citri for the presence of CRISPR loci. Our results demonstrated that both the cas1 gene and the CRISPR array are conserved in all 56 strains of X. citri pv. citri. However, our PCR screen failed to amplify a cas1 gene or a CRISPR array in the X. citri pv. bilvae strain NCPPB 3213. We conclude that at least this X. citri pv. bilvae strain does not have a X. citri pv. citri-type CRISPR/Cas system, which is supported by the absence of CRISPR-related sequences in its draft genome sequence. Notably, other xanthomonads infecting citrus, such as Xanthomonas citri pv. aurantifolii (strains 1566, FDC1559, FDC1561, FDC1609, ICPB 10535, ICPB 11122) and Xanthomonas euvesicatoria pv. citrumelo (strain F1, synonymous to FL1195) do also not have CRISPR loci, as indicated by the absence of cas genes and CRISPR arrays in the genome sequences. Hence, CRISPR loci appear to be restricted to X. citri pv. citri among citrus-infecting xanthomonads and our results demonstrate that the cas1 gene is a useful diagnostic marker for the presence or absence of the CRISPR/Cas system and could be used to differentiate citrus pathogens of the genus Xanthomonas.
CRISPRs in X. citri pv. citri are adapted for a simple genotyping based-PCR tool
Compared to other strains of Xanthomonas, such as X. oryzae pv. oryzae [40,41], the CRISPR locus of X. citri pv. citri is rather small. Most strains of X. citri pv. citri have only 23 or fewer spacers while strains of X. oryzae pv. oryzae have been found to possess between 37 (Xo604) and 77 spacers (Xo21). Consequently, the small size of the X. citri pv. citri CRISPR loci allowed using simple conventional PCR to resolve the genetic diversity of different X. citri pv. citri strains. The PCR screening revealed considerable size variation of CRISPR loci among strains of X. citri pv. citri, suggesting that these loci consist of different numbers of spacer/repeat units due to deletion or acquisition of spacers based on their evolutionary history. Analysis of spoligotypes showed that most X. citri pv. citri strains share 23 or fewer spacers except for CFBP 2911, and that the leader-proximal spacer, which corresponds to the most recently acquired spacer, is conserved in most X. citri pv. citri strains (Figure 2). This means that these strains only differ due to loss of one or more of the 23 spacers. The fact that 23 unique sequences built up the repertoire of spacers suggests that this set of strains originated from a common ancestor that harbored all the 23 spacers. Strain CFBP 2852 represents the oldest isolate in our set of strains (Table 3), yet it lacks already spacer Xcc_14. It would be interesting to go further back in time by analyzing herbarium specimen that date back to the beginning of the 20th century and to analyse their repertoire of spacers [52].
Correlations among different DNA-based typing methods
Correlation analyses of AFLP versus CRISPR or minisatellite-based typing (MLVA–31) data revealed a fairly good congruence between these methods. We found more AFLP haplotypes (49 haplotypes) and MLVA–31 haplotypes (37 haplotypes) than CRISPR spoligotypes (25 haplotypes). Hence, the AFLP method appears to better resolve the genetic diversity among strains of X. citri pv. citri than the two other methods but suffers from technical limitations making interlaboratory comparisons difficult to achieve, a characteristic that precludes a wide use for epidemiosurveillance [17].
In general, strains belonging to a certain spoligotype clade in the CRISPR tree also cluster together in the AFLP tree (Figure 3). Exceptions were two strains, LH001–3 (spoligotype 23) and LG117 (spoligotype 25), with exceptionally small numbers of spacers, 12 and 8, respectively, which might explain their misplacement in comparison to the AFLP, MLVA–31 and SNP analyses [17,18,19]. For instance, strain LH001–3 clusters with strains LD007–1, LE116–1, LE117–1, LH37–1 and NCPPB 3562 in the AFLP analysis. The latter five strains belong to spoligotype 15. Strikingly, a single recombinational event, leading to a deletion of spacers Xcc_11 to Xcc_21, could transform spoligotype 15 into spoligotype 23 of strain LH001–3. Evolutionary speaking, such a scenario would place both strains close to each other. Similarly, both AFLP and spoligotyping cluster strains CFBP 2852, JW160–1, LB100–1 (spoligotype 2), JF090–8, JJ238–10, LG100 (spoligotype 4), NCPPB 3612 (spoligotype 5), NCPPB 3610 (spoligotype 6), and LG116 (spoligotype 7). In addition, AFLP contains as well strain LG117 in this cluster. Again, just two recombinational events, deleting spacers Xcc_01 to Xcc_03 and spacers Xcc_12 to Xcc_23, could transform spoligotype 2 into spoligotype 25 of strain LG117.
Indeed, the used algorithm considers binary information about presence or absence of individual spacers and no software is publicly available to consider the minimal number of necessary mutations for tree construction based on spoligotype data. For example, strain NCPPB 3562 contains spacers Xcc_01 to Xcc_13 and spacers Xcc_19 to Xcc_23. In contrast, strain LH001–3 contains only spacers Xcc_01 to Xcc_10 and spacers Xcc_22 to Xcc_23, i.e. this strain lacks six spacers in comparison to strain NCPPB 3562 (Xcc_11, Xcc_12, Xcc_13, Xcc_19, Xcc_20, Xcc_21), thus resulting in a large apparent distance, which does not necessarily refect the ‘true’ evolutionary distance. However, incorrect placement of a small number of strains is a common feature of many genotyping techniques. This was observed for a few host-restricted strains (JF090–8 and a few relatives), which clustered with pathotype A genetic lineage 2 strains when assayed by minisatellite-based typing (MLVA–31), whereas SNP analysis from complete genome data unambiguously assigned them to pathotype Aw [18,19]. These strains had been previously erroneously assigned to pathotype A*, as they had a Mexican lime-restricted host range and AFLP-based methods did not show any close genetic relatedness to other host-restricted A* or Aw strains [17]. This incorrect placement of a few strains both by spoligotyping and minisatellite-based typing may explain the lower Mantel value between these two techniques, as compared to the values obtained for each of these techniques when compared to AFLP (Table 2).
Distinguishing pathotypes A and A*
Interestingly, pathotype A and pathotype A* strains of our dataset with different citrus host range can be distinguished from each other by the presence or absence of spacer Xcc_06, which corresponds to the first deletion event in the evolution of pathotype A* spoligotypes. Knowledge of pathotype is of importance for disease management and has consequences for regulation measures. However, conventional determination of pathotypes is laborious, as it requires assaying citrus plants. Moreover, some PCR-based techniques failed to accurately identify pathotype A* strains [53,54]. Apart from whole genome sequence data, the most straightforward method for distinguishing pathotype A* from another X. citri pv. citri pathotype is currently MLVA–31 (or its derivative MLVA–12) targeting minisatellites [18,19].
We suggest to consider spacer Xcc_06 as a first line of evidence for the identification of pathotype A* strains using a PCR combining a spacer Xcc_06-specific primer and a primer annealing to the conserved terminator region, which would be a highly discriminatory assay. Analysis of publicly available genomic resources further confirmed the interest of spacer Xcc_06 as a diagnostic marker. Yet, it cannot be ruled out that hitherto undiscovered spoligotypes exist that could undermine such a diagnostic PCR. It is therefore necessary to sequence more CRISPR arrays or genomes, which would (i) help in estimating the discriminatory power of such an approach at a given geographical scale and (ii) allow designing complementary PCR schemes, if necessary.
Origin of spacers
CRISPR arrays represent a signature of the long history of interactions between bacteria and bacteriophages or other extrachromosomal genetic elements. To understand the evolution of CRISPR loci, it is of interest to know from where the spacer sequences derive. To elucidate their origin, we performed BLASTN searches against the NCBI GenBank. In addition to the hits in the CRISPR loci of completely sequenced X. citri pv. citri strains, we found significant hits between spacer sequences and five Xanthomonas bacteriophages, a finding that supports the principal mechanism of CRISPR immune system in bacteria. Homologies with Xanthomonas bacteriophage CP1 (GenBank accession number AB720063) were found for spacers Xcc_36, Xcc_28 and Xcc_25 (Additional file 7: Table S1). Four bacteriophages (CP1, CP2, CP115 and CP122) have been used for classification of X. citri pv. citri strains based on their sensitivity to phage for quarantine purposes [55,56]. Strains from X. citri pv. citri were variable in their sensitivity to bacteriophages CP1 and CP2 [55,57]. The studies of genomic analysis of bacteriophage CP1and CP2 have reported that the CP1 DNA sequence was detected in the genome sequence of X. campestris bacteriophage phiL7 (GenBank accession number EU717894), X. oryzae bacteriophage OP1 (GenBank accession number AP008979) and Xanthomonas bacteriophage Xp10 (GenBank accession number AY299121) [45]. In addition, a sequence in the genomic contig of the Ralstonia-related blood disease bacterium R229 (GenBank accession number FR854082) was related to spacer Xcc_31; this sequence encodes a DNA-dependent DNA polymerase with homology to DNA polymerases of the Xanthomonas-specific bacteriophages phiL7, OP1 and Xp10 [58,59,60]. Possibly, the genomic sequence of the blood disease bacterium R229 corresponds to a prophage with similarity to Xanthomonas-specific bacteriophages. Therefore, spacer Xcc_31 was likely acquired from a bacteriophage. Xanthomonas bacteriophages f20-Xaj and f30-Xaj also matched with several spacers of the 14 unique spacers of strain CFBP 2911 (Additional file 7: Table S1). Those two bacteriophages are closely related to each other and belong to the same clade as X. citri pv. citri bacteriophage CP2 [47]. Taken together, this evidence supports the hypothesis that the aforementioned spacers have been acquired from alien DNA most likely derived from bacteriophage CP1 and CP2, which were originally isolated from X. citri pv. citri strains [61].
Using less stringent thresholds (E-value smaller than 1 and no minimum criterium with respect to coverage of the query sequence), we also found a match for spacer Xcc_37 in the Xanthomonas bacteriophage CP1, and a few more bacteriophage-related matches for spacers Xcc_31, Xcc_28, Xcc_27, Xcc_11, and Xcc_10 (Additional file 7: Table S1). With even less stringent criteria there are also matches with Xanthomonas-specific bacteriophages for spacers Xcc_34 (bacteriophage CP1), Xcc_32 (bacteriophage CP1), Xcc_11 (Streptomyces phage Yaboi), and Xcc_2 (Microbacterium phage MementoMori) (data not shown). However, as demonstrated in Additional file 7: Table S1, relaxing the threshold results in an increased number of matches in genomes of diverse bacteria. Therefore, we cannot conclude that these are bona fide homologs the sequences of which have been altered with the long time since these spacers were acquired, or if these are merely false positives.
Only four of the 23 older spacers matched to sequences in GenBank that did not correspond to the CRISPR arrays of X. citri pv. citri. In all four cases, homology to sequences from integrated prophages or from a filamentous bacteriophage was observed. It was surprising that none of the older and conserved 23 spacers matched to a sequence from a bacteriophage genome whereas all the observed hits of the CFBP 2911-specific spacers corresponded to sequences from bacteriophages that have been isolated over the last 50 years. It is not clear whether this observation is merely due to sampling effects or if it reflects the fact that the sources for the 23 old spacers got extinct and only a few of the homologous sequences were vertically inherited and thus preserved in the form of prophages or remnants thereof.
Multiple genetic events have contributed to the CRISPR array diversity within X. citri pv. citri
It is interesting to note that these strains did not acquire new spacers after spacer Xcc_23. Only strain CFBP 2911 acquired 14 new spacers next to the leader sequence, which are not present in any other X. citri pv. citri strain that we have analysed (Figure 2). This finding can be explained by three scenarios. The first explanation that these 14 new spacers were deleted in all X. citri pv. citri strains but CFBP 2911 is very unlikely because CFBP 2911 does not represent an ancestral clade at the root of X. citri pv. citri phylogeny [18]. Second, it is possible but unlikely as well that none of the 56 strains except for CFBP 2911 was challenged by alien DNA elements, such as bacteriophages or plasmids, since they had acquired spacer Xcc_23. We favor the third hypothesis that the CRISPR immunity system was mutationally inactivated in its ability to acquire new spacers in the ancestor of all 56 X. citri pv. citri strains in our dataset, yet the CRISPR/Cas system was maintained during evolution as a mechanism of protection against bacteriophage infection. Possibly, a revertant evolved which regained the function of spacer acquisition, giving rise to strain CFBP 2911. Given the important role of the Cas proteins for spacer acquisition in the CRISPR/Cas system, we compared the sequences of the cas gene cluster of strain CFBP 2911 with those of other strains. However, we did not find any differences in the Cas protein sequences between CFBP 2911 and the other strains that could explain the regained CRISPR/Cas activity in strain CFBP 2911 (Additional file 8: Figure S7). Interestingly, csd1/cas8c genes of the majority of strains suffer from a frame-shift mutation due to a short tandem repeat of two base pairs (AG). Yet, strain CFBP 2911 is not the only one that has an intact copy of this gene. Therefore, the reason why strain CFBP 2911 acquired 14 extra spacers is still unclear. For further insight it would be interesting to analyze more strains from the same region as CFBP 2911 (i.e. Pakistan) by assuming that they might have undergone the same evolutionary event(s).
In addition, we found two cases of IS element insertions in CRISPR loci of X. citri pv. citri. One insertion occurred in the repeat between spacers Xcc_20 and Xcc_21 (LB302, LB305, LG115 and NCPPB 3608,) and another insertion had occurred in spacer Xcc_18 (LG097) (Figure 2, Additional files 3 to 5: Figures S3 to S5). The first four strains originated from India (LG115, NCPPB 3608) and Florida (LB302, LB305). Notably, these strains were all assigned to pathotype Aw and genetic lineage 3 based on minisatellite typing [19]. Interestingly, the spacer Xcc_14 was deleted from strains, LB302, LB305 and LG115 whereas NCPPB 3608, probably representing the ancestral spoligotype of our dataset, had all 23 spacers. Our results thus further confirm an Indian origin of the Aw strains from Florida, in agreement with outbreak investigation and previously produced genotyping data [18,19,62]. Insertion of IS elements can therefore be another source of polymorphism as frequently observed in the CRISPR locus of M. tuberculosis [63,64]. Depending on the spoligotyping scheme, insertion of an IS element into either the direct repeat or spacer sequences can influence the spoligotype pattern, resulting in apparent deletion of CRISPR sequence [65]. In such cases, binary data of the spoligotype might be unable to provide sufficient information to accurately establish genotypic relationships among bacterial isolates. This limitation needs to be considered when using spoligotyping data for molecular epidemiological strain tracking and phylogenetic analyses of pathogens [65].
Genealogy of CRISPR spoligotypes
Since the CRISPR array of all strains originated from a conserved array of 23 spacers, one can use this information to establish an evolutionary trajectory among the observed spoligotypes. To building such an evolutionary pathway one could assume to minimize the number of mutational events that are necessary to connect all spoligotypes with each other. Yet, without additional information it is impossible to be absolutely certain about a given scenario because several alternatives might exist with a similar number of postulated mutation (deletion) events. Here, we took advantage of the availability of genome sequence data for 42 out of the 56 X. citri pv. citri strains, which were used to build a robust phylogenetic tree based on whole genome alignment upon removal of regions with signs of recombination [18]. These data provided information about the evolutionary relationships among 21 spoligotypes. Only spoligotypes 7, 13, 21, 21, and 23 were not covered by full genome data. In these cases, information was taken from global studies using AFLP and MLVA data [17,19]. Based on these phylogenetic datasets, which can be considered as evolutionary neutral, we were able to manually build trees for all observed spoligotypes, with one tree representing pathotype A and another tree representing pathotype A* strains (Figures 4 and 5). Future work including strains representing a larger temporal scale, e.g. from herbarium specimen [52], together with approaches to build time-calibrated phylogenies will help to assess the speed of the molecular CRISPR clock [66].
The phylogenetic trees for pathotype A and A* strains demonstrate the utility and power of spoligotyping in order to assess the genealogy of bacterial strains. The pathotype A strains fall into two clades that are distinguished by three early deletion events (Figure 4). One clade consists of two strains from Bangladesh, LG097 and LG102 (spoligotypes 3 and 22, respectively). These two spoligotypes likely derived from a hypothetical intermediate spoligotype (missing link labelled “A” in Figure 4) that lacks spacers Xcc_03 and Xcc_21. The second clade consists of strains that all lack spacer Xcc_14. Loss of spacer Xcc_14, which is represented by strains from India, Bangladesh and the Seychelles (spoligotype 2), can thus be considered as an early event in the evolution of this clade, possibly in connection with the Indian subcontinent being regarded as a likely area of origin of X. citri pv. citri [19,67]. Interestingly, this clade contains two spoligotypes that correspond to strains from West Africa, spoligotype 15 (which also contains strain NCPPB 3562 from India) and spoligotype 14 (which also contains three strains from Brazil, FDC2017, FDC1083 and IAPAR306). Since all the West African strains were isolated after 2005 while the other strains have been isolated up to 25 years earlier it is temping to speculate that X. citri pv. citri has been introduced in West Africa at least two times, once from the Indian subcontinent and once from South America. Strikingly, this observation is backed by (i) mini- and microsatellite data where spoligotype 15 corresponds to DAPC cluster 2 and spoligotype 14 corresponds to DAPC cluster 1 [21], and (ii) by whole genome data [18].
The pathotype A* strains fall into two clades that are distinguished by the presence or absence of spacers Xcc_9 and Xcc_10 (Figure 5). One clade is restricted to strains from Cambodia and Thailand (spoligotypes 16 and 17), which result from an evolutionary pathway that involved at least four spacer/repeat deletion events (spacers Xcc_03, Xcc_08, Xcc_14/Xcc_15, Xcc_19). The other clade shows as well a strong geographic structuring. Spoligotype 18, which only contains strain from Iran, probably evolved by two deletion events (spacers Xcc_03/Xcc_04 and spacers Xcc_18/Xcc_19) from spoligotype 9, which only contains strains from Saudi Arabia. Spoligotypes 12 and 13 are restricted to strains from Ethiopia while spoligotypes 11 and 24 correspond to strains from Pakistan and India, respectively, with their ancestral spoligotype 10 consisting of strains from India, Oman and Saudi Arabia. These findings are consistent with previous minisatellite and whole genome sequence analyses [18,19].
Five of the seven pathotype Aw strains have been sequenced and allow as well their phylogenetic reconstruction [18]. Strain JF090–8 from Oman (1986) diverged early and its spoligotype 4 can be considered as the ancestor of spoligotype 7, which underwent a subsequent deletion of spacer Xcc_7 (strain LG116 from India, 2006). Spoligotype 1*, as represented by strain NCPPB 3608 from India (1988) and which contains all 23 ancestral spacers, can be considered as the founder of a distinct clade which is characterized by the acquisition of an IS element between spacers Xcc_20 and Xcc_21. Genomic data indicate that strains LG115 (India, 2007), LB302 (Florida, USA, 2002) and LB305 (Florida, USA, 2003), corresponding to spoligotype 2*, are descendants of a spoligotype–1* strain that underwent a deletion of spacer Xcc_14 [18]. Therefore and because of their geographic separation it is likely that the deletion of spacer Xcc_14 in spoligotypes 2* and 4 were independent events; hence, effects of homoplasy need to be considered when drawing conclusions from spoligotyping. Nevertheless, we conclude that CRISPR elements provide a new and useful framework for the genealogy of the citrus canker pathogen X. citri pv. citri.