A collection of 832 pathogen strains sampled from 12 wheat cultivars formed a single population
We assembled a collection of 1924 Z. tritici isolates coming from time points in May (C1) and July (C3) during a natural epidemic on winter wheat cultivars from the GABI-panel planted in 2016 in Switzerland30 (Fig. 1a,b; Extended data Fig. 1; Extended data Fig. 2; Supplementary Data 1). We used 12 microsatellite markers to assess the genetic diversity of the collection and identify candidate isolates for whole genome sequencing. We identified 149 alleles in total, with an average of 12.5 alleles per locus (Supplementary Data 2-4)30. We found 1588 multi-locus genotypes (MLGs), of which 85.34% (1356) were found only once and 14.66% (232) at least two times (Extended data Fig. 3a; Supplementary Data 5). The May and July collections showed a high and nearly identical genotype richness and a low abundance of clones, with a Simpson index (lambda) of 0.998 and 0.999 and a genotype abundance (E) of 0.907 and 0.908, respectively (Supplementary Data 5). For collections from each wheat cultivar, the genotype abundance ranged from 0.85 (on Simano) to 0.97 (on Aubusson) (Supplementary Data 6). The high number of unique genotypes found during one epidemic at the field scale reflects the high genetic diversity of Z. tritici described in wheat fields31.
We performed an analysis of molecular variance (AMOVA) to search for genetic differentiation between collection time points and hosts after clone correction and filtering out genotypes with too much missing data and with rare alleles (MAF < 5%). We observed less than 1% of the molecular variance between collection time points (0.2% of variation) and between hosts within collection time points (0.3% of variation) (Extended data Fig. 3b; Supplementary Data 7). The molecular variance increased to 4.4% between samples within hosts, but most molecular variance (95%) was found between samples (Supplementary Data 7). After multi-locus clone correction, we sequenced 845 strains sampled from 12 wheat cultivars exhibiting a wide range of quantitative resistance (Supplementary Data 1; Extended data Fig. 2). We filtered out the low-quality samples and clones undetected with the SSR markers to obtain 5,551,288 high-confidence variants, including SNPs and short indels for 832 samples (Extended data Fig. 3c,d). A principal component analysis of 981,623 biallelic SNPs confirmed that the 832 Z. tritici genotypes from this single-field population belong to a single genetic cluster (Fig. 1c,d). We observed a few strains from the cultivars CH Camedo (n = 11, 16% of the CH Camedo collection) and Arobase (n = 7, 10% of the Arobase collection) that were slightly separated from the main cluster. These strains were collected from different plots and collection time points and could not be linked to meaningful population structure (Fig. 1d). Together, the population diversity and structure assessment of Z. tritici strains from our experimental field revealed a single genetic cluster that was stable throughout the epidemic, in agreement with previous assessments of Z. tritici population structure at small geographical scales32. This lack of genetic structure is consistent with the prior description of a single genetic cluster covering most European Z. tritici genotypes, including Switzerland16. The absence of population stratification enhances the statistical power of GWAS by removing confounding effects and the introduction of false positives, which is especially important for the study of complex traits such as host specialization7.
Using natural infection phenotypes to identify candidate genes for host specialization using GWAS
To better understand the genetic landscape of Z. tritici host specificity, we performed a genome-host association (GHA), using GWAS to identify genetic variants associated with specific wheat cultivars. Briefly, our approach is as follows: for each of the 12 cultivars, we statistically compared allele frequencies in Z. tritici strains isolated from a focal cultivar with a random subsample of Z. tritici strains isolated from the other 11 cultivars. Our assumption was that alleles important for infection in the focal cultivar would be enriched in the strains collected on this cultivar when compared to the overall Z. tritici population in the same field. We corrected for unbalanced group sizes with a bootstrapping approach33,34, i.e., the random subsampling was done 100 times and each GWAS was performed with a linear mixed model35, on variants with minor allele frequency higher than 5% and with less than 20% missing data (Fig. 2a). This resulted in 1200 GWAS tests based on from 928,182 to 978,139 variants (Supplementary Data 8). As a proof-of-concept, we included the cultivar Runal in our 12-host panel. In Runal, quantitative resistance to Swiss strains was already demonstrated to involve a gene-for-gene interaction, including one avirulence factor and a major resistance gene20,36,37. The GHA performed using Z. tritici strains sampled from Runal identified one significantly associated locus that encompassed the recently reported avirulence gene Avr3D1 (Extended data Fig. 4) 20,36,37. This finding illustrated that our GHA approach identifies genes involved in host specialization. We note that the AvrStb6 locus was not significantly associated with any cultivar even though we included Arina, which is known to carry the Stb6 and Stb15 resistance genes 17,19. As our approach relies on an allelic enrichment on a particular host, this suggests the Stb6 resistance gene is widely distributed across our panel of wheat cultivars, as already shown for many wheat varieties and landraces5,38. This illustrates a possible limitation of our approach, which could be avoided by using a tailored experimental design that includes cultivars known to possess specific resistance genes.
We identified 398 single nucleotide polymorphism (SNP) variants associated with 12 different wheat cultivars. The number of variants associated with host specialization on the other 11 cultivars ranged from 13 (Forel) to 88 (CH Camedo) after Bonferroni correction (α = 0.05)(Fig. 2b,c, Extended data Fig. 5, Extended data Fig. 6, Supplementary Data 9). Only one SNP was significantly associated with the cultivar CH Claro, but since it barely reached the Bonferroni correction threshold, we did not consider this locus further (Extended data Fig. 6). We identified 158 genes with more than one overlapping or neighboring significantly associated SNP, including 73 genes containing more than two significant SNPs in their sequences (Supplementary Data 9). Forty-eight of these genes (65%) were annotated to encode diverse functions, including functions that would be expected to affect host-specificity, such as predicted effectors (4 genes), transcription factors (3 genes), plant cell wall degrading enzymes and peptidases (4 genes) and proteins containing kinase-domains (3 genes) expected to affect regulatory processes39. We counted a minimum of two genes specifically associated with CH Combin to a maximum of 20 genes specifically associated with CH Camedo (Supplementary Data 9). These findings suggest that our GHA identified a wide array of functions likely to be associated with host specialization under natural field conditions, which is consistent with the complex genetic architecture of host specialization in the Z. tritici-wheat pathosystem10,36,40. In addition, these results illustrate that molecular plant-microbe interactions extend far beyond the canonical gene-for-gene model of effector-triggered immune responses leading to host resistance41,42 and likely involve a wide range of molecular functions that contribute to promoting pathogen infection and adaptation to a specific host environment1,2,43,44. In plants, GWAS has confirmed the polygenic and dynamic genetic basis of disease resistance7. Here, we depict a similar picture of a polygenic host adaptation that occurs in pathogens.
Functional validation of two candidate genes involved in host specialization
Next, we describe more fully the five most significant loci identified in our GHA analyses. The complete list of significant associations and corresponding effects are compiled in Supplementary Data 9. The top five associations were found on chromosome 1 for Simano and Arina, chromosome 2 for CH Combin, and chromosome 9 for Aubusson and Arobase. Three out of these five loci overlap with genes of unknown functions. The Aubusson-associated locus found on chromosome 9 encompasses three overlapping annotated genes (i.e., ZtIPO323_098020, ZtIPO323_098030, and ZtIPO323_098040; Extended data Fig. 6b, Supplementary Data 9). Interestingly, one gene in the same region (<1kbp downstream of our locus) was previously identified by a GWAS approach to be associated with necrotic lesions on the cultivar Salamouni10, suggesting this region is important for Z. tritici virulence. The cultivar Salamouni contains the three major resistance genes: Stb13, Stb14, and StbSm317. Although we do not know which major STB resistance genes may be present in Aubusson, we speculate that at least one of these genes could be linked to virulence for either or both of these cultivars. The two other strong associations were found in genes with predicted functions related to signaling functions, potentially underlying mechanisms related to response to host quantitative resistance. The second most significant locus, found on chromosome 1 for cultivar Arina, pinpoints a gene encoding a bZIP transcription factor (Extended data Fig. 5f, Supplementary Data 9). Transcription factors can be involved in pathogenicity and host-specific infection mechanisms, including regulation of growth and virulence in Z. tritici 45. A bZIP transcription factor of the vascular wilt pathogen Verticillium dahliae was shown to be crucial for pathogenicity by mediating the regulation of nitric oxide46. Nitric oxide is produced in plants in response to infection and is known to be involved in defense signaling and plant cell death mechanisms 47. This finding, combined with other signaling-related genes such as the predicted MAP kinases we found significantly associated with different host phenotypes, highlights the importance of signaling networks in Z. tritici infections, which is concordant with transcriptomics analyses of Z. tritici-wheat interactions48. Altogether, these findings indicate that our novel GWAS approach provides an effective tool for identifying novel candidate genes involved in host specialization.
We conducted functional studies for two of the top five candidate genes for host specialization, focusing on Arobase, one of the most susceptible cultivars, and Simano, one of the most resistant cultivars in our panel (Fig. 3, Fig 4). The GWA mapping using Arobase strains revealed a strong host-specific association in one gene, ZtIPO323_10370, that encodes a 740 amino acid protein of unknown function (Fig. 3a, b, Supplementary Data 9). The frequency distribution of the most significant SNPs of ZtIPO323_10370 shows an enrichment of the alternative allele in strains from Arobase (present in 85% of the strains - Figure 3c). The GHA mapping using Simano strains identified a predicted effector gene (ZtIPO323_003260; Fig. 4a, b, Supplementary Data 9). The alternative allele of ZtIPO323_003260 was enriched in strains infecting Simano (present in 96% of the strains – Fig. 4c). In both cases, we found the enriched alternative allele was also present in strains found on other wheat cultivars, supporting the hypothesis that these alleles may be generally advantageous for pathogenicity while being especially advantageous on specific hosts. To functionally test the importance of these candidate genes, we generated knockout (KO) lines by targeted gene disruption in the virulent reference strain ST99CH_1A5. Strain ST99CH_1A5 carries the alternative alleles of ZtIPO323_10370 (i.e., Zt1A5_G9462 in ST99CH_1A5) and ZtIPO323_003260 (i.e., Zt1A5_G299 in ST99CH_1A5). We phenotyped the ST99CH_1A5ΔG9462 and ST99CH_1A5ΔG299 KO lines in planta by comparing their virulence (measured as percentage of leaf area covered by lesions, PLACL) on Arobase and Simano, respectively, with their virulence on the susceptible cultivar Drifter, which was not included in our host panel (Fig. 3d,e; Fig. 4d,e). The ST99CH_1A5ΔG9462 lines (n = 2 independent KO) displayed a significant decrease in virulence on Arobase (median PLACL of 14% and 2%) compared to the wild-type strain (median PLACL of 55%) after 21 days post-infection (Fig. 3d, e). On Drifter, the median PLACL reached 70% to 80% on leaves of the two independent ST99CH_1A5ΔG9462 lines, while the wild type had a median PLACL of 92% on the same cultivar. Similarly, the ST99CH_1A5ΔG299 lines (n = 2 independent KO, median PLACL of 23% and 15%) presented a lower PLACL on Simano at 21 days post-infection compared to the wild-type strain (median PLACL of 92%)( Fig. 4d, e). For the two independent ST99CH_1A5ΔG299 lines, the median PLACLs on Drifter were estimated at 74% and 92%, while the wild type showed 93% PLACL. The KO lines showed a generally higher or non-significantly different in vitro growth compared to the wild types (Supplementary information 2, Supplementary Note 1). Taken together, these results indicate that both of the tested candidate genes affect host specialization, with a potential involvement in effector-triggered susceptibility2. We consider it notable that the Arobase pathogenicity-related gene ZtIPO323_10370 had an unknown function and does not carry a predicted secretion signal that is needed to be defined as an effector3,49. Recent studies in the hemibiotroph Pyricularia oryzae have reported an unconventional secretion system that depends on codon usage and tRNA modification for the delivery of cytoplasmic effectors independently of the endoplasmic reticulum-Golgi pathway 2,50. Though this mechanism has yet to be described in Z. tritici, we did find orthologous genes corresponding to the Uba4–Urm1 system of P. oryzae 50 in the functional annotations of IPO323 (i.e. ZtIPO323_075920 and ZtIPO323_028380, respectively)51. This could suggest a similar unconventional secretion of cytoplasmic effectors in Z. tritici, which may partially explain why many non-secreted protein genes of unknown functions can be found in different Z. tritici-wheat association studies8,10. More in-depth functional characterization of these two genes will be needed to better understand how they promote host-specific Z. tritici infection, but the functional validation of these candidate genes illustrates the power of our GHA approach to identify genes involved in host specialization.