The sequencing of the eight zucchini varieties resulted in an average of 40 million raw reads with a minimum value of 28,804,339 reads in “Nano verde di Milano” (NV) and a maximum value of 52,915,127 reads in “Whitaker”. The average mapping rate of the trimmed and filtered reads ranged from 98.7–90.54%. The higher mapping value was provided by “Lungo Bianco di Sicilia” (BDS) and the lower value by “Striato d'Italia” (SI). We identified an average of 1.3 million of raw variants, including SNPs and IN/DEL, in each sample, and 4.7 million raw variants considering the eight varieties all together. After filtering, the number of SNPs was reduced to an average of 485,120 per sample and 3,831,907 in the all-varieties dataset and the number of IN/DEL was reduced to an average of 50,968 per sample and 387,243 in the all-varieties dataset (Table S1). The nucleotide changes identified in our analysis were classified based on their presence in the population. Specifically, 150,217 variants (141,714 SNPs, 4,596 insertions and 3,907 deletions) resulted common to 95% of individuals. Additionally, 2,111,951 variants (1,915,792 SNPs, 96,602 insertions and 99,557 deletions) were observed in the 10 to 95% of individuals. Lastly, the 10% of the individuals was affected by 1,302,967 unique variants (1,180,581 SNPs, 56,966 insertions and 65,420 deletions).
By comparing the distribution of variants along the chromosomes (Figure S1) we inferred the different density of variants between varieties. “Whitaker” showed the higher frequency, while 381e and 968Rb showed the lowest number of variants. The higher frequency observed in “Whitaker” may due by the higher number of mapped reads and to its genetic background [30]. In general, the distribution of variants found along the different chromosomes was homogenous. (Fig. 1; Figure S1). The average read depth across the eight varieties was 29.7, with “Whitaker” exhibiting the lowest value (22.3) and 968Rb the highest (45.6) (Fig. 1, Table S1).
Population structure and genetic variability
In order to infer the genetic structure of the population an admixture-based clustering analysis was performed. Two suitable subpopulation structures (K = 4 and K = 3) were obtained (Fig. 2). The structure based on four subgroups showed a first cluster, including BDS, SPQ and SI and a second cluster composed of 381e and 968Rb and NV, showing mixed background, “Whitaker” and OF remained ungrouped (Fig. 2A). Applying a value of K = 3, OF was merged with the first subgroup (BDS, SPQ and SI) while NV was included within the 381e and 968Rb subgroup (Fig. 2B).
PCA built up on genotypic data displayed that the varieties were divided into three major clusters according to the morphological characteristics. The first two principal components (PC1 and PC2), explained 38% and 18.4% of the total variance, respectively (Fig. 3A). The clustering obtained through PCA is in accordance with the results obtained in the previously admixture-based analysis. Zucchini varieties clustered together, NV clustered with zucchini varieties, close also to the Cocozelle group. In the latter group, SI and BDS are located very close and OF was located at bottom right side. “Whitaker” was on the opposite quadrant close to the left horizontal axis.
The consensus phylogenetic tree (Fig. 3B) confirmed that the varieties were organized in two clusters belonging to Cocozelle and Zucchini group and that “Whitaker” was separated from the rest of the accessions.
The association of alleles at different loci was estimate using LD decay (Fig. 4). Appling the test to the whole varieties collection, LD was low (r2 < 0.3), approximately 60 kb. Hereafter, LD decay was calculated also separately for Zucchini and Cocozelle group, showing to be variable among the different groups. Overall, the results showed a slightly faster decay in Cocozelle group in comparison with Zucchini group. Within groups, the LD decay of Zucchini group was r2 = 0.6, 100kbp, it was lower for Cocozelle group (60kbp; r2 = 0.28) (Fig. 4).
Private SNPs identification and functional annotation
High quality homozygous and heterozygous alternative sites, differing from the homozygous reference allele were identified in all varieties. This resulted in 1.18 million unique SNPs across the eight varieties, although 815 thousand of these private sites were identified in “Whitaker” (Table 1). A SnpEff analysis was performed on the variants in all eight varieties together (all-varieties variant calling set) to identify a set of genetic variants having a potential biological effect. The majority of potential functional SNPs displayed a modifier impact, followed by the low impact, moderate impact and high impact (Table 1). In our study, we identified a total of 49,828 high-effect SNPs that are specific to only one variety, in total, 983 genes have been significantly affected by these SNPs across all eight varieties examined (Table S2). These SNPs may play a crucial role in the observed phenotypic differences among the different varieties. Each zucchini variety had its own set of distinct genetic differences in comparison to the others. “Whitaker” and 381e stood out as they possessed the highest number of unique or private SNPs. In the case of “Whitaker”, this is likely due to its derivation from multiple parental sources, which could account for the increased genetic diversity and unique SNPs. Additionally, 381e possesses introgressions linked to its resistance against ZYMV, where three associated QTLs (SNP1, SNP2, and SNP3) were identified [31, 32]. This finding could explain the abundance of private SNPs observed in this variety. We observed an uneven distribution of unique SNP proportions across different chromosomes, with a notable higher proportion of unique SNPs in variety 381e on chromosomes 1, 8, and 18. Also, 968Rb possesses an introgression associated with PM resistance [33] and exhibits a higher proportion of unique SNPs on chromosome 10 (Table S3). This finding suggested that 381e and 968Rb genomes hold specific regions with higher diversity, potentially related with their ZYMV and PM resistances respectively.
Table 1
Unique SNPs in the eight varieties. BDS (Lungo Bianco di Sicilia), NV (Nano Verde di Milano), SPQ (San Pasquale), OF (Ortolano di Faenza), SI (Striato di Italia), “Whitaker”, 381E and 968Rb.
| BDS | NV | SPQ | OF | SI | WHITAKER | 381E | 968Rb |
Nº of SNPs | 52,928 | 20,129 | 39,323 | 70,163 | 47,207 | 815,131 | 128,336 | 7,364 |
% high effect SNPs | 0.043 | 0.037 | 0.063 | 0.051 | 0.036 | 0.04 | 0.048 | 0.039 |
% low effect SNPs | 2.085 | 2.207 | 2.15 | 2.144 | 2.015 | 2.31 | 3.049 | 1.798 |
% moderate effect SNPs | 1.42 | 1.55 | 1.489 | 1.514 | 1.337 | 1.531 | 1.779 | 1.223 |
% modifier effect SNPs | 96.452 | 96.206 | 96.298 | 96.291 | 96.612 | 96.119 | 95.124 | 96.939 |
Genes involved in plant and fruit morphological traits
As a result of the SNP calling performed for the eight varieties, a multitude of variants related to the morphological traits were discovered. All genotypes tested showed a bush growth habit phenotype, except BDS and NV that exhibited a semi-trailing growth habit. Gene candidates for the dwarf phenotype (Table S4), involved to the gibberellin metabolic pathway were assesses to identify variants. Interestingly, the C. pepo gene Cp4.1LG10g05910, involved in the dwarf growth habit [34] not exhibited variants either in BDS than in NV. By contrast, variants in Cp4.1LG12g07890, an orthologue of a gibberellin 3-beta-dioxygenase 2-like gene related to the dwarf phenotype in C. moschata [35] were found only in BDS and NV. Likewise, Cp4.1LG18g03150, annotated as gibberellin regulated protein had variants both in BDS and NV, suggesting that the semi-trailing habitus could be related to mutated genes in the gibberellin pathway. In addition, variants in 6 genes related to the expansin family were identified in the same genotypes (Cp4.1LG02g13490, Cp4.1LG04g14850, Cp4.1LG04g04730 and Cp4.1LG15g06270). Finally, two SNPs in the purine permease 3 and nitrate regulatory gene 2 protein-like (Cp4.1LG11g04650 and Cp4.1LG11g08000), reported as related to growth habit traits [36, 37] were also observed in semi-trailing genotypes. Four different orthologues genes, described as involved in leaf shape, showed SNPs with modifier effect only in the medium lobed leaf genotypes (BDS, NV and SPQ). While the deeply lobed leaf genotypes OF, “Whitaker”, SI, 381e and 968Rb showed the same allele as the reference genome MU-CU-16, classified as deeply incised leaves [38]. Variants in BDS, NV, and SPQ were observed in key genes associated with leaf development. Specifically, in the class I KNOTTED1-like homeobox proteins (Cp4.1LG02g06180), involved in regulation of the leaf shape in tomato [39], and in Cp4.1LG07g06690 and Cp4.1LG09g01450, Homeobox-leucine zipper proteins, transcription factors related to deeply lobed leaf structures [40]. Additionally, the AP2-like ethylene-responsive transcription factor ANT (Cp4.1LG11g01170), involved in palmately lobed leaf traits [41], exhibited variations (Table S4).
The eight varieties clearly showed phenotypic variability in fruit shape. The four Cocozelle varieties (SI, BDS, SPQ, OF) displayed variants with moderate or modifier effect in six genes encoding a WUSCHEL-related homeobox like, involved in the LOCULE NUMBER (LC) trait [14, 42] (Table S4). Likewise, five genes annotated as calmodulin-binding protein showed SNPs with moderate modifier effect (Table S4). SNPs with modifier effect have also been observed, in these four accessions for genes involved in fruit morphology, such as receptor protein kinase CLAVATA1-like (Cp4.1LG02g08230) [43]; protein LONGIFOLIA 1-like (Cp4.1LG04g14520 and Cp4.1LG20g00830) [13, 44]; OVATE FAMILY PROTEINS (OFPs) (Cp4.1LG19g08040) [45] and E3 ubiquitin-protein ligase RING1-like (Cp4.1LG11g11400) [12] (Table S4).
The fruit color is another important factor involved in fruit quality. Seven of the eight varieties have the same rind color, being BDS the only white variety, which allowed us to search for SNPs with for alternative allele respect the reference genome, (MU-CU-16, green rind). By filtering for SNPs with high and moderate effect, we found BDS variants in two genes related to the carotenoids pathway (Cp4.1LG03g0478 and Cp4.1LG05g11360) and in two genes related to the chlorophyll pathway (Cp4.1LG04g08850 and Cp4.1LG13g11170) [46]. In addition, three genes previously reported as up-regulated during the synthesis of carotenoids in C. pepo with orange and yellow peel [47] such as the transcription factor bHLH82-like (Cp4.1LG04g09000), the two-component response regulator-like APRR2 (Cp4.1LG05g02070) and the transcription termination factor MTERF4 (Cp4.1LG07g08650) showed variants in BDS. Furthermore, BDS variants with high or moderate effects were found in Cp4.1LG05g01000; Cp4.1LG05g10310; Cp4.1LG14g02760 and Cp4.1LG14g02300 (Table S4).
Variants in genes involved in biotic stress
Among the zucchini varieties analyzed in this study, there were two isogenic lines to TF, 968Rb and 381e, one resistant to powdery mildew (PM) and one tolerant to ZYMV, respectively, as well the variety SPQ reported as susceptible to both diseases [32]. In order to identify putative chromosomal regions harboring genes responsible for the different response to the fungus and virus, the variants distribution of the three genotypes was compared along the zucchini genome. A non-uniform pattern of SNP and IN/DEL distribution was observed for some chromosomes, especially for the chromosome 1, the chromosome 8 and the chromosome 10.
A higher variants density was identified in 968Rb, in a region ranging from 1.0 to 2.8 Mb at the beginning of the chromosome 10 (Fig. 5A). By analyzing in the detail this region, we found putative genes related to biotic stress affected by variants. Many genes with high or moderate effect were identified related to the receptor kinase, TF, cell wall metabolism etc. (Fig. 5B). Genes specifically affected in 968Rb included: a MAP3K5-like isoform X1 (Cp4.1LG10g02640) with an IN/DEL with disruptive inframe deletion, the dof zinc finger protein DOF1.4 (Cp4.1LG10g04940), the serine/threonine-protein kinase-like protein CCR1 (Cp4.1LG10g04780). In addition, a moderate effect IN/DEL influenced the 968Rb peroxidase (Cp4.1LG10g04730), located at the beginning of the chromosome 10 (at 1192530 Mb), implicated in defense to fungus. Among the cell wall related genes, a pectate lyase (Cp4.1LG10g04500) was altered by two IN/DELs with modifier effect. Interestingly, genes belonging to a cluster composed of TFs and resistance proteins were affected by variants (Fig. 5B, blue inset and table). In addition, 968Rb and SPQ showed the presence of variant with modifier effects in the susceptibility gene, MLO gene (Cp4.1LG10g05200, CpeMLO11), but in a different location. The Cp4.1LG10g03270 WAT1-related protein, Cp4.1LG10g03390 and Cp4.1LG10g03520 showed IN/DELs with moderate effect in 968Rb and in SPQ but not in 381e. The receptor kinase Cp4.1LG10g01720 and the kinase Cp4.1LG10g12160 showed in the three genotypes moderate and modifier variations with different location.
A relevant difference between the variant distribution in 381e and SPQ and n 968Rb was also identified on the chromosome 8. For what concerned the SNPs, the region ranging from 4.0 to 4.2 Mb, and the region around the 6 Mb showed a higher density in 381e. On the other hand, SPQ showed a high density of SNPs around 3 Mb (Fig. 6A, 6B).
Among the genes affected by variations, some are putatively related to biotic stress being involved in cell wall metabolism, small RNA mediated resistance, transcription, regulation of transcription, RNA processing, chromatin remodeling, and translation repression (Fig. 6B). The gene Cp4.1LG08g00040, harboring the LG08SNP4 reported by Amoroso et al [28], was affected in both 381e and SPQ by mutations, as well as the gene Cp4.1LG08g01460, containing LG08SNP1 [28].
On the chromosome 1, two regions with SNP density differing among the genotypes were identified. A region around 17-17.7 Mb showed a higher number of variants in 381e while a region around 6-6.4 Mb had a higher number of variants in SPQ (Fig. 7). In the region around 17 Mb, (Fig. 7A, red inset, Fig. 7B, zoom of the red inset), were highlighted genes affected by mutations only in 381e such as the N-acetyltransferase domain-containing protein (Cp4.1LG01g24380), the Histone acetyltransferase (Cp4.1LG01g20150), the leukotriene A-4 hydrolase homolog (Cp4.1LG01g24170.1), the protein RNA-directed DNA methylation 3-like (Cp4.1LG01g24630) and RNA binding proteins (Cp4.1LG01g19740.1, Cp4.1LG01g20260). In addition, Cp4.1LG01g20160 harbored high effect mutations in 381e as well as modifier mutations in all genotypes (16 in 381e and 5 in SPQ and 10 in 968Rb). The LG01SNP1, associated to ZYMV resistance [28] was located in the gene Cp4.1LG01g18580, that showed 42 modifier mutations in 381e. Near this gene there were two RNA polymerases (Cp4.1LG01g18460 and Cp4.1LG01g19120) with moderate and high variants in 381e.
By focusing on the region around 6 Mb of the chromosome 1 (Fig. 7A, blue inset), only SPQ showed a SNP peak (Fig. 7C, zoom of blue inset), including a cluster of four cucumisin-like genes (Cp4.1LG01g10880, Cp4.1LG01g10970, Cp4.1LG01g11000 and Cp4.1LG01g10900) with a series of moderate and modifiers mutations only in this genotype. In this area, among the others, the Cytochrome P450 (Cp4.1LG01g11290) was affected by a high number of variations.