The gf trait in watermelon is controlled by a simply inherited gene
The carotenoid composition and content in the mature flesh of COS and PI 192938 were analyzed by liquid chromatography-tandem mass spectrometry (LC-MS/MS) in our previous study (Fang et al. 2020). β-carotene and violaxanthin appeared to be the two pigments showing major differences in content between the two parental materials, and the contents in PI 192938 (16.133 ± 0.952 µg/g) were approximately 53.2-fold higher than those in COS (0.033 ± 0.004 µg/g). Based on the flesh color associated with the two pigments, we speculated that the high accumulation of β-carotene in PI 192938 may be the main reason for the gf trait (Fig. 1a). The flesh color of F1 was canary yellow, similar to that of COS, and four flesh categories were segregated in the F2 generation: gf, pale yellow, canary yellow and gf mixed with canary yellow (Fig. 1b). F2 individuals can also be divided into gf and non-gf groups (pale yellow, canary yellow and gf mixed with canary yellow, Fig. 1c to e). According to these classification criteria, the COS×PI 192938-F2 population in 2018 (84 individuals in total) consisted of 17 gf and 67 non-gf plants and exhibited a 1:3 genetic ratio (χ2 = 1.016, p = 0.313), whereas in 2019, 279 F2 individuals consisted of 74 gf and 205 non-gf plants, which was also consistent with the 1:3 genetic ratio (χ2 = 0.345, p = 0.557). For generation of a backcross population between F1 and COS, none of the individuals exhibited the gf color. In the backcross population derived from F1 and PI 192938, 39 plants had the gf color, and 45 plants had a non-gf color, which corresponded to a ratio of 1:1 (χ2 = 0.429, p = 0.513, Table 1). Based on the above results, we conclude that the gf trait in watermelon flesh is controlled by a simply inherited gene and that pale yellow flesh is partly dominant to the gf color.
BSA-seq and recombinant deletion of the gf locus in a 39‑Kb region identified ClPsy1 as a candidate gene
After filtering low-quality and short reads, 125,549,810 and 60,739,330 clean read pairs were obtained for COS and PI 192938 with approximately 10.01 (28.76× depth coverage) and 8.56 (22.87× depth coverage) Gbp clean bases, respectively, and Q30 values above 89.89% were found. A total of 86.50% and 92.48% of these clean reads of COS and PI 192938, respectively, were successfully mapped to the reference genome, and 68,585,634 and 68,541,696 clean read pairs were generated from the gf pool (25.7× depth coverage and 92.35% properly mapped ratio) and pale-yellow flesh pool (25.88× depth coverage and 93.16% properly mapped ratio), respectively, through the Illumina high-throughput sequencing platform. A total of 366,358 and 374,417 SNPs were identified between the reference genome and the two gene pools, respectively. The SNP index for each identified SNP was calculated, and the average SNP index was computed in a 1-Mb interval using a 10-Kb sliding window. By combining the SNP index information from the gf color pool and pale-yellow flesh pool, the ΔSNP index was calculated and plotted against the genome positions. According to the ΔSNP index value, an obvious signal related to the gf color was detected on chromosome 1, spanning approximately 2.99 Mb (from 8,912,000 bp to 11,900,000 bp, Fig. 2a).
A total of 30 CAPS and two KASP markers evenly distributed in BSA-seq chromosome segments were developed based on parental line resequencing data, and 10 markers (eight CAPS markers and two KASP markers) were used for initial mapping after polymorphism detection among COS, PI 192938 and their F1 generation. Individuals with a recessive phenotype (gf trait) from 2019 were selected for genotyping with the 10 polymorphic markers. The candidate region was narrowed to a physical distance of 290.214 Kb (from 9,272,322 bp to 9,562,536 bp) using 11 recessive-trait plants (including nine recombinants) between the CAPS markers Chr01_9272322 and Chr01_9562536 with one and two recombinants (Fig. 2b). To further narrow down the initial mapping region precisely, a larger COS×PI 192938-F2 segregating population including 1,003 individuals was subjected to genotyping of the primary flanking markers Chr01_9242322 and Chr01_9562536 in the spring of 2020. A total of 20 recombinants were screened for further fine mapping of the gf gene. Another nine polymorphic markers were developed to genotype the 20 recombinants. The target trait genotype of the dormant recombinants was confirmed based on phenotypic segregation in their F3 families. Finally, the gf locus was delimited between the CAPS markers Chr01_9440282 and Chr01_9479366 (physical distance of approximately 39.08 Kb) with two and nine recombinants, respectively (Fig. 2c).
According to the watermelon reference genome, the 39.08-Kb region contained only two annotated candidate genes, Cla97C01G008760 and Cla97C01G008770. Cla97C01G008760 encodes a phytoene synthase protein (ClPsy1), and Cla97C01G008770 was annotated as a GATA zinc finger domain-containing protein. To identify the candidate gene for the gf locus, we first analyzed the genomic variations in the two candidate genes between the parental lines with resequencing data. The results identified no polymorphic sites in Cla97C01G008770, whereas one nonsynonymous SNP mutation, SNP9,448,870 (A→G, located in the first exon at the 9,448,870th bp position), was detected in the coding region of Cla97C01G008760 between COS and PI 192938. In COS, base A encodes glutamic acid (Glu), whereas in PI 192938, this base is mutated to base G, resulting in an amino acid change from Glu to lysine (Lys). To further confirm the sequence variation, we cloned the coding regions of the two candidate genes in COS and PI 192938. This SNP mutation was still found between the two parental lines. We further developed this nonsynonymous SNP into the KASP marker Chr01_9448870 and genotyped F2 individuals from 2018 to 2019. As a result, ClPsy1A:A exhibited the gf color, whereas ClPsy1G:A/G:G showed a non-gf color, which indicated that Chr01_9448870 cosegregated with the phenotype in all the plants (Fig. 3a).
Although no variation in the Cla97C01G008770 gene sequence was found between COS and PI 192938, we found some polymorphic sites in the promoter region. To further confirm this hypothesis, we analyzed the gene expression patterns of the two candidate genes in COS and PI 192938 flesh tissues collected from different stages of flesh color formation. The results showed that the two parental lines exhibited similar expression trends across the five developmental stages (10, 18, 26, 34 and 42 DAP), and no significant difference in Cla97C01G008770 was found (Fig. 3b). For Cla97C01G008760 (ClPsy1), the two parental lines also showed similar expression patterns (the expression level was upregulated gradually during flesh maturation), but at 26 DAP, the allele of ClPsy1 in PI 192938 exhibited significantly higher expression than that in COS, and this increase continued to be observed until 42 DPA (mature stage, Fig. 3c). Our previous research also showed that 26 DAP may be an important developmental stage for flesh color formation (Fang et al., 2020). At this stage, the colored COS and PI 192938 watermelon flesh started to abundantly accumulate carotenoids. Hence, we hypothesized that ClPsy1 is the most likely candidate gene for the gf locus and is responsible for high β-carotene accumulation in watermelon.
Nucleotide variation in the ClPsy1 gene structure among natural watermelon accessions
To examine the allelic diversity of the ClPsy1 gene in natural watermelon groups, we examined the nucleotide variation of the ClPsy1 locus in 26 resequenced accessions with different flesh colors (red, orange, canary yellow, pale yellow, light green and white), including 18 C. lanatus, four C. mucosospermus and four C. amarus accessions (Fig. 4). SNP9,448,870 was still present, but this mutation was not correlated with flesh color among the different watermelon accessions and exhibited no obvious difference between the cultivated and wild-type watermelon groups. These results indicated that this site may not affect carotenoid accumulation. In addition to SNP9,448,870, another nonsynonymous SNP mutation, SNP9,448,438 (C→T, located in the first exon at the 9,448,438th bp position), was also detected. Interestingly, SNP9,448,438 existed only in the C. amarus group, resulting in an amino acid substitution from proline (Pro in the C. lanatus and C. mucosospermus groups) to serine (Ser in the C. amarus group).
To analyze the reason for the low ClPsy1 expression in COS, we cloned the 1,996-bp promoter sequence from four cultivated watermelon varieties: COS, PI 192938, LSW-177 (red flesh) and PI 635597 (canary yellow). The promoter sequences of 14 other cultivated watermelon accessions were extracted from their genome resequencing data. Interestingly, a total of six SNPs (SNP342, SNP598, SNP898, SNP1,257, SNP1,634 and SNP1,694) were detected in the COS promoter region compared with the other 17 watermelon accessions, which exhibited consistent promoter sequences. SNP598 and SNP1,257 were located in the MYC- and MYB-binding sites, respectively, whereas the other four SNPs were not located in the sequence of any cis-acting element (Fig. 5a).
MYB and MYC2 may be important transcription factors regulating the expression level of ClPsy1
We then used the RNA-seq data of COS and PI 192938 flesh tissues (collected at 18, 26 and 42 DAP, data not shown in this manuscript) to obtain an overview of the expression patterns of all MYB and MYC TFs. A total of 65 MYB TFs with read per kilobase per million mapped reads (RPKM) values were detected, and only two MYB TFs (Cla97C10G196920 and Cla97C02G046390) exhibited expression tendencies similar to that of ClPsy1 in COS and PI 192938. An obvious significant difference in expression began to be observed at 26 DAP in PI 192938 compared with COS and continued to be observed at 42 DAP (mature stage). For all 22 MYC TFs, Cla97C06G112130 (annotated as a MYC2 transcription factor) also showed an obvious significant difference in expression between COS and PI 192938 at 26 DAP. We further examined the expression levels of Cla97C10G196920, Cla97C02G046390 and Cla97C06G112130 in COS and PI 192938 flesh tissues collected at five developmental stages by qRT-PCR to verify the expression pattern. The results showed the same tendency as the RNA-seq data (Fig. 5b to d). LSW-177 was a red flesh-colored watermelon accession with the same genotype in the gene sequence and promoter region as PI 192938. The RNA-seq data between COS and LSW-177 (red flesh) flesh tissues (BioProject number PRJNA338036) were also used for analyzing the expression levels of Cla97C10G196920, Cla97C02G046390 and Cla97C06G112130. The results showed that the three TFs also presented a higher expression level in LSW-177 than in COS (Fig. 8a to c). In watermelon fruit rinds, the expression levels of the three TFs were also clearly lower than those in the flesh of 97103 (red flesh), which indicated that these TFs may be expressed in tissues with high carotenoid accumulation according to the RNA-seq data of SRP012849 (Fig. 8d to f).
The conserved domains of Cla97C10G196920 (148 aa) and Cla97C02G046390 (110 aa) were extracted and compared to the Arabidopsis Information Resource (TAIR). The results showed that AtMYB21/AtMYB3, AtMYB24, AtMYB57, AtMYB59, and AtMYB48 were the first three (or two) homologs. Cla97C06G112130 has two conserved domains: the N-terminus of the bHLH-MYC and R2R3-MYB TFs and the N-terminus of a family of MYB and MYC transcription factors (156 aa). The other superfamily is the bHLH domain superfamily (70 aa), and AtMYC2 exhibited the highest homology. We speculated that ClMYB and ClMYC2 may be two important TFs regulating ClPsy1 expression due to variations in the binding sites in their promoter region between COS and PI 192938. Although MYB and MYC TFs have many functions, they have not been reported to play a role in flesh color formation in watermelon.
The gene expression and genotype variations in carotenoid pathway genes between COS and PI 192938 provide insight into gf trait formation in watermelon flesh
We examined the transcript abundances of ClPDS, ClZDS, ClCRTISO, ClLCYB, ClCHYB and ClNCED7 in COS and PI 192938 flesh tissues collected at five developmental stages (10, 18, 26, 34 and 42 DAP) by qRT-PCR (Fig. 6a to f). ClPDS, ClZDS and ClCRTISO exhibited the same expression trend as ClPsy1 between the two parental lines. The transcript abundance of PI 192938 was always higher than that of COS throughout all developmental stages, particularly at 26 DAP. It has been reported that ClLCYB regulates lycopene accumulation at the protein level (Zhang et al., 2020), and COS and PI 192938 have the same single-nucleotide mutation as red-fleshed watermelon accessions. The mutation of G676th to T676th altered the 226th amino acid from valine (Val) to phenylalanine (Phe), whereas the mutation of G1,305th to C1,305th altered the 435th amino acid from lysine (Lys) to asparagine (Asp). This finding indicated that the ClLCYB protein may have the same function in COS and PI 192938. The lycopene content in the mature flesh of the two parental lines was quite low compared with that in a red-fleshed variety identified in our previous study (Fang et al., 2020). The expression level of ClNCED-7 in both parental lines showed an increasing trend over time after pollination, and COS presented a significantly higher expression level than PI 192938, except at 26 DAP.
The sequence variations in genes encoding enzymes at each step of the carotenoid pathway were also analyzed between COS and PI 192938 using resequencing data. Three SNPs were found in the coding region of ClZDS in PI 192938 compared with that in COS, and two of these SNPs led to amino acid substitutions. Mutation of the 161st base (G→A) resulted in a change in the 54th amino acid from serine (Ser) to asparagine (Asp), and mutation of the 480th amino acid (G→T) resulted in a change in the 160th amino acid from lysine (Lys) to asparagine (Asp). Only one nonsynonymous residue was detected in ClCRTISO, and mutation of the 526th base (T→C) changed the 176th amino acid from tyrosine (Tyr) to histidine (His). No variations were detected in the coding region of ClPDS or ClNCED-7.
According to the above-described results, we speculated the cause of the high β-carotene content shown in Fig. 7. High expression of ClPsy1 in PI 192938 contributed to increased phytoene accumulation, and the abundance of phytoene may upregulate the expression of ClPDS, ClZDS and ClCRTISO at each step of the carotenoid metabolism pathway to result in the synthesis of higher amounts of zeta-carotene and tetra-cis-lycopene in PI 192938. Tetra-cis-lycopene could be isomerized through ClCRTISO to generate lycopene, which is the carotenoid upstream of β-carotene. The same genotype of the ClLCYB protein may have a similar cyclization effect in PI 192938 and COS, and nearly all lycopene can be cyclized into β-carotene. High expression of ClCHYB increased violaxanthin accumulation, whereas low expression of ClNCED-7 may prevent β-carotene metabolism in PI 192938. These factors may be the main reasons for the high accumulation of violaxanthin and β-carotene in PI 192938 and thus its orange flesh color.