Sequencing, variants and population structure of Guangxi tomatoes
In this study, a total of 605 tomato accessions were used for genotyping and subsequent GWAS analysis. Among these accessions, a diverse global collection of 539 accessions were genotyped in previous studies 30, 31,32, and 66 newly sequenced accessions, which showed vines and had small leaves and small red fruits with thin skin, were collected from the mountainous area of Guangxi Province, China (Supplementary Table 1 and Supplementary Fig. 1). A total of 6.25 billion 100-bp paired-end reads were obtained for these Guangxi (GX) accessions, representing a mean depth of 9.85× coverage of the tomato genome (Supplementary Table 1). A total of 4,412,112 million high-quality SNPs were called from the sequencing data of the 605 accessions.
To determine the evolutionary status of the GX tomatoes, we inferred the phylogenetic relationships of the 605 tomato accessions using the genome-wide SNPs. Largely consistent with the previously reported phylogeny30, these tomatoes were divided into six groups (Fig. 1a), which was further supported by the principal component analysis (PCA) and population structure analysis (Fig. 1b,c). Interestingly, GX tomato accessions clustered together and divided the S. lycopersicum var. cerasiforme (SLC) group into two subgroups: South American SLC and Non-South American SLC, suggesting that the GX accessions were introduced to China from South America after domestication and then underwent independent evolution.
The nucleotide diversity decreased from the S. pimpinellifolium (SPIM) group (π=2.61×10-3) to the SLC group (π=1.3×10-3) and to the S. lycopersicum var. lycopersicum (SLL) group (π=0.57×10-3) and GX group (π=0.5×10-3), indicating that a large amount of genetic diversity has been lost for GX tomatoes, possibly because of geographical isolation and narrow ancestral genetic background. Large population divergences between the GX group and the other three groups were observed (Fig. 1d), suggesting that GX tomatoes in China might have accumulated some genetic diversity after their introduction.
Phenotypic variation in the tomato population
In this study, a total of 27 yield-related traits, including fruit number in the second spike (FRNS), fruit number in the third spike (FRNT), flower number on the second inflorescence (FLNS), flower number on the third inflorescence (FLNT), sepal number (SN), petal number (PN), first inflorescence node (FIN), ovary transverse diameter (OTD), ovary longitudinal diameter (OLD), ovary transverse diameter to ovary longitudinal diameter ratio (OTLD), stigma exsertion (SE), stigma shape (SS), sepal length to petal length ratio (SPR), stomatal density (SD), first to second inflorescence node (FSIN), internode length (IL), petal length (PL), stamen length (STAL), stigma length (STIL), sepal length (SL), indeterminate or determinate meristem (IDM), stamen length to (stigma length+ovary longitudinal diameter) ratio (SSR), fruit stalk diameter (FSD), fruit stalk length (FSL), inflorescence type (IT) and fasciated flower (FL), were investigated during the whole growth period of tomato with three replications (Supplementary Fig. 2 and Supplementary Note). These agronomic traits were classified into three categories: four organ location traits, nine organ number traits and 14 organ size traits (Supplementary Tables 2 and 3). For the majority of the agronomic traits, an abundant variation was detected, with the coefficients of variation (CV) ranging from 0.12 for PN to 2.92 for FLNS (Supplementary Table 2). All traits determined in the diverse global collection of tomato accessions (Supplementary Table 3) displayed a broad-sense heritability (H2) greater than 0.7, and 14 had heritability over 0.9 (Supplementary Table 2), suggesting these traits were primarily determined by genotype. Most of the traits showed a normal distribution while PN, SN, FSIN, FSD, FSL and OTD showed skewed distributions (Supplementary Fig. 3). Except IL, the phenotypic values of all agronomic traits were significantly different among three subgroups of tomato (SPIM, SLC and SLL) (Supplementary Fig. 4). For example, the ovary size (OLD and OTD) increased during tomato domestication and improvement, while the flower number (FLNS and FLNT) and fruit number (FRNS and FRNT) declined (Supplementary Fig. 4). Many of the 27 traits were correlated (Supplementary Fig. 5), and the correlation coefficients between some traits were very high, such as SL and PL (r = 0.85), IL and FSIN (r = 0.88), and OTD and SN (r = 0.83). Negative correlation between ovary size (OLD and OTD) and flower and fruit number (FLNS, FLNT, FRNS and FRNT) supported the hypothesis that fruit size was positively selected, and the number of fruits decreases during the process of tomato breeding (Supplementary Fig. 5).
Genome-wide association study for 27 yield-related traits
To reveal the genetic architecture of the vegetative and reproductive organ development in tomato, we performed GWAS on the 27 yield-related traits in the 605 tomato accessions using the genome-wide SNPs. The Manhattan plots of GWAS for all 27 traits are shown in Supplementary Figures 6-32, and detailed information about all significant associations is summarized in Supplementary Table 4.
In tomato, an association locus has been defined as a chromosomal region in which the distance between the adjacent pairs of associated SNPs is less than 200 kb33. According to this definition, a total of 239 suggestive associations (including 148 significant associations) corresponding to 129 loci were identified (Fig. 2 and Supplementary Table 4). For organ location, organ number and organ size traits, 28, 85, and 126 associated loci were identified, respectively (Table 1). On average, each trait had ~9 identified associated loci, with SSR having 20 associated loci whereas SPR only one. We identified four potential GWAS hotspots (density>0.03) which fit perfectly with known QTLs involved in the regulation of the growth and development of tomatoes on chromosomes 2 (LC, locule number), 3 (FA, falsiflora), 6 (SP, self-pruning), and 11 (FAS, fascinated), respectively (Supplementary Fig. 33). For example, the locus overlapping with LC was simultaneously detected for 11 different traits including FLNS, FRNS, FLNT, FRNT, PN, SN, FSD, OTD, OTLD, SS and SSR. The locus overlapping with FAS was detected for 8 different traits including FSIN, IL, IDM, IT, PN, SN, FL, OTD and SS. These results are in line with that many of the agronomic traits were highly correlated (Supplementary Fig. 5). The percentage of phenotypic variation explained by each locus ranged from 1.6 to 69.4% in organ location traits, from 1.2 to 78.5% in organ number traits and from 1.4 to 75.3% in organ size traits, with mean values of 9.8%, 8.0% and 8.6%, respectively (Table 1). Although some traits were controlled by one major locus that explained over 60% of the natural variation, such as SN, IDM, IT and FL, most agronomic traits were determined by multiple moderate-effect loci.
Key candidate genes involved in vegetative and reproductive development
We searched for functional known or unknown candidate genes responsible for the variation of tomato agronomic traits based on the information of gene annotation, phylogenetic analysis of candidate genes with their homologs with known functions, and cross-referencing with results from previous linkage mapping. We were able to identify several plausible candidate genes and possible causative SNPs underlying the yield-related traits (Supplementary Table 5). Taking the 18 associated loci of PN as an example, in addition to previously reported large-effect genes including CLV3, LC, SlWOX4 and SlRRA334, 35, we also identified several new minor-effect candidate genes such as SlCLE2 and SlCLE6 belonging to the CLV3/EMBRYO-SURROUNDING REGION family that plays an important role in regulating stem cell proliferation and differentiation of plant development36 (Supplementary Fig. 34). The new loci identified here provide valuable candidates for future studies that can further our understanding of the genetic regulation of these traits.
For FSIN, IL, PL, STAL and IDM, the association signal at the end of chromosome 6 showed a subtle zigzag pattern, suggesting multiple trait-associated genes present in this small region (Fig. 3a). In detail, we detected three significantly associated SNPs (SL2.50ch06_43765964, SL2.50ch06_44230173 and SL2.50ch06_45972263) corresponding to genes Solyc06g071140, Solyc06g071830 and Solyc06g074350 in this region, respectively (Fig. 3b). Solyc06g074350 corresponds to the known SELF-PRUNING (SP) gene7. Solyc06g071140 (SlELF3), located 28.7 kb downstream of SL2.50ch06_43765964, was annotated as a homolog of the EARLY FLOWERING gene that regulates circadian clock function and flowering in Arabidopsis37, 38, rice39 and pea40 (Supplementary Fig. 35a). Solyc06g071830 (SlBOP4), located 15.3 kb downstream of SL2.50ch06_44230173 and encoding a BTB/POZ protein, was homologous to three other SlBOPs (SlBOP1, SlBOP2 and SlBOP3) that are known to control inflorescence architecture and flower production in tomato16 (Supplementary Fig. 35b), suggested that Solyc06g074350 is likely the candidate gene underlying this locus. To compare association results across different traits, we constructed an association network to visualize complex trait relationships (Fig. 3c). The SNP SL2.50ch06_45972263 near the SP gene was the most significant locus for the IDM trait (P=3.36×10-55) and was repeatedly detected for IL (P=3.05×10-16) and FSIN (P=8.63×10-19). The SNP SL2.50ch06_44230173 near Solyc06g071830 was the second most significant locus of FSIN (P=2.28×10-13) and was repeatedly detected for IL (P=2.85×10-11), STAL (P=2.02×10-10) and PL (P=4.26×10-10). The SNP SL2.50ch06_43765964 near Solyc06g071140 was the third most significant locus for FSIN (P=2.74×10-12) and was repeatedly detected for IL (P=2.72×10-11), STAL (P=1.17×10-10) and PL (P=8.5×10-9). The co-localization between these two organ size traits (PL and STAL) and three organ location traits (IL, FSIN and IDM) suggests a relationship between flowering time and flower size in tomato.
Stigma exsertion, defined as the pistil longer than the stamen (Supplementary Note), has been reported as a key determinant of the plant mating system and contributes to the application of heterosis in plant41, 42. Style2.1, the major QTL responsible for SE in cultivated tomatoes, has been identified on chromosome 2 (Ref. 15). In this study, SE and SSR were significantly associated with SL2.50ch01_84029382 (P=2.78×10-12) and SL2.50ch03_60427735 (P=1.29×10-16) on chromosome 1 and 3, respectively (Supplementary Figs. 26 and 30). SL2.50ch03_60427735 is located 2 bp from the start codon of Solyc03g098070 that encodes an C2H2L domain class transcription factor (referred to as Style3 hereafter). Homologous of Style3 in Arabidopsis, SGR5, has been reported to regulate the gravitropism of inflorescence stems43. Both association and expression analyses supported that Style3 is likely the candidate gene underlying stigma exsertion in tomato (Supplementary Fig. 36). Style3 was highly expressed in flower tissues, especially in style (Supplementary Fig. 36c). Higher expression of Style3 was found in stigma exsertion and stigma flush tomatoes compared with stigma inside tomatoes, while no significant expression difference was observed between stigma exsertion and stigma flush accessions (Supplementary Fig. 36d). Genotyping analysis revealed that all stigma exsertion and stigma flush accessions (52 SLC, 22 SLL, 18 SPIM and 11 other wild accessions) exhibited the C allele, while the stigma inside accessions (6 SLC and 37 SLL accessions) exhibited the T allele of the lead SNP (SL2.50ch03_60427735; Supplementary Fig. 36e and Supplementary Table. 6). These results provide the first evidence that Style3 may be involved in determining the style length.
Despite the widely reported genetic mapping of the fruit size trait, genes responsible for early fruit development remain largely unexplored in tomato5, 6, 44. For the GWAS of ovary size traits, four clear signals, SL2.50ch01_84023965 corresponding to fw1.2, SL2.50ch02_47188498 corresponding to lc, SL2.50ch03_64734105 corresponding to fw3.2 and SL2.50ch11_55052389 corresponding to fas, were identified for OTD and OTLD, but not for OLD, indicating that lateral development in early fruit stages is more genetically regulated in tomato (Supplementary Figs. 23 and 24).
Functional characterization of a candidate gene regulating stomatal density
Stomata, the exchange channels of gas and moisture between the plant and the external atmosphere, is an important trait affecting photosynthesis, transpiration and productivity of plants19. Here, we experimentally verified and further characterized a candidate gene involved in stomatal formation, providing novel functional insights. The phenotype values of stomatal density presented a skewed normal distribution in the natural tomato population investigated here (Supplementary Fig. 3). We obtained two significant loci associated with stomatal density in tomato leaf on chromosome 3 and 11, respectively (Supplementary Fig. 17). The significant association (P=5.59×10-8) between SNP SL2.50ch11_53544569 and stomatal density suggested that a genomic sequence related to SNP SL2.50ch11_53544569 forms the major genetic locus (explain 26.2% of the variation) responsible for the natural variation in stomatal formation of tomato leaves (Fig. 4a). Two major haplotypes, C and T, at the lead SNP (SL2.50ch11_53544569) of the association signal were associated with high-density and low-density stomatal phenotypes in tomato, respectively (Fig. 4b).
There were 22 genes within the 100-kb sequences flanking either side of the lead SNP (Supplementary Table 7). Pairwise linkage disequilibrium (LD) analysis within the 400-kb interval centered on the lead SNP showed that SNPs with high LD to the lead SNP fell into a 70-kb region for 53.51 Mb to 53.58 Mb (Fig. 4a). Solyc11g068970, encoding an aluminum-activated malate transporter (ALMT) (Fig. 4c), was the closest gene to the lead SNP (1.8 kb downstream), and the gene and the SNP were in the same LD block (Supplementary Fig. 37). Previous studies have shown that many ALMTs are expressed in guard cells and contribute to stomatal closure in plants45, 46, therefore, Solyc11g068970 (which we named SlALMT15) was considered as the causal gene candidate for controlling stomatal density in tomato.
A total of 98 orthologs with high amino acid similarity (>50%) to SlALMT15 were identified in different plant species (Supplementary Fig. 38). The SlALMT15 protein was predicted to contain five transmembrane helices and a long C-terminal domain that harbored a conserved WEP-motif (Fig. 4e). To investigate functional allelic variation at the SlALMT15 locus, we analyzed the nucleotide sequence of SlALMT15 in 13 tomato accessions with diverse stomatal density, which revealed 17 polymorphisms including five indels and 12 SNPs in the promoter region, and no polymorphism in the gene region (Supplementary Fig. 39). Except two indels, the remaining 15 polymorphisms led to 36 possible cis element changes in the promoter of SlALMT15 according to PLACE (https://www.dna.affrc.go.jp/PLACE/) (Supplementary Table 8). The spatial and temporal expression patterns of SlALMT15 in high-density stomata accessions (Ts-9 and Ts-53) and low-density stomata accessions (Ts-55 and Ts-52) were then investigated. SlALMT15 showed high expression levels in stem, flower and leaf, but low in fruit and root, with the transcript levels higher in most tissues of Ts-9 and Ts-53 than in Ts-55 and Ts-52 (Fig. 4f), supporting a role of SlALMT15 in positively regulating stomatal density in tomato.
To further functionally characterize the role of SlALMT15 and stomatal formation, we mutated SlALMT15 in vivo using CRISPR/Cas9 in the high-density stomatal accession Ts-9 (Fig. 4d). CRISPR/Cas9-induced knockout mutations (deletions) in SlALMT15 were detected by PCR and further confirmed by DNA sequencing (Fig. 4g). The three investigated mutant lines developed significantly less stomata than Ts-9 in leaves (Fig. 4h and 4i). To investigate whether the SlALMT15 affects drought stress tolerance by affecting stomatal density, six-week-old seedlings from the four studied mutant lines and the wild-type were challenged with drought stress by withholding water for 8 days. Dehydration symptoms (leaf wilting) were observed in both mutants and wild-type plants, but the wilting was significantly more severe in the wild-type plants. Changes of drought-related physiological indicators, including net photosynthetic rate, stomatal conductance, transpiration rate and malondialdehyde (MDA) content, also supported the different degrees of wilting in the mutants and the wild-type plants (Supplementary Fig. 40). Together these results strongly support that SlALMT15 functions in the stomata formation and further affects drought stress tolerance in tomato.
Selective sweeps related to yield-related traits
Long-term domestication and improvement have brought many morphological changes to tomato, such as larger flower and fruit6, stronger stem47, embedded stigma15 and so on. To investigate how artificial selection underlying these changes, we scanned genomic regions for selective signals in the tomato genome. Based on the phylogenetic analysis, we combined the GX group and the SLC group into one group (SLC_GX) for selective sweep identification. In total, we identified 128 domestication sweeps exhibiting lower nucleotide diversity in SLC_GX compared with SPIM, covering 59.85 Mb and harboring 2,492 genes (πSPIM/πSLC_GX> 2.76; Fig. 5a, Supplementary Fig. 41 and Supplementary Table 9). On the other hand, 204 improvement sweeps with a cumulative size of 68.22 Mb were detected and harbored 4,959 genes (πSLC_GX/πSLL> 5.38; Fig. 5b and Supplementary Table 10). Collectively, there were 2,132 and 4,599 genes only involved in the domestication or improvement, respectively, and 360 genes in both (Supplementary Table 11). We found that 62% of genes in domestication sweeps (1,545 out of 2,492) and 63% in improvement sweeps (3,122 out of 4,959) detected in our study were also detected in a previous study30 (Fig. 5c).
To determine the genetic and phenotypic basis of tomato breeding, we compared selective sweeps with the 129 GWAS loci identified in our study (Supplementary Table 4), and observed that 51 out of 129 (39.5%) GWAS loci overlapped with selective sweeps, including three overlapping only with domestication sweeps, 43 only with improvement sweeps and five with both (Fig. 5d and Supplementary Table 12). These 51 loci corresponded to 92.6% of traits (25 out of 27, except SD and STIL) investigated in our study, suggesting that most yield-related traits have been under artificial selection, especially during tomato improvement, consistent with the phenotypic difference of most traits among SPIM, SLC and SLL (Supplementary Fig. 4). Moreover, five loci associated with SSR, FB, FSIN, and IDM were selected during both tomato domestication and improvement, indicating a continuous selection of their corresponding agronomic traits (Fig. 5d).