An allele-defined genome for S. spontaneum
Circular consensus sequencing (CCS) can produce accurate reads from noisy individual subreads, representing a better tradeoff between read length and accuracy, thereby providing a better tool to tackle highly complex auto-polyploid genomes. We obtained a total of 52 Gb of PacBio CCS long reads from 909 Gb of raw sequence data and 417 Gb of Illumina short reads on the Sequel and HiSeq X platforms, respectively (Supplementary Table S1). The Canu (v1.9)13 software package was used to perform initial de novo assembly of the S. spontaneum Np-X genome and yielded an initial contig-level assembly with an N50 of 405 Kb, a significant improvement relative to the previous S. spontaneum AP85-441 genome, which had an N50 of 45 Kb (Table 1 and Supplementary Note). The total size of this initial contig-level genome assembly was 2.76 Gb, which accounts for 97.53% of the Np-X genome size estimated by genome survey based on K-mers (2n = 4x, ~2.83Gb) (Table 1, Supplementary Fig. S1 and Supplementary Note). A total of 59.97 billion (98.73%) of 60.74 billion Illumina short reads could be aligned and covered 99.05% of the assembly (Supplementary Table S2), supporting that a high-quality contig-level genome had been obtained for further processing.
We then used ~105x of high-throughput chromosome conformation capture (Hi-C) data to scaffold the allele-aware chromosome-level auto-tetraploid S. spontaneum Np-X genome using ALLHiC10,14 (Supplementary Fig. S2 and Table S3). A total of 2.74 Gb (98.96%) of contigs were anchored into 40 pseudo-chromosomes comprised of 10 homologous groups with four allelic chromosomes in each group (Fig. 1 and Supplementary Table S4). Comparison of homologous haplotypes A, B, C, and D revealed 8.20 million SNPs, 0.66 million insertion/deletion (InDels), and 4,033 structural variations (SVs) among 10.73 Mb of sequence with heterozygosity of 1.39% in the S. spontaneum Np-X genome (Supplementary Fig. S3 and Supplementary Table S5). A total of 241 (96.27%) complete gene models among the 248 ultra-conserved core eukaryotic genes (CEGs) in CEGMA and 1,389 (96.46%) among 1,440 conserved genes in BUSCO were recalled in our assembly (Supplementary Table S6, S7). The assembly allowed us to predict 30 potential centromeric regions with lengths ranging from 0.59 to 8.63 Mb and 43 telomere regions with monomer copy number from 104 to 1,218 along the 40 chromosomes (Supplementary Table S8, S9).
Allele-defined genome annotation
Allele-defined gene annotation is the key for characterizing autopolyploid genomes. Based on our homologous chromosome-level assembly, we annotated a total of 123,128 protein-coding gene models (Table 1 and Supplementary Table S4) and 93.5% (115,166) of these gene models were functionally annotated via searches of the NR, GO, KEGG, Swiss-Prot, and KOG databases (Supplementary Table S10 and Extended Data 1). Due to the polyploid nature of this genome, many annotated gene models are naturally allelic variations of the same gene. Analysis of allele definition showed that 35,830 gene models were well defined, including 8,645 (24.1%) genes with four alleles, 12,861 (35.9%) with three, 10,781 (30.1%) with two, and 3,543 (0.1%) with one, with an average of 3.4 alleles per gene (Table 1, Supplementary Table S11 and Extended Data 2). These results confirm that the chromosome-level S. spontaneum Np-X assembly and gene annotation are well-organized and allele-defined.
Among the gene models we defined, 99.38% (122,177) and 93.26% (114,656) of CEGs could be identified in the genomes of S. spontaneum AP85-44110 (2n = 4x = 32) and sorghum (Sorghum bicolor15)(2n = 2x = 20) (Extended Data 4), respectively, supporting that these two S. spontaneum accessions share more similar gene content, than do S. spontaneum and sorghum, as expected from their phylogenetic relationships. Comparison of the genome assemblies of S. spontaneum AP85-44110, Saccharum hybrid R5709, Saccharum hybrid SP328011, and sorghum15 showed that, among 34,232 gene families, 1,934 (5%) were unique to S. spontaneum Np-X (Supplementary Fig. S5). The Np-X-specific genes were enriched in several Gene Ontology (GO) categories, including ‘RNA-DNA hybrid ribonuclease activity’, ‘polysaccharide binding’, ‘glycolytic process’, and ‘defense response to fungus’ (Supplementary Fig. S6). Our KEGG enrichment analysis indicated that some of these genes likely participate in glucose-6-phosphate isomerase pathway protein transport (Supplementary Fig. S7).
We identified a total of 1,590 Mb transposable elements (TEs) accounting for 57.52% of the assembled S. spontaneum Np-X genome (Supplementary Table S12), similar to their proportions in the genomes of AP85-44110(56.02%), S. officinarum (57.79%), Miscanthus16 (60.14%), and sorghum15 (61.30%) (Supplementary Fig. S8a and Table S13). Long terminal repeat (LTR) retrotransposons account for 40.64% of the Np-X genome, including 11.47% Ty1-copia and 28.87% Ty3-gypsy, whereas LTRs account for about 39.64% of the AP85-441 genome, 41.59% of the S. officinarum genome, 43.55% of the Miscanthus genome, and 44.66% of the sorghum genome (Supplementary Fig. S8b and Supplementary Table S13).
The genome re-structuring in S. spontaneum
The genome of S. spontaneum Np-X shows strict synteny with that of sorghum except for a sorghum-specific inversion on SbChr0410,15(Fig. 1c). The decrease in the basic chromosome number from 10 to 8 in S. spontaneum AP85-441 was caused by fission following by fusion of its ancestral sorghum chromosome homologs 5 and 8 as well as of rice chromosome homologs 11 and 12, which are paleo-duplicated chromosome pairs (PdCPs) that originated from the ρ event of the Poaceae10,16,17. However, the chromosome forms and numbers resulting from these events were not found in S. spontaneum Np-X (Fig. 1c and 1d). Relative to sorghum, two inversions occurred in S. spontaneum AP85-441 chromosomes APChr2AB and APChr7AB and three chromosomal fragments occurred in AP85-441 chromosome APChr6ABD10, but these chromosomal variations were not detected in S. spontaneum Np-X or the related genus Miscanthus (Supplementary Fig. S4). These results indicated that chromosomal inversions occurred after the event resulting in the chromosome number decrease in S. spontaneum AP85-441, further confirming that S. spontaneum Np-X retained the chromosome forms of the last common ancestor (LCA) of S. spontaneum.
To analyze the scenario of the hypothesized rejoining of Chr5 and Chr8, we analyzed synteny between these S. spontaneum Np-X and S. spontaneum AP85-441 chromosomes (Fig. 2) and found that the recombination breakpoints in S. spontaneum Np-X were located on the centromeres of NpChr5 and NpChr8. We found that S. spontaneum AP85-441 APChr5 is comprised of the ancestral short arms of S. spontaneum Np-X NpChr5 and NpChr6, and APChr6 is comprised of the ancestral long arms of NpChr5 and NpChr7, and that centromere-specific sequences were retained and spread over longer distances in both APChr5 and APChr6 (Fig. 2b). However, although APChr2 is comprised of the ancestral long arms of NpChr8 and NpChr2, its centromere-specific sequences were retained from only ancestral NpChr2. In contrast, APChr7, which is composed of the ancestral short arms of NpChr8 and NpChr2, retained two centromere-specific sequences from the two Np-X chromosomes. Comparative analysis of the homologous genomic regions in Np-X and AP85-441 showed that a 6.2 Mb genomic region of NpChr5 containing 34 genes and a 6.1 Mb segment of NpChr8 containing 16 genes appears to have been lost in the centromeric regions of the AP85-441 genome (Fig. 2c, Supplementary Table S14 and S15).
To further investigate the concerted evolution of the PdCPs (NpChr5 and NpChr8) in Saccharum, close paralogous gene pairs were identified in Chr5 and Chr8 in Np-X as well as in the corresponding chromosomes of AP85-441 (Chr5A (57–89 Mb) and Chr6D (54–90 Mb) compared with Chr2 (98-125 Mb) and Chr7 (62–83 Mb)), Miscanthus16 (Chr10 and Chr14), sorghum15 (Sb05 and Sb08), and rice18 (R11 and R12) (Fig. 2a). The numbers of pairs of gene conversions in each genome in descending order are: 848 pairs in rice, 213 in sorghum, 265 in Np-X, and 114 in AP85-441 (Supplementary Fig. S9, S10). Obviously, AP85-441 retained far fewer converted genes pairs than did Np-X. These phenomena could be the reason that re-structuring of the PdCPs results in the absence of homeologous genes during the concerted evolution of the chromosomal ancestors of NpChr5 and NpChr8.
Gene redundancy on PdCPs in Saccharum
The genes on NpChr05 and NpChr08 displayed lower expression levels than those located on the other chromosomes in the examined tissues, and a similar phenomenon was observed in the AP85-441 (Fig. 2e) and S. officinarum genomes (Supplementary Fig. S11). Moreover, the homologous regions between these two chromosomes displayed significantly higher average synonymous substitution rates (Ks, 0.036) within than did the homologous regions of other chromosomes (0.022) in Np-X (Fig. 2f), suggesting a more rapid evolution of the PdCPs.
Change in the chromatin status of PdCPs across the Poaceae
To investigate the evolution of the three-dimensional (3D) genome in the S. spontaneum Np-X auto-tetraploid, 1,937 million valid interaction read pairs were used to construct chromatin interaction maps for each set of homolog chromosomes at 500-Kb resolution. The frequencies of intra-chromosomal interactions displayed a rapid decrease with increasing linear distance (Supplementary Fig. S12). We found that 32.9–64.2% and 35.8–67.1% of regions on the homologous group chromosomes exist as compartment A (active regions) and compartment B (inactive regions), respectively (Supplementary Fig. S13 and S14), indicating that the distribution of chromatin compartment is disproportionate in the autopolyploid.
The A/B compartment changes were further investigated for the genes with a complete set of four alleles in the collinearity regions. We found that a total of 5,075 (59.9%) genes exhibited three conserved alleles and 2,240 (26.4%) genes exhibited two conserved alleles, while only 1,164 (13.7%) genes were conserved for all four alleles (Supplementary Fig. S15 and Supplementary Table S16). The GO term analysis of these three type genes indicated that the genes exhibited all four alleles conserved were mainly enriched in lignin catabolic process (GO:0046274; FDR < 6.14E-14), oxygen oxidoreductase activity (GO:0052716; FDR < 6.14E-14), multicellular organism development (GO:0007275; FDR < 3.60E-09) and ubiqultin-dependent protein catabolic process (GO:0006511; FDR < 3.60E-09), while the genes with two or three conserved were involved in transferase activity (GO:0016757; FDR < 4.15E-05), protein glycosylation (GO:0006486; FDR < 4.15E-05), superoxide metabolic process (GO:0006801; FDR < 6.71E-04) and peptidylprolyl cis-trans isomerase activity (GO:0003755; FDR < 9.05E-11) (Supplementary Fig. S16). At the chromosome levels, most of genes(68.8%, 59.8%, 81.6%, 56.5%, 84.6% 77.2% and 79.7%) in Chr1, 2, 4, 5, 6, 7 and 10 exhibited three conserved alleles(P < 1.0 × 10-5, Fisher’s exact test) and most genes (83.5%) in Chr9 exhibited two conserved alleles(P < 1.0 × 10-5, Fisher’s exact test), while most genes (55.1% and 42.8%) in only Chr3 and Chr8 exhibited all four conserved alleles(P < 1.0 × 10-5, Fisher’s exact test) (Supplementary Fig. S15 and Table S16). These results suggested that the A/B compartment diverged among sets of homologous chromosomes in this auto-polyploid species.
We further analyzed chromatin status in PdCPs of AP85-44110, Np-X, sorghum15, and rice18. In these four genomes, 44.8–47.5% and 52.5–55.2% of genomic regions exist in compartment A and in compartment B, respectively (Supplementary Table S17). We found that the genes of the homologous chromosomes of NpChr2/5/7/9 (74.0%–87.0%) reside mainly in the most conserved regions among the four genomes, while the genes (41.1–49.9%) that underwent switching from the B to A compartment reside mainly in the homologs of NpChr2/6 (Supplementary Fig. S17 and Fig. S18). It is noteworthy that NpChr5 and NpChr8 displayed the lowest degree of regions exhibiting B to A compartment switching among the homologous chromosome sets in Np-X compared to AP85-441 (Fig. 2d), and all these gene located in the regions of B to A compartment switching are higher expressed in AP85-441 than in Np-X (Supplementary Fig. S19), indicating that the re-structuring of NpChr5 and NpChr8 might have suppressed switching of chromatin status from inactive to active.
The recent polyploidization in Saccharum
Based on Ks estimations, Saccharum diverged from sorghum and Miscanthus ~6.4 Mya (Ks = 0.08) and ~4.0 Mya (Ks = 0.05), respectively. In Saccharum, S. spontaneum split from S. officinarum about 1.6 Mya (Ks=0.02), and the two S. spontaneum accessions, Np-X and AP85-441, separated at ~0.8 Mya (Ks=0.01) (Fig. 3a), demonstrating that chromosome reduction they underwent occurred very recently. Considering that x = 10 appears to be ancestral chromosome number in the Saccharinae-Sorghinae, the auto-octoploid of Saccharum, S. spontaneum SES208 (2n=8x=64) and S. officinarum (2n=8x=80) should have experienced two rounds of WGD, while the auto-tetraploid S. spontaneum Np-X experienced only one round of WGD after its divergence from sorghum (Fig. 3). Our Ks estimates reveal that the homologous chromosomes diverged at Ks = 0.01 in both Np-X and S. officinarum, and at Ks = 0.00 in AP85-441, indicating that the WGD of auto-octoploid S. spontaneum SES208 appeared to be more recent than those of both Np-X and S. officinarum (Fig. 3a).
TEs have long been considered as key genomic features that can trigger genome instability and are thus an important source of evidence for exploring the evolutionary history of genomes. In the four genomes (Np-X, AP85-44110, S. officinarum, sorghum15, and Miscanthus16) we examined, LTR retrotransposons appear to have undergone continuing and recent amplification bursts ranging from 0 to 2 Mya (Fig. 3b) The most recent LTR/Gypsy sequence was used as a reference to identify TE hits, and 10 distinct insertion peaks with identities ranging from 65% to 98% were detected in the four genomes (Fig. 3c and 3d). A Gaussian probability density function (GPDF) analysis estimated that the earliest TE insertion events (P1 and P3) occurred at ~2.0 Mya in Saccharum (Fig. 3d, 3e and Supplementary Fig. S20-22), which preceded the expected divergence time for Saccharum. The P4 insertion peak (71.0–72.2%) that occurred ~1.2 Mya could also be specifically identified in the S. officinarum genome. These results suggested that the divergence between S. spontaneum and S. officinarum occurred between 1.2 and 2.0 Mya, supporting the estimation of ~1.6 Mya based on the Ks values of orthologous gene pairs. The occurrence of the P6 peak (75.8–76.4%) ~0.89 Mya is found specifically in Np-X and AP85-441 and is consistent with the estimation of 0.8 Mya based on the analysis of orthologous gene pairs. Two TE insertion peaks with identities of 91.8–92.4% (P8, ~0.55 Mya) and 96.4–96.8 (P9, ~0.35 Mya) appeared specifically in Np-X genome, indicating that these TE insertion peaks occurred specifically in Np-X. Similarly, P2 (65.4–67.9%, ~1.9 Mya) and P5 (72.7–74.1%, ~1.0 Mya) TE insertion peaks were found in the sorghum genome, and a P7 (83.5–84.2%, ~0.6 Mya) TE insertion peak was found in Miscanthus (Fig. 3d, 3e and Supplementary Fig. S23, S24).
Genes related to the key characteristics of Saccharum
Given that Np-X exhibits the ancestral chromosome forms of S. spontaneum, comparative analysis of the genes in the three Saccharum and sorghum genomes may offer clues to the evolution of key agronomic characteristics of Saccharum. We thus analyzed the core gene families related to sugar accumulation, photosynthesis, and leaf width in Saccharum.
Genes encoding sugar metabolism enzymes and sugar transporters: We identified the key gene families related to sugar metabolism including sucrose phosphate synthase (SPS)19,20, invertase (INV)21, sucrose synthase (SUS)22,23, and fructokinase (FRK)24 from the three Saccharum genomes (Fig. 4a and Supplementary Table S18). Compared to sorghum, the number of INV genes had undergone expansion in S. officinarum, which exhibited high sugar content, but were relatively conserved in both of the S. spontaneum genomes, indicating that expansion of INV genes might be a prerequisite for evolution of sugarcane with high sugar content.
Sugar transporters are crucial for the allocation of sugars to sink and source tissues during the development of plants25-27. We identified a total of 119 genes that are likely members of sugar transporter superfamilies including PLT, STP/HXT, INT, VGT, pGlcT, SFP, TMT, and SWEET (Fig. 4a and Supplementary Table S18). The PLT, SUT, and SFP gene families expanded in the two S. spontaneum genomes, and the STP and VGT families showed more expansion in Np-X than in AP85-441, indicating that the expansion of the STP and VGT families in S. spontaneum Np-X could be a transitional intermediate state. In contrast, the SWEET gene family only expanded in S. spontaneum Np-X, suggesting that contraction of this gene family might have occurred in S. spontaneum AP85-441 after genome re-structuring. In addition, 19 STP and 23 PLT genes had expanded due to tandem duplications in AP85-441, while expansion of the PLT family was only observed in Np-X, indicating that the tandem duplications of STP genes in Saccharum occurred after speciation.
C4 photosynthesis pathway: The C4 photosynthesis pathway was discovered in sugarcane28,29 and has been considered to have higher energy efficiency necessary for generating large biomass. We identified members of nine core gene families that participate in the photosynthesis pathway (CA, PEPC, PEPC-K, NAD-ME, PPDK, NADP-MDH, NADP-ME, PPEK-RP and PEPCK) in the three Saccharum genomes and that of sorghum (Fig. 4a and Supplementary Table S19). Compared to sorghum, gene expansions occurred in the NAD-ME gene family in both Np-X and AP85-441, but not in S. officinarum (Fig. 4a, Supplementary Fig. 25 and Table S19). NpPEPC1, NpPEPC-k1, NpNADP-MDH2, NpNADP-ME2, NpPPDK1, NpPPDK-RP1, NpCA1, and NpCA2, are preferentially expressed in leaves, suggesting that these eight genes might be related to C4 photosynthesis in Np-X, which is consistent with observations in AP85-44110 and S. officinarum (Zhang et al., under review). However, with the exception of NpPEPC1, transcripts of other genes putatively involved in photosynthesis displayed much lower abundance (average TPM = 21.0) than in either AP85-441 (average TPM = 216.9) or S. officinarum (average TPM = 320.9) (Fig. 4b). Thus, it seems that the types of component genes involved in the C4 photosynthesis pathway in Saccharum might have converged as well as the regulation of their respective expression.
Narrow leaf genes: S. officinarum has much larger leaves than S. spontaneum, and among S. spontaneum varieties, Np-X has much smaller leave than AP85-411 (Fig. 1a). About six NARROW LEAF (NAL) genes controlling leaf width have been reported in rice30-35, and among them, NAL1 was shown to have the strongest effect on leaf width. We identified 13 NAL genes in the three Saccharum genomes (Fig. 4a, 4c and Supplementary Fig. S26), and gene expression analysis showed NAL1 and NAL10 transcripts to be expressed at much lower abundance in Np-X than in either AP85-441 or LA-Purple, and both appear to be down-regulated compared to their expression in stem (Fig. 4c), suggesting these as candidate genes affecting leaf width.
The origination and independent polyploidization of S. spontaneum
We generated a total of 4,682 Gbp of resequencing reads for 102 S. spontaneum genomes and the genomes of 14 related species for population genetics analysis, using the reference S. spontaneum Np-X genome (Fig. 5a and Supplementary Table S20). A total of 3,345,380 high-confidence variants including 3,140,400 SNPs and 204,980 InDels were identified in these data. Using the genomes of the 14 related species as outgroup (including Sorghum, Miscanthus, S. officinarum and S. robustum), principal component analysis (PCA) of SNPs showed substantial genetic diversity among S. spontaneum groups (Fig. 5b). PC1 (35.81%) and PC2 (15.64%) separated the S. spontaneum accessions from the outgroup accessions while PC1 vs. PC3 clearly divided the S. spontaneum accessions into four groups, which was consistent with our Maximum Likelihood (ML)-based phylogenetic analysis (Fig. 5a and 5c). The four groups displayed continuous geographic distribution from the Indian subcontinent to eastern and southern Asia. Group I accessions were primarily distributed in the Pakistan, India, and southwestern China (Himalayan region); Group II accessions were mainly distributed in southeastern China including Fujian and Taiwan provinces, and the Philippines; Group III accessions were mainly distributed in southern China including Yunnan, Guangdong, and Hainan provinces, and in Myanmar and Laos; and group IV accessions were primary distributed in Indonesia and India (Fig. 5b). It is noteworthy that the recorded progenitor S. spontaneum of modern sugarcane hybrids, Glagah, clustered into group IV.
Linkage Disequilibrium (LD) decays among the resequenced accessions were relatively low with an average rate of 0.086, supporting that these accessions had not experienced artificial selection (Supplementary Fig. S27). The π values of Groups were 0.24×10-3, 0.31×10-3, 0.29×10-3, and 0.26×10-3, respectively (Supplementary Fig. S28), which verifies the results of our LD analysis. Genetic differentiation values (FST) between any two of the four groups ranged from 0.047 to 0.084 (Supplementary Fig. S28), and each of the neighbor group pairs exhibited lower FST values in comparison to any two of the other four groups.
Further, we estimated admixture proportions and individual ancestry based on the SNP data set (Fig. 5d, Supplementary Fig. S29 and Fig. S30). In the admixture plot at K = 5 (Fig. 5d), each of the five groups exhibited distinct relative monophyletic ancestry matching our ML phylogenetic analysis, and further supports that a low level of genetic exchange between the four S. spontaneum groups due to their geographic isolation. Only weak gene flow was detected between Group I and Group II (P-value = 2.2e-308; F_statistic = 0.397) (Fig. 5e). Further, population structure analysis showed that 12 accessions in Group I had some Group II ancestry and that 11 accessions in Group II had some Group I ancestry (Fig. 5d), supporting the limited gene flow between the two groups. All of these findings indicate that the four S. spontaneum groups evolved relatively independently after they originated on the Indian subcontinent (Group I) and spread step by step to the regions where Group II, Group III, and Group IV are now distributed.
Consistent with the results of a previous study10, our phylogeny showed that S. spontaneum accessions of diverse ploidy are clustered together within each group, confirming that the different ploidy levels of S. spontaneum were originated and diverged independently in each of these four groups.
The evolution of basic chromosome numbers in S. spontaneum
With the availability of the S. spontaneum Np-X genome (x = 10), basic chromosome numbers can be determined based on the coverage of mapping reads on the restructured chromosomes (NpChr5 and NpChr8), specifically by looking at read coverage across breakpoints for evidence of restructuring. Examples of x=9 typically has signs of restructuring around NpChr5, and not NpChr8 (Fig. 6a).
Totals of 4, 7, and 91 of the S. spontaneum accessions have basic chromosome numbers of x = 10, x = 9 and x = 8, respectively. It is noteworthy that the S. spontaneum accessions with x = 8 were distributed across all four groups, while the accessions with x = 10 and x = 9 formed a clade that was nested within Group I and were mainly located in Pakistan, northern India, and Tibet China (Fig. 5c). As accessions with basic chromosome numbers of x = 9 and x = 10 are only found in Group I, while those with basic chromosome numbers of x = 8 appear in all four groups of S. spontaneum accessions, we assumed that the fluid ploidy levels have evolved independently from ancestral progenitors with basic chromosome numbers of x = 8 in three groups: Group II, Group III, and Group IV. In addition, gene flow is undetected between any pairs of these three groups of accessions (Fig. 5e).
The population of accessions with a basic chromosome number of x = 8 shows much lower LD decays (an average of 0.043) than do those with basic chromosome numbers of x = 9 and x = 10 (an average of 0.127 and 0.256, respectively), supporting the notion that they had not undergone artificial selection. Tajima's D test suggested that the population of accessions with a basic chromosome number of x = 8 (d = 1.225) were mainly under balancing selection and a recent population contraction, while both those with basic chromosome numbers of x = 9 (d = 0.191) and x = 10 (d =0.17) were under neutral selection (Supplementary Fig. S31). Moreover, accessions with a basic chromosome number of x = 8 exhibit a higher degree of nucleotide diversity (π=3.80 ×10-4) than the other two forms (π = 3.26 × 10-4 for accessions with basic chromosome number x = 9, and π = 3.02 × 10-4 for accessions with basic chromosome number x = 10) (Supplementary Fig. S32), indicating that balancing selection in the population of accessions with a basic chromosome number of x = 8 contributed to maintenance of high levels of polymorphism in those populations, perhaps also related to a large overall population size of x=8 species.
We explored the population demographic history of S. spontaneum using SNP data with a PMSC (Pairwise Sequentially Markovian Coalescence) model31, which revealed that the S. spontaneum population experienced a prominent effective population (Ne) expansion (Ne ~ 500,000) around 12 to 14 Kya (thousand years ago) and a subsequent Ne contraction (Ne ~ 10,000–60,000) around 8 to 1.4 Kya (Fig. 6d). Interestingly, the time of the Ne expansion corresponded to that of the Marine Isotope Stage (MIS) 5e interglacial period, the warmest interval of a warming phase during which the earth emerged from an extreme glacial phase36-40, according to benthic oxygen isotope data. The time of the Ne contraction corresponded to the Younger Dryas cold event that occurred ~1.1–1.2 Kya during which the global climate changed dramatically41-43. The S. spontaneum population of accessions with a basic chromosome number of x = 10 diverged demographically from the S. spontaneum populations with basic chromosome numbers of x = 9 and x = 8 with a much smaller Ne in recent history.
The accessions of S. spontaneum with a basic chromosome number of x = 8 exhibit a high level of genetic diversity and are distributed over a broader geographical range than the other two forms of accessions with basic chromosome numbers of x = 9 and x = 10, indicating that the genome re-structuring likely contributed to the vigor and adaptation of S. spontaneum with a basic chromosome number of x = 8.
To investigate the genetic basis of likely fitness advantage of x = 8 in S. spontaneum, XP-CLR was used to detect natural selective sweeps. We identified selective sweep signatures for 25.7 Mb of the genome sequence with 338 candidate genes putative involved in the reduction of basic chromosome number from x = 10 to x = 8 (Supplementary Fig. S33 and Extended Data 5). The identified genes located within selective sweep regions are mainly enriched in the molecular function ‘polysaccharide binding’, suggesting divergence in the genetic control of related processes after the genome re-structuring in S. spontaneum. It is noteworthy that 13 of the genes found in the selective sweeps putatively encode glycosyltransferase, glycosyl hydrolases, STARCH SYNTHASE 1 (Npp.10B005850.1), sucrose-phosphate synthase (Npp.03C039020.1), cellulose synthase (Npp.01B008750.1), and SWEET (Npp.04A016530.1), indicating that divergence of genes involved in carbohydrate metabolism pathways likely occurred after the S. spontaneum genome re-structuring. Interestingly, four tandem FUTs (Xyloglucan fucosyltransferase) (Npp.04A015560.1, Npp.04A015590.1, Npp.04A015600.1, and Npp.04A015610.1) putatively involved in plant cell wall biosynthesis processes were also identified in the selective sweeps44,45, which could provide a foundation for the morphological differentiation of the three populations. In addition, some gene functions related to stress responses were detected, e.g., LOT1(Npp.05D013900.1) which enhances ABA response and plant drought tolerance46; two genes (Npp.10B011680.1 and Npp.10B011690.1) putatively encoding PER3 and Npp.01B018240.1, which have been implicated in responses to low temperatures in plants47,48; two genes (Npp.06C016710.1 and Npp.06C016720.1) with functions related to cold tolerance and that are induced by drought and ABA; two genes (Npp.01B030420.1 and Npp.05D025320.1) encoding a putative ethylene response factor; and 16 genes, such as CYP81F (Npp.01B031390.1 and Npp.05D025440.1), which likely encode enzymes putatively involved in oxidation-reduction processes and might be related to tolerance of drought stress49,50. These results indicated that the natural genomic sweep regions were likely involved in polysaccharide metabolism and stress tolerance during the chromosome reduction process.