Analysis of transgressive segregation for hyper-salinity tolerance in the IR29 (indica) x Pokkali (aus) recombinant inbred population (F9) under greenhouse conditions was described earlier [28, 32, 33]. Phenotyping of 139 recombinant inbred lines revealed major phenotypic categories and their contributing morpho-physiological traits. FL478 is equally tolerant as its parent Pokkali, which is the origin of the SalTol QTL. FL454 is equally sensitive as its parent IR29. FL510 is superior to both FL478 and Pokkali, and while all three genotypes contain the SalTol allele from Pokkali, FL510 is transgressive super-tolerant. FL499 is inferior to both IR29 and FL454 hence transgressive super-sensitive.
Agronomic superiority of the transgressive progeny
To confirm the transgressive nature of FL510 as established under EC = 9 to 12 dSm− 1 in the greenhouse, we tested the population in the field at the International Rice Research Institute (IRRI), Philippines during the wet season of 2023–2024. Tests were performed under Artificial Salinity Field at IRRI headquarters in Los Banos, Laguna, where field plots were irrigated with artificial saltwater, and under Natural Salinity Field in the coastal town of Infanta, Quezon (Philippine Sea) with seawater irrigation. The population was subjected to moderate (EC = 9 dSm− 1), severe (EC = 10 to 12 dSm− 1) and lethal (EC = 14 to 16 dSm− 1) levels of hyper-salinity. Results reiterated the transgressive nature of FL510, which is the only progeny that survived under lethal hyper-salinity (Fig. 1). Agronomic performance indicated the outlier nature of FL510.
Non-parental genomic novelties created by recombination
IR29 is a typical short-duration indica released as a high-yielding variety from multi-parent hybrid selections during the Green Revolution. Pokkali is a photoperiod-sensitive landrace historically grown in the marshlands and river basins of South Asia affected by brackish water runoffs [34]. Pan-genome analysis showed that the typical indica subspecies has a generally larger genome ranging from 376Mb to 380Mb compared to the typical japonica subspecies which range from 386Mb to 393Mb. The aus subgroup where Pokkali is classified under is a clade between indica and japonica with a typically smaller genome than indica, e.g., 383Mb for cv Natel Boro [35] (Additional Files 1 to 5).
Genome size prediction based on k-mer analysis showed that IR29 is similar to the indica reference from 425 to 445Mbp, and Pokkali is comparable to the japonica reference at from 415 to 425Mbp (Fig. 2A). While the overall gene content were similar between IR29 and Pokkali, the two genomes were widely contrasting due to the contraction-expansion of repetitive sequences and transposon loci (Fig. 2B). These differences are suggestive of variation in chromatin length between the indica and aus. Chromatin length variation due to expansion-contraction of non-coding regions contributes to imperfect synapsis of homologous chromosomes during prophase-1, an event that is often associated with genomic shock [24, 30, 31].
We observed a general trend of genome downsizing in the recombinants of IR29 x Pokkali characterized by the shrinkage of repetitive and transposon loci (Figs. 2A, 2B). Most notably, the transgressive super-tolerant FL510 had the smallest genome and highest magnitude of downsizing. K-mer analysis also confirmed the homozygosity of recombinants, consistent with the expectation of fixed and stable recombinant genomes after eight generations of selfing (Fig. 2C). Homozygosity eliminates the possibility that the outlier nature of FL510 could be due to overdominance effects.
We assembled the IR29 and Pokkali genomes using the most closely related genotypes IR8 and N22 as respective references [36] (Additional Files 6 to 8). IR29 and Pokkali genomes were compared to each other to establish locus synteny across the entire genome. From this comparison, we observed that the differences were not only due to the contraction-expansion of repetitive and transposon loci but also because of microsynteny disruptions driven by rearrangements and expansion-contraction (Additional Files 9 to 11).
We performed a locus-by-locus genotyping-by-sequencing of the recombinant genomes by sequence identity to assign the parental origins of genomic loci based on sequence homology and synteny. Genotypic profiles indicate that the two tolerant recombinants (FL478, FL510) are more similar to each other than to their sensitive siblings (FL499, FL454) as shown by similarity dendgrams in each chromosome (Fig. 2B). Notably, FL510 had the highest allelic contribution from Pokkali at 26,083 loci in comparison to only 23,910 loci from IR29 (Fig. 3A). FL499 and FL454 had the highest contributions from IR29 at 25,808 and 27987 loci, respectively.
IR29 is a longer genome due to expanded repeat and retrotransposon loci, while Pokkali is a shorter genome due to contracted repeat and retrotransposon loci (Fig. 2A). While considerable genome downsizing was evident in all recombinants, there were also significant rearrangements in many loci across all chromosomes. A locus was considered ‘conserved’ when allele-called at high certainty (threshold of ≥ 90% recovery of parental loci) to the parents. A locus was considered ‘IR29-type’ or ‘Pokkali-type’ when the similarity to one parent is 5% higher than the other. ‘Non-parental’ and ‘Unmapped’ loci are attributed largely to size contraction and structural rearrangements that caused extensive disruptions in microsynteny to the point that the loci can no longer be reliably allele-called to either parent (Fig. 3A). It appears that the difference in chromatin sizes between indica and aus may have caused a mild genomic shock that led to downsizing and microsynteny disruption in the recombinant progenies.
We observed a bias in the transmission of shorter loci from either parent to the recombinants. There is a high propensity for the contracted loci from either indica (IR29) or aus (Pokkali) genome to be preferentially passed on to recombinants. This bias is most pronounced in FL510, having the smallest genome of all the recombinants (Fig. 3B). The highest frequency of recombination is also evident in FL510 (Fig. 3C; Additional File 12). The highest magnitude of genome downsizing in FL510 due to repeat and transposable element loci contraction and microsynteny disruption is an outcome of higher recombination frequency and genomic shock effects.
Shedding from the repetitive and retrotransposon loci in recombinants
We scrutinized the loci that were most frequently shed in the recombinant genomes by examining the types of transposons and other sequence motifs that were affected by locus contraction (Figs. 2A, 2C). We mapped the transposons from the IR29 and Pokkali genomes into the recombinant genomes to predict genome exchanges by recombination (Figs. 2B, 3A). Results showed the predominance of copia and gypsy LTR-retrotransposons along with other unclassified subclasses of LTRs and nested retrotransposons in the genomes of both parents and recombinants (Additional File 13). It is also evident that copia and gypsy heavily populated the shed regions. The shedding of copia and gypsy loci are most pronounced in the most downsized genome of transgressive FL510, which is about 10Mb smaller than the genome of Pokkali. Taking that into account, shedding of retrotransposon regions explains about 50% of the downsizing of recombinant genomes, with FL510 having the highest cumulative shedding (Fig. 4B).
In-depth scrutiny of the enrichment of repetitive elements in the shed regions by permutation test showed different magnitudes of losses and retention of parental elements in the recombinant genomes (Fig. 4C). LINEs appeared to be retained across all recombinants while copia from IR29 appeared to be shed at the highest magnitude in FL510. Shedding of gypsy from Pokkali contributed largely to the downsized genome of FL510. Unclassified LTR-type retrotransposons (RLX) had parent-specific pattern of retention, while SINEs are very well retained regardless of parental origin. For DNA transposons, cacta is shed heavily while harbinger/mutator have higher propensity to be retained.
By permutation test, we also examined the enrichment of repetitive elements within the loci that are unique to the recombinants as consequences of new insertion and duplication events. In Fig. 4D, positive values indicate the association with the regions that are unique to the recombinant genomes while negative values indicate that these loci are not contributing to the disturbance of parental substrates in recombinants. LINEs have the propensity to be retained in the recombinant genomes. Interestingly, copia appeared to have high retention in the transgressive FL510, and this is largely contributing to its overall uniqueness relative to its other recombinant siblings. Gypsy is very similar in all recombinants except FL510.
Shedding from transposon loci led to genome-wide hypomethylation
Methylation of cytosine residues is an evolutionarily conserved property of eukaryotic genomes as the primary mechanism for taming the resident transposons. On average, indica genome is about 38% transposons while aus genome is about 46% transposons [36, 37]. DNA methylation is fluid in nature where C-methylation from non-coding and repetitive regions could spread to the periphery or into the body of genic spaces [38, 39]. Cytosine methylation in plants occur in three context, i.e., CG, CHG, CHH, where CG and CHG methylation are predominant in the repetitive/transposon, intergenic, centromeric, and telomeric regions, while CHH is more predominant in the facultatively heterochromatic genic spaces [38]. Having established that recombination between indica and aus led to the shrinkage of non-coding and transposon loci, we addressed the question of whether genome downsizing has significant impacts to DNA methylation.
Bisulfite-seq assembly indicated that the indica-type genome of IR29 and the aus-type genome of Pokkali have widely divergent methylome landscapes in all contexts. Differences in the overall sizes of genomes and their transposon contents translate to significant variation in DNA methylation (Fig. 3A; Additional File 13). There is a general trend of significant hypomethylation in the recombinant genomes in all contexts, with the transgressive FL510 exhibiting the highest magnitude of overall hypomethylation (Fig. 5A). Each recombinant genome has a unique pattern of hypomethylation with FL510 being the most extreme. FL499 had the most similar hypomethylation to FL510, particularly in CG and CHG contexts as indicated by the similarity dendrograms across many chromosomes (Fig. 5A). Notably, FL510 is a clade of its own in CHH context, indicating its unique CG and CHG hypomethylation profiles (Fig. 5B). The loci affected by CHH-hypomethylation were distinct between FL510 and FL499. The unique CHH-hypomethylation profile of FL510 is highly correlated with its most highly recombinant nature. Much of CHH-hypomethylated loci in FL510 also co-localized with non-parental variant loci (Figs. 5B, 5C).
Variation in hypomethylation across the sibling recombinants was lowest in CG context and highest in CHH context. This implies that the hypomethylation signatures of each distinct recombinant have important implications to gene functions by virtue of CHH-methylation polymorphism in genic regions. For instance, when the genic space methylation profiles were compared with respect to all protein-coding genes, transcription factor-encoding genes, and genes in the SalTol QTL, the unique CHH-hypomethylation of FL510 stands out across the upstream regions, gene bodies and downstream regions (Fig. 5D). The most CHH-hypomethylated group of genes in FL510 was enriched with biological processes associated with the regulation of growth and development.
We also compared the small RNA (sRNA) profiles of the leaf transcriptomes to understand the potential significance of sRNA expression to the observed hypomethylation of recombinant epigenomes. Normalized abundances indicate that 24-nt sRNAs are the most predominantly expressed class (Fig. 6A, top panel). This class plays a key role in RNA-directed DNA methylation (RdDM) of the non-coding and retrotransposon-rich regions [39, 40]. Evidently, expression of 24-nt sRNAs was significantly higher in the parents compared to their hypomethylated recombinant progenies. Notably, the transgressive super-sensitive FL499 exhibited the lowest expression. Based on occurrence of CG/CHG/CHH motif in the sRNA sequences, it appears that the 24-nt sRNAs that were most abundantly expressed in the parents are being targeted to genomic regions where CHH-methylation sites are also very high (Fig. 6A, bottom panel). This trend is further reiterated by the lower range of mapped 24-nt sRNA on the CHH-hypomethylated and CHH-hypermethylated loci of FL510. The proportion of genomic loci with reduced 24-nt sRNA and CHH-hypomethylation was highest in FL510 (23%) and most consistent in the most CHH-hypomethylated loci in FL510, indicative of uncoupling to RdDM (Fig. 6B).
Novel chromatin landscapes predicted from Topologically Associating Domains
Methylation and demethylation of cytosine residues have a profound influence on localized interactions along the DNA. The Topologically Associating Domains (TADs) formed by localized chromatin folding are important components of the third dimension of the genome. DNA sequences within a TAD interact more frequently with each other than with other sequences outside of TAD boundaries. TADs also tend to overlap with the units of recombination. Formation of chromatin loops affect the accessibility of cis-regulatory components of transcriptional initiation (enhancers, silencers, insulators) to trans-acting factors (activators, repressors) in both positive and negative manner [41, 42]. In vertebrates, TAD variation has been implicated with developmental innovations [43, 44].
With the observed genome-wide extreme hypomethylation in in the transgressive hyper-salinity super-tolerant FL510, we investigated its potential higher-order consequences by modeling the chromatin configuration with respect to TAD distribution across the twelve chromosomes. We examined the TAD profiles using normalized Hi-C linkages with a common threshold to identify the relative saturation of domains. Results indicate that the TAD profiles of recombinants cannot be attributed directly to the contributions of parental chromatin profiles. Rather, the recombinant epigenomes exhibited various types of uniquely non-parental TAD profiles (Fig. 7A, top panel). The most notable was the extreme and distinctively uniform segmentation of TADs in the transgressive FL510. This distinctive TAD characteristic of FL510 is evident across all twelve chromosomes (Additional Figures Files 14 to 37). Despite the differences in the degree of segmentation, the overall structure of chromatin as measured by the profiles of A/B compartments were quite conserved, implying that TAD segmentation is site-specific (Fig. 7A, bottom panel).
To ensure an effective noise reduction in Hi-C-based TAD prediction, we carried out our inter-genotypic comparison through a ‘four-reference-approach’ instead of just using either IR29 or Pokkali genome as direct references. We used IR29 genome to detect TADs in IR29 and in the other five genotypes, i.e., IR8-parent. In parallel, we used Pokkali to detect TADs in Pokkali and in the other five genotypes, i.e., N22-parent. Moreover, we assembled working references for each of the four recombinants relative to IR29, i.e., IR29-native and Pokkali, i.e., Pokkali-native, and then used them as references to detect TADs in individual genotypes to account for the Hi-C mapping in non-parental segments (Fig. 7B).
Results of this prediction further reiterated the most uniquely segmented nature of TAD profile in FL510, both in terms of relative number and size of TADs (Fig. 7B). More segmentation translates to smaller TADs while less segmentation translates to larger TADs. Higher segmentation of TADs brings higher possibility for uncoupling with insulators, which could mean positive effects to transcription initiation. The two sensitive recombinants FL454 and FL499 had significantly lower levels of segmentation while the tolerant recombinant FL478 is in between (Figs. 7A, 7B). To address the possibility that the extreme TAD segmentation in FL510 was a consequence of the global loss of DNA methylation, we investigated the degree of methylated and unmethylated cytosines around 500-kb windows surrounding the TAD triangles in all of CG, CHG, and CHH contexts (Fig. 7A). Results indicate that the extreme CHH-methylation in FL510 contributed to its highest degree of TAD segmentation (Fig. 7B).
Studies in vertebrates have shown that CTCF-binding sites are highly enriched in TAD borders, and differential binding of trans-acting factors to these sites are methylation-sensitive [45]. Although the CTCF ortholog in plant is not established, CTCF-binding motifs functioning as insulators are predicted to be enriched with gypsy retrotransposons in Arabidopsis [46]. In this context, we examined if the characteristic shedding of gypsy may have contributed to the extreme hypomethylation and TAD segmentation of the FL510 epigenome. Figure 7D shows the propensity for gypsy elements to be enriched in the TAD borders and this trend is consistent with the unique TAD profiles. For example, TAD borders of FL510 are enriched in O. rufipogon-type gypsy from IR29 and O. nivara-type gypsy from Pokkali. TAD borders of FL499 are enriched in O. punctata-type gypsy from IR29 and O. barthii-type gypsy from Pokkali, all of which suggest a direct link of TAD formation to the shedding of retrotransposons in recombinant epigenomes.
Impact of novel TAD signatures to the transcriptome
We classified the different categories of parent-to-progeny modifications in TAD profile. We identified the most predominant TAD categories using both the indica (IR29) and aus (Pokkali) references. The major categories of TAD included: a) ‘Retained’ in which > 90% of parental TAD is identified in the same locus as in the recombinants, which is most enriched in the transgressive FL510; b) ‘Segmented’ in which one parental TAD is associated with multiple TADs in recombinants, which is most enriched in FL510; c) ‘Merged’ is the opposite of segmented and most enriched in FL499; d) ‘Lost’ is a parental TAD that is no longer traceable in recombinants, which is most enriched in FL499; and e) ‘New’ in which the inter-TAD regions in parent are associated with a new TAD in recombinants, most enriched in FL510 (Fig. 8A).
Chromatin state has profound effects to the plasticity and stochasticity of gene expression by virtue of the openness or availability of transcriptional landing pads to trans-acting factors influencing enhancer-gene interactions. While rapid changes in gene activity may contribute to adaptation to various conditions, instability due to hyper-induction could compromise the modulation and economy of steady-state transcription. Under hyper-induction, a significant proportion of the transcriptome is dedicated towards non-coding RNA for post-transcriptional and translational regulation, which is critical to the maintenance of steady-state and homeostatic conditions [47–49].
To understand the impact of TAD modification to the relative stability of transcript abundances in the recombinants, we first identified the gene loci under each TAD category (Additional Files 38, 39). We examined the temporal fluctuation in transcript abundances from one time-point to the next across a series as a measure of relative stability of expression (stochasticity) during a period of 144 hours before and during exposure to hyper-salinity stress (0, 24, 48, 72, 144 hr). FL510 along with FL499 had the most stable and robust transcription profiles (i.e., least fluctuation from one time-point to the next, hence lowest stochastic variability). This is shown by the low relative coefficient of variation for temporal transcript abundance fluctuation for a specific TAD category (Fig. 8B).
Venn diagrams show the shared and unique genes with low stochasticity across recombinants for each TAD category (Fig. 8C). FL510 and FL499 had the highest number of genes with uniquely low stochastic variability. FL510 exhibited the highest proportion of genes with uniquely low stochastic variability (i.e., 76–78% of stable genes are unique to FL510) for the ‘Segmented’ TAD category, which represents its most defining chromatin state at 21% overall enrichment relative to the 16%, 3%, and 3% enrichment in its siblings FL478, FL499, and FL454, respectively (Fig. 8A, 8B).
The other TAD categories that were most predominant in FL510 include the ‘Retained’ and ‘New’ categories at 7% and 4%, respectively (Fig. 8A). The total number of uniquely stable genes was also highest in FL510 for these TAD categories at 76–78% and 44–65%, respectively (Fig. 8C). In contrast to FL510, the ‘Merged’ and ‘Lost’ categories defined FL499 the most at 56% and 7% enrichment, respectively (Fig. 8A). FL499 is the second genotype with the least stochastic variability after FL510 (Fig. 8B). Consistent with these trends, the ‘Merged’ and ‘Lost’ TADs also had the highest percentages of stable genes that are unique to FL499 at 54–55% and 44–74%, respectively (Fig. 8C). Integrated analysis of TAD profile and transcriptome datasets showed a strong positive correlation between TAD enrichment across categories, low stochastic variability of gene expression, and genotype-dependence of the number of uniquely stable genes under the TAD.
Physiological implications of TAD-effects to the transcriptome
Robust gene expression associated by the signature TAD modification in FL510 are generally enriched with functions associated with the regulation of stress-related and growth and development-related responses (Fig. 8D; Additional File 39). These robustly expressed genes are consistent with the observed phenotypic novelty of FL510 [28, 32]. For instance, gene ontologies associated with the regulation of cell cycle, mitosis, and cell wall organization, and the establishment of building blocks for secondary cell walls are significantly enriched in the genomic regions where the signature segmented TAD of FL510 chromatin are also most enriched. Examples include attributes related to plant architecture such as cyclin-related genes (GO:0000079 and GO:0016592), expansins (GO:0009664) and xyloglucan fucosyltransferases (GO:0008107). It appears that the stable expression of these genes under hyper-salinity stress are important contributors to the novel ideotype of FL510, which imparted its novel stress-adaptive morphology and resilience of growth and development under prolonged periods of hyper-salinity stress [28, 32]. The characteristic segmented TAD of FL510 is also enriched with gene ontologies associated with stomatal regulation (GO:0010119; plectin-related genes). We have reported earlier that low stomatal conductance was a defining characteristic of FL510 thereby allowing its ability to efficiently regulate water loss and transpiration [32].
Gene ontologies associated with processes involved in the mitigation of cellular ionic toxicity effects are also significantly enriched in the genomic regions under the segmented TADs of FL510 chromatin (Fig. 8D). These include regulatory genes involved in calcium signaling (GO:0005516; calmodulins), ethylene-mediated stress signaling (GO:0051740), and stress-related kinases (GO:0048544; S-domain receptor kinases). Furthermore, the enrichment of genes associated with urea transport (GO:0071705 and GO:0015204) are also relevant to cellular damage repair mechanisms by virtue of their direct roles in polyamine and proline biosynthesis and maintenance of foliar chlorophyll stability [50, 51]. As expected, genes involved in potassium transport (GO:0015079, high-affinity K+ transporters) are also among those that were enriched in the segmented TAD regions of FL510 chromatin. Taken together, the enrichment of relevant gene ontologies under the characteristic segmented TAD of FL510 chromatin are consistent with the ideotype and physiological novelties of FL510 as previously reported [28, 32, 33].
The transgressive super-sensitive FL499 also exhibited a drastic TAD modification relative to its parents with significant enrichment for the ‘Merged’ category, which is the opposite of the signature ‘Segmented’ TAD of FL510. Genomic regions associated with the ‘Merged’ TADs of FL499 were enriched with gene ontologies associated with trehalose biosynthesis (GO:0005992), response to DNA damage (GO:0006974), response to photo-oxidative damage (GO:0080183), and genes associated with cellular dehydration response including dehydrins (GO:0009627 and GO:0009415). Genes involved in Na+ toxicity response such as K+ transporters (GO:0006813; shaker/shaker-like K+ channels; total of 13 genes) were also enriched (Fig. 8D). While the expression of these genes may imply significance to defense, components related to growth and development that define the adaptive morphology of FL510 appeared to be uncoupled to the defense response components in FL499.