eccDNAs are derived from diverse region across the rice genome
Rice (Oryza sativa ssp. japonica cv. Nipponbare) shoot tissues were collected from eight treatments with two biological replicates each to isolate eccDNAs. Optimal nutrient growth stages were set for 1 (Ctrl_D1), 3 (Ctrl_D3), 7 (Ctrl_D7), and 14 days (Ctrl_D14). Low N treatments were set for 3 (LN_D3) and 7 days (LN_D7), and low P treatments for 7 (LP_D7) and 14 days (LP_D14) (Supplementary Fig. 1a-b; see METHODS). After removing contaminant chromosomal DNA, extracted eccDNAs were purified and amplified by random Rolling Circle Amplification (rRCA) to generate high-quality libraries for Oxford Nanopore long-read sequencing (Supplementary Fig. 1c, see METHODS). Arbitrarily primed-alike PCR (apLike PCR) to confirm the total removal of linear DNA digestion and EcoR1 digestion of rRCA products to confirm the isolation and amplification of eccDNAs (Supplementary Fig. 2a-b). Samples of circular DNA lacking linear DNA were used forOxford Nanopore sequencing libraries. Nanopore sequencing data was quality controlled (Supplementary Data 1) and analyzed as described in the METHODS section and Supplementary Fig. 3a. After evaluating the bioinformatic tools CIDER-Seq224, ECCsplorer25, ecc_caller26 and ecc_finder27 for eccDNA detection, we found that ecc_finder provides the best bioinformatic framework for the precise identification of eccDNAs from long-read Nanopore sequencing data. Long reads facilitate eccDNA characterization compared to Illumina short-read sequencing, which would need substantial improvement to identify eccDNAs from repeated loci of large genomes (Supplementary Fig. 3b).
Analysis with ecc_finder showed that the size distribution eccDNAs ranged from 200 bp to 37 kbp, with all eccDNAs having a minimum size of 200 bp due to ecc_finder's identification criteria (Fig. 1a; Supplementary Fig. 3b). Although a few eccDNAs were over 37 kb, the average size across all the treatments was approximately 500 bp. This size distribution was similar to that observed in other organisms, such as human cells11, Arabidopsis15, and the short-read sequencing analysis from rice17 (Supplementary Fig. 4a). It is worth mentioning that there were no significant statistical differences in size distribution between the treatments (Supplementary Fig. 4b). Based on chromosomal distribution (Fig. 1b), eccDNAs were widely distributed across the rice genome, with the highest eccDNAs density in pericentromeric and centromeric regions (Fig. 1c). Furthermore, these high-density eccDNA regions also overlap with areas of high transposon elements (TEs) and DNA methylation density.
To assess the location of eccDNAs in the genome, we merged eccDNAs with overlaps of over 200 bp from each sample and assigned them a single ID number (Supplementary Fig. 5a). From the initial 140619 eccDNAs, we obtained 96757 merged eccDNAs with unique IDs (Supplementary Fig. 5b). This ID-eccDNAs were located into different genomic features, namely genes (5'UTR, CDS, 3'UTR), 2 kb upstream of the genes (up2kb), 2 kb downstream of the genes (down2kb) and intergenic regions. 30% of ID-eccDNAs overlapped gene regions (ecGenes) (Fig. 1d), of which 267 covered full-length genes. 15.8% ID-eccDNAs mapped to up2kb, 8.8% to down2kb, and 45.1% to intergenic regions. A more detailed analysis of genes showed that 40.1% overlapped to CDS regions (Fig. 1g), 18.9% to 3'UTRs, and 8% to 5'UTRs.
When mapping to transposon elements or non-coding RNAs, 51.2% ID-eccDNAs mapped to TEs (ecTEs) (Fig. 1e) and 40.3% to non-coding RNAs (ecNon-codingRNAs) (Fig. 1f). Although most eccDNAs mapped to single genomic features, 3504 originated from regions encompassing genes, TEs, and non-coding RNAs (Fig. 1h) and 5668 to regions covering up2kb, TEs, and non-coding RNAs (Fig. 1i). Additionally, we identified 2843 eccDNAs covering a region spanning from the down2kb, TEs and non-coding RNA regions (Fig. 1j).
All the data in Fig. 1 demonstrate that the rice eccDNAs landscape is widely spread across the rice genome, with ecTEs mainly originating from centromeric regions.
Gene-overlapped eccDNAs show dynamic changes during rice growth
To explore the dynamic changes of eccDNA production during rice growth, we investigated gene-overlapped eccDNAs (ecGenes) at three stages of rice development in optimal growth conditions: short-term (Ctrl_D1 vs. Ctrl_D3), mid-term (Ctrl_D1 vs. Ctrl_D7) and long-term (Ctrl_D1 vs. Ctrl_D14). After performing GMPR normalization and EMDomics differential analysis, we identified ecGenes as differentially enriched if their q-value < 0.05 (see METHODS). We further classified differential ecGenes as exclusive if they were present in both biological replicates of one treatment and absent in the two replicates of the other and as differential if the number of reads for one ecGene was higher in one treatment than the other. We classified differential and exclusive ecGenes based on analysis of Gene Ontology (GO) enrichment categories and summarized using REVIGO (see METHODS). When comparing GO results of differential + exclusive ecGenes with exclusive ecGenes alone, we found no substantial differences in the total counts or the GO enrichment categories (Supplementary Fig. 6; Supplementary Data 2), reflecting that exclusive ecGenes were the most statistically significant.
In the short-term optimal growth stage, we identified 863 exclusive ecGenes, of which 230 were unique to Ctrl_D1 and 633 to Ctrl_D3 (Fig. 2a). We performed GO enrichment of exclusive ecGenes from both samples to capture eccDNAs with positive and negative effects on specific processes. GO enrichment analysis on these 863 ecGenes (Fig. 2d) revealed enriched categories included reproductive system development (2.86-fold) and developmental process involved in reproduction (2.83-fold), as well as categories associated with metabolic processes, among which N compound metabolic process was significantly represented with 177 enriched ecGenes (Supplementary Data 3.2). Notably, there is significant enrichment in the regulation of short-day photoperiodism/flowering category, exhibiting the highest fold enrichment value of 20.19 (Supplementary Data 3.1). This category includes ecGenes mapping to OsCO3, OsMADS51, and SDG718, all of which play important roles in regulating short-day flowering in rice28,29,30.
In the mid-term optimal growth phase, we identified 331 exclusive ecGenes, with 210 unique to Ctrl_D1 and 121 unique to Ctrl_D7 (Fig. 2b). GO enrichment analysis of these 331 ecGenes revealed four enriched categories (Fig. 2d). Cellular process category, was observed in both the short-term and mid-term growth phases. The lysyl-tRNA aminoacylation category had the highest fold enrichment value (> 100), which may indicate the role of eccDNAs in regulating the mechanisms of transferring activated amino acids to the 3'-OH group of lysine-accepting tRNA in rice (Supplementary Data 4). It is worth mentioning that unlike the developmental pathways enriched in the short-term growth phase, the flower development category was uniquely identified in the mid-term phase, suggesting that ecGenes dynamics respond to specific developmental processes along initial growth.
In the long-term optimal growth stage, we identified 404 exclusive ecGenes, with 182 unique to Ctrl_D1 and 222 to Ctrl_D14 (Fig. 2c). GO enrichment analysis of these 404 exclusive ecGenes (Fig. 2d) revealed that the N compound metabolic process category was enriched at this stage. However, it had fewer enriched ecGenes (81) than in the short-term growth stage (177). The DNA damage response pathway had the highest fold enrichment value (4.44), suggesting a potential link between the mechanisms by which eccDNAs originate and DNA damage during long-term growth phase31.
The number of exclusive ecGenes in different growth stages showed a significant fluctuation, decreasing sharply from the short-term to mid-term growth stage but then increasing from the mid-term to the long-term (Fig. 2a-c). Compared to the short-term growth stage, the number of significantly enriched pathways was less in the mid-term and long-term growth. However, specific pathways showed a higher fold enrichment value in the longer growth phases. Among the GO enrichment on cellular components, the cellular anatomical entity category was identified in all growth stages with similar fold enrichment values (1.31 in short-term growth, 1.36 in both mid-term and long-term growth) (Supplementary Fig. 7b). Moreover, in the long-term growth stage, transferase complex was observed as the category with the highest fold enrichment (3). Thus, our GO results showed that ecGenes are enriched specifically in different growth stages, suggesting a role of ecGenes on rice development.
We validated one ecGene detected in all growth stages by inverse PCR and SANGER sequencing (SANGERseq) in both 7-day and 14-day control samples (Fig. 2f-g). This ecGene originated from the Os06g0651100 locus region and covered most of the first intron (Fig. 2e).
Gene-overlapped eccDNAs respond to N stress
N deficiency is the most prevalent environmental stress affecting plants in natural ecosystems32. According to our previous research33, N concentration in shoot tissue usually starts to decrease rapidly on the third day after LN treatment, and a stronger transcriptional response to N starvation is activated after seven days of treatment. To investigate the role of eccDNAs under N limitation, we conducted low nitrogen (LN) treatments on rice plants in a short treatment of 3 days (Ctrl_D3 vs. LN_D3) and a long-term of 7 days (Ctrl_D7 vs. LN_D7). As for the analysis of developmental stages, no statistical differences in GO-enriched categories were found when differential eccDNAS were added to exclusive ecGenes for LN treatments (Supplementary Fig. 8; Supplementary Data 5), suggesting that de novo formation rather than an increase in the abundance of ecGenes differentiate growth stages or responses to stress treatment.
During short-term LN treatment, we identified 2424 exclusive ecGenes, among which 639 were only detected in Ctrl_D3, and 1785 were specific to LN_D3 (Fig. 3a). GO enrichment analysis of these ecGenes showed that N compounds metabolic process category was significantly enriched with 443 ecGenes (Supplementary Data 6.1; Fig. 3c). When compared to the same category during short-term optimal growth, the count of enriched ecGenes increased from 177 in optimal growth to 443 in LN treatment. Furthermore, among these 443 enriched ecGenes, we highlight several with known functions related specifically to N stress (Supplementary Data 6.1). Interestingly, we also identified the N compound transport category (Supplementary Data 6.2). These GO enrichment results indicate that in response to LN stress, ecGenes abundance shifted from a potential developmental role to a more specific N starvation response. In addition to pathways related to N stress, we also found enriched categories in the phosphate-containing compound metabolic process, which is related to phosphorus metabolism and “response to stimulus” and “response to stress”, which suggested a relationship between ecGenes and a general response to abiotic stress.
In the long-term LN treatment, we identified 1033 exclusive ecGenes, among which 101 were specific to Ctrl_D7, and 932 for LN_D7 (Fig. 3b). GO enrichment analysis of these 1033 exclusive ecGenes showed that N compound transport category was still enriched (Fig. 3d; Supplementary Data 7), suggesting the roles of ecGenes in the long-term N stress. Interestingly, the chloroplast protein-transporting ATPase activity category was extremely enriched in molecular function with the highest fold enrichment value of 45.34 (Supplementary Fig. 9a). This indicated that ecGenes were involved in chloroplast-related molecular functions to respond to the relative long-term N stress.
Compared to the short-term LN treatment, the number of exclusive ecGenes in long-term samples decreased significantly. However, the proportion of exclusive ecGenes detected only in LN_D7 (932/1033) is much higher than in LN_D3 (1785/2424). Even though fewer categories from biological process were enriched during the long-term N treatment, the N compound transport was still statistically enriched, revealing that specific ecGenes are produced in response to LN along different growth phases.
We validated one ecGene that shows a significant induction in the long-term LN treatment, overlapping with OsNPF2.4 (Os03g0687000) (Fig. 3e), known of whose expression level in old leaves is induced by N starvation and plays an important role in low-affinity nitrate acquisition and long-distance transport34. Inverse PCR and SANGERseq analyses showed that this ecGene is clearly detected in 7-day LN sample and had the same sequence as that originally obtained from our eccDNA sequencing (Fig. 3f).
Gene-overlapped eccDNAs respond to P stress
Besides N deficiency, plants commonly face P deficiency in most soils due to low P availability35. Considering the lower demand of P than N at early tillering stages, rice takes more days to respond to low phosphorus (LP) treatment at the global transcription and phenotypic levels36. According to transcriptome analysis, the initial significant response of major P starvation-related genes in rice was observed 7 days after treatment36. Thus, the short-term LP treatment was set at 7 days (Ctrl_D7 vs. LP_D7), and the long-term LP treatment at 14 days (Ctrl_D14 vs. LP_D14).
In the short-term LP treatment, we identified 556 exclusive ecGenes, among which 91 were specific to Ctrl_D7 and 465 to LP_D7 (Fig. 4a). GO enrichment analysis (Fig. 4c) identified three highly enriched categories among those categorized as biological process: tissue development (fold enrichment value = 5.87), meristem maintenance (fold enrichment value = 8.52), and meristem development (fold enrichment value = 10.6). These categories highlight the potential role of ecGenes in the plant´s developmental response to P deficiency, which is known to alter root and shoot architecture36. Additionally, we identified the N compound metabolic process category, encompassing 100 ecGenes (Supplementary Data 9.1). Interestingly, despite the absence of P-related categories in biological process, we highlight the monoatomic cation transmembrane transporter activity category, enriched in molecular function (Supplementary Fig. 11a). This finding suggests that eccDNAs may play a role in modulating membrane transporter activity in response to the short-term LP treatment (Supplementary Data 9.2).
In the long-term LP treatment, we identified 348 exclusive ecGenes, of which 206 were only detected in Ctrl_D14 and 142 in LP_D14 (Fig. 4b). GO enrichment analysis (Fig. 4c) identified again the N compound metabolic process category, encompassing 72 ecGenes (Supplementary Data 10.1). We identified the response to stimulus category as the general response to abiotic stress. Worth noticing is the highest fold enrichment for the regulation of macroautophagy category with a value over 100 (Supplementary Data 10.2). This category suggests an important role of genes in modulating the frequency, rate, or extent of macroautophagy during long-term LP treatment. Within the enrichment of cellular component, we identified a high fold enrichment (43.97) for phosphatidylinositol 3-kinase complex (Supplementary Fig. 11b). This indicates that during long-term LP treatment, ecGenes actively participate in modulating the expression of the phosphatidylinositol 3-kinase (PI3K) complex (Supplementary Data 10.3).
Comparing long-term LP treatment to short-term treatments, we found a decrease in the count and proportion of exclusive ecGenes detected only in LP samples. However, the number of categories in GO enrichment of biological process and cellular component did not change significantly. Most categories related to metabolic process were enriched in both short-term and long-term LP treatment. However, specific pathways were specifically enriched for each of the two low Pi treatments (Fig. 4c). Generally, the variation and abundance of ecGenes in response to LP treatment were not as pronounced as in LN, regardless of duration of the P-starvation treatments. Nevertheless, there is still a clear and specific ecGene response to limited P supply in rice. Moreover, based on the dynamics observed in GO enrichment results, we suggest that extending the duration of the stress treatment could alter the set of ecGenes produced as a response.
We performed inverse PCR, SANGER sequencing, and KBseq (based on Illumina platform) on one of the differential ecGenes identified in the long-term LP treatment spanning nearly the entire length of OsACP1 (Os01g0720400) (Fig. 4d), which has been previously reported to play a role in rice P starvation37. Our results validated the presence of this ecGene in 14-day LP sample (Fig. 4e).
Nutritional stresses lead to variation in TE-overlapped eccDNAs in rice
The relationship between eccDNAs and transposon elements (TEs) has been recently studied in several organisms, especially plants, such as Arabidopsis and Potato14,22,23. We quantitatively analyzed eccDNAs overlapped to transposable elements (ecTEs) and full-length repeat units (full-length ecRepeatUnits) across all treatments. We identified 35,485 ecTEs and 6,866 full-length ecRepeatUnits from the 16 samples spanning 8 treatments (Supplementary Fig. 12a). Among the ecTEs, 611 were common to all treatments (Supplementary Fig. 12b).
Over 70% of ecTEs were DNA transposons, with more than 60% belonging to the Zator and hAT families (Fig. 5a, b). The proportion of DNA transposons in ecTEs did not change significantly during short-term and mid-term optimal growth (Fig. 5b). However, under LN treatment, we highlight a substantial change in the ratio of DNA transposons to retrotransposons (Fig. 5b, d). The ratio of retrotransposons increased from 12% in unstressed controls to 21.27% in the short-term LN treatment and 26.72% in the long-term LN treatment. This increase was more prevalent for the Gypsy family of LTRs (Fig. 5c, d). A similar trend was also observed for the LP treatments, in which the proportion of retrotransposons among ecTCs increased from 12–17.47% in LP_D7 and 20.94% in LP_D14 (Fig. 5d). These suggest that both LN and LP treatments induce an increase in the ratio of retrotransposons in ecTEs, with greater induction in longer treatments.
Among the full-length ecRepeatUnits, 68 were common to all treatments (Supplementary Fig. 12c). Then we focused on the main DNA transposon families, especially Stowaway, which makes up about 2% of the rice genome38,39. We characterized 364 full-length ecRepeatUnits from Stowaway family. The count of Stowaway- full-length ecRepeatUnits was higher in all nutrient-stressed samples (226) than in all controls (138) (Fig. 5e-g). However, prolonged nutritional stresses decreased the count of Stowaway units (from 134 in LN_D3 to 19 in LN_D7, and from 47 in LP_D7 to 4 in LP_D14). We identified 26 out of the 36 Stowaway subfamilies in the ecRepeatUnits from all treatments (Supplementary Fig. 12d). Both LN and LP treatments affected the number of subfamilies and the average reads per subfamily. Samples at 3 days had the highest count of subfamilies (20 in Ctrl_D3, 24 in LN_D3), followed by samples at 7 days (12 in Ctrl_D7, 13 in LN_D7, 16 in LP_D7), and 14 days (5 in Ctrl_D14, 3 in LP_D14), revealing a negative correlation between prolonged treatment or developmental stage with the number of Stowaway subfamilies (Fig. 5e-g).
In addition to Stowaway, we identified another DNA transposon family, Kiddo, which appeared exclusively in both LN and LP treatment samples at 3 and 7 days (Fig. 5e-g). Previous studies suggest Kiddo might still be active in rice genome40. These results highlight the dynamic nature of full-length ecRepeatUnits linked to DNA transposon families in response to nutritional stresses and treatment duration, particularly emphasizing the prevalence and potential activity of the Stowaway and Kiddo elements in eccDNAs.
Identification and validation of multiple-fragment eccDNAs in rice
Single fragment is typically the predominant eccDNAs class found in all organisms examined to date11 and in this study (Fig. 6a). However, a fraction of eccDNAs have sequences derived from different genome regions, either with two or more non-contiguous fragments from the same chromosome or two or more sequences from different chromosomes. Here, we named this kind of eccDNAs multiple-fragment eccDNAs (MF-eccDNAs) as previously defined11. To complement the comprehensive understanding of rice eccDNAs (Supplementary Fig. 3a, c), eccDNA_RCA_Nanopore41, a tool reported with good performance in MF-eccDNA detection in human cells11, was used for the identification of rice MF-eccDNAs. eccDNA_RCA_Nanopore was specifically designed to detect cross-chromosomal or far-distance junctional reads from long-read sequencing, unlike other tools41. When comparing ecGenes identified by ecc_finder and eccDNA_RCA_Nanopore, 91.4% were detected by both pipelines, which validated the reliability of our eccDNA identification. Based on eccDNA_RNA_Nanopore analysis, we observed that 91% of eccDNAs consisted of only one fragment (Nfragment = 1) (Fig. 6a), while the remaining 9% were composed of two to ten non-contiguous DNA regions (Nfragment ≥ 2).
We mapped all MF-eccDNA regions to the rice genome (Fig. 6a and see METHODS) and grouped the results by core gene IDs and the number of fragments (Nfragment). The origin of sequences present in MF-eccDNA was widespread in the rice genome. One notable example is a set of MF-eccDNAs ranging from 3 to 5 fragments containing the common core gene ID Os04g034305 (cytochrome c oxidase subunit 2). This set of MF-eccDNAs had a high load in LN_D7 (Fig. 6c-d, h). The 5-fragment eccDNAs of this set were validated by inverse PCR and Whole Plasmid Sequencing for both 7-day LN and 14-day LP samples (Fig. 6f-g).
We also identified another set of MF-eccDNAs with a putative cytochrome P450 as the core gene (Os05g0372300), composed of 2 to 7 fragments, with 5-fragment being the most common variant (Fig. 6e), which was more prevalent in both LN_D7 and LP_D14 (Fig. 6c). It is also important to note that we did not identify MF-eccDNAs with Os04g0343050 in any Ctrl_D3 or Ctrl_D7. Similarly, MF-eccDNAs with Os05g0372300 were absent in our Ctrl_D1 sample. Interestingly, these two sets of MF-eccDNAs always had an LTR element on one side and a DNA transposon element on the other (Supplementary Data 11.1, 11.2), suggesting possible mechanisms for their origin based on the function of TEs in rice.
On the other hand, two sets of MF-eccDNAs with Os01g0791033 or Os12g0423313 as core genes appeared in all samples, predominantly composed of 2-fragment (Supplementary Fig. 13a-c). As shown in the IGV snapshots (Supplementary Fig. 13f-g), some of the two fragments were next to each other and surrounded by several LTR elements from the same subfamilies (Supplementary Data 11.3, 11.4), suggesting an LTR-related homologous recombination (HR) mechanism in MF-eccDNA formation13 (Supplementary Fig. 13d-e). An increased count of these MF-eccDNAs during longer growth (Supplementary Fig. 13a; Fig. 6c) suggests that an extended optimal growth phase leads to the accumulation of specific MF-eccDNAs. WE also observed an increase in the number of reads of these two MF-eccDNA sets in LN_D3 and LN_D7 (Supplementary Fig. 13a; Fig. 6c), being much higher in LN_D7. While not as pronounced as in LN stress, we observed that MF-eccDNAs repertoire and abundance are modulated by P status in rice (Fig. 6c). These results reveal nutrient stress leads to the accumulation of MF-eccDNAs, highlighting the dynamic rice genomic responses to nutritional signals.
ATAC-seq effectively validates high-density regions of rice eccDNAs
To further validate the high eccDNA density regions in rice, we performed the Transposase-Accessible Chromatin with high-throughput sequencing (ATACseq) on Ctrl_D7, LN_D7 and LP_D7. As reported in other organisms, libraries prepared for classic ATACseq may also cover circular DNA molecules42. However, due to the lack of enrichment for circular DNA molecules, ATACseq only allows the detection high-density regions of eccDNAs. Although ATACseq has been previously utilized for eccDNA detection in animal cells43,44, its application in plants has not been explored.
After applying the short-read mode from ecc_finder on ATAC-seq data, 14 regions were identified in our analysis (Fig. 7a). These 14 regions were characterized as high-density regions of single-fragment eccDNAs (ecc_finder) or MF-eccDNAs (eccDNA_RCA_Nanopore). Several mapped inside or near centromeres, which have been according to our results are high-density eccDNA regions in rice (Fig. 6i; Fig. 1c). Seven regions did not exhibit significant differences between treatments (Fig. 6j). However, the read number of eccDNAs such as those overlapping with chr01:39110652–39128615 or chr11:11895489–11900602, increased substantially in LN_D7 when compared to Ctrl_D7 or LP_D7 (Fig. 7b).
Since Kumar et al.42 and Kang et al.44 reported the potential of ATACseq to identify eccDNAs, this technique has been validated for human cancer cell lines and rice in our work. Thus, this innovative approach provides valuable insights into eccDNA dynamics in plants and opens new avenues for future research.