Genome-wide identification of NATs in C. sativa
We sequenced a total of nine samples from three different tissue types using an Illumina Hiseq 4000 platform. In total, 120.31 Gb raw data were produced. The raw data size of each sample ranged from 78.43 Gb to 99.97 Gb, respectively (Table S2). After filtering, a total of 118.39 Gb clean reads were obtained, ranging from 76.89 Gb to 98.47 Gb for each sample.
Then, we optimized a pipeline to predict NATs in C. sativa (Fig. 1), which included three modules. Firstly, in step 1, the clean reads of nine ssRNA-seq libraries were mapped to the C. sativa reference genome using Bowtie2 and StringTie. The mapping rate ranged from 41.4–68.6%. Then, these sequences were assembled into 26,446 to 34,536 transcripts, respectively. Lastly, we merged the transcripts identified from the nine libraries, resulting in 64,272 transcripts.
In step 2, 295 cis- and 1,431 trans- candidate NATs were identified from the merged 64,272 transcripts.
In step 3, these candidate NATs were further evaluated by the following criteria to identify non-coding RNAs from these candidate NATs. These criteria included: the sequence length > 150 bp, length of encoded peptide fragments < 100 amino acids, and the protein-coding potential < 0. A total of 92 cis-NATs and 168 trans-NATs were obtained after this step.
Characterization of NATs in C. Sativa
We found the NATs and their corresponding STs exhibited multiple mapping ratios. In other words, one NAT could match one or more STs and vice versa. In total, 897 SAT pairs were obtained. 92 cis-NATs and 86 STs formed 92 cis-NAT/ST pairs, and 168 trans-NATs and 588 STs formed 805 trans-NAT-ST pairs (Table S3).
We further analyzed the length distribution of cis-NATs and trans-NATs. As shown in Fig. 2A, the length of 92 cis-NATs ranged from191 to 5,069 bp, with an average length of 1,245 bp. In contrast, the length of 168 trans-NATs ranged from 201 to 3,482 bp, with an average length of 999 bp. The most abundant group of cis-NATs and trans-NATs had lengths ranging from 200 to 300 bp, followed by those with more than 1000 bp long.
Next, we analyzed the ratio of the length for overlap regions between the NATs and STs to the total length of NATs and STs. As shown in Fig. 2B, more than half of the NAT sequences were in the overlapping area, and the average ratio of the total length of NATs in the overlapping area sequence was 51.07%. The average ratio of the total length of the STs is 45.37%, indicating the high homology between sequences of the NATs and the STs.
According to the mapping region between cis-NATs and STs, the cis-SAT pairs can be divided into four subtypes (Fig. 2C): divergent, convergent, S > N (ST contains NAT), and N > S (NAT contains ST). Results showed that S > N accounted for the main type, including 35 pairs, which was in accord with previous studies in Arabidopsis thaliana [1], Oryza sativa [13], and Salvia miltiorrhiza [12]. The second most abundant type was divergent, having 32 pairs. For the rest, there were 21 pairs of the convergent and four pairs of N > S, respectively.
Based on expression profiles of NATs in leaves, roots, and stems, the distributions of NATs among different tissues were also analyzed (Fig. 2D). Results showed that 364 (40.6%) NATs were common in the three tissues; 406 (45.3%) NATs, (0.9%) NATs only existed in the roots and stems, respectively.
Functional enrichment analysis of C. sativa NATs
It is generally believed that NATs could play roles by regulating their corresponding STs. To decipher the potential biological functions of the NATs C. sativa, we carried out GO and KEGG functional enrichment analysis for the STs related to the 260 NATs. Results showed that these STs were annotated with GO terms of three categories: biological processes, cell composition, and molecular functions, of which 17 Go terms were significantly enriched (q ≤ 0.05) (Table 1). GO Terms of STs were significantly enriched in the growth and developmental processes for multiple plants, including the auxin metabolic process (GO: 0009850), positive regulation of post-embryonic development (GO: 0048582), lysine biosynthetic process via diaminopimelate (GO: 0009089), the heterocycle catabolic process (GO: 0046700) and defense response to a bacterium (GO: 0042742). It should be noted that GO: 0042742 included 7 STs, ST048, ST072, ST084, ST266, ST345, ST370, and ST598. In addition, STs were also significantly enriched in enzyme activity-related GO Terms, including transferase activity (GO: 0016740), diaminopimelate decarboxylase activity (GO: 0008836), and phosphatidylinositol N-acetylglucosaminyltransferase activity (GO: 0017176), indicating the regulatory effect on the activity of enzymes and the involvement in the biosynthesis of compounds in C. sativa.
Table 1
Functional enrichment analysis of STs corresponding to NATs in C. sativa
Category ID | Subcategory | Term | Number of STs in the category | q value |
GO:0009850 | Biological Processes | auxin metabolic process | 3 | 7.21E-3 |
GO:0042742 | Biological Processes | defense response to bacterium | 7 | 9.53E-3 |
GO:0048582 | Biological Processes | positive regulation of post-embryonic development | 3 | 0.014 |
GO:0009089 | Biological Processes | lysine biosynthetic process via diaminopimelate | 3 | 0.030 |
GO:0046700 | Biological Processes | heterocycle catabolic process | 4 | 0.036 |
GO:0006888 | Biological Processes | ER to Golgi vesicle-mediated transport | 3 | 0.040 |
GO:0006417 | Biological Processes | regulation of translation | 3 | 0.048 |
GO:0016021 | Cellular Components | integral component of membrane | 9 | 2.78E-3 |
GO:0005730 | Cellular Components | nucleolus | 7 | 0.034 |
GO:0005737 | Cellular Components | cytoplasm | 34 | 0.045 |
GO:0003964 | Molecular Functions | RNA-directed DNA polymerase activity | 8 | 8.0E-11 |
GO:0004521 | Molecular Functions | endoribonuclease activity | 9 | 2.58E-8 |
GO:0003676 | Molecular Functions | nucleic acid binding | 13 | 1.6E-4 |
GO:0016740 | Molecular Functions | transferase activity | 8 | 9.4E-4 |
GO:0008836 | Molecular Functions | diaminopimelate decarboxylase activity | 2 | 0.011 |
GO:0017176 | Molecular Functions | phosphatidylinositol N-acetylglucosaminyltransferase activity | 2 | 0.027 |
GO:0015145 | Molecular Functions | monosaccharide transmembrane transporter activity | 2 | 0.038 |
ath00300 | KEGG_Pathway | Lysine biosynthesis | 3 | 0.040 |
ath00563 | KEGG_Pathway | The glycosylphosphatidylinositol (GPI)-anchor biosynthesis | 2 | 0.049 |
KEGG enrichment analysis showed that STs were significantly enriched in the pathways: Lysine biosynthesis (ath00300) and Glycosylphosphatidylinositol (GPI)-anchor biosynthesis (ath00563) (Table 1). As an important signaling molecule, lysine plays a central role in regulating plant growth and responses to the environment (Galili et al., 2001). The glycosylphosphatidylinositol (GPI) is a category of sugar-lipid compounds that enable proteins to be attached to cell membranes. GPI plays an essential role in the fertility of the male and female gametophytes (Desnoyer and Palanivelu, 2020). These results indicated that C. sativa NATs participated in the regulation of biosynthetic pathways of multiple compounds.
Validation of the expression profiles of cis-SAT pairs
To verify the expression profiles of cis-SAT pairs in C. sativa, we use the ssRT-qPCR approach (Kawakami et al., 2011), which could distinguish NATs from STs by adding sequence labels for reverse transcription (Kawakami et al., 2011; Tercero et al., 2019). The 25 cis-SATpairs were selected for verification based on the two following criteria. Firstly, the ST sequence was annotated by Go Terms or enriched in the KEGG pathway. Secondly, the correlation of ssRNA-seq expression profiles between NATs and their corresponding STs was significantly positively related (r ≥ 0.8). Results showed that 46 of 50 (92%) ssRT-qPCR profiles were positively related to ssRNA-Seq (r ≥ 0.8). In particular, 27 of 50 (54%) transcripts had r ≥ 0.9 (Fig. 3, 4). These results showed that the NATs found in this study were reliable and can be used for further analysis.
Significantly differentially expressed NATs among tissues
To explore the potential regulatory roles of NATs, we analyzed the expression levels of NATs in different tissues. In total, 17 DENATs were detected using the threshold of |log2(fold change) | ≥ 1 and q ≤ 0.05 (Table S4). Among them, NAT066, NAT089, NAT139, NAT154, NAT375, NAT573, NAT791, NTA842, and NAT843 expressed higher in the roots than in the leaves. The expression level of NAT897 in the root was higher than in the stem. In contrast, the expression levels of NAT623 and NAT673 in the stems were higher than in the roots. And NAT327, NAT375, NAT578, NAT724, NAT743, NAT783, and NAT893 had higher expression levels in the stems than in the leaves.
Co-expression between NATs and their corresponding STs
To explore whether NATs have a regulatory effect on STs, we further analyzed and verified the correlation between NATs and their corresponding STs expression based on the ssRT-qPCR experiments (Fig. 5). Results showed that in the 25 SAT pairs verified. The expression profiles of two NATs were significantly negatively correlated with the corresponding STs (r≤-0.9). In contrast,13 NATs were significantly positively correlated with the corresponding STs (r ≥ 0.9). The r of 5 NAT-ST pairs ranged from 0 to 0.9, indicating the positive regulation of these NATs.
Identification of miRNA targeting the overlapping regions in the SAT pairs
Previous studies showed that NATs could stabilize the STs by forming an RNA duplex to reduce the degradation of STs by miRNA (Wang et al., 2016). To explore the crosstalk between miRNA, NATs, and their corresponding STs in C. sativa, we predicted the target sites of known miRNAs in the overlapping regions of the SAT pairs were predicted using the psRNATarget server. The overlapping regions of seven SAT pairs were identified to be targeted by miRNAs (Table 2). Moreover, the relative expression levels of these seven NATs, including NAT002, NAT006, NAT020, NAT024, NAT033, NAT037, NAT061, NAT076, and their corresponding STs were significantly positively correlated in the three tissues with r ≥ 0.9. Taking it together, we proposed that these NATs might interact with their STs to form RNA duplex, inhibit miRNAs to cleavage the STs, and enhance the stability of STs.
Table 2
miRNA targeting the overlapping regions of SAT pairs
ST ID | MiRNA | Start | End | Aligned MiRNA Fragment | Alignment a | Aligned ST Fragment | Inhibition |
ST0002 | gra-miR8639a | 24 | 52 | UAAACAUAAGUAGAAUUAAACAAG | .::: :: : :.::.:::::::: | UUUGAUUUACUUUAUUUAUGUUUU | Cleavage |
ST0006 | CPA-miR8154 | 1 | 22 | CAGAGGAGGAGAUGAAGAGGGA | :.:.:::::.:::.:: .:.:: | UUCUUCUUCGUCUUCUGUUUUG | Cleavage |
ST0020 | hme-miR-14 | 1 | 22 | UCAGUCUUUUUCUCUCUCCUAU | ::::::::.:.::. ::: | GAGAGAGAGAGAGAGAGGGUGA | Cleavage |
ST0024 | gma-miR1526 | 1 | 22 | CCGGAAGAGGAAAAUUAAGCAA | .::::::::..:.::..:: | UAAUUUAAUUUUUUUUUUUUGG | Cleavage |
ST0033 | gra-miR7504i | 1 | 21 | AUAUGAAACUGAGAUACCAUG | : : ::::.::: :::::::: | CCUCGUAUUUCACUUUCAUAU | Cleavage |
ST0037 | osa-miR2919 | 1 | 19 | AAGGGGGGGGGGGGAAAGA | :::::: .:.::.:.:.:: | UCUUUCGUCUCCUCUCUUU | Cleavage |
ST0061 | mtr-miR2597 | 1 | 21 | UUUGGUACUUCGUCGAUUUGA | ::::::::.::.:::: :.: | ACAAAUCGAUGAGGUACAAGA | Cleavage |
Conservation of C. sativa NATs
To reveal the conservation of C. sativa NATs, the 260 NATs were compared with the plant lncRNA database (Jin et al., 2013), the Green Non-Coding Database (Di Marsico et al., 2022), and lncRNAs reported in hemp (Wu et al., 2021) using BLAST with e-value 1e-5 and alignment length > 50% of the total length. Results showed that one NAT (TCONS_00048050) was homologous to Tdicoccoides_TRIDC3AG050040.2 and Tdicoccoides_TRIDC3AG050040.2 in the public database (Table S5). Moreover, 15 NATs were aligned with previous reported 24 lncRNAs in C. sativa variety ‘Yunma No.1’ (Wu et al., 2021), which indicated these NATs might play similar roles among these two cultivars of C. sativa (Table S5).
Identification of nat-siRNAs
SAT pairs usually form dsRNA in vivo, which could lead to dsRNA-dependent siRNA-mediated gene silencing in some cases (Zhang et al., 2012). In this study, we found that 12 cis- and 278 trans- SAT pairs could produce siRNAs, respectively. In total, 476 cis- and 2342 trans- nat-siRNAs were identified (Table S6). We then determined the functions of these siRNA-producing STs. In total, 233 STs were annotated as being potentially involved in metabolites transfer or transport, growth and development, DNA repair, regulating gene expression, disease resistance, signal transduction, flavanone biosynthesis, etc. (Table S6). For example, some STs (RNA-XM_030625573.1) encode 2-oxoglutarate-dependent dioxygenase (DAO), which functions in the flavonoid pathway by catalyzing hydroxylation and desaturation reactions (Wang et al., 2021). Some other STs (RNA-XM_030622242.1, RNA-XM_030631242.1, RNA-XM_030646986.1) encode FAR1-related sequence proteins and play crucial roles in the growth and development of Arabidopsis (Ma and Li, 2018).
Prediction and validation of nat-siRNAs targets
Target prediction showed that a total of 9,653 transcripts were found potentially to be targeted by 2,422 nat-siRNAs (Table S7). KEGG pathway analysis exhibited that these targets were involved in multiple metabolisms, including cysteine and methionine metabolism (ko00270), purine metabolism (ko00230), and pyruvate metabolism (ko00620); genetic information processing, including ribosome (03010), spliceosome (ko03040), and protein processing in the endoplasmic reticulum (ko04141); signaling and cellular processes including MAPK signaling pathway - plant (ko04016), plant hormone signal transduction (ko04075), and plant-pathogen interaction (ko04626) (Table S8). Next, we are concerned with the CBD-, FA- and cellulose- related Nat-siRNAs. 15 Nat-siRNAs targeted genes encoding the acyl-activating enzyme, 1-deoxy-D-xylulose-5-phosphate synthase, dehydrodolichyl diphosphate synthase, or geranylgeranyl diphosphate reductase. These enzymes are involved in CBD biosynthesis. Two Nat-siRNAs targeted four acetyl-CoA carboxylase genes of the Fatty Acids biosynthetic pathway. 120 Nat-siRNAs targeted cellulose biosynthesis-related genes, including cellulose synthase A, cellulose synthase-like protein, kinesin-like protein, mitogen-activated protein kinase kinase kinase, inositol polyphosphate 5-phosphatase or UDP-galactose transporter (Table S9).
It is known that miRNA/siRNA usually cleaves target mRNA at the 9-11th nucleotide from its 5′ ends, occasionally in other site (Wu et al., 2021. Degradome sequencing, a modified approach of 5’-Rapid Amplification of cDNA Ends using high-throughput sequencing technology, is a practical approach to detecting cleaved miRNA/siRNA targets (Addo-Quaye et al., 2009). In this study, a total of 718 targets were found to be targeted by 543 nat-siRNAs based on the degradome sequencing data (Table S10).