Identification of novel phasiRNA biogenesis pathways in Oryza sativa
The sRNA HTS datasets from different rice samples were employed as inputs and rice cDNA sequences as alignment reference for searching PHAS loci capable of producing 21-nt or 24-nt phasiRNAs. As a result, fourteen 21-nt and nineteen 24-nt PHAS loci candidates passed through the filtering procedures as well as the corresponding searching of sRNA triggers for phasiRNA production. (Additional file 1:Table S1, Additional file 2: Table S2). Recent reports discovered that, the processing of 21-nt phasiRNAs mainly dependent on OsDCL4, and OsDCL3 is necessary for biogenesis of 24-nt phasiRNAs in rice[20].Therefore, we evaluated the abundance of 21-nt and 24-nt phasiRNAs generated from potential PHAS loci candidates by comparing the wild-type with osdcl4 knockdown mutant (osdcl4-1)[26](for 21-nt phasiRNAs) and osdcl3 knockdown mutant (osdcl3-1)[20](for 24-nt phasiRNAs), respectively.
As a result, five novel 21-nt PHAS loci and five novel 24-nt PHA loci along with their corresponding miRNA/sRNA triggers were identified. The information of PHAS loci gene ID, miRNA/sRNA trigger sequences, the miRNA/sRNA trigger binding sites and their mediated cleavage sites on PHAS loci which identified based on degradome-based HTS data (see details in “methods”) were listed in Table 1.The cleavage signature which detected by degradome sequencing, the abundance of phasiRNAs which generated from each phasing registers on the corresponding PHAS loci were profiled in Figure 1 and Figure 2. The abundance of phasiRNAs generated from these newly found 21-nt and 24-nt PHAS loci in wild type were relatively higher than that in osdcl4-1 mutant and osdcl4-1 mutant, respectively (Additional file 3: Figure S1 and Additional file 4: Figure S2).
Previously, we found lots of sRNAs generated in three-week-old seedling. Here, we tested whether these sRNA are phasiRNAs by using our mining method. As we expected, the phasiRNAs generated from two novel 21-nt PHAS loci, (LOC_Os01g57968.1(Gene ID of an expressed protein) and LOC_Os05g43650.1(Gene ID of an expressed protein)) and four novel 24-nt PHAS loci (LOC_Os02g20200.1(Gene ID of a retrotransposon protein), LOC_Os02g55550.1 (Gene ID of a F-box/LRR-repeat protein 14), LOC_Os04g45834.2 (Gene ID of DUF584 domain containing protein) and LOC_Os09g14490.1 (Gene ID of TIR-NBS type disease resistance protein)) were identified in three-week-old seedling tissues (Table 1).
Since the triggering of phasiRNAs are sometimes tissue specific and stress dependent, a serial of sRNA HTS datasets of different rice samples were employed for mining novel PHAS loci in different tissue and stress condition. As shown in Figure 1, transcripts of 21-nt PHAS loci, LOC_Os02g18750.1 (Gene ID of an expressed protein) and LOC_Os04g25740.1 (Gene ID of an expressed protein), were able to produce 21-nt phasiRNAs in panicle under normal culture condition. LOC_Os06g30680.1(Gene ID of WD domain, G-beta repeat domain containing protein)-derived 21-nt phasiRNAs and LOC_Os01g37325.1(Gene ID of retrotransposon protein)-derived 24-nt phasiRNAs were detected in panicle both under normal and drought stress culture condition.
Two known 21-nt PHAS loci (LOC_Os12g42380.1(Gene ID of an expressed protein) and LOC_Os12g42390.1(Gene ID of hypothetical protein)) were also uncovered by our screening procedure (Additional file 1: Table S1), which have been identified as two parts of a long non-coding RNAs[27]. LOC_Os12g42380.1-derived phasiRNAs were detected in both seedling and panicle under normal, drought and salinity stress conditions, and in shoot they were only detected under salinity stress. LOC_Os12g42390.1-derived phasiRNAs were detected in shoot under normal culture condition, and panicle under drought culture condition. These results implied there are three alternative phasiRNA production regions within their lncRNA PHAS loci, their capability of phasiRNA production varies in different development stages and stress conditions.
To note, to our knowledge, for all these newly found PHAS loci, only the biogenesis of LOC_Os04g25740.2-derived phasiRNAs were triggered by a known miRNA, miR2118f. The rest of phasiRNAs were generated from first-time discovered PHAS loci, and were triggered by novel sRNAs (Table 1), which suggested that, these phasiRNA biogenesis pathways are not belong to the miR2118 or miR2257 mediated regulatory networks.
Table 1 Novel PHAS loci in Oryza sativa
Gene ID of the PHAS loci
|
Gene annotation
|
PhasiRNA production region
|
sRNA trigger ID
|
sRNA trigger sequence
|
Binding sites of sRNA trigger on gene of PHAS loci
|
Cleavage sites discovered by degradome on gene of PHAS loci
|
21-nt PHAS loci
|
LOC_Os01g57968.1
|
expressed protein
|
361-1765
|
OSsRNA-1
|
GCUUUUUUGAACUUUUUCAUU
|
424-444
|
435
|
LOC_Os02g18750.1
|
expressed protein
|
188-920
|
OSsRNA-2
|
UUUUUUGGCAUUCUGUAACUUG
|
176-197
|
188
|
LOC_Os04g25740.1
|
expressed protein
|
1908-2159
|
osa-miR2118f
|
UUCCUGAUGCCUCCCAUUCCUA
|
1875-1896
|
1887
|
LOC_Os05g43650.1
|
expressed protein
|
1494-1620
|
OSsRNA-3
|
GAUUCAUUAACUUCAAUAUGAA
|
1528-1549
|
1540
|
LOC_Os06g30680.1
|
WD domain, G-beta repeat domain containing protein
|
62-208
|
OSsRNA-4
|
UUCCUGGAGCCGCUCAUUCCAU
|
50-71
|
62
|
24-nt PHAS loci
|
LOC_Os01g37325.1
|
retrotransposon protein
|
1565-1760
|
OSsRNA-14
|
AAAAGUAGAUGGAUGCGGAGAC
|
1676-1697
|
1688
|
LOC_Os02g20200.1
|
retrotransposon protein
|
4856-5052
|
OSsRNA-15
|
UAGAUGCUGUCCUGAAAAGGUG
|
4873-4894
|
4885
|
OSsRNA-16
|
AGCCAUGCUAGUCUAAGAGGG
|
5007-5027
|
5018
|
LOC_Os02g55550.1
|
F-box/LRR-repeat protein 14
|
905-1101
|
OSsRNA-17
|
UAGAUGCUGUCCUGAAAAGGUG
|
922-943
|
934
|
LOC_Os04g45834.2
|
DUF584 domain containing protein
|
1051-1307
|
OSsRNA-18/
OSsRNA-19
|
UUAAUAUUUAUAAUUAGUGUCU/
UUAAUAUUUAUAAUUAAUGUCC
|
1103-1124
|
1115
|
LOC_Os09g14490.1
|
TIR-NBS type disease resistance protein
|
4585-4757
|
OSsRNA-20
|
UAGAUGCUGUCCUGAAAAGGUG
|
4578-4599
|
4590
|
Analysis of the regulatory function of novel phasiRNAs generated from 21-nt PHAS loci
The tasiRNAs are those 21-nt phasiRNAs have regulatory function in trans-regulation of target genes by cleaving mRNAs in plant. In order to identify novel tasiRNAs generated from the newly found 21-nt PHAS loci, all the 21-nt phasiRNAs were systematically “predicted” based on the modified model of tasiRNA biogenesis [28]. All of the detectable phasiRNAs were then employed for target prediction based on miRU algorithm and verified by using degradome-based HTS data (see details in “methods”). The results indicated ten novel tasiRNAs were generated from three newly found 21nt PHAS loci (LOC_Os02g18750.1, LOC_Os05g43650.1 and LOC_Os06g30680.1), respectively. These tasiRNAs mediated forty sRNA-target interactions (Table 2, Figure 3, Additional file 5: Figure S3).Among these targets, LOC_Os02g39380.1 played important roles in plant cellular signaling cascades [29]. (LOC_Os01g34620.8, LOC_Os02g52900.2, LOC_04g39600.1, LOC_08g40440.1, LOC_Os6g23274.1, LOC_Os06g47850.1, LOC_11g41860.1,LOC_11g41860.2 and LOC_Os05g46580.1) were involved in plant growth and development [30-35]. And LOC_Os09g12230.1, LOC_Os04g38450.1 and LOC_Os04g49160.1 were related to plant defense and stress response [36-38].
Although the transcript of LOC_Os12g42380.1 has been identified as part of an lncRNA phasiRNA precursor[27], one novel LOC_Os12g42380.1-derived tasiRNAs was found based on our revised tasiRNA biogenesis model[28]. LOC_Os12g42380.1 (414)21 5'D7(+) targeted to a NAD-dependent epimerase/dehydratase gene (LOC_Os07g47700.1) (Table 1, Addition file 5: Figure S3), suggesting it might be involved in regulation of plant growth, development and environmental stress[39, 40].
Taken together, these results suggested the OSsRNA-2-LOC_Os02g18750.1-phasiRNA, OSsRNA-3-LOC_Os05g43650.1-phasiRNA, OSsRNA-4-LOC_Os06g30680.1-phasiRNA and OSsRNA-5-LOC_Os12g42380.1-phasiRNA pathways might play crucial regulatory roles in rice growth, development and stress response. In addition, the regulatory networks of the phasiRNAs pathways mentioned above were constructed based on the target information (Figure 4).
Table 2 Targets of novel tasiRNAs in Oryza sativa
TaisRNA ID
|
tasiRNA sequence
|
Targets
|
Target annotation
|
miRU start-ending
|
taisRNA mediated cutsites
|
LOC_Os02g18750.1(189)21 3'D26 (+)
|
UGUGCCACGUCAACACCACCA
|
LOC_Os03g40260.1
|
Regulator of chromosome condensation domain containing protein
|
1676-1696
|
1687
|
LOC_Os02g18750.1(192)21 3'D25 (+)
|
GCGCCACUGCCGUCGACGUGU
|
LOC_Os02g39380.1
|
OsCML17 - Calmodulin-related calcium sensor protein
|
343-363
|
354
|
LOC_Os02g18750.1(204)21 3'D13 (+)
|
UCGACUUCGCCGCCUCGGCGC
|
LOC_Os02g39090.1
|
expressed protein
|
802-823
|
814
|
LOC_Os05g43650.1(1540)21 3'D2(+)
|
UCAAUAUGAAUGUGGAAAAUG
|
LOC_Os01g15520.1
|
expressed protein
|
1248-1268
|
1259
|
LOC_Os01g34620.8
|
OsGrx_S15.1 - glutaredoxin subgroup II
|
500-520
|
511
|
LOC_Os03g50070.1
|
DUF1295 domain containing protein
|
1195-1215
|
1206
|
LOC_Os04g38450.1
|
gamma-glutamyltranspeptidase 1 precursor
|
2137-2157
|
2148
|
LOC_Os04g49160.1
|
zinc finger, C3HC4 type domain containing protein
|
1093-1113
|
1104
|
LOC_Os05g03574.1
|
expressed protein
|
648-668
|
659
|
LOC_Os06g23274.1
|
zinc finger, C3HC4 type, domain containing protein
|
4632-4652
|
4643
|
LOC_Os06g47850.1
|
zinc finger family protein
|
97-117
|
108
|
LOC_Os08g19114.1
|
expressed protein
|
2050-2070
|
2061
|
LOC_Os08g40440.1
|
dihydroflavonol-4-reductase
|
1315-1335
|
1326
|
LOC_Os09g12230.1
|
ubiquitin-conjugating enzyme
|
1021-1041
|
1032
|
LOC_Os09g27500.1
|
cytochrome P450
|
1714-1734
|
1725
|
LOC_Os11g41860.1
|
OsFBX429 - F-box domain containing protein
|
1030-1050
|
1041
|
LOC_Os11g41860.2
|
OsFBX429 - F-box domain containing protein
|
973-993
|
984
|
LOC_Os12g12950.1
|
expressed protein
|
1071-1091
|
1082
|
LOC_Os05g43650.1(1540)21 3'D2(-)
|
UUUUCCACAUUCAUAUUGAUG
|
LOC_Os02g45650.1
|
peptidase
|
1760-1780
|
1771
|
LOC_Os05g43650.1(1542)21 3'D1(+)
|
AAUGAAUCUAGACAUAUAUAU
|
LOC_Os02g05810.1
|
expressed protein
|
1330-1350
|
1341
|
LOC_Os02g05810.2
|
expressed protein
|
1324-1344
|
1335
|
LOC_Os02g52900.2
|
glutaredoxin 2
|
2034-2054
|
2045
|
LOC_Os02g53000.2
|
lysM domain-containing GPI-anchored protein precursor
|
1340-1360
|
1351
|
LOC_Os04g44590.1
|
expressed protein
|
651-671
|
662
|
LOC_Os04g44590.5
|
expressed protein
|
445-465
|
456
|
LOC_Os05g41190.1
|
expressed protein
|
1026-1046
|
1037
|
LOC_Os05g41190.2
|
expressed protein
|
1082-1102
|
1093
|
LOC_Os05g51140.1
|
expressed protein
|
929-949
|
940
|
LOC_Os05g51140.2
|
expressed protein
|
1586-1606
|
1597
|
LOC_Os09g33930.1
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1457-1477
|
1468
|
LOC_Os09g33930.2
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1454-1474
|
1465
|
LOC_Os09g33930.3
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1740-1760
|
1751
|
LOC_Os09g33930.4
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1453-1473
|
1464
|
LOC_Os09g33930.5
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1375-1395
|
1386
|
LOC_Os12g37510.1
|
UDP-glucoronosyl and UDP-glucosyl transferase domain containing
|
1584-1604
|
1595
|
LOC_Os05g43650.1(1543)21 3'D2(-)
|
GCAUUUUCCACAUUCAUAUUG
|
LOC_Os02g48390.1
|
phosphoribosyl transferase
|
1758-1778
|
1769
|
LOC_Os05g43650.1(1543)21 3'D3(-)
|
UUCACAAUGUAAGUCAUUUUA
|
LOC_Os04g39600.1
|
fasciclin domain containing protein
|
1020-1040
|
1031
|
LOC_Os07g01130.1
|
pentatricopeptide containing protein
|
4240-4260
|
4251
|
LOC_Os05g43650.1(1543)21 3'D1(+)
|
AUGAAUCUAGACAUAUAUAUC
|
LOC_Os12g40920.1
|
bZIP transcription factor domain containing protein
|
1312-1332
|
1323
|
LOC_Os06g30680.1(62)21 3' D2(+)
|
CAUGGACAACUUCCUGCACAG
|
LOC_Os05g46580.1
|
polyprenyl synthetase
|
1365-1385
|
1376
|
LOC_Os12g42380.1(414)21 5'D7(+)
|
UUUCUUCCAAGAGAGAGUAAG
|
LOC_Os07g47700.1
|
NAD dependent epimerase/dehydratase family domain containing protein
|
1753-1773
|
1764
|
Analysis of the RNA directed DNA methylation (RdDM)regulated promoters of novel 24-nt phasiRNAs
RdDM is an important regulatory event with regards to repressive epigenetic modification which triggers transcriptional gene silencing. In order to analysis the novel 24-nt phasiRNA mediated RdDM in rice, we focused on all the known promoter sequences for scanning the target sites of novel phasiRNAs generated from the newly found five 24-nt PHAS loci. The result indicated a promoter of LOC_Os02g40860.1 gene was targeted by five LOC_Os01g37325.1-derived phasiRNAs (Table 3). Since LOC_Os01g37325.1-derived phasiRNAs were detected in panicle rather than in root tissue (Figure 2), we used the bisulfite-seq and RNA-seq datasets[41] of rice panicle and root for identification of the LOC_Os01g37325.1-derived phasiRNAs mediated DNA methylation on the target promoter and their role in transcriptional repression of target gene (LOC_Os02g40860.1), respectively. It is reported that CG and CHG methylation contexts are maintained by DNA methyltransferases and histone modifications, while CHH methylation is associated with 24-nt siRNA guided RdDM[16]. We discovered the CHH methylation status of promoter was relative higher in panicle than in root (Figure 5). In addition, the expression level of LOC_Os02g40860.1 was relatively lower in panicle than in root. These results implied a methylation mediated transcriptional silencing of the promoter of LOC_Os02g40860.1.
For LOC_Os02g40860.1, it encodes a Casein kinase I1 (OsCKI1) protein belongs to the CKIs protein family. CKIs are highly conserved in eukaryotes, and they are involved in a variety of important biological events since they have a wide substrate specificity in vitro [42]. Taken together, we speculated that the OSsRNA-14- LOC_Os01g37325.1-phasiRNA pathway might play crucial roles for rice seedling and panicle development.
Table 3 The target promoter of LOC_Os01g37325.1-derived phasiRNAs
24-nt phasiRNAs_ID
|
PhasiRNAs_sequences
|
Binding_sites_
on_promoter
|
Prmoter_location
|
Target_genes
|
Target annotation
|
LOC_Os01g37325.1(1684) 24 5’D12(+)
|
AUCAUGACUUGGGUAUUACGUUUC
|
111-134
|
chr2_24766608-24766807
|
LOC_Os02g40860.1
|
Casein kinase I1 (CKI1)
|
LOC_Os01g37325.1(1684) 24 5’D10(+)
|
AGUCCUGGUUUGAUAAGAUUGUAA
|
63-86
|
LOC_Os01g37325.1(1684) 24 5’D9(+)
|
AGUAGAUUUAGGAAACCGAUACCG
|
39-62
|
LOC_Os01g37325.1(1665) 24 5’D13(+)
|
ACUAGUUAUAGGGGAUAACUUAUA
|
154-177
|
LOC_Os01g37325.1(1665) 24 5’D11(+)
|
GACUUGGGUAUUACGUUUCCCUGU
|
106-129
|