Identification of novel phasiRNA biogenesis pathways in Oryza sativa
The sRNA HTS datasets from different rice samples were employed as inputs and rice cDNA sequences as alignment reference for searching PHAS loci capable of producing 21-nt or 24-nt phasiRNAs. As a result, fourteen 21-nt and nineteen 24-nt PHAS loci candidates passed through the filtering procedures as well as the corresponding searching of sRNA triggers for phasiRNA production. (Additional file 1:Table S1, Additional file 2: Table S2). Recent reports discovered that processing of 21-nt phasiRNAs mainly depends on OsDCL4, and OsDCL3 is required for biogenesis of 24-nt phasiRNAs in rice[20].Therefore, we evaluated the abundance of 21-nt and 24-nt phasiRNAs generated from potential PHAS loci candidates by comparing the wild-type with osdcl4 knockdown mutant (osdcl4-1)[26](for 21-nt phasiRNAs) and osdcl3 knockdown mutant (osdcl3-1)[20](for 24-nt phasiRNAs), respectively.
As a result, five novel 21-nt PHAS loci and five novel 24-nt PHA loci along with their corresponding miRNA/sRNA triggers were identified (Table 1). As shown in figure 1 and figure 2, the miRNA/sRNA triggers-mediated cleavages on target PHAS loci were detected by at least one degradome sequencing dataset. Indeed, each cleavage site was close to the flank of phasiRNA production region as indicated by the relative abundances of phasiRNAs (middle panel),which suggest these sites were primary registers for phasing process. Additionally, the abundance of phasiRNAs generated from these newly found 21-nt and 24-nt PHAS loci in wild type were relatively higher than that in osdcl4-1 mutant and osdcl3-1 mutant, respectively. This indicated that the 21-nt and 24-nt phasiRNA productions are OsDCL4- and OsDCL3 dependent, respectively (Additional file 3: Figure S1 and Additional file 4: Figure S2).Taken together, these results demonstrated that, these newly found PHAS loci fit the profiles of canonical phasiRNA precursors [8,20].
Previously, we found lots of sRNAs generated in three-week-old seedling tissues (see details about the information of GEO number of dataset and culture condition of plants in “Methods”). Here, we tested whether these sRNA are phasiRNAs by using our mining method. As we expected, the phasiRNAs generated from two novel 21-nt PHAS loci, (LOC_Os01g57968.1and LOC_Os05g43650.1and four novel 24-nt PHAS loci (LOC_Os02g20200.1, LOC_Os02g55550.1, LOC_Os04g45834.2 and LOC_Os09g14490.1 were identified in three-week-old seedling tissues (Table 1).
Since the triggering of phasiRNAs are sometimes tissue specific and stress dependent, a serial of sRNA HTS datasets of different rice samples were employed for mining novel PHAS loci in different tissue and stress condition(see details about the information of GEO number of datasets, culture and treatment conditions of plants in “Methods”). As shown in Figure 1, transcripts of 21-nt PHAS loci, LOC_Os02g18750.1 and LOC_Os04g25740.1, were able to produce 21-nt phasiRNAs in panicle under normal condition. LOC_Os06g30680.1-derived 21-nt phasiRNAs and LOC_Os01g37325.1-derived 24-nt phasiRNAs were detected in panicle both under normal and drought stress condition.
According to the gene annotation, LOC_Os01g57968.1, LOC_Os02g18750.1, LOC_Os04g25740.1 and LOC_Os05g43650.1 encode proteins with unknown function and LOC_Os06g30680.1 encodes a WD domain, G-beta repeat domain containing protein. LOC_Os01g37325.1 and LOC_Os02g20200.1 encode two retrotransposon genes, LOC_Os02g55550.1 encodes an F-box/LRR-repeat protein, LOC_Os04g45834.2 encodes a protein with DUF584 domain, and LOC_Os09g14490.1 encodes a TIR-NBS type disease resistance protein.
Taken together, these protein-coding genes acted as PHAS loci in different tissues and stress conditions suggested these coding sequences were regulated at post-transcriptional level in response to different stages of growth and stress conditions.
Two known 21-nt PHAS loci (LOC_Os12g42380.1(Gene ID of an expressed protein) and LOC_Os12g42390.1(Gene ID of hypothetical protein)) were also uncovered by our screening procedure (Additional file 1: Table S1), which have been identified as two parts of a long non-coding RNAs[27]. LOC_Os12g42380.1-derived phasiRNAs were detected in both seedling and panicle under normal, drought and salinity stress conditions, and in shoot they were only detected under salinity stress. LOC_Os12g42390.1-derived phasiRNAs were detected in shoot under normal condition, and panicle in drought. These results implied there are three alternative phasiRNA production regions within their lncRNA PHAS loci, their capability of phasiRNA production varies in different development stages and stress conditions.
To note, to our knowledge, for all these newly found PHAS loci, only the biogenesis of LOC_Os04g25740.2-derived phasiRNAs were triggered by a known miRNA, miR2118f. The rest of phasiRNAs were generated from first-time discovered PHAS loci, and were triggered by novel sRNAs (Table 1), which suggested that, these phasiRNA biogenesis pathways are not belong to the miR2118 or miR2257 mediated regulatory networks.
Table 1 Novel PHAS loci in Oryza sativa
GeneID of the PHAS loci
|
Gene annotation
|
PhasiRNA production region
|
sRNA trigger ID
|
sRNA trigger sequence
|
Binding sites of sRNA trigger on gene of PHAS loci
|
Cleavage sitesdiscovered by degradome on gene of PHAS loci
|
21-nt PHAS loci
|
LOC_Os01g57968.1
|
expressed protein
|
361-1765
|
OSsRNA-1
|
GCUUUUUUGAACUUUUUCAUU
|
424-444
|
435
|
LOC_Os02g18750.1
|
expressed protein
|
188-920
|
OSsRNA-2
|
UUUUUUGGCAUUCUGUAACUUG
|
176-197
|
188
|
LOC_Os04g25740.1
|
expressed protein
|
1908-2159
|
osa-miR2118f
|
UUCCUGAUGCCUCCCAUUCCUA
|
1875-1896
|
1887
|
LOC_Os05g43650.1
|
expressed protein
|
1494-1620
|
OSsRNA-3
|
GAUUCAUUAACUUCAAUAUGAA
|
1528-1549
|
1540
|
LOC_Os06g30680.1
|
WD domain, G-beta repeat domain containing protein
|
62-208
|
OSsRNA-4
|
UUCCUGGAGCCGCUCAUUCCAU
|
50-71
|
62
|
24-nt PHAS loci
|
LOC_Os01g37325.1
|
retrotransposon protein
|
1565-1760
|
OSsRNA-14
|
AAAAGUAGAUGGAUGCGGAGAC
|
1676-1697
|
1688
|
LOC_Os02g20200.1
|
retrotransposon protein
|
4856-5052
|
OSsRNA-15
|
UAGAUGCUGUCCUGAAAAGGUG
|
4873-4894
|
4885
|
OSsRNA-16
|
AGCCAUGCUAGUCUAAGAGGG
|
5007-5027
|
5018
|
LOC_Os02g55550.1
|
F-box/LRR-repeat protein 14
|
905-1101
|
OSsRNA-17
|
UAGAUGCUGUCCUGAAAAGGUG
|
922-943
|
934
|
LOC_Os04g45834.2
|
DUF584 domain containing protein
|
1051-1307
|
OSsRNA-18/
OSsRNA-19
|
UUAAUAUUUAUAAUUAGUGUCU/
UUAAUAUUUAUAAUUAAUGUCC
|
1103-1124
|
1115
|
LOC_Os09g14490.1
|
TIR-NBS type disease resistance protein
|
4585-4757
|
OSsRNA-20
|
UAGAUGCUGUCCUGAAAAGGUG
|
4578-4599
|
4590
|
Analysis of the regulatory function of novel phasiRNAs generated from 21-nt PHAS loci
The tasiRNAs are those 21-nt phasiRNAs have regulatory function in trans-regulation of target genes by cleaving mRNAs in plant. In order to identify novel tasiRNAs generated from the newly found 21-nt PHAS loci, all the 21-nt phasiRNAs were systematically “predicted” based on the modified model of tasiRNA biogenesis [28]. All of the detectable phasiRNAs were then employed for target prediction based on miRU algorithm and verified by using degradome-based HTS data (see details in “methods”). The results indicated ten novel tasiRNAs were generated from three newly found 21nt PHAS loci (LOC_Os02g18750.1, LOC_Os05g43650.1 and LOC_Os06g30680.1), respectively. These tasiRNAs mediated forty sRNA-target interactions (Table 2, Figure 3, Additional file 5: Figure S3).Among these targets, LOC_Os02g39380.1 played important roles in plant cellular signaling cascades [29]. (LOC_Os01g34620.8, LOC_Os02g52900.2, LOC_04g39600.1, LOC_08g40440.1, LOC_Os6g23274.1, LOC_Os06g47850.1, LOC_11g41860.1,LOC_11g41860.2 and LOC_Os05g46580.1) were involved in plant growth and development [30-35]. And LOC_Os09g12230.1, LOC_Os04g38450.1 and LOC_Os04g49160.1 were related to plant defense and stress response[36-38].
Although the transcript of LOC_Os12g42380.1 has been identified as part of an lncRNA phasiRNA precursor[27], one novel LOC_Os12g42380.1-derived tasiRNAs was found based on our revised tasiRNA biogenesis model[28]. LOC_Os12g42380.1 (414)21 5'D7(+) targeted to a NAD-dependent epimerase/dehydratase gene (LOC_Os07g47700.1) (Table 1, Addition file 5: Figure S3), suggesting it might be involved in regulation of plant growth, development and environmental stress[39, 40].
Taken together, these results suggested the OSsRNA-2-LOC_Os02g18750.1-phasiRNA, OSsRNA-3-LOC_Os05g43650.1-phasiRNA, OSsRNA-4-LOC_Os06g30680.1-phasiRNA and OSsRNA-5-LOC_Os12g42380.1-phasiRNA pathways might play crucial regulatory roles in rice growth, development and stress response. In addition, the regulatory networks of the phasiRNAs pathways mentioned above were constructed based on the target information (Figure 4).
Table 2 Targets of novel tasiRNAs in Oryza sativa
TaisRNA ID
|
tasiRNA sequence
|
Targets
|
Target annotation
|
miRU start-ending
|
taisRNA mediated cutsites
|
LOC_Os02g18750.1(189)21 3'D26 (+)
|
UGUGCCACGUCAACACCACCA
|
LOC_Os03g40260.1
|
Regulator of chromosome condensation domain containing protein
|
1676-1696
|
1687
|
LOC_Os02g18750.1(192)21 3'D25 (+)
|
GCGCCACUGCCGUCGACGUGU
|
LOC_Os02g39380.1
|
OsCML17 - Calmodulin-related calcium sensor protein
|
343-363
|
354
|
LOC_Os02g18750.1(204)21 3'D13 (+)
|
UCGACUUCGCCGCCUCGGCGC
|
LOC_Os02g39090.1
|
expressed protein
|
802-823
|
814
|
LOC_Os05g43650.1(1540)21 3'D2(+)
|
UCAAUAUGAAUGUGGAAAAUG
|
LOC_Os01g15520.1
|
expressed protein
|
1248-1268
|
1259
|
LOC_Os01g34620.8
|
OsGrx_S15.1 - glutaredoxin subgroup II
|
500-520
|
511
|
LOC_Os03g50070.1
|
DUF1295 domain containing protein
|
1195-1215
|
1206
|
LOC_Os04g38450.1
|
gamma-glutamyltranspeptidase 1 precursor
|
2137-2157
|
2148
|
LOC_Os04g49160.1
|
zinc finger, C3HC4 type domain containing protein
|
1093-1113
|
1104
|
LOC_Os05g03574.1
|
expressed protein
|
648-668
|
659
|
LOC_Os06g23274.1
|
zinc finger, C3HC4 type, domain containing protein
|
4632-4652
|
4643
|
LOC_Os06g47850.1
|
zinc finger family protein
|
97-117
|
108
|
LOC_Os08g19114.1
|
expressed protein
|
2050-2070
|
2061
|
LOC_Os08g40440.1
|
dihydroflavonol-4-reductase
|
1315-1335
|
1326
|
LOC_Os09g12230.1
|
ubiquitin-conjugating enzyme
|
1021-1041
|
1032
|
LOC_Os09g27500.1
|
cytochrome P450
|
1714-1734
|
1725
|
LOC_Os11g41860.1
|
OsFBX429 - F-box domain containing protein
|
1030-1050
|
1041
|
LOC_Os11g41860.2
|
OsFBX429 - F-box domain containing protein
|
973-993
|
984
|
LOC_Os12g12950.1
|
expressed protein
|
1071-1091
|
1082
|
LOC_Os05g43650.1(1540)21 3'D2(-)
|
UUUUCCACAUUCAUAUUGAUG
|
LOC_Os02g45650.1
|
peptidase
|
1760-1780
|
1771
|
LOC_Os05g43650.1(1542)21 3'D1(+)
|
AAUGAAUCUAGACAUAUAUAU
|
LOC_Os02g05810.1
|
expressed protein
|
1330-1350
|
1341
|
LOC_Os02g05810.2
|
expressed protein
|
1324-1344
|
1335
|
LOC_Os02g52900.2
|
glutaredoxin 2
|
2034-2054
|
2045
|
LOC_Os02g53000.2
|
lysM domain-containing GPI-anchored protein precursor
|
1340-1360
|
1351
|
LOC_Os04g44590.1
|
expressed protein
|
651-671
|
662
|
LOC_Os04g44590.5
|
expressed protein
|
445-465
|
456
|
LOC_Os05g41190.1
|
expressed protein
|
1026-1046
|
1037
|
LOC_Os05g41190.2
|
expressed protein
|
1082-1102
|
1093
|
LOC_Os05g51140.1
|
expressed protein
|
929-949
|
940
|
LOC_Os05g51140.2
|
expressed protein
|
1586-1606
|
1597
|
LOC_Os09g33930.1
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1457-1477
|
1468
|
LOC_Os09g33930.2
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1454-1474
|
1465
|
LOC_Os09g33930.3
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1740-1760
|
1751
|
LOC_Os09g33930.4
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1453-1473
|
1464
|
LOC_Os09g33930.5
|
farnesyltransferase/geranylgeranyltransferase type-1 subunitalph
|
1375-1395
|
1386
|
LOC_Os12g37510.1
|
UDP-glucoronosyl and UDP-glucosyltransferase domain containing
|
1584-1604
|
1595
|
LOC_Os05g43650.1(1543)21 3'D2(-)
|
GCAUUUUCCACAUUCAUAUUG
|
LOC_Os02g48390.1
|
phosphoribosyltransferase
|
1758-1778
|
1769
|
LOC_Os05g43650.1(1543)21 3'D3(-)
|
UUCACAAUGUAAGUCAUUUUA
|
LOC_Os04g39600.1
|
fasciclin domain containing protein
|
1020-1040
|
1031
|
LOC_Os07g01130.1
|
pentatricopeptide containing protein
|
4240-4260
|
4251
|
LOC_Os05g43650.1(1543)21 3'D1(+)
|
AUGAAUCUAGACAUAUAUAUC
|
LOC_Os12g40920.1
|
bZIP transcription factor domain containing protein
|
1312-1332
|
1323
|
LOC_Os06g30680.1(62)21 3' D2(+)
|
CAUGGACAACUUCCUGCACAG
|
LOC_Os05g46580.1
|
polyprenylsynthetase
|
1365-1385
|
1376
|
LOC_Os12g42380.1(414)21 5'D7(+)
|
UUUCUUCCAAGAGAGAGUAAG
|
LOC_Os07g47700.1
|
NAD dependent epimerase/dehydratase family domain containing protein
|
1753-1773
|
1764
|
Analysis of the RNA directed DNA methylation (RdDM)regulated promoters of novel 24-nt phasiRNAs
RdDM is an important regulatory event with regards to repressive epigenetic modification which triggers transcriptional gene silencing. In order to analysis the novel 24-nt phasiRNA mediated RdDM in rice, we focused on all the known promoter sequences for scanning the target sites of novel phasiRNAs generated from the newly found five 24-nt PHAS loci. The result indicated a promoter of LOC_Os02g40860.1 gene was targeted by five LOC_Os01g37325.1-derived phasiRNAs (Table 3). Since LOC_Os01g37325.1-derived phasiRNAs were detected in panicle rather than in root tissue (Figure 2), we used the bisulfite-seq and RNA-seq datasets[41] of rice panicle and root for identification of the LOC_Os01g37325.1-derived phasiRNAs mediated DNA methylation on the target promoter and their role in transcriptional repression of target gene (LOC_Os02g40860.1), respectively. It is reported that CG and CHG methylation contexts are maintained by DNA methyltransferases and histone modifications, while CHH methylation is associated with 24-nt siRNA guided RdDM[16]. We discovered the CHH methylation status of promoter was relative higher in panicle than in root (Figure 5). In addition, the expression level of LOC_Os02g40860.1 was relatively lower in panicle than in root. These results implied a methylation mediated transcriptional silencing of the promoter of LOC_Os02g40860.1.
For LOC_Os02g40860.1, it encodes a Casein kinase I1 (OsCKI1) protein belongs to the CKIs protein family. CKIs are highly conserved in eukaryotes, and they are involved in a variety of important biological events since they have a wide substrate specificity in vitro [42]. Taken together, we speculated that the OSsRNA-14-LOC_Os01g37325.1-phasiRNA pathway might play crucial roles for rice seedling and panicle development.
Table 3 The target promoter of LOC_Os01g37325.1-derived phasiRNAs
24-nt phasiRNAs_ID
|
PhasiRNAs_sequences
|
Binding_sites_
on_promoter
|
Prmoter_location
|
Target_genes
|
Target annotation
|
LOC_Os01g37325.1(1684) 24 5’D12(+)
|
AUCAUGACUUGGGUAUUACGUUUC
|
111-134
|
chr2_24766608-24766807
|
LOC_Os02g40860.1
|
Casein kinase I1 (CKI1)
|
LOC_Os01g37325.1(1684) 24 5’D10(+)
|
AGUCCUGGUUUGAUAAGAUUGUAA
|
63-86
|
LOC_Os01g37325.1(1684) 24 5’D9(+)
|
AGUAGAUUUAGGAAACCGAUACCG
|
39-62
|
LOC_Os01g37325.1(1665) 24 5’D13(+)
|
ACUAGUUAUAGGGGAUAACUUAUA
|
154-177
|
LOC_Os01g37325.1(1665) 24 5’D11(+)
|
GACUUGGGUAUUACGUUUCCCUGU
|
106-129
|