Chlorophytes possess highest prion density in the functionally diverse plant prionome
Figure 1A depicts the density distribution of PrLPs across each of the 39 plant species used in the analysis, including four chlorophytes, two bryophytes (Marchantia and Physcomitrella), one gymnosperm (Pine), an ancient angiosperm (Amborella), six monocots (grass family members), and 25 dicots representing 13 taxonomic families. Algae possess highest prion densities, even with a threshold prion selection score of 25, with exception of Micromonas pusilla CCMP which was found to have the lowest (0.0076) densities across all phyla. Overall, lower plant taxa possess higher PrLP densities, with Oryza sativa and Arabidopsis thaliana depicting greatest prion density among all higher plants.
In all, we identified, 4479 PrLPs (above threshold core score value of 25) and these could be classified into ten functional categories depicted in Figure 1B. The representation was highest for the RNA binding/regulation/transcription (22.5%), category followed by DNA binding/replication/TF (17.9%) and transport (7.1%) (Supplemental Table 1). Notably, one of the top ten roles represented by PrLPs was flagged to be transposon (Ts/RTRs) related functions as can be seen in Figure 1B. These ten categories were further supported by gene ontology enrichment of plant PrLPs (Supplemental Table 2), with ‘nucleic acid binding’ function (GO:0003676) universally enriched across different phyla (Supplemental Figure 1A). Chlorophytes were specifically enriched in various functions related to DNA binding. Further, DNA/RNA-related functions like, transcription co-regulator activity (GO:0003712) and RNA binding (GO:0003723), were found to be co-enriched in A. thaliana, P. abies and P. patens. Flowering and development were overrepresented in various plant prionomes, along with proteins involved in the regulation of the reproductive processes (GO: 2000241). Plant prionomes were also enriched in other functions such as, aromatic/cyclic compound metabolic process (GO:1901362, GO:0046483, GO:0019438, GO:0006725 terms) in most species (Supplemental Figure 1A).
In context of biological processes, similar indications about overrepresentation of DNA-related functions could be seen, with DNA biosynthesis, metabolism and DNA regulation at transcriptional level, or gene expression, being the predominant biological PrLP functions across all phyla (Supplemental Figure 1B). Nitrogen metabolism related processes were found to be enriched in higher angiosperms (GO:0051171, GO:0034641 and GO:0044271).
Analysis of cellular location revealed enrichment of various nucleus-related roles of PrLPs (Supplemental Figure 1C). For example, PrLPs form part of the transcription factor TFIID complex (GO:0005669) in many prionomes, while RNA polymerase II holoenzyme (GO:0016591) and complex (GO:0030880) components could be associated with P. deltoides, M. domestica, B. rapa and S. italica prionomes.
The Rice Prionome: Enrichment of Ts/RTRs
We identified 228 PrLPs (encoded by 201 genes) in the rice proteome (listed in Supplemental Table 3). The well annotated rice genome and availability of several gene expression datasets, offers a unique opportunity to delineate precise functional roles of PrLPs, and explore their involvement in stress, or in acclimation to stress memory.
Functional classification and GO enrichment of the rice prionome revealed numerous unique functional features (Figure 2 & Supplemental Figure 2). In terms of cellular distribution of rice PrLPs, we observed a preference for mitochondrial localisation followed closely by nuclear and cytosol-based localization, with instances of localization in more than one organelle (Figure 2C &D). Nuclear rice PrLPs were associated with DNA-related roles like transcription factors (TCP, auxin related), transcriptional activators (RSG and SWIRM domains), flowering regulation related (LEUNIG and FCA proteins), as well as the RNA binding FUS proteins and KH domain proteins (Figure 2E; Supplemental Figure 2). In addition, transport proteins such as ANTH/ENTH domains, as well as VHS/ GAT domain containing PrLPs were also predicted to be nuclear localised (Supplemental Table 3). In terms of biological processes, rice PrLPs were found to be enriched in regulation of flower development (GO:0009909), auxin activation pathways (GO:0009734) and G-protein coupled receptor signalling (GO:0007186) (Supplemental Figure 2). Molecular function enrichment highlighted ATP-dependant helicases (GO:0008026), lipid binding (GO:0008289) and involvement in nutrient reservoirs of cell (GO:0045735) (Supplemental Figure 2).
12 members in the rice prionome were identified as transcription factors, representing ARF, bZIP, C3H and NF-YB families, explored in more detail in later sections. Comparison of the rice prionome with the ten major functional categories described above, revealed the presence of an exceptionally large number of Ts/RTRs (Figure 2A & B, Supplemental Table 3). More than half (62.5%) of the identified PrLPs belong in this category (131 PrLPs). Interestingly, more than 80% of the cytosolic and mitochondrial rice prionome consists of Ts/RTRs (Figure 2F). RTRs were also detected in other plant prionomes, but nowhere as significantly as in case of rice, clearly indicating a unique case for Oryza sativa. Among other plants, chlorophyte D. salina (0.3%); monocots B. distachyon (1.2%), S. italica (1.3%) and Z. mays (3.9%); and dicots A. thaliana (3.8%), G. raimondii (1.3%) and M. domestica (3.9%) were found to harbour Ts/RTRs in their prionomes (Supplemental Table 1). We believe these numbers may not truly represent an absence of RTRs in other plants, but rather, a lack of sufficient or complete annotation about this important yet under-explored domain, and that future studies may reveal the existence of RTRs in other plant prionomes as well. For rice, the available annotation (RGAP version 7) has enabled a thorough investigation of the rice prionome, especially in terms of its unique enrichment for RTR/Ts domains.
Rice prionome: Gene Expression Profiles
In order to gain deeper insights into potential prion-like roles of PrLPs, we analysed the development and tissue-specific expression profiles of 62 rice PrLPs, as described in methods (Figure 3). Expression levels of a stress-responsive N-rich protein, and a DAG protein-2 were found to be highest among all PrLPs across all stages of development (Supplemental Table 3). In contrast, RBD-FUS2, EXP3, EXP8 and EXP9 exhibited lowest expression among all rice PrLPs. However, few PrLPs showed significant variations during development such as, RNA-binding RBD-FUS1 protein, which was more abundant at the seedling and tillering stages and significantly downregulated during flowering stage. Further, transcript levels of ANTH/ENTH and RPA1C protein were increased at the flowering stage while those of BRO1, RSG activator and VHS and GAT1, were markedly downregulated during flowering. The TCP domain containing protein, SHR transcription factor, RRM1, ZFP1 and floral homeotic gene LEUNIG2 were upregulated in the stem elongation stage and downregulated at the heading stage. In coherence with the flowering and developmental functions, FCA has differential expression during vegetative and flowering stages (Figure 3A).
Figure 3B depicts the rice prionome expression by tissue type. The most highly expressed group of PrLPs comprised of N-rich protein and DAG protein2 while SHR, RBD-FUS2, ANTH/ENTH and EXP3/8/9 constituted the least expressing groups across all tissues, as also observed across developmental stages. Differential expression of PrLPs was observed between male and female reproductive tissues, where ZFP2, ARF5, SWIRM domain, and SAC3/GANP1 showed much lesser expression levels in male tissues (stamen, anther and pollen) as compared to female tissues (pistil, stigma and ovary). Also, bZIP TF, CDK and ARF genes which were otherwise highly abundant in different organs showed lower expression in pollen whereas RPA1C showed opposite pattern of expression. Interestingly, stamen had upregulated expression for NA binding1 and RSG activator. The RBD-FUS1, however, showed greater expression in roots than shoots and inflorescence. In contrast, RRM2, NA-binding1, HMA1 and HMA2 are expressed more in the inflorescence. Further, VHS and GAT2 showed lowest expression in roots, particularly, in primary root tip and highest expression in the leaf blades. Importantly, root tips were particularly seen to express RBD-FUS3 and NA-binding proteins.
Overall, we noted that the plant prionome is extensively expressed in diverse tissues across developmental stages, and we then explored interaction between members of the prionome, as addressed in the next section.
Rice Prionome Interaction network highlights essential cellular processes
Figure 4 depicts the protein-protein interaction network for the rice prionome, constructed as described in methods. Of the 201 PrLPs, 37 exhibit binary interactions with >1000 other members of rice proteome as evident from the core proteins (indicated in black circles) of the network. In order to functionally characterize the PrLP interactome, all 1263 interacting proteins were analyzed for pathway enrichment using KEGG database. Enrichment data was consistent with previous observations, showing DNA and RNA binding processes being the main biological roles of the rice PrLP interactome. In particular, functional clusters involved in ribosome and protein biogenesis, transcriptional machinery and its regulation, DNA replication/repair proteins and RNA surveillance proteins were predominant in the interactome (Figure 4). Since processing, transport and proteolysis of proteins involves a larger number of interactions, same could be seen from the interactome, with the largest cluster related to ribosome and protein biogenesis, represented by 886 proteins including 16 rice PrLPs. Similarly, flowering and development was also represented as a significant functional category in the interactome as observed in the rice prionome dataset alone. Mitochondrial biogenesis and autophagy, plant MAPK signaling pathway, and nucleotide and amino acid metabolism related functions were noted as other sub-networks.
Role of PrLPs in Stress and Memory
We also investigated the response of the rice prionome to cold, heat, drought, salinity and biotic stresses, as shown in Figure 5. Assessment of the stress transcriptome map of the rice prionome identified specific PrLPs whose expression was significantly altered in response to stress conditions. For example, cold stress led to downregulation of RBD-FUS3, NA-binding1, DAG protein1, HMA1/2 and ZFP2 concomitant with upregulation of RBD-FUS1 and N-rich protein, which was found to be the most stress-responsive PrLP. Heat stress resulted in a nine-fold decrease in transcript levels of N-rich protein, while drought, salinity and M. oryzae infection also result in marked upregulation.
PrLPs encoding transcriptional corepressor LEUNIG1, auxin response factor ARF5, DAG protein1, VHS and GAT1 and SHR protein, are specifically increased in response to heat treatment, while the expression of TCP and ANTH genes is more than 4-fold downregulated under high temperature. Further, ANTH transcript levels are also drought-inducible. Notably, only two PrLPs namely, N-rich protein and VHS and GAT protein1, appear to be salt-inducible. Biotic stress responsiveness of the rice prionome was only observed for N-rich protein and ANTH. Overall, stress profiling of the rice prionome suggested differential regulation, catering to diverse processes from development to stress.
Interestingly, we found PrLPs to be involved in abiotic stress memory and stress recovery in several plants species, as depicted in Supplemental Figure 3. These range from cold stress in Arabidopsis [18], hormonal stress priming in M. domestica (Supplemental Figure 3B) and so on. PrLP expression profiles in memory responses pertaining to recovery phase have been observed in Populus spp., in response to periodic and successively increasing drought, or chronic phase of combined drought-heat stress followed by one week of recovery phase (Supplemental Figure 3C). Likewise, heat stress showed a memory response among PrLPs in Chlamydomonas and Arabidopsis as well (Supplemental Figure 3 C&D). Interestingly, we could detect homologs for ten of these genes within the rice prionome, supporting the role of rice PrLPs in memory signals. These observations when combined with the large number of rice PrLPs found to be impacted by heat stress as mentioned in earlier sections, suggest an important role of the prionome in heat stress and memory. Notably, the very recent reports of heat shock proteins being important epigenetic mediators of transient as well as trans-generational memory, led us to explore cross-talk between reported epigenetic signals of memory and signals mediated by PrLPs. A full list of genes reported to be involved in plant stress or memory is provided in Supplemental Table 4.
Transcriptional regulatory inferences from gene co-expression data
Gene co-expression data analysis was performed on rice PrLPs using the publicly available diurnal gene expression dataset, as described in Methods. Interestingly, the 66 PrLPs for which we found diurnal expression profiles (listed in Supplemental Table 5) included 11 of the 12 transcription factors in the rice prionome, as well as five Ts/RTSs, enabling a thorough investigation of the regulatory role of both TFs and RTRs in the prionome. The genes found to be significantly correlated with PrLPs, were used to (a) identify correlated clusters of genes, if any, among PrLPs, (b) capture the intersection between PrLPs, specially the Ts/RTRs, and genes known to be involved in stress or memory, and (c) identify transcription factors defining or influencing the rice prionome for insights into the PrLP master regulatory network.
Figure 6 depicts the correlation plot of the rice prionome, highlighting the significantly correlated PrLPs (at correlation coefficient cutoff of +-0.8 and P < 0.01). As can be seen in this Figure, two clusters of PrLPs are discernible, with about 15 genes in each cluster, and all 91 interactions are listed in Supplemental Table 6. Both clusters are significantly negatively correlated with each other, suggesting an antagonism in their roles/involvement, with one cluster including four TFs and two Ts/RTRs, while the other cluster having one TF and one Ts/RTR. Interestingly, the first cluster has several RNA helicases and exonucleases, as well as some of the PrLPs noted in earlier sections to be among the most highly expressed (DAG proteins) and those upregulated in floral tissue, both male (RSG) and female (ARF5 and ZFP2). In contrast, the second cluster, showing negative correlation with the first one, contains the BRO1 and RSG activators, whose expressions were earlier observed to be downregulated during flowering stages. Notably, this cluster also has the two LEUNIG proteins, well known to repress several floral homeotic genes in the floral meristems, required for proper differentiation of stamen and carpel structures in the flower [19, 20]. These patterns suggest a role for PrLP-mediated regulation among flowering genes. As can be seen in the inset, the first cluster also has several genes that were observed to be down-regulated in cold stress (RBD-FUS3, DAG, ZFP2) and upregulated during heat stress (ARF5, DAG1).
The composition of the above two PrLP clusters in diurnal co-expression data and the pattern of distribution of their respective transcription factors, corroborated by observations from condition-specific, tissue-based and developmental gene expression profiles, motivated us to derive regulatory inferences from co-expression data, for all TFs in the rice prionome.
As stated earlier, eleven of the 12 TFs in the rice prionome were found to have diurnal expression profiles and these 11 genes showed a significant positive correlation with 100 other TFs in rice over the entire day night cycle, as well as a significant inverse correlation (cor value < -0.8 at P < 0.01) with another 101 TFs, suggesting a master regulatory role for these 11 members of the rice prionome. To further ascertain a master regulatory role, we checked the upstream regions of all the positively and negatively correlated TFs for presence of cis-binding elements for the eleven PrLPs, resulting in the identification of 40 high fidelity rice TFs that had a strong positive or negative correlation with the rice prionome, in addition to containing the respective TF binding sites on their promoter sequences, and these have been added to Supplemental Table 6. These high fidelity TFs belong to the WRKY, Dof, C2H2, Myb-related, LBD, TCP, GATA, G2-like, TriHelix, SBP, RAV and NAC families, while eight of the 40 are PrLPs themselves, further supporting internal cross-talk and diverse regulatory roles of the rice prionome.
Master Regulatory network reveals PrLP clusters in Memory acclimation
In order to visualise the patterns of interaction and regulation within and between the rice prionome and sets of other rice TFs or stress or memory genes (for which we had diurnal expression profiles), we constructed a transcriptional regulatory network using gene-co-expression data, and this was generated in a stepwise manner, as described in methods. This network enabled us to explore crosstalk between TFs and RTRs in the prionome, and the extent to which they may regulate or be controlled by other activators or repressors, especially in stress or memory acclimation.
The five Ts/RTRs in the rice prionome were found to be positively co-expressed with 91 TFs and negatively co-expressed with 77 distinct TFs. Of these pairs of co-expressing TF partners, we performed the same filtering as was done for 11 TFs above, to identify/retain only high fidelity TFs that have a known binding site on the respective Ts/RTR promoters (Supplemental Table 6). This was achieved by scanning the upstream regions of all co-expressing partners for the presence of cis-elements, leading to the retention of 22 true positive TFs, which were added to the rice prionome co-expression data, to generate a regulatory network. In the next step, the network was expanded by adding the 32 additional TFs that were identified earlier as high fidelity co-expressing partners of eleven TFs in the prionome. This network was then superimposed with available information on genes involved in stress or memory acclimation, by adding co-expressing partners of Ts/RTRs and TFs in the rice prionome that were (a) present in the rice stress interactome [21] or (b) implicated in memory acclimation (Supplemental Table 4). The resulting master regulatory network is depicted in Figure 7, and the corresponding annotated edge list is provided as Supplemental Table 7.
As can be seen in Figure 7, the network has 208 edges and 139 nodes depicting all 66 PrLPs, including five Ts/RTRs and 11 TFs, along with other significantly correlated TFs, as well as rice genes directly or indirectly implicated in memory and stress events. Most importantly, this network has two large disconnected components, each highlighting the role of rice prionome members as hubs for the currently known data on memory acclimation. Four top ranking MCODE clusters have been highlighted on the network, and each is composed of distinct but tightly inter-connected PrLP genes. Interestingly, only two non-PrLP genes are hubs in these four clusters, and these are homologs of Arabidopsis ELF3 and FORGETTER1 genes, that have very recently been associated heat stress memory via prion-like domains [6], and chromatin remodelling mechanism [3]respectively. Furthermore, till date, there has been no report connecting these two genes, nor the two above-mentioned memory mechanisms, while the GRN in Figure 7 clearly depicts how pervasively PrLPs act as bridges between the various clusters. For instance, FGT1 lies in the first PrLP cluster where it is closely interacting with RTR1, LEUNIG2, RTR3, RSG activator and RRM1 in the rice prionome, while ELF3 forms part of the third cluster in the GRN of Figure 7, interacting with PrLPs involved in clathrin assembly, NA-binding and RNA binding (RBD-FUS), implying cross-talk between the prion-mediated and epigenetic memory pathways. Other prominent hubs in the PrLP regulatory network are the transcription factors ARF5 and RF2a, as well as Ts/RTRs CACTA and RTR4, earlier observed in the PrLP correlation plot (Figure 6), upregulated in heat stress and female flowers. The four clusters form distinct yet synergistic sub-networks of stress memory within the master regulatory GRN of the prionome, with Transposon CACTA, ARF and RBD-FUS3 having a predominantly antagonistic effect on most memory related genes. For example, transcription factor ARF5 is positively correlated with two rice homologs of the MSH family, that has very recently been implicated in trans-generational heat shock memory [21], while it also negatively regulates a homolog of the heat shock protein HSA32 [22, 23]. The Ts/RTR family member RTR1 is strongly correlated with a rice homolog of a Chlamydomonas gene shown to be involved in stress recovery [24], while RTR4 is positively correlated with rice homolog of MSH1, while being inversely correlated with HSA32, closely mimicking ARF5. Similarly, RF2a expression is strongly positively correlated with multiple rice homologs of Hsp17 and inversely correlated with the rice ERD9 gene, reported to be involved in heat stress memory in Arabidopsis [25]. In stark contrast, the transposon CACTA is negatively correlated with HSFA3 and MSH family genes, and several heat shock promoter elements, while being positively correlated with HSA32.
Overall, the gene regulatory network of PrLPs, reveals a strongly inter-connected pattern of interaction between TFs, RTRs and genes involved in stress and memory processes, apart from identifying clusters and hubs for future investigation of cross-talk between these molecular factors.