3.1 Identification and basic information of the TCS Genes in switchgrass
Using the Arabidopsis TCS proteins as query sequences, BLAST P searches were conducted against the local BLAST + program. After eliminating redundant sequences and conserved domains, 87 TCS members, comprising 20 HK(L)s, 10 HPs, and 57 RRs, were identified in the switchgrass genome (Supplementary Table S2). The length of the Histidine Kinases (HKs) protein ranges from 188 to 1202 amino acid (Table 1). Among them, the AHK subfamily comprises 8 members (PvHK1-8), with amino acid lengths ranging from 757aa to 1202aa and isoelectric points ranging from 5.93 to 8.59. There are significant variations in MWs; PvHK2 had an MW of 83.93 kDa and PvHK6 was 133.14 kDa, and the subcellular localization predictions show that they are mainly localized in the plasma membrane and cytoplasm. The ETR subfamily comprises 6 members (PvETR1-6), with amino acid lengths ranging from 602aa to 769aa and isoelectric points ranging from 6.52 to 8.05. There are significant variations in MWs; PvETR3 had an MW of 67.00 kDa and PvETR2 was 85.14 kDa, and the subcellular localization predictions show that they are mainly localized in the plasma membrane and chloroplast. The PHY subfamily comprises 6 members (PvPHYA-F), and the amino acid lengths of this subfamily are distributed between 1038aa and 1165aa, except for the PvPHYE amino acid length of 188aa. The isoelectric points of the other members are between 4.75 and 5.88; Similarly, except for PvPHYE, which has an MW of 20.80 kDa, the MW distribution of other members ranged from 115.01 kDa to 127.65 kDa. Predictions of subcellular localization demonstrate that all members of this subfamily are situated in chloroplasts, thus demonstrating that they are all part of the plant's photosensitive pigment signaling system. In addition, the 20 HKs genes were unevenly distributed on 9 chromosomes, including Chr01K, Chr01N, Chr03K, Chr03N, Chr05K, Chr05N, Chr08N, Chr09K, and Chr09N, among which 9 chromosomes had no HKs genes. Analysis of the physicochemical properties of proteins from 10 HPs members revealed that the amino acid lengths of this family of proteins were mainly distributed between 143aa and 196aa, with isoelectric points ranging from 4.55 to 9.13 and the MW distribution of this members ranging from 16.10 kDa to 23.10 kDa. Prediction of subcellular localization revealed that this family of proteins were mainly situated in chloroplasts and nuclei, yet PvHP9 and PvHP10 were situated in extracellular. The 10 PvAHP genes were uniformly distributed on 8 chromosomes, including Chr02K, Chr02N, Chr03K, Chr03N, Chr05K, Chr05N, Chr06K and Chr06N, among which 10 chromosomes had no PvHP genes (Table 2).
Table 1
Basic information on PvHKs members
Gene name | ID | Chromosome | AA | PI | MW(kDa) | Location | Subcellular localization |
PvHK1 | Pavir.1KG503600.1 | Chr01K | 974 | 5.94 | 108.48 | 50664607–50671095 | Plasma membrane |
PvHK2 | Pavir.1NG411200.1 | Chr01N | 757 | 5.96 | 83.93 | 60372825–60379286 | Cytoplasmic |
PvHK3 | Pavir.5NG264417.1 | Chr05N | 1045 | 8.38 | 117.29 | 69006473–69014462 | Cytoplasmic |
PvHK4 | Pavir.5KG741600.2 | Chr05K | 1003 | 8.59 | 112.74 | 59582697–59588917 | Plasma membrane |
PvHK5 | Pavir.9KG032338.1 | Chr09K | 998 | 6.15 | 109.55 | 9520177–9526697 | Chloroplast |
PvHK6 | Pavir.9KG357000.1 | Chr09K | 1202 | 6.51 | 133.14 | 31317233–31328475 | Plasma membrane |
PvHK7 | Pavir.9NG129673.1 | Chr09N | 994 | 5.93 | 109.02 | 10433732–10440687 | Plasma membrane |
PvHK8 | Pavir.9NG401900.1 | Chr09N | 1201 | 6.09 | 132.61 | 39786999–39798389 | Plasma membrane |
PvETR1 | Pavir.1KG547200.1 | Chr01K | 769 | 6.52 | 85.11 | 56003616–56008119 | Chloroplast |
PvETR2 | Pavir.1NG554100.1 | Chr01N | 768 | 6.67 | 85.14 | 66091514–66095116 | Chloroplast |
PvETR3 | Pavir.3KG172400.1 | Chr03K | 602 | 8.05 | 67.00 | 10608391–10618346 | Endoplasmic reticulum. |
PvETR4 | Pavir.3NG214791.1 | Chr03N | 632 | 7.03 | 70.08 | 10277643–10282647 | Plasma membrane |
PvETR5 | Pavir.9KG036200.1 | Chr09K | 634 | 6.88 | 70.40 | 10837255–10841895 | Plasma membrane |
PvETR6 | Pavir.9NG148800.1 | Chr09N | 641 | 6.88 | 71.26 | 11609919–11614364 | Plasma membrane |
PvPHYA | Pavir.8NG150600.2 | Chr08N | 1038 | 5.43 | 115.01 | 13262068–13272029 | Chloroplast |
PvPHYB | Pavir.9KG029983.1 | Chr09K | 1130 | 5.75 | 125.18 | 9387722–9393185 | Chloroplast |
PvPHYC | Pavir.9NG128200.1 | Chr09N | 1131 | 5.81 | 125.33 | 10238172–10245639 | Chloroplast |
PvPHYD | Pavir.9NG674300.1 | Chr09N | 1165 | 5.88 | 127.65 | 66732027–66739482 | Chloroplast |
PvPHYE | Pavir.9KG432932.1 | Chr09K | 188 | 4.75 | 20.80 | 57020777–57021852 | Chloroplast |
PvPHYF | Pavir.9NG097800.1 | Chr09N | 1135 | 5.85 | 126.23 | 7708806–7715067 | Chloroplast |
We scrutinized the physicochemical characteristics of the amino acid sequences of 57 Response Regulators (RRs) proteins, noting disparities among the three subfamilies (Table 3). In the Type A RRs subfamily, the amino acid sequence length primarily ranged from 121 to 248 amino acids (aa), implying a relatively small molecular weight (13.58–26.77 kDa). Predictions were that the majority of these proteins would be situated in the cytoplasm, with only a scant amount in the nucleus. For the Type B RRs subfamily, the amino acid sequence length predominantly fluctuated between 330 to 718 aa, and these proteins displayed a molecular weight mainly varying between 37.96 kDa and 78.60 kDa. The Isoelectric Point (PI) primarily ranged from 5.16 to 6.26. Except for PvRR38 and PvRR42, the rest were predicted to localize in the nucleus. In the PRRs subfamily, the amino acid sequence length predominantly oscillated between 134 to 789 aa. The molecular weights exhibited significant variations, with PvPRR7 and PvPRR8 showing weights of 14.53 kDa and 85.51 kDa consecutively. The principal distribution of PI was between 5.89 and 8.91. Visualizing the chromosomal localization of all members of the TCS of switchgrass, Table 3) predicted PvPRR6 and PvPRR7 to be located in the cytoplasm, while the rest were expected to be in the nucleus. Our findings revealed that the majority of the TCS genes were situated in high-density regions close to telomeres (Fig. S1), thus providing a more intuitive representation of the chromosomal localization of all genes. The results revealed that 87 members were spread out unevenly among 18 chromosomes, with the most on Chr01K, Chr01N, Chr09K, and Chr09N, and the least on Chr04N, where only PvRR40 was present.
Table 2
Basic information on PvHPs members
Gene name | ID | Chromosome | AA | PI | MW(kDa) | Location | Subcellular localization |
PvHP1 | Pavir.2KG217732.1 | Chr02K | 145 | 4.55 | 16.10 | 20762489–20764700 | Chloroplast |
PvHP2 | Pavir.2NG243700.1 | Chr02N | 145 | 4.55 | 16.17 | 20504633–20508784 | Chloroplast |
PvHP3 | Pavir.3KG164227.1 | Chr03K | 196 | 9.13 | 23.10 | 11996850–12001496 | Chloroplast |
PvHP4 | Pavir.3KG340100.1 | Chr03K | 152 | 6.16 | 17.63 | 18606164–18608602 | Nuclear |
PvHP5 | Pavir.3NG177203.1 | Chr03N | 151 | 8.99 | 17.81 | 12160085–12162597 | Nuclear |
PvHP6 | Pavir.3NG258154.1 | Chr03N | 152 | 5.02 | 17.39 | 19177857–19180611 | Chloroplast |
PvHP7 | Pavir.5KG546300.1 | Chr05K | 151 | 5.87 | 17.14 | 48066554–48069147 | Chloroplast |
PvHP8 | Pavir.5NG491700.1 | Chr05N | 151 | 6.59 | 17.26 | 56413903–56416272 | Chloroplast |
PvHP9 | Pavir.6KG407400.1 | Chr06K | 143 | 6.72 | 16.23 | 47708419–47712493 | Extracellular |
PvHP10 | Pavir.6NG369800.1 | Chr06N | 143 | 5.65 | 16.25 | 52302989–52306023 | Extracellular |
Table 3
Basic information on PvRRs members
Gene name | ID | Chromosome | AA | PI | MW(kDa) | Location | Aliphatic Index |
A-type response regulator | | | | | |
PvRR1 | Pavir.1KG338200.1 | Chr01K | 241 | 8.95 | 26.15 | 35984744–35987607 | Endoplasmic reticulum |
PvRR2 | Pavir.1KG392600.1 | Chr01K | 144 | 8.82 | 15.35 | 42736046–42737235 | Cytoplasmic |
PvRR3 | Pavir.1KG554300.2 | Chr01K | 127 | 6.73 | 14.38 | 56477176–56477792 | Nuclear |
PvRR4 | Pavir.1NG291000.1 | Chr01N | 248 | 8.5 | 26.77 | 45506257–45509060 | Chloroplast |
PvRR5 | Pavir.1NG358300.1 | Chr01N | 142 | 8.82 | 15.27 | 51868134–51869783 | Cytoplasmic |
PvRR6 | Pavir.1NG546300.1 | Chr01N | 132 | 6.3 | 14.91 | 66559760–66560374 | Cytoplasmic |
PvRR7 | Pavir.2NG278900.2 | Chr02N | 179 | 5.17 | 19.37 | 18467277–18468982 | Chloroplast |
PvRR8 | Pavir.3KG016200.1 | Chr03K | 200 | 5.71 | 22.33 | 1743605–1745736 | Nuclear |
PvRR9 | Pavir.3NG010100.1 | Chr03N | 196 | 6.39 | 22.14 | 1332042–1334707 | Nuclear |
PvRR10 | Pavir.5NG632500.1 | Chr05N | 232 | 5.37 | 25.31 | 70764613–70768207 | Nuclear |
PvRR11 | Pavir.6KG234700.1 | Chr06K | 123 | 5.48 | 13.87 | 31928649–31929933 | Cytoplasmic |
PvRR12 | Pavir.6KG234800.1 | Chr06K | 123 | 6.27 | 14.00 | 31882269–31883864 | Cytoplasmic |
PvRR13 | Pavir.6KG235000.1 | Chr06K | 123 | 5.55 | 13.95 | 31857300–31858901 | Cytoplasmic |
PvRR14 | Pavir.6KG299100.1 | Chr06K | 122 | 5.54 | 13.77 | 39653675–39654994 | Cytoplasmic |
PvRR15 | Pavir.6KG299200.1 | Chr06K | 122 | 5.88 | 13.77 | 39660020–39661391 | Cytoplasmic |
PvRR16 | Pavir.6KG305100.2 | Chr06K | 122 | 5.54 | 13.75 | 39857811–39859179 | Cytoplasmic |
PvRR17 | Pavir.6NG213221.1 | Chr06N | 123 | 5.76 | 13.88 | 33354744–33356231 | Cytoplasmic |
PvRR18 | Pavir.6NG243100.1 | Chr06N | 121 | 4.9 | 13.63 | 43326692–43328034 | Cytoplasmic |
PvRR19 | Pavir.6NG243200.1 | Chr06N | 121 | 4.9 | 13.63 | 43361982–43363319 | Cytoplasmic |
PvRR20 | Pavir.6NG243400.2 | Chr06N | 121 | 4.9 | 13.63 | 43373513–43374838 | Cytoplasmic |
PvRR21 | Pavir.6NG246800.1 | Chr06N | 121 | 5.28 | 13.58 | 43182915–43183924 | Cytoplasmic |
PvRR22 | Pavir.7KG168600.1 | Chr07K | 239 | 8.58 | 25.46 | 33485405–33489551 | Endoplasmic reticulum |
PvRR23 | Pavir.7KG253900.1 | Chr07K | 139 | 8.52 | 15.08 | 40071483–40073013 | Cytoplasmic |
PvRR24 | Pavir.7KG378200.1 | Chr07K | 166 | 6.74 | 17.87 | 51421063–51422332 | Chloroplast |
PvRR25 | Pavir.7NG088270.1 | Chr07N | 123 | 5.55 | 13.92 | 879792–881191 | Cytoplasmic |
PvRR26 | Pavir.7NG096700.1 | Chr07N | 123 | 5.55 | 13.95 | 286850–291495 | Cytoplasmic |
PvRR27 | Pavir.7NG190900.1 | Chr07N | 230 | 8.84 | 24.47 | 31254128–31256852 | Chloroplast |
PvRR28 | Pavir.7NG193417.1 | Chr07N | 238 | 8.74 | 25.25 | 31224958–31227926 | Cytoplasmic |
PvRR29 | Pavir.7NG340100.1 | Chr07N | 139 | 7.71 | 15.00 | 38049481–38051159 | Cytoplasmic |
PvRR30 | Pavir.7NG435700.1 | Chr07N | 168 | 6.15 | 18.18 | 49462085–49462709 | Cytoplasmic |
Gene name | ID | Chromosome | AA | PI | MW(kDa) | Location | Aliphatic Index |
PvRR31 | Pavir.8KG033800.2 | Chr08K | 203 | 5.25 | 22.71 | 2776315–2778225 | Nuclear |
PvRR32 | Pavir.8NG029800.1 | Chr08N | 201 | 5.76 | 22.57 | 1621900–1623780 | Nuclear |
B-type response regulator | | | | | |
PvRR33 | Pavir.1KG088000.1 | Chr01K | 636 | 6.26 | 69.20 | 6612450–6617580 | Nuclear |
PvRR34 | Pavir.1KG089500.1 | Chr01K | 636 | 6.2 | 69.16 | 6528098–6532853 | Nuclear |
PvRR35 | Pavir.1KG518100.1 | Chr01K | 677 | 5.97 | 73.66 | 4551136–54557045 | Nuclear |
PvRR36 | Pavir.1NG079300.2 | Chr01N | 635 | 6.16 | 68.90 | 6549521–6554631 | Nuclear |
PvRR37 | Pavir.4KG074200.2 | Chr04K | 679 | 6.2 | 74.55 | 6522275–6526885 | Nuclear |
PvRR38 | Pavir.4KG310700.1 | Chr04K | 718 | 5.91 | 78.60 | 38944024–38949697 | Cytoplasmic |
PvRR39 | Pavir.4KG346800.1 | Chr04K | 691 | 6.06 | 74.97 | 41103674–41115149 | Nuclear |
PvRR40 | Pavir.4NG179900.1 | Chr04N | 682 | 5.57 | 73.76 | 41246435–41252355 | Nuclear |
PvRR41 | Pavir.5KG703400.1 | Chr05K | 575 | 5.24 | 64.28 | 58192270–58206483 | Nuclear |
PvRR42 | Pavir.7KG006300.1 | Chr07K | 368 | 5.8 | 41.73 | 24921421–24923835 | Cytoplasmic |
PvRR43 | Pavir.7KG006400.1 | Chr07K | 380 | 5.16 | 42.84 | 24908380–24910902 | Nuclear |
PvRR44 | Pavir.9KG213000.1 | Chr09K | 330 | 5.35 | 37.96 | 23382295–23412337 | Nuclear |
PvRR45 | Pavir.9KG540900.5 | Chr09K | 687 | 6.26 | 73.87 | 62548307–62554216 | Nuclear |
PvRR46 | Pavir.9NG196400.7 | Chr09N | 331 | 5.29 | 38.05 | 25130131–25134127 | Nuclear |
PvRR47 | Pavir.9NG840500.7 | Chr09N | 685 | 6.1 | 73.77 | 73610285–73615723 | Nuclear |
Pseudogene responsive regulator (PRRs) | | | | | |
PvPRR1 | Pavir.1KG385300.1 | Chr01K | 521 | 6.01 | 57.98 | 41744859–41747964 | Nuclear |
PvPRR2 | Pavir.1NG350900.1 | Chr01N | 522 | 5.89 | 58.02 | 50799219–50802447 | Nuclear |
PvPRR3 | Pavir.2KG379300.1 | Chr02K | 625 | 6.63 | 69.71 | 53418020–53423227 | Nuclear |
PvPRR4 | Pavir.2NG448600.1 | Chr02N | 623 | 6.42 | 69.53 | 54763663–54768456 | Nuclear |
PvPRR5 | Pavir.2NG610075.1 | Chr02N | 746 | 6.41 | 80.40 | 70094034–70106996 | Nuclear |
PvPRR6 | Pavir.5KG544714.1 | Chr05K | 168 | 7.59 | 18.69 | 48851150–48852705 | Cytoplasmic |
PvPRR7 | Pavir.5NG470586.1 | Chr05N | 134 | 8.91 | 14.53 | 57338487–57339126 | Cytoplasmic |
PvPRR8 | Pavir.8KG072800.4 | Chr08K | 789 | 8.87 | 85.51 | 4588507–4593979 | Nuclear |
PvPRR9 | Pavir.9KG482200.1 | Chr09K | 757 | 6.21 | 82.89 | 58530952–58541644 | Nuclear |
PvPRR10 | Pavir.9NG690300.1 | Chr09N | 757 | 7.01 | 82.66 | 68849641–68857961 | Nuclear |
3.2 Phylogenetic analysis of TCS proteins
To analyse the phylogenetic relationships of PvHKs genes, an unrooted NJ tree was constructed based on monocotyledonous and dicotyledonous plants of 20 HKs proteins from switchgrass, 16 from Arabidopsis, 14 from rice, 17 from maize and 18 from tomato (Fig. 1, Supplementary Table 2). The results showed that the 85 HK(L) proteins from the five land plant species were perfectly divided into six distinct subfamilies, namely, cytokinin receptors, CKI1, AHK1, AHK5, ethylene receptors, and phytochromes. It is interesting that the HKs members of switchgrass and maize are only included in cytokinin receivers, ethylene receivers, and phytochromes, which is similar to previous research results.
Two subfamilies of the authentic and pseudo-HP proteins from these five species were mainly identified, which were marked with red and blue lines in Fig. 2. It can be clearly seen that the HPs phylogenetic relationship between Arabidopsis and tomato, which belong to the same dicotyledonous plant genus, is closer, while the evolutionary relationships of HPs members in rice, corn, and switchgrass, which belong to the same monocotyledonous plant genus, are more similar. The evolutionary relationship of AHP protein between dicotyledonous and monocotyledonous plants is varied, while the evolutionary relationships within each plant are comparatively stable. In the dicotyledonous group, SlHP4 and SlPHP1 are grouped with AHP4, while the other members of tomato are grouped separately with AHP1, AHP2, AHP3, and AHP5, which may also be effective in cytokin signaling [56]. It is speculated that PvHPs are also related to abiotic stresses such as salinity and drought, as evidenced by the aggregation of switchgrass HPs and OsHP1/2 on tree branches [30]. Considering their lack of homologous counterparts with Arabidopsis, further research is needed on their evolutionary mechanisms and functional roles.
The RR proteins of the five species were divided into 5 subfamilies: type-A RR, type-B RR, type-C RR, type-B PRR, and Clock PRR (Fig. 3). The A-type RR protein occupies two separate branches, one of which contains only five members of Arabidopsis and tomato; The main branch is another large one, which includes 5 species, and the A-type RR of three monocotyledonous plants, namely switchgrass, corn, and rice, is only included in this subfamily. All B-type RR proteins occupy a large branch alone, and there are several sub-branches within it. It is evident that monocotyledonous and dicotyledonous species tend to cluster together, indicating that these different sub-branches may occur after the diversification of monocotyledonous and dicotyledonous plants. C-type RR and A-type RR have similar structures, but their genetic distance is not closely related. Studies have indicated that C-type RR is likely the oldest RR, while A-type RR could have been derived from C-type RR through mutations in its promoter [57]. We divided the PRRs into two subfamilies: type B PRR and Clock PRR, both of which comprise members from five species. Approximately twice the amount of members in the Clock PRR subfamily is that of B-type PRR, and there is no obvious difference between monocotyledonous and dicotyledonous plants, implying that the PRR protein is fairly consistent in the development of monocotyledonous and dicotyledonous plants.
3.3 Conserved motifs and gene structure analysis
The differences in UTR-CDS structure often reflect evolutionary information within gene families, while protein function and evolutionary relationships are typically determined by the type and composition of conserved motifs. Using Evolview, the evolutionary relationships and gene structures of the three gene families (HKs, AHP, and RRs) in TCS were visualized from the genome of switchgrass. The MEME online program was then employed to analyze the conserved motifs (HKs10, AHP10, and RRs20) among the members of these gene families.
Firstly, members of the HK, ETR, and PHY subfamilies in the HKs family cluster in three different branches (Fig. 4A), and the evolutionary relationship between the HK subfamily and the ETR subfamily is closer. Secondly, members within the same subgroup share conserved gene structures, while gene members in different subgroups exhibit some differences (Fig. 4B). Interestingly, although PvPHYE belongs to the PHY subfamily, it has the shortest gene length and the simplest gene structure. The above results indicate that there is rich diversity among members of the HKs family and relatively conservative evolutionary relationships within subfamilies, which enables them to play different functions during plant growth and development. Conservative motif analysis of HKs indicates that PvHKs proteins typically contain 4–6 motifs, with motif 1, motif 2, motif 3, and motif 4 being highly conserved domains in all PvHKs protein sequences except for PVPHY. The HK subfamily contains all 10 motifs, while the ETR subfamily contains the same motif except for motif 6 and motif 9 found in PvETR1 and PvETR2. There is no significant difference in the motif content among the other members of the PHY subfamily except for PvPHY. In addition, specific motifs unique to certain subgroups were identified, such as motif10, which only appeared in the HK subgroup (Fig. 4C, Fig.S2). Visualizing the Protein Domain, a member of the HKs family, revealed significant differences between the PHY subfamily and the other two subfamily members. PAS domain is a unique domain of this family; Although most members of the ETR subfamily and HK subfamily have PPK1107 Domain, which explains the close evolutionary relationship between the two, CHASE Domain only appears in the HK subfamily, and GAF only appears in the ETR subfamily; In addition, some minority domains appear in specific subfamilies, such as REC Domain and ETR-ERS-EIN4 Domain appearing in PvETR1 and PvETR2 (Fig.S3).
On the basis of the phylogenetic tree within the AHP gene family of switchgrass, a comparative analysis was conducted on the UTR-CDS, Motif, and Domain of this family. The gene structures of 10 HPs members were strikingly alike, with the exception of PvHP3, which had an extra exon; the other members only had 6 exons, suggesting that the family's member structure is highly conserved in evolutionary relationships (Fig. 5A, B). AHPs conserved motif analysis shows that PvHPs proteins typically contain 3–4 motifs, with motif1, motif2, and motif3 being highly conserved domains among all PvHPs protein sequences; Meanwhile, motif4 and motif5 also appear relatively conservatively in PvHP3-8 (Fig. 5C, Fig. S4). In addition, a conserved phosphate transfer domain (HPt) containing His was identified in the AHP family, corresponding to motif 1 (Fig. S5).
Evolutionary analysis was conducted on all members of the RRs family in switchgrass, and it was found that similar to the previous evolutionary analysis (Fig. 3), A-type RR, B-type RR, and PRRs were clustered together, respectively. Simultaneously, the gene structures of members in the same subfamily are comparable, as evidenced by the gene structure (Fig. 6A, B). Considering that there are multiple subfamily types within the RRs family, we selected 20 motifs to analyze their protein structures. RRs family proteins typically contain 4–5 conserved motifs, among which motif 1, motif 2, motif 4, and motif 5 are highly conserved motifs among almost all RRs member sequences. In addition, motif 3 and motif 11 are only present in the B-type RR subfamily, motif 6, motif 12 and motif 17 are only present in the PRRs subfamily, and motif 9 is only present in the A-type RR subfamily (Fig. 6C, Fig. S6). Meanwhile, we analyzed the structural domains of protein sequences of RRs family members. All other members of the 32 type A RRs, apart from PvRR8, PvRR9, PvRR10, PvRR31, and PvRR32, had a REC domain; the type-B RRs, however, were distinguished by one REC domain in the N terminal end and one Myb domain in the C terminal end. All 15 type-B RRs meet this characteristic. In addition, two subgroups of pseudo RRs (PRRs) can be further distinguished: clock type and B-type. Compared to real RR, PRR has a pseudo REC domain. The CCT domain of Clock PRR (Fig. S7) is of great significance in controlling cyclic rhythms and flowering time [58].
3.4 Synteny analysis of PvTCS genes in switchgrass
Switchgrass is a homologous tetraploid with two subgenomes. We used a database downloaded from JGI (Walnut Creek Joint Genomics Institute, California, USA) to determine the positions and chromosome lengths of three gene family members, PvHKs, PvHPs, and PvRRs, in TCS. Here, all 87 PvTCS are assigned to 18 chromosomes, and the distribution of PvTCS on chromosomes is uneven, with significant differences in chromosome distribution among different families (Fig. 7). Moreover, in the collinearity analysis of the TCS gene in switchgrass, a total of 36 fragment repeats and one tandem repeat were identified, involving 44 PvTCS genes (Fig.S1). Members of the HKs family are mainly distributed on the Chr09K and Chr09N chromosomes, with PvHK1 and PvHK2 located on Chr01K and Chr01N, respectively. They are homologous genes, and PvHK3 and PvHK4, as well as PvETR3 and PvETR4, have the same relationship; A-type RRs are mainly distributed on Chr06K, Chr06K, Chr07K, and Chr07N, and segmental and tandem repeats are the modes of gene replication in this subfamily; The distribution of B-type RR on 8 chromosomes is relatively uniform, with fragment duplication being the main mode. Similarly, PRRs mainly appear in the form of fragment replication on 9 chromosomes of switchgrass, with a relatively uniform distribution of their numbers. The results of Synteny analysis show that certain PvTCS pairs are close to each other on the same chromosome, with similar structures and evolutionary relationships.
Calculations of the synonymous rate (Ks), non-synonymous rate (Ka), and Ka/Ks of the duplicates were made, and the divergence time was determined using the Ks values (Supplementary Table S3). These Ks values spanned from 0.04219–1.590162, which is equivalent to the divergence time from 0.402–53.005 MYA (million years ago). The long time span suggests that TCS gene expansion and evolution occurred over millions of years, and not only at specific time points. Furthermore, the Ka/Ks values of the segmental duplications, all of them less than 1, suggest that they all underwent purification selection. To further predict the species evolution mechanism of PvTCS family members, TCS genes from switchgrass, Arabidopsis, and rice were selected for synteny analysis (Fig. 8). Among them, switchgrass and Arabidopsis have 7 tandem repeat genes, with fewer collinearity genes and 64 tandem repeat genes with rice, indicating that the TCS gene family proteins are relatively conserved in Poaceae and monocotyledonous plants.
3.5 Analysis of cis-elements in the putative promoter regions of TCS genes in switchgrass
The 2000 bp regions upstream of the transcriptional start sites were selected for the purpose of identifying cis-elements, which act as TF binding sites and can be used to determine tissue-specific or stress-responsive gene expression patterns [59]. To further investigate the transcriptional regulation and potential functionalities of these TCS genes, these cis-regulatory elements were situated upstream of the genes. Predicted cis-acting regulatory elements, mainly related to plant abiotic stress and hormones (Figs. 8–12), include light responsive elements such as AE box, G-box, Sp1, as well as cis-elements associated with cold stress (LTR), drought response (MBS), heat stress (STRE), anaerobic stress (ARE, GC motif), injury stress (WUN motif), and more. Cis-acting elements, such as those involved in abscisic acid (ABRE), auxin (IAA) response (AuxRR core, TGA element), jasmonic acid (JA) response (CGTCA motif, TGACG motif), gibberellin (GA) response (GARE motif, P-box, TATC box), salicylic acid (SA) TCA element, and ethylene responsive element, are typically found in promoters of PvTCS genes, and regulate hormone responses. Incorporating seed-specific expression (RY-element), cell cycle regulation (MSA-like), and meristem expression (CAT box) as cis-elements, the TCS gene of switchgrass may be implicated in the control of various abiotic stresses, plant hormone reactions, and cell growth and development. Moreover, cis-elements (circadian) related to circadian rhythms were also identified. To sum up, this gene may be involved in the regulation of growth and development.
3.6 Expression analysis of TCSs indiferent tissues and growth stages
Figure 9 Expression profles of TCSs in diferent tissues and growth stages. Note: IN3 vas bundle. E4 represents the vascular bundle isolated from the 1/5 fragment of internode 3. These tissues of pool leaf blade. E4, pool leaf sheath. E4, pool nodes. E4, pool whole crown. E4 and whole root sys were collected from the leaf blade, leaf sheath, nodes, and root system in elongation growth stage 4 (4 nodes). DAP0, DAP5, DAP10, DAP20, DAP25 and DAP30 represent the whole foret at 0, 5, 10, 20, 25, and 30days post fertilization, and V1, V3, and V5 represent vegetative growth stage 1 (1 leaf ), vegetative growth stage 3 (3 leaves), and vegetative growth stage 5 (5 leaves)
The expression patterns of TCS genes in 26 tissues, obtained from the JGI database, were visualized with R in order to investigate the possible roles of TCS genes in plant growth and development (Fig. 9, Supplementary Table 4). The majority of the TCS genes in switchgrass, out of the 87 members, have higher expression levels in the roots, followed by stems and leaves. In the early flowering stage, some members also showed higher expression levels, while members with lowerexpression levels mainly appeared in the tissues between seeds and stems. Further analysis indicates significant tissue expression specificity among the HKs, AHP, and RRs families. At the outset, apart from PvPHYA and PvPHYC, all HKs exhibited a considerable amount of expression in nutrient organs, such as roots and stems, and during nutrient development phases. However, they showed relatively low expression in reproductive organs and reproductive growth stages. Secondly, there are similarities in the relative expression levels between AHP family members and HKs family members, with a slight difference being that their relative expression levels in leaves are also relatively high. In addition, due to the large number of members and numerous subgroups in the RRs family, there are significant differences in expression levels among members. For example, PRRs members not only have relatively high expression levels in roots and stems, but also in stem tips and inflorescences. Some RRs members show significant high expression levels at various stages of inflorescence and post flowering (PvRR12-26, PvRR39-40, 42, 43), and show a trend of first increasing and then decreasing during flower development, indicating that A-type RRs and B-type RRs also play an important role in the development of switchgrass flowers. The above results suggest that TCS members may have different functions at different stages of plant growth and development.
3.7 Functional enrichment analysis of TCS genes
This study identified 87 TCS genes for which Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were conducted. Analysis of GO enrichment revealed that the TCS gene was primarily regulated by hormone signal transduction (e.g. cytokinin and ethylene), histidine phosphorylation, phosphate transduction, meristem development, embryonic tissue development, root development, light signal response, nitrogen compound metabolism, leaf senescence, and circadian rhythm Meanwhile, the molecular functions involved in TCS genes are mainly divided into four categories, namely protein kinase activity, phosphotransferase activity, molecular transducer activity, and hormone binding receptor related activity. It is worth noting that these genes are mainly located in the nucleus, intracellular, reticular and endometrial systems, as well as organelles associated with the cell membrane (Fig. 10A). In addition, the results of KEGG enrichment analysis showed that the main enriched pathways for all TCS genes included plant hormone signaling transduction (ko: 04075), circadian rhythm (ko: 04712), and MAPK signaling pathway (ko: 04016) (Fig. 10B). The evidence points to TCS genes being primarily responsible for plant reactions to environmental stimuli, hormone signal transduction, and circadian rhythms.
3.8 Prediction of protein-protein interaction and miRNAs targeting of TCS members
This study employed the rice protein database as a reference to anticipate protein interactions between TCS members of switchgrass, due to the two-component system's critical part in plant reactions to abiotic stress and hormone signal transduction. The results indicated a variety of protein interactions between HKS and AHPs, particularly among histidine protein kinase members (PvHKs) and phosphate transfer proteins (PvHPs), which serve as cytokinin signal receptors (purple line connection). Additionally, there are diverse interactions between pvhps and response regulators (RRs), with the interaction between PvHP1/2/3/5 and pvrrs being notably complex. It is likely that PvHPs are primarily linked to A-type ARRs, as they are thought to be the primary response genes of cytokinins [60, 61]. Furthermore, there are various interactions among pvrrs (Fig. 11A). Therefore, it is speculated that there is a complex interaction within the TCS of switchgrass, offering a reference for further research on the molecular mechanisms of TCS in regulating plant growth, development, and responses to abiotic stress.
The regulation of plant growth, development, and stress is greatly impacted by microRNA (miRNA), which has been demonstrated to be involved in the timing of flowering, the formation of flower organs, the formation of apical meristem [62], and lateral root development [63]during growth and development. Additionally, miRNA is involved in the regulation of drought stress, salt stress, and temperature stress during abiotic stress [63–65]. In this study, by using the mRNA sequence of switchgrass TCS members as input, miRNAs targeting switchgrass TCS members were retrieved from the rice miRNA database. The prediction results showed that a total of 133 miRNAs targeted 63 TCS members (Fig. 11B, Supplementary Table 5). Among them, seven PvHKs are mainly targeted by genes such as miR2919, miR2871, and miR2873. PvETRs are mainly regulated by miRNAs such as miR444 and miR2100. PvHPs are regulated by 11 miRNAs, such as miR1425-5p, miR5148, and miR5512. A-type ARRs are regulated by a variety of miRNAs. For example, PvRR26 is mainly regulated by miR164 and miR5836. PvRR22 is the target gene of miR169. Some miRNAs can regulate multiple target genes simultaneously, such as miR1858 regulating PvRR4 and PvRR27, and miR2919 regulating the expression of PvRR10/22/28. Similarly, B-type ARRs are also regulated by a variety of miRNAs, such as miR168, miR397, and miR319. PRRs are the target genes of miR1871, miR5075, and miR5501. The prediction of TCS member microRNAs is of great significance for the further study of the post-transcriptional regulation of switchgrass TCS members in the future. A basis for further exploration of its molecular operation is thus established.
3.9 Expression profiles of TCS genes in response to various abiotic stresses
Studies have increasingly revealed that TCS genes are involved in plant responses to abiotic stressors. Consequently, due to the disparities in gene structure and cis-acting elements in the promoter area, we studied the expression of 31 genes from 87 PvTCSs in the leaves of switchgrass when exposed to drought, low temperature, salt, high temperature, and ABA stress. Except for two unexpressed genes PvRR42 and PvRR43, the expression patterns of the remaining 29 genes in the leaves of switchgrass under different stress treatments were focused on the heatmap (Fig. 12). As treatment time elapsed for drought stress, most TCS genes displayed an augmented expression pattern, 13 of which (PvHK1/2/3, PvETR1/3, PvHP1, PvRR1/9/38/40/45, and PvPRR6/7) attained their peak expression after 12 hours of treatment, and their expression levels began to decline after 24 hours. Similarly, the relative expression levels of nine TCS genes (PvHK4, PvPHYB/E/F, PvHP3, PvRR7/33/41, PvPRR3) showed a weak upward trend within 0–12 hours but reached their maximum value after 24 hours of drought treatment. However, three clustered genes, PvPHYA and PvHP4/9, reached their expression peak after 6 hours of drought treatment and then showed a decreasing trend (Fig. 12A). Following 12 hours of low-temperature treatment, the expression of the majority of TCS genes was significantly augmented, and the drought treatment that followed caused a decrease in their expression levels. Eight genes (PvHK2, PvPHYA, PvHP1/4, PvRR9/37/38/41) still maintained high expression levels after 24 hours of drought (Fig. 12B). Unlike low-temperature treatment, high-temperature treatment at 40°C resulted in four expression patterns of TCS genes. Among them, seven genes (PvPHYB/E/F, PvRR40, PvPRR3/7/9) showed a gradually decreasing trend in expression levels with the passage of high-temperature stress time. After 6 hours of high-temperature treatment, a marked upregulation of expression levels was observed in PvETR3 and PvPRR1/6, yet this trend began to diminish as the stress period progressed. Five genes (PvETR1, PvHK2/4, PvRR38/45) showed a gradual increase in expression levels within 12 hours of treatment, and their expression levels began to decrease again after 12 hours, but their relative expression levels at 24 hours were still significantly higher than those in the control group (0 h). The expression of the 12 genes (PvHK1/3, PvHP1/3/4/9, PvRR1/7/9/33/37/44) gradually increased within 24 hours, reaching its peak at 24 hours (Fig. 12C). The expression levels of TCS genes under salt stress can be divided into four distinct patterns; PvRR1 and PvPHYA/E were notably higher in the initial stage of stress treatment than after 12 hours. The expression levels of PvHK4, PvHP4, and PvRR7/37/40 remained almost unchanged in the first 12 hours of treatment, but were significantly upregulated at 24 hours. PvRR33/41, PvPHYB, and PvHP1/3/9 showed a decreasing and then increasing expression trend. However, PvRR9/39/45, PvETR1, PvPHYF, and PvPRR1/3/9's relative expression levels gradually rose over the course of 24 hours, reaching their peak at 24 hours, while the other members showed a decreasing and then increasing expression trend. The expression level gradually increased with time within 12 hours of drought treatment, but showed a decreasing trend after 12 hours. At 24 hours, the expression level was still significantly greater than the control group (0 h) (Fig. 12D). ABA, as an important plant hormone, participates in regulating various environmental stress responses in plants. Studies have shown that there is a close interaction between cytokinins and ABA, which jointly regulate plant response to environmental signals and growth and development processes [66]. Investigating the influence of exogenous ABA on the expression of TCS genes in switchgrass, Fig. 11E revealed that the majority of the detected TCS genes were generally increased after ABA treatment, which is in agreement with the widespread presence of ABA responsive elements (ABRE) in the promoter region of TCS genes in switchgrass.
3.10 Subcellular localization analysis
Confocal laser microscopy was employed to detect fluorescence signals in Nicotiana benthamiana epidermal cells, and transient expression of green fluorescent protein (EGFP) fusion protein was used to analyze the subcellular localization of TCS protein. The results showed that histidine protein kinase (PvHK2) was mainly expressed in the cytoplasm and cell membrane, revealing that PvHK2 may be a membrane-bound receptor protein (Fig. 13A-D). PvETR3 has been identified as an ethylene receptor, and the fluorescence signal of PvETR3-EGFP is mainly detected on the cell membrane, while there is also weak fluorescence in the cytoplasm. This may be due to the functional role of ETR protein in sensing ethylene signals on the cell membrane and endoplasmic reticulum membrane (Fig. 12E-H). The cell membrane and cytoplasm are the primary sites of expression for the fluorescence signal of the light receptor protein PvPHYE EGFP, with chloroplasts also displaying fluorescence signals (Fig. 12I-L). PvHP1 EGFP, the phosphate transfer protein, is mainly localized in the cell membrane and nucleus, though weak fluorescence signals were also detected in the cytoplasm (Fig. 13M-P).Only in the nucleus was the fluorescence signal of PvRR9 EGFP detected (Fig. 13Q-T); however, the B-type regulatory factor PvRR45's fluorescence signal was only seen in the nucleus (Fig. 13U-X), indicating its role as a transcription factor. To confirm this, we fused the reported nuclear marker gene OsSK2 with ERFP (OsSK2-ERFP) and co-expressed it with TCS-EGFP in tobacco epidermal cells. The overlap of green and red fluorescence signals suggested that PvRR9 and PvRR45 are both nuclear localization proteins. In Arabidopsis, all RR proteins were localized exclusively to the nucleus, apart from ARR3 and ARR16, which are also located in the cytoplasm [67]. Similar results were obtained for subcellular localization of tomato TCS protein [16]. Consequently, the subcellular localization findings of six TCS proteins in switchgrass in this study are strikingly analogous to prior studies and show distinct subcellular targeting, implying that they serve distinct roles in the two-component system.