Identification of TaHMT Genes from the Whole Wheat Genome
Hmmsearch and the BLASTP program obtained 152 HKMT (histone lysine methyltransferase) genes and 23 PRMT (histone arginine methyltransferase) genes, which were named TaHMT1–TaHMT152, coinciding with TaHKMT1–TaHKMT152, TaHMT153–TaHMT175, and TaPRMT1-23, following the nomenclature 1A to 1D, 2A to 2D, up to 7A to 7D, based on their position on the chromosome. In total, we identified 175 genes that catalyze two kinds of histone methyltransferases (Fig. 1, Table 1, Table S1 and Table S2). Chromosome distribution showed that the HKMT gene family of wheat was distributed across all chromosomes and showed a trend of cluster distribution on chr2 and chr3. The small number of PRMT genes entails that they are orphans that exist on some chromosomes, such as chr4A (Figure S1).
Table 1
Numbers of HMT genes in wheat and related species
species
|
Number of HMTs
|
Number of HKMTs
|
Ratio to wheat
|
Number of PRMTs
|
Ratio to wheat
|
Wheat
|
175
|
152
|
--
|
23
|
--
|
Arabidopsis
|
57
|
48
|
3.17
|
9
|
2.56
|
Rice
|
47
|
39
|
3.90
|
8
|
2.88
|
Brachypodium
|
56
|
48
|
3.17
|
8
|
2.88
|
Sorghum
|
54
|
46
|
3.30
|
8
|
2.88
|
Hordeum
|
49
|
42
|
3.62
|
7
|
3.29
|
HMT Gene Family of Wheat Has Expanded with More In-paralogs Genes
Sorghum bicolor, Hordeum vulgare, and Brachypodium distachyon are usually considered to be relatives of wheat due to the whole-genome duplication events that occurred at about 45–60 million years ago [38]. In this study, the HMT gene families in Arabidopsis thaliana, Oryza sativa, Hordeum vulgare, Sorghum bicolor, and Brachypodium distachyon were also comprehensively identified in Table 1 using the same method as that employed for wheat. According to previous results, the number of HMT genes family in diploid species is between 40 and 60. We found that there were 47 and 57 HMT genes in rice (Os) and Arabidopsis (At), respectively, which is close to the results previously obtained. In addition, 54, 56, and 49 HMT genes were obtained from the Sorghum bicolor (Sb), Brachypodium distachyon (Bd), and Hordeum vulgare (Hv) genome data, respectively. The proportion of gene quantity between wheat and other species was 3.07 (At), 3.72 (Os), 3.24 (Sb), 3.13 (Bd), and 3.57 (Hv), close to a ratio of 3:1, as expected (χ2 test, P > 0.05).
To further explore the evolutionary relationship between wheat and the other five species, the orthologs and in-paralogs between each pair of these species were calculated. The number of orthologs in each species with one other genome is shown in Table 2. The result shows that wheat had a high average ortholog group size of 2.32, which also meant that every HMT gene had an average of 1.32 paralogs. The group sizes of other five species were significantly smaller than those of wheat, indicating that the genome ploidy of wheat enriches the number of paralogous genes for the HMT gene family. Significantly, the fact that wheat and Brachypodium distachyon clearly have more orthologous genes proves a later divergence between the two species, which is also a reflection of the previous conclusion that Brachypodium distachyon has a common ancestor with wheat.
Table 2
Total number of orthologs identified by In-paranoid 4. 1
species
|
Wheat
|
Arabidopsis
|
Rice
|
Sorghum
|
Brachypodium
|
Hordeum
|
Average in-paralogs
|
Wheat
|
--
|
137/401
|
121/43
|
132/46
|
141/50
|
111/44
|
2.321
|
Arabidopsis
|
48/40
|
--
|
42/34
|
42/34
|
45/36
|
42/34
|
1.23
|
Rice
|
44/43
|
38/34
|
--
|
40/39
|
38/37
|
36/36
|
1.04
|
Sorghum
|
47/46
|
39/34
|
40/39
|
--
|
44/44
|
40/39
|
1.04
|
Brachypodium
|
50/50
|
44/36
|
38/37
|
47/44
|
--
|
40/40
|
1.06
|
Hordeum
|
44/44
|
38/34
|
37/36
|
40/39
|
40/40
|
--
|
1.03
|
1 The number of orthologs in an organism when clustered with another genome was shown on the left of the slash while the number on the right of the slash referred to the number of ortholog groups between two species. For wheat, average size of ortholog group: 2.32 = (137/40 + 121/43 + 132/46 + 141/50 + 111/44)/5. Notably, Table 2 was not a symmetrical table, since gene duplication frequency in organism ‘A’ generally differed from that in organism ‘B’ since speciation of organism ‘A’ and ‘B’ and thus the number of ‘A’ genes having orthologs in organism ‘B’ was unequal to the number of ‘B’ genes having orthologs in organism ‘A’. |
According to the published wheat genome data, there are about in homologous groups of three, termed triads (1:1:1; 35.8% of genes) (IWGSC, 2018) [39]. Our results revealed a higher proportion 92.76% of HKMTs were in triads. In PRMTs, the ratio is 91.3% (Table 3). The high homology retention rate indicates that the HMT gene of three sets of wheat chromosome donors exhibited a certain conservation; in other words, there was almost no lack of homologous genes. With the improvement of accuracy in the knowledge of the wheat genome, it might be a higher proportion. This kind of conservative combination of triplets and high homology retention rate can also explain the richness of some gene families in wheat, such as the HMT genes. In addition to this 1:1:1 correspondence, some genes have expanded in breadth along the chromosomes. These members do not strictly follow the three homologies, like HKMT1 and PRMT1. In addition to a correspondence to genes on chromosomes with A/B/D homology, they also show homology with members of other chromosomes to a certain extent. The homology of these genes may have been established before the formation of wheat polyploidy.
Table 3
Groups of homoeologous HMT genes in wheat
Homoeologous group
(A:B:D)
HKMT
|
Number of groups
|
Number in total
|
Ratio in the gene family
|
1:1:1
|
47
|
141
|
92.76%
|
Paired
|
3
|
6
|
3.94%
|
Orphans/singletons
|
5
|
5
|
3.30%
|
PRMT
|
|
|
|
1:1:1
|
7
|
21
|
91.30%
|
Paired
|
1
|
2
|
8.70%
|
Orphans/singletons
|
0
|
0
|
0.00%
|
Structure, Phylogeny, and Domain Analysis of HMT Genes in Wheat
The analysis of the physical and chemical properties of the protein sequences encoded by 152 TaHKMT genes revealed that the amino acid content of different HKMT proteins varied significantly, and the molecular weight of the encoded protein was significantly different from 24.76 kD (TaHMT145) to 221.14 kD (TaHMT118). Compared to the HKMT gene family, the protein encoded by the PRMT gene family seems not to show a significant difference in physicochemical properties. In the HKMT protein, the differences in the physicochemical properties of the protein were significantly higher than those of the histone arginine methyltransferase. With the exception of TaHMT154, the isoelectric point (pI) of the proteins encoded by the histone arginine methyltransferase gene was less than 7.0. This shows a more consistent acidity that tends to play a role in the acidic subcellular environment, which may be due to the coding of weakly acidic protein by the PRMT genes. Only 5.26% (8 /152) of HKMT proteins showed high stability, while 34.78% (8/23) of PRMT members were proteins with high stability. These findings suggest that HKMT may have wider catalytic functions and may be involved in more biological processes.
Interestingly, the subcellular localization prediction analysis told us that most HKMT proteins are located in the nucleus, while most PRMT proteins are located in the cytoplasm (Table S1). This difference in localization also implies that they may have different physiological functions. The structural analysis of all TaHMT genes indicated that all of the members contained exons, but the number of them was quite different. The TaHMT gene family has a large number of introns and exons, most of which have more than four exons, and their genetic structure is complex and diverse. Interestingly, the number of exons in the HKMT gene family presents a huge difference from 25 (TaHMT113) to 1 (TaHMT13.etc). A group of 11 genes, including TaHMT13 and TaHMT29, contain only one exon, which may have been produced late (Fig. 2, Table S1). We presume that their functions are relatively specific, and that TaHMT113 has the largest number of exons (25), leading to the speculation that intron loss may have occurred in the evolution of the TaHMT gene family.
All 152 TaHKMT proteins were divided into seven classes according to the classification of the SET gene family in Arabidopsis [40]. The seven classes have fully different domain architectures and motif compositions (Fig. 2 and Fig. 3). Class I, namely, Enhancer of zeste [E(Z)] homologs (H3K27), has unique domains, such as SWI3, ADA2, N-CoR, and TFIIIB (SANT) DNA-binding domains and DNA binding protein CXC domain. These domains may facilitate the ability of the SET domain to modify histones. Class II consists of 16 SET proteins, which are homologous with Drosophila ortholog ASH1 and may lead to H3K4 and H3K36 modifications. This classification includes the typical associated with SET (AWS) domain, followed by a cysteine-rich post-SET domain, which may play roles in the stability of SET protein. Class III is believed to be the SET protein most responsible for the active mark H3K4me1/2/3, similar to the Drosophila ortholog TRX, and there are various protein domains in Class III, including conserved protein domains PWWP, FYRN/C, and the PHD domain.
Class IV is the least of all of the classifications, which is highly conservative, and it is mainly characterized by a PHD finger and a C-terminally located SET domain. It is suggested that the members of this group may be involved in the methylation of H3K4. Class V is the largest population, consisting of 63 SET genes homologous to Drosophila SU(VAR)3–9, which is divided into two different subgroups. Subgroup I contains the WIYLD domain, pre-SET domains, SET domains, and post-SET domains, while subgroup II contains a typical SRA domain at the N-terminus, playing an essential role in the establishment of heterochromatic mark H3K9me1/2/3, as well as H4K20me and H3K27me2. Classes VI and VII contain members with a SET domain whose functions are to be determined. Classes VI and VII contain 44 members, and they lack proteins specific to other classifications. Among these, class VI contains an interrupted SET domain, and class VII mostly contains Rubis-subs-bind.
The existence of SET protein contains the activity of histone methylation, and the existence of Rubis-subunit may grant it the ability to modify other proteins. The 23 TaPRMT proteins were divided into four classes, following previous opinion (Bedford 2007). Classes I-1 to I-3 contain 20 members. Most of them include AdoMet-MTases with arginine methyl-transfer activity, which catalyzes the formation of mono-methylarginine and asymmetric dimethylarginine. TaPRMT18 to 20 belong to class II, which owns the core domain PRMT5, which catalyzes mono-methylarginine and symmetric dimethylarginine (Fig. 2C and Fig. 3B). Ultimately, we counted the number of genes in each subfamily. The main reason that the HMT gene family has a large number of members is that the class I/II/V HKMT subfamily is expanding its members during evolution. The ratio of their number in wheat to that in rice and Arabidopsis is higher than the average of 3 (χ2 test, P < 0.05) (Fig. 4, Table S3).
Expression characterization and transcription regulation analysis of TaHMTs
At present, the expression level of HMTs in various wheat tissues without any stress pressure is still unknown. In our study, the expression of TaHMT genes in the root, young leaf/shoot, and the young spikes during the vegetative growth stage were significantly higher than those of other tissues at other stages, and they may be widely involved in the physiological activities of wheat growth and development. Spike, as the reproductive organ, exhibits all-round expression for the genes during vegetative and reproductive stages. This result also indicates that the gene expression levels were significantly different in different organs and tissues, which may be biased. For example, TaHMT6 and TaHMT18 showed obvious tissue specificity, which are characterized by high expression only in the leaves/shoots during the seedling stage, and a relatively low relative expression in young spikes (Fig. 5), suggesting that they have a special role to play in the development and morphogenesis for the plants. There are many inert genes whose expression level is always very low, such as TaHMT15 and TaHMT41.
Cis-regulatory elements analysis showed that light responsiveness related elements were the most extensive-elements in the promoter regions of TaHMT genes. Except for TaHMT151, all genes contained light responsiveness related elements SP1 (GGGCGG) or GT1 motif (GGTTAA), of which 29 TaHMTs also contained circadian elements. These elements provided candidate binding sites for light responsive factors indicating that TaHMT genes were related to some photoperiod biological processes, such as flowering, photosynthesis, nutrient absorption and so on. In addition to light response elements, most TaHMT genes (173/175) contained stress resistance related elements, such as MBS motif related to drought-inducibility, ABRE motif to abscisic acid responsive, LTR motif to low-temperature responsiveness. Some of TaHMT members (142/175) also contained plant hormone responsive elements, such as TGA element and AuxRR core in response to auxin, P-box and GARE motif in response to gibberellin. These findings indicated that TaHMT genes might have a wide range of functions in wheat (Figure S2).
MicroRNAs (miRNAs) are a class of endogenous small noncoding RNAs which are involved in the regulation of gene expression at the posttranscriptional level by degrading their target mRNAs and/or inhibiting their translation. To further explore the relationship of miRNAs-regulated functions and TaHMT genes expression, we used psRNATarget server to predict HMT genes targeting annotated microRNAs in wheat. We found that 11 different miRNAs had 15 targeting sites in body parts of 11 TaHMT genes (Table S4, Figure S6). Some miRNAs can act on multiple TaHMT genes, and one TaHMT gene can also interact with multiple miRNAs. The expression level of most TaHMTs regulated by miRNAs were apparently inhibited which may regulated by post-transcriptional cleavage manners, except the two miRNAs, which were found to may inhibit expression of the target genes at the translational level: tae-miR5049-3p inhibiting gene TaHMT40 and tae-miR1134 inhibiting gene TaHMT172.
Expression Analysis of HMT Genes between Parents and Hybrids
Based on our transcriptome sequencing results of JM8 (BS366×TY806), we obtained four genes whose relative expression level was significantly different (|fold change| > 2, p < 0.05) from that of their parents in spikes (Fig. 6B). These parts are TaHMT39, TaHMT49, TaHMT149, and TaHMT152, which may be involved in heterosis during the period of wheat reproductive growth in spikes. However, there are no significant DEGs screened in the tillering tissue.
We conducted a profound analysis of the transcriptome sequencing data and the expression data downloaded from wheat-expression.com. We defined the transcriptome sequencing data as shoot-TR and spike-TR, respectively, and the data obtained from the website database was defined as shoot-WE and spike-WE, representing two organizations in two periods. The two parts showed the matching degrees of 56.67% in shoot tissue, with 43.33% in spikes (Fig. 6A, Table S5). Our expectation of seven active genes at high levels of expression in the four groups was met. Finally, among them, 12 genes were screened in shoots, and 15 were screened in spikes for further analysis.
TaHMT Genes Showed Multiple Gene Expression Patterns in Hybrid Combinations
Together with the 1000 seed weight (TSW) and effective tillers per plant in wheat, four TaHMT DEGs and 27 genes obtained by the previous methods were analyzed by qRT-PCR. WGCNA was performed on the expression results. The relative expression level of each maternal plant was standardized as relative 1, and all of the expression results were shown in Figure S3.
Based on the trait data of TSW, the expression of 19 genes in spikes showed low correlations on the whole. The results were divided into six modules, in which three DEGs (green modules), TaHMT49, TaHMT149, and TaHMT152 showed a certain correlation (R = 0.32, p = 0.04) (Fig. 6C). In general, their expression levels showed a positive correlation with the TSW in the samples. These three genes, on the whole, showed under-dominant expression patterns. In addition, their expression in transgressive materials was somehow higher than that of the low-parent materials corresponding to transcriptome sequencing data (Fig. 7). In the HP combination, the gene expression in F1 were up-regulated by about 20 times at most (TaHMT49 in Combi201), while in some LP combinations, it was down regulated by more than 30 times (TaHMT49 in Combi199), compared with one parent. It is worth noting that this expression trend is not consistent in all materials. For example, TaHMT149 and TaHMT152 in Combi7 has low parental expression. This seems to be a manifestation of the complexity of heterosis. Interestingly, another DEG, TaHMT39, showed a low-parent dominant expression pattern in the transgressive materials (HP) and high-parent dominant expression patterns in the low-parent materials (LP) (Fig. 8A). For example, in the Combi157 (LP), the TSW of F1 was 0.8 times than that of the maternal parent, while the expression of the gene in F1 was over six times that of the female parent (Fig. 8A).
The high-expression genes in spikes showed multiple expression patterns in all combinations. However, no consistent rule indicated that we could not obtain a more consistent conclusion. TaHMT71 in the M5 combination, TaHMT118 in the M6 combination, and TaHMT162 in the M1 combination had negative correlation expression patterns. In HP, the expression in F1 was up-regulated by about 6 times, while in LP, the expression was down regulated by over 25 times at most (Fig. 8B). This means that in the HP materials, the genes presented a trend of low expression and a trend of high expression in the LP material. The expression patterns of TaHMT83, TaHMT84, and TaHMT148 were positively correlated in the M4 combination, and their expression trends were consistent with the TSW, showing an over-dominant expression pattern (Fig. 8C). Conversely, TaHMT86, TaHMT89, and TaHMT93 showed under-dominant patterns in some combinations. Their expression in F1 was down regulated about 5 times compared with the parents (Fig. 8D).
The expression of the 12 genes in the tillers was analyzed as follows: as a whole, the expression of the 12 genes had a negative correlation with the number of tillers, but the trend was nevertheless not very significant. No significant correlation was seen between the first two modules and the number of tillers. Therefore, the gene may have no relation to the number of tillers. The expression of the six genes in the third modules (green part) was negatively correlated with the number of tillers (R = -0.34, p = 0.03). From the analysis of the expression of each combination, the TaHMT genes mainly showed a negative correlation (Fig. 9A). That is, the expression of F1 was significantly down-regulated in HP and significantly up-regulated in LP. For example, TaHMT50 was down-regulated by 3–5 times in HP in M6 and M4, and up-regulated by 3 times in LP in M4. In particular, the expression of TaHMT122 in maternal M4 and TaHMT50 in M6 had a strong negative correlation (R > -0.6, p < 0.05) with the tiller number (Fig. 9B).