Identification and annotation of wheat PG gene family
The Hidden Markov Model (HMM) file of Glycoside hydrolase family 28 (GH28, EC 3.2.1.) and the amino acid sequences of PGs of Arabidopsis were aligned with the whole amino acid sequences of wheat, and a total of 113 wheat PGs were finally obtained after strict screening and identification (Additional file 1: Table S1). These 113 wheat PGs were unevenly distributed on 21 chromosomes, and each chromosome had at least one PG. However, there were five PG genes distributed on the A, B and D of chromosomes 2 and 5 respectively, as well as chromosomes of 1D, 4A, and 4B. (Additional file 2: Figure S1). According to their chromosomal location, these 113 wheat PG genes were renamed as TaPG01-TaPG113. The 113 TaPGs have a CDS length of 762-1617 bp, and an exon number of 2-10. The molecular weight of wheat PG proteins ranged from 26.69-57.99 KDa, with a theoretical isoelectric point of 4.8-10.45, and encoding 253-538 amino acids (Table 1). Accurate identification and reasonable naming are essential for further research in this gene family.
Evolutionary analysis of wheat PG gene family
To understand the evolution pattern of wheat PG, a total of 238 PG proteins from wheat, Arabidopsis and rice were used to construct a phylogenetic tree (Fig. 1). The results showed that 238 PG proteins of these three species were clustered into six branches (A-F). Among these six classes, class A had 12 TaPGs, and class B to class F had 16, 13, 28, 30, and 14 TaPGs in sequence. In addition, there were 39 pairs of paralogous genes in 113 TaPGs, of which A class has four pairs of paralogous genes, class B, C and F all had five pairs, while class D and E has nine and eleven pairs of paralogous genes respectively. However, there was only one pair of PG orthologous of rice and wheat (LOC_Os01g33300/TaPG44) in the D class. Therefore, it can be speculated that the expansion of the wheat PG family mainly occured after species differentiation from ancestor.
The duplications of wheat PGs
In order to find out the expansion dynamics and evolution direction of the wheat PG gene family, we investigated its gene duplication events. The duplications of the wheat genome indicated that there were eight pairs of tandem duplications and 87 pairs of segmental duplications between TaPGs. The KA/KS values of these eight pairs of tandem duplication TaPGs were all less than 1, indicating that they were in the purify selection (Table 2). 93 TaPGs involved in segmental duplications, accounting for 82.30% (93/113) of all TaPGs and it is 34.58% higher than the probability of segmental duplications of whole wheat genome (number of all genes: 120744, number of segmental duplication genes: 57613, percentage: 47.72%) (Fig. 2, Additional file 3: Table S2). This is a good evidence that the segmental duplications promote the expansion of wheat PG gene family.
Gene structure and conserved motif analysis of wheat PGs
The alignment of amino acid sequences of 113 wheat PGs indicated that most of them contained four intact or tend to be intact conserved domains (Ⅰ ‘SPNTDGI’, Ⅱ ‘GDDC’, Ⅲ ‘CGPGHGISIGSLG’, Ⅳ ‘RIK’) (Fig. 3). However, some PGs lack a certain domain, for example, in the B class, the amino acid sequences of the conserved domains of TaPG07 and TaPG81 were missing and incomplete. The most obvious deletions were shown in class E PGs, just as in the case of apple [15], the III conserved domains of all members of wheat class E PGs were missing. In addition, some of the conserved domains of PGs were complete, but individual amino acids were mutated. Most of the PGs in class E mutated the serine (S) to alanine (A) or threonine (T) in the conserved motif 'SPNTDGIH'. The domain IV was the most conserved region, except for the deletion in TaPG07 and the mutation of class E PGs, which completely conserved in the remaining TaPGs. In summary, the PGs in E class were the most variable members of the family.
MEME tool was used to search for the conserved motifs of 113 wheat PGs, and ten conserved motifs were obtained (Fig. 4, Additional file 4: Figure S2). No PG had all 10 motifs. Wheat PGs of class A and class C had the same five motifs, while all class D wheat PG members had the same nine motifs except for TaPG92. It was found that motif 1 and motif 3 both contained conserved domain I 'SPNTDGI' and II 'GDDC' of PGs at the same time, and motif 1 also contained the portion of the conserved domain III 'CGPGHGISIGSLG', while motif 2 and motif 6 contained a conserved domain IV 'RIK'. In addition to the above four conserved motifs, motif 5 and motif 9 were the most conservative because they existed in almost all PGs including group E PGs. The structure of TaPGs showed that most of the PG genes had 1-9 introns, among which classes of C and D PGs belonging to exo-PG had relatively shorter gene sequences and fewer introns, and E class PGs belonging to oligo-PG generally had longer intron sequences, while class F PGs generally had more intron numbers (7-9). From the analysis of motifs and gene structures, not only the structural characteristics of each class of PGs can be obtained, but also provided a basis for classification of PG functions.
GO annotation and cis-elements prediction of wheat PGs
It is a very effective means to obtain corresponding gene functions by searching for annotation information in kind of databases and predicting upstream regulatory factors. We obtained the molecular function, biological process and cellular component of 113 TaPGs by searching for GO (Gene ontology) annotations. The GO annotations showed that most of the 113 TaPGs (68) were active in extracellular region, and all 113 TaPGs exerted the hydrolysis and catalytic activity and involved in the primary metabolic and organic substance metabolic process (Fig. 5, Additional file 5: Table S3). Cis-acting elements of TaPGs were predicted, and ten of them were analyzed here. They included cis-elements responded for the anaerobic induction (ARE), light (G-box), low-temperature (LTR), also included cis-elements regulated by hormones and transcription factors, such as cis-acting elements involved in the abscisic acid (ABRE), gibberellin (P-Box), auxin (TGA-element), methyl jasmonate (CGTCA-motif, TGACG-motif), and MYB binding site involved in drought-inducibility (MBS), as well as cis-acting regulatory element related to meristem expression(CAT-box). As can be seen from Figure 6, most TaPGs (85) may respond to light, 38 TaPGs respond to low temperature, 83 TaPGs respond to abscisic acid, 20 TaPGs respond to gibberellin, and 32 TaPGs participate in meristem expression.
Analysis of the expression patterns of TaPGs
To predict the relevant functions based on the expression specificity of PG, a heatmap were generated using RNA-seq data from Chinese spring wheat in 15 different developmental stages and tissues (Fig. 7). Obviously, 21, 7, 11 TaPGs were significantly up-regulated in the anthesis spikes (spike, reproductive, anthesis), spikes wrapped in flag leaves (spike, reproductive, flag leaf stage), and grains two days post anthesis (grain, reproductive, 2 dpa).These 21 up-regulated TaPGs in the anthesis spikes belonged to the D class, except for three from the B class. The 11 up-regulated TaPGs in spikes of the flag leaf stage were derived from the B, C, and D classes, and the up-regulated TaPGs of the grain belonged to the F class. The results of these preference expressions strongly demonstrated that TaPGs of different classifications have a clear division of labor in different tissues. In order to explore TaPGS that play an important role in wheat pollen development, we focused on twelve completely spike-specific expressed TaPGs (TaPG09, 14, 18, 28, 40, 41, 49, 50, 54, 87, 93, 95).
Pollen-specific TaPGs identified
Real-time quantitative PCR (qRT-PCR) was performed on eight of spike-specific expressed TaPGs by sterile and fertile anthers of thermo-sensitive male sterile wheat KTM3315A to understand their expression modes in anthers. The results showed that three TaPGs (TaPG14, 40, 41) that specifically expressed in the spikes of the flag leaf stage showed a decreasing trend with the development of anthers, while the anthesis spike-specific TaPGs (TaPG18, 28, 87, 93, 95) were up-regulated during the late anther development (Fig. 8). Just as confirmed by RT-PCR (Additional file 6: Figure S3), these differentially expressed data during anther development supported the fact that these spike-specific TaPGs were actually expressed in anthers. Although TaPG14, TaPG40 and TaPG41 were up-regulated at the uninucleate stage, their role in fertility conversion could be ruled out for they were not significantly differentially expressed between sterile and fertile anthers. TaPG18, TaPG28, TaPG87, TaPG93, and TaPG95 all had the highest expression levels at the trinucleate or binucleate stage in fertile anthers and showed significant differences in expression level with sterile anthers. However, the expression level of TaPG87 was significantly down-regulated in fertile anthers compared with sterile anthers, while the other four TaPGs were opposite. Their non-uniform expression patterns suggested that they had different roles in the development of anthers, and their functions were not redundant even if TaPG87 and TaPG95 all belong to class B in this gene family.
By performing an alignment analysis on the above five differentially expressed TaPGs, we found that their protein sequences were highly similar to four Arabidopsis proteins (AT4G18180, PGA4, ADPG2, ADPG1), respectively. Like TaPG18 (401aa), the similarity between TaPG28 (401aa) and AT4G18180 (422a) was also bit-score 364.8, the bit-score of TaPG93 (294aa) and PGA4 (431aa, AT1G02790) was 263, TaPG87 (402aa) and ADPG2 (AT2G41850, 433aa) was 399.4, and TaPG95 (420aa) and ADPG1 (AT3G57510, 431aa) was 401. AT4G18180 belonged to the pectin lyase-like superfamily protein and its function was described as polygalacturonase activity, but its role in anthers has not been determined. However, PGA4 was identified as exo-polygalacturonase and played a role in the depolymerization of pectin during pollen development, pollen grain germination, and pollen tube growth. Its interaction protein included five pectin lyase-like superfamily proteins, two beta-D-xylosidase, two alpha-L-arabinofuranosidase, and a glycosyl hydrolase 9A1 involved in cellulose biosynthesis. ADPG2 was identified as a polygalacturonase involved in cell separation in the final stages of pod shatter, in anther dehiscence and in floral organ abscission, and part of its interaction proteins have the function of organ shedding. ADPG1 was a polygalacturonase protein involved in silique and anther dehiscence, and most of its interaction proteins belonged to pectin lyase-like superfamily protein (Additional file 7: Table S4). Therefore, by aligning with the more well-studied model plant Arabidopsis to obtain sequence-similar Arabidopsis PG genes and predicting functions of the corresponding TaPGs, three TaPGs worth studying have broken into our field of vision. TaPG93 was significantly up-regulated during the binucleate stage and was supposed to be involved in pollen development and pollen tube growth, while TaPG87 and TaPG95, which were up-regulated in the trinucleate stage, played an essential role in pollen separation and anther dehiscence.