Prolamin gene families play important roles in flour viscoelasticity, nutritional quality, and CD epitope content. For research on these gene subfamily, the method of known genome-wide sequence searches are regarded as the most comprehensive method. The genome-wide identification of prolamin gene families have been widely carried out in T. aestivum, T. urartu, and Ae. tauschii, but lack systematic analysis [14–16, 37]. Currently, knowledge is still limited about prolamin gene families in Th. elongatum. In this study, we identified 19 α-gliadins, 9 γ-gliadins, 19 ω-gliadins, 2 HMW-GSs, and 5 LMW-GSs from the Th. elongatum genome. Genes from the genomes of the above related species were summarized. Although there are limitations in genome assembly, these results at least provide a reference for the study of prolamin genes in a single germplasm [37].
Transcripts of grain development at different stages indicated the complexity of differential expression of prolamin genes. According to previous reports of hexaploid wheat, the expression of LMW-GS genes reached the peak at 10 days after anthesis, and then decreased with the maturity of seeds [28]. Another analysis also shows a similar result, in which LMW-GS genes began to express on the 5th day after flowering, reached the highest level at the 14th day, and then decreased gradually with seed ripening [17]. In Th. elongatum, the expression of three LMW-GS genes showed a consistent trend with the above studies (Fig. 2A). Compared with α/β-gliadin genes, their expression has genomic differences that genes of B and D genomes belong to early-expressed genes (highest level at 10 days after flowering), similar to the expression of LMW-GS genes, while those of the A genome are late-expressed genes (highest level at 20 days after flowering) [28]. The expression trend of α-gliadin genes is synchronous with that of LMW-GS genes in Th. elongatum, therefore the α-gliadin genes of E genome should be early-expressed genes (Fig. 2A). In CS, although the expression levels of all genes decreased to an exceptionally low level at 23–25 days post-anthesis (DPA), the two types of γ-gliadins showed different expression patterns; the expression of type I decreased rapidly at 10–15 DPA, while the expression of type II decreased slowly and gradually after reaching the highest level at 10 DPA [25]. In this study, the expression of γ-gliadin genes increased from the half grain to the grain stage and reached its peak later than that of α-gliadin genes and LMW-GS genes (Fig. 2A). The expression levels of different LMW-GS genes vary greatly (almost ten-fold) [17]. However, the largest difference was only found in two periods of γ-gliadin gene families in Th. elongatum (5–6 times) (Fig. 2A).
Previously, the collinearity of genes at different loci has been studied in genome fragments of rice, maize, sorghum, barley, and wheat [38]. Relieving limitations of DNA markers based on genetic maps, these studies and comparisons of the smaller regions will provide us with preliminary insights into the detailed composition and organization of many plant genomes [19]. Prolamins are concentrated into clusters on chromosomes, which is helpful in comparing the homologous regions of different species to elucidate their evolutionary characteristics. A comparison of a 307 kb physical contig was analyzed between the A and B genomes of durum wheat and the D genome of Ae. tauschii. It was realized that, although gene collinearity appears to be retained, four of six genes including the two paralogous HMW-GS genes are disrupted in the orthologous region of the A genome [39]. Another study inferred that considerable sequence changes caused rearrangements of prolamin genes in these genomic regions after the split of the two homoeologous wheat genomes [10]. In this study, the homology of the whole Gli-1 and Glu-3 intervals was shown between Th. elongatum and other selected species (Fig. 4). The order of ω-gliadin, γ-gliadin, and LMW-GS genes was maintained in the E genome of Th. elongatum and other genomes of common wheat (A and D subgenomes), T. urartu (A genome), and Ae. tauschii (D genome). However, an inversion occurred in the interval of Gli-1 and Glu-3 loci on chromosome 1B of common wheat, revealing a dynamic change in this region (Fig. 4). As reported, the homoeologous genomes of wheat are not as well conserved as previously thought, owing largely to the differential insertion of transposable elements. In addition, a homologous ω-gliadin locus of Th. elongatum was detected only on chromosomes 1A and 1B of bread wheat (Fig. 4). We speculated that this locus was lost from chromosome 1D of bread wheat and Ae. tauschii, and chromosome 1A of T. urartu during the progress of evolution. These results lay the foundation for the further study of prolamin genes and the flanking regions of Th. elongatum.
The phylogenetic relationships of Hystrix, Leymus, and their relatives were investigated using the Acc1 gene and obtained a result consistent with morphological and cytological studies, which indicates that the Acc1 gene is a potentially valuable source for phylogenetic analysis in Triticeae [57]. In the past, the Acc1 gene has also been successfully applied to the study of evolution of Triticum/Aegilops, as well as that of switchgrass (Panicum virgatum L.) [58,59]. However, the evolutionary relationship of species usually requires multiple groups of evidence. Prolamin genes are also considered as research resource for evolutionary relationship of Gramineae species. For example, the evolution of LMW-m genes indicated that there was a close relationship between the B genome and the Ss genome, which supported the view that the B genome originated from Ae. apetala [25]. Recently, a study indicated that the E genome of the Elytrigia genus was more related to the B genome of common wheat and the A genome was more related to the D genome of common wheat through the analysis of single copy genes of its genome [13]. In our study, this result is supported by phylogenetic trees of α-gliadin genes and x-type HMW-GS genes, respectively (Fig. 5 and S2C). In the α-gliadin subfamily, the genes on chromosome 7E were shown to diverge earlier than other genes (Fig. 5).
Based on amino acid sequences of N-terminal, six subgroups of LMW-GSs could be recognized, called LMW-s (SHIPGL-), LMW-m (METSHIPGL-, METSRIPGL-, METSCIPGL-), LMW-i (ISQQQQ-), α-LMW (VRCPCP-), β-LMW (NMQVDP-), and ω-LMW (KELQSP-/ARQLNP-) [34–36]. The studies of α-LMW, β-LMW, and ω-LMW types are found in fewer numbers than others in general. In this study, multiple sequence alignment showed that LMW-m was no longer limited to three types, and 6 other types (METSHNPGL, METSHIPSL, METSRVPGL, METSCISGL, MDTSYIPGL, and METRCIPGL) have evolved (Additional file 4). Because only the M, T, and L sites of the nine amino acid residues at the beginning of the N-termini were conserved, we speculate that there will be more types of LMW-m with species diversification (Additional file 4). We also found that clustering of LMW-GSs was related to the position of the first and the penultimate cysteine. Therefore, an improved method of dividing LMW-m into four types was proposed (detailed analysis is shown in evolutionary analysis).
Based on the putative functional α-gliadin sequences of four species, our results showed that CD peptides were genome-specific, which is consistent with the results of a previous study (Additional file 1: Table S6) [1]. Importantly, α-gliadins of the E genome contain only one type CD peptide, which is beneficial to low-CD breeding. Although many germplasm resources of Triticeae have been used to test CD content based on protein sequences, they have not been applied to the cultivation of wheat varieties. DS6E(6D), which replaces the polypeptide D genome with the oligopeptide E genome and has been verified through cytological study (results not shown), will be used to detect the effect of low CD in the future.