Phylogenetic analysis of the AT-hook motif gene family in soybean
We predicted a total of 63 AHL proteins containing the AT-hook motif and PPC domain in soybean (Fig. 1, Table 1). To infer the evolution relationship among the AHL proteins in soybean, phylogenetic analysis was performed on the full-length AHL protein sequences. Our results showes that the AHL proteins in soybean can be divided into two clades, Clade-A (with 34 proteins) and Clade-B (with 29 proteins), as previously described in other land plants [1]. Multiple sequence alignments allowed to further divide, Clade-A and Clade-B into Type-I (54%),Type-II (27%) and Type-III (19%). The higher abundance of Type I in soybean is also consistent with observations in other land plants [1], and shows that AHL proteins are conserved in the course of evolution.
We found that Clade-A, which contained the conserved PPC domain sequences Leu-Arg-Ser-His and Leu-Arg-Ala-His, was more variable than Clade-B, with a PPC domain comprised of Phe-Thr-Pro-His. At the same time, We also observed that the variability of the PPC domain in soybean AHL proteins is higher than that of maize [19]. It is possible that the increase in PPC domain variability may extend the range of biological functions of AHL proteins.
The Type-I AT-hook motif contains four conserved conservative amino acid residues at the N-terminus of Arg-Gly-Arg-Pro, and eight conserved amino acid residues at the C-terminus of Gly-Ser-Lys-Asn-Lys-Pro-Lys-Pro. This contrasts with an observed seven and ten conserved amino acid residues at the N-terminal and C-terminal of Type II, respectively. Comparing the structure of Type-III and Type-II, they have the same PPC domain and the N-terminal of AT-hook motif conservative structure, but the former lack conserved amino acids residues of AT-hook motif at the C-terminal. The observed diversity in the AT-hook motif and PPC domains across soybean AHL proteins are likely to result in diverse biological functions.
Gene structure and motif prediction analysis in the AT-hook motif gene family in soybean
We implemented a gene structure analysis and estimated the length of AHL genes, and the variability in the number of CDS and UTRs (Fig. 2, Table 1). The length of the AHL gene family ranges from 585bp to 7968bp, with a total of 12 genes (mostly from Clade A), lacking the UTR, and some showing a variable number of introns and exons (usually Types II and III showed a higher number of introns). Type-I genes were the shortest and contained the lowest number of CDS, which began to increase from Glyma.20G202300. Among them, Type-II and Type-III have two or more introns, which are more obvious than Type-I. Thus, we believe that Type-II and Type-III evolved from Type-I. This result is consistent with the report of maize AHL gene family [19]. In eukaryotes, introns and exons alternately form genes. In plants, up to 60% of the genes undergo splicing, most of which occurs in introns [28]. After the introduction of intron-mediated enhancement(IME) into Arabidopsis, mRNA accumulation increased by 24 times and the activity of the reporter enzyme increased by 40 times, indicating that introns have an important influence on the regulation of gene expression in plants [29]. This was also observed in maize, where introns increased the expression level of the genes Zm00001d018515 and Zm00001d051861 [19]. The alternative splicing of introns results in a diverse range of encoded proteins and thus to abundant biological functions. So it is possible that the increased number of introns in soybean AHLs expand the abundance of AHL proteins. In Type-I of maize, only one gene has UTR, while most genes have UTR in soybean [19], indicating that AHLs gene structure of different species is diverse. In summary, we suspect that Type-II and Type-III introns enable plants to acquire more complex and diverse biological functions, and at the same time lay the foundation for the further expansion of intron-carrying AHLs.
Next, MEME website was used to predict the protein motifs(Fig. 3). We found a total of ten conserved motifs were identified in the AHL proteins (Table 3), which contained of amino acids ranges from 8 to 32 while the sits rang from 8 to 62.
The motifs 3 and 6 had a common conserved Arg-Gly-Arg core, whereby likely belong to the AT-hook motif family. The motif 3 is defined as type I AT-hook motif, and motif 6 is defined as II AT-hook motif. Type-I AHL proteins contains a I AT-hook motif, Type-II contains both I and II AT-hook motifs, and Type-III only has a II AT-hook motif. The sequences downstream of the Arg-Gly-Arg core share common conserved that play an important role in AHL proteins [1]. Interestingly, there is also a conserved sequence Gly-Arg-Phe-Glu-Ile-Leu (motif 2) sequence in the PPC domain. This motif is not only found in soybeans, but also in other land plants, previous study has shown that this motif has an important influence on the PPC domain [1]. It is worth noting that all AHL proteins contain motif 1, motif 4 and motif 5, indicating the consistency of the AHL protein sequences.
In summary, the results of our gene structure and motif prediction analyses indicate that the AHL gene family has a consistent and evolutionary diversity in soybean and other land plants [1], including maize [19] and cotton [20].
Evolution relationship of the AT-hook motif gene family in different species
In order to further explore the evolutionary relationship between AHLs in different species by selecting Arabidopsis thaliana, sorghum (Sorghum bicolor L) and soybean as materials and constructing a phylogenetic tree a phylogenetic tree using MEGA7 (Fig. 4) [1]. Patterns of different colors are used to represent different species. The phylogeny includes 29, 63 and 25 full-length AHL proteins from Arabidopsis, soybean and sorghum, respectively. Our analysis showed that the AHL genes of these species can be divided into two distinct clades, A and B. A total of 15 and 14 proteins belonged to Clade-A in Arabidopsis and sorghum, respectively, compared to an observed 14 and 11 in Clade-B (Table. 2). While Type-I was the more conserved of all types, the lack of a new subgroup between Types II and III in Clade-B indicates the divergence of these proteins occurred relatively late. To sum up, the phylogenetic tree highlights the consistency of the evolution of AHLs among different species, together with the determination of the homology relationships between species provides insights for the future analysis of the biological functions of these proteins.
Chromosome location, duplication, GO annotations and collinearity analysis of the AT-hook motif gene family in soybean
In order to study the arrangement of 63 AHL genes to 20 different chromosomes in the soybean genome. (Fig. 5A). The gene location information was in Table 1. 63 AT-hook motif genes are distributed on 20 soybean chromosomes. There are 9 AHLs on chromosome 20, 1 AHL on chromosome 19 and no AHL on chromosome 12 and 15. And found that the distribution of these genes on chromosomes was independent of chromosomal length.
In the current study, we then used GO enrichment analysis to predict the potential biological functions of AHLs. As shown in Fig. 5B and Table 4, AHLs are involved in different biological functions of biological process(BP), molecular functions(MF), and cellular component(CC). Among all the enriched biological functions, we detected an association that the biological process(BP) biological process is related to flowering development, indicating that the AHL gene family interfere in the growth and development of floral organs in soybean, which is consistent with the data published in Arabidopsis[17]. As for cellular component is the most abundant, the most of the cell components are located in the nucleus. In terms of the molecular function (MF) category, we identified DNA binding (GO: 0003677), sequence-specific DNA binding transcription factor activity (GO: 0003700) and protein binding (GO: 0005515) are identified. Most AHL proteins evolved to bind DNA and are able to specifically target DNA to perform different biological processes, suggesting AHLs can regulate the expression of other genes.
Gene duplication is a common process in plant evolution that leads to the expansion of gene families, of which tandem and segmental gene duplication events are the most common in angiosperms [30–33]. In order to further examine the evolution of AHLs in soybean, we analyzed gene duplication events in the AT-hook motif gene family, as shown in Fig. 5C and Table 6. And showed that 84% of AHL genes result from segmental duplication events, while 13% represent tandem gene duplication events, and the remaining 3% are proximal. These results suggest that segment duplication events may be the main driver of AHL gene family evolution.
The collinearity relationship of AHLs of two dicotyledonous plants (Poplar and medicago) and two monocots plants (rice and maize) plants were investigated in order to explore the potential evolutionary relationships (Fig. 6). The results revealed a higher homology between soybean, medicago and poplar than that between rice and maize. Compared with monocots, more AHL homologous genes are found in dicots. Some soybean AHL genes are collinear with AHL genes in other plants, particularly in poplar and medicago, which suggests that these genes may play important roles in plant evolution. These results can be useful for subsequent comparative studies of AHL genes with known functions.
Promoter sequence analysis of the AT-hook motif gene family in soybean
In organisms, the gene promoter region is located upstream of genes, binds to transcription factors is called the cis-regulatory element, which plays an important role in the biological regulation of gene expression under stress [34]. We identified cis-regulating elements for light responsiveness, anaerobic induction, MYB and gibberellin-responsiveness cis-regulating elements in the 2100bp region upstream of the AHLs promoters (Fig. 7). Approximately 43.5% of the selected genes contained a MYB binding sites, and previous studies have shown that the MYB gene family can regulate anther development and function formation [35, 36]. In addition, more than 198 and 183 MYB members directly or indirectly involved in responses to drought stress were described in Arabidopsis and rice, respectively [37], including AHL gene in rice [22]. Therefore, it is possible that the AHL gene family can also mediate responses to drought stress in soybean. All selected AHL promoters contain the light responsiveness element, suggesting that the AHL genes participated in plant light morphogenesis in soybean. Approximately 91.3% of the selected AHLs had the anaerobic induction element. Under anaerobic conditions, plant disease resistance is reduced, root morphological formation is imperfect, and root tip epidermal cells are damaged or died, leading to pathogen invasion [38]. Hemoglobin is an intracellular signal of hypoxia in plants, and the amount of symbiotic hemoglobin in legumes is relatively high [39]. Higher plants perceive O2 molecules through hemoglobin under anaerobic conditions, and the changes in hemoglobin concentration are regulated by partial pressure of O2 pressure [39]. Our results predict that AHLs play significant roles in soybean anaerobic induction. Gibberellin plays an important role in the growth cycle of plants, promoting cell division and elongation [40], controlling seed germination and enabling roots formation [41, 42]. 17.4% of the selected AHLs include the gibberellin-responsiveness element, whereby AHLs participate in the regulation of growth and development in soybean, confirming the variety of functions played by AHLs in soybean growth.
Co-expression network analysis of the AT-hook motif gene family in soybean
A co-expression network was used to represent the upstream and downstream genes that interact with AHLs in the three different Types (Fig. 8). We picked out the representative genes from the co-expression network and the annotated genes functions are available in the supplementary material Table 5. Our study demonstrates that some AHLs are associated with genes related to energy binding, such as Glyma.11G179200 Glyma.09G196600, that might be involved in soybean energy transduction. The co-expression network indicates that in addition to interacting with other genes, AT-hook motif genes also interacted to some extent with each other. For example, Type II Glyma.20G212200 interacted with four AT-hook motif genes to jointly regulate the expression of other genes. We also found that AT-hook motif genes are involved in biological processes histone binding and ATP binding in soybean and that the same gene is involved in histone modification in Arabidopsis thaliana [17]. In our speculations, part of AHL genes is related to nucleation signals and mainly distributed in Type-II, whereby, AHL genes regulates the nucleation process of other proteins in soybean. The reported DELLA (LeGAI) gene is expressed in both nutritional and reproductive tissues in tomato and this gene family is also involved in GA signal transduction [43]. In our research, that the AHL gene of Glyma.20G212200 was co-expressed with two Glyma.05G140400 and Glyma.08g095800 DELLA genes. Similarly, Glyma.16G204400 and Glyma.08g095800 Glyma.05G140400 DELLA genes interact to regulate the gibberellin transduction pathway in soybean. Therefore, we consider that the AT-hook motif gene family is involved in gibberellin signal transduction pathway in soybean. Together, our results show that the AHL gene family is involved in regulating biological processes such as energy transduction, the gibberellin pathway and the nuclear entry signal pathway in soybean.
Expression profiles of the AT-hook motif gene family in soybean
To address the expression patterns of the AT-hook motif gene family, we selected the representative soybean cultivars, Jack and Williams82 at different tissues and during the VC stage. The transcription data is available from NCBI (accession number: SRP285849) [44]. W82 and Jack were used to investigate whether there were differences in the expression profiles of the AT-hook motif gene family between different soybean varieties (Fig. 9A and Fig. 9B). The expression results showed that AHLs were mostly expressed in roots and meristems, and that these patterns were similar in W82 and Jack. There are 35 and 31 genes with high expression levels in Jack and W82 roots, respectively. Of the 35 highly expressed genes in Jack’s roots, 22 expressed the same as W82. Of the remaining 13 genes with inconsistent expression, 9 genes had high expression in Jack. In meristem, 26 and 24 genes are highly expressed in Jack and 21 in W82, respectively. The results of the study find that the expression of the same gene differs between different varieties. For example, the expression level of Glyma.09G260600 is higher in Jack and lower in W82. The expression levels in the leaves of both Jack and W82 are very low, with the exception of 5 genes in Jack and 4 genes in W82. This corroborates previous results in maize [19]. In the Jack’ epicotyl, we find 5 highly expressed genes, similar to W82. In the hypocotyl, Glyma.04G091600 and Glyma.06G093400 are both highly expressed, and the expression is consistent. But the expression level of Glyma.18G036200 of the hypocotyl in W82 is higher than that of Jack. Interestingly, the genes showing high levels of expression in meristematic tissues are mainly distributed in Type-II, while those highly expressed in the roots mainly belong to Type-I. These results indicate that although the AHL genes in Jack and W82 had similar expression patterns in different tissues, different genes were expressed differently between the two varieties. Hence, different AHL genes may have different functions in the two varieties, and may play important roles in plant development. At the same time, for verification the data of RNA-seq, 3 genes for RT-qPCR were performed to evaluate the expression pattern of three genes in the roots, leaves, meristem, epicotyl and hypocotyl of W82 (Fig. 9C). The results show that genes with high expression levels in one tissue have low expression levels in other tissues, indicating that AHL genes expression is tissue specific in soybean.
The expression of the AT-hook motif gene family under drought and submergence
Both drought and submergence have adverse effects on plant growth and a previous study has shown that AHLs mediate plant response to drought stress [22]. Based on the cis-acting analysis, a part of AHLs contain a MYB element, so we hypothesise that AHLs in soybean may also impact in drought stress responses in in soybean. Hence, we tested the expression of genes in the leaves and roots of W82 under submergence and drought conditions (PRJNA574626) at the V1 stage (Fig. 10A and Fig. 10B). The RNA transcription data is from NCBI. Both in the control and treatment showed that a higher number of AHLs were expressed in roots compared to the leaves, which is consistent with the results in Fig. 9A and B. After 5–6 days of drought treatment, the expression of highly expressed genes, such as Glyma.02G285500, considerably decreased. However, the expression of Glyma.14G181200 increased, especially after 6 days of drought treatment in leaves. In the roots, drought treatment led a significant reduction of expression genes compared to the control group. Similar patterns were observed under submergence treatment, where some genes, such as Glyma.14G066800, showed significantly higher expression in leaves than controls. Overall, the levels of expression of most genes were decreased after submergence in roots.
We used roots and leaves at V1 stage of W82 to verify the expression of AHL genes under drought and submergence stresses (Fig. 10D). Our study found that after one day of submergence stress, the expression level of AHL genes in leaves increased significantly, and the expression decreased significantly after three days of submergence. When the treatment was restored for one day, the expression level of AHL genes were same as that of the control. The expression level in roots decreased after submergence stress. The expression of AHL genes increased significantly after one day of drought stress, and decreased after six days of drought in the leaves. As the stress time increased, the expression level decreased compared with the control in the roots after drought stress. At the same time, we recorded the phenotype of soybean under submergence and drought stress (Fig. 10C). As the stress time increases, the soybean plant under stress is shorter and more wilting than the control, but the phenotypic difference is not particularly obvious.
These results suggest that during stress condition, gene expression overall increases in the leaves and decreases in the roots. Furthermore, we also found that after 1 day of recovery, the levels of gene expression were restored, and were sometimes even higher than those of the control group. The different expression patterns indicate that AHLs are more expressed in the roots, and are involved in responses to drought and submergence stress.