Isolation and identification of the NF-Y family members in E. grandis
To obtain information on the NF-Y genes of E. grandis, we used Arabidopsis NF-Y protein sequences as queries to search for the E. grandis NF-Y genes in Phytozome v13 [4]. Through comprehensive screening, including remove those with improper domains and redundant sequences, a total of 31 EgrNF-Y sequences were identified in the E. grandis genome, including 7 NF-YA, 16 NF-YB and 8 NF-YC genes. Furthermore, the physicochemical properties data of all gene members were estimated by ExPASy server (http://WWW.expasy.org/), and the characteristics of the EgrNF-Y sequences are listed in Table 1. Among them, the identified EgrNF-Y genes encoded peptides ranged from 94 to 339 aa. The molecular weights (MWs) and the isoelectric point (pI) values of these proteins ranged from 10.64 kDa to 37.21 kDa, and from 4.49 to 9.48, respectively (Table1).
Gene structure and conserved motifs analysis of the EgrNF-Y genes
The analysis of gene structure can understand the evolution of gene families. To investigated the evolutionary conservation and divergence of NF-Ys between E. grandis and Arabidopsis, we analyzed the exon-intron gene structure of 31 identified EgrNF-Ys using the GSDS website. Phylogenetic analysis revealed that the EgrNF-Y genes were divided into three groups: EgrNF-YA, EgrNF-YB and EgrNF-YC. Most of the EgrNF-YAs (except EgrNF-YA5) had five or six exons with similar distribution. More than two-thirds of EgrNF-YBs had no introns, and the results showed that members with similar numbers of exons and introns are distributed in the same clade. For the EgrNF-YC subfamily, EgrNF-YC1 and EgrNF-YC4 had two exons. EgrNF-YC5 and EgrNF-YC7 have six and five exons, respectively. In general, the gene structure of NF-Y members was positively correlated with their phylogenetic relationships (Fig. 1). Furthermore, the distributions of conserved motifs were assessed by MEME software. The results showed that all of the genes contained motif 1 except EgrNF-YA6. Interestingly, three EgrNF-Y subunits have a unique motif distribution (Fig. 2). For example, motifs 8 were only present in EgrNF-YA, motif 2 was only observed in EgrNF-YB, and motif 7 was unique to EgrNF-YC.
Chromosomal localization and collinearity analysis of EgrNF-Ys
The 31 EgrNF-Y gene members were widely distributed according to the positions of the annotated chromosomes in the E. grandis genome database (Table 1). To further analyze the distribution of EgrNF-Y family members on each chromosome, we constructed a location map using MapInspect software. The result showed that the distribution of NF-Y gene family on the chromosomes of E. grandis was uneven. Among them, the largest number of members distributed on chromosome 2, while only 1 member distributed on chromosome 3 and chromosome 9. Moreover, the number of genes was not positively correlated with chromosome length. For example, chromosome 3 had the largest length, but only one member is distributed (Fig. 3).
To further investigated the homologous genes and their evolutionary relationships between E. grandis, model plant Arabidopsis and woody model plant poplar, a multicomparative synteny map generated between three species. The 31 EgrNF-Ys were located at 11 scafolds, 46 PtNF-Ys were distributed on 19 chromosomes, and 36 AtNF-Ys were distributed on 5 chromosomes, which have a collinearity relationship with EgNF-Ys. Among them, the relationship between E. grandis and poplar is closer, which indicates that the NF-Y family is a close relationship between woody plants (Fig. 4).
Conserved regions and phylogenetic relationships of EgrNF-Ys
To further investigate the conserved regions of EgrNF-Ys, the protein sequences of the 31 members were analyzed using ClustalW 2.1 and Genedoc software. Multiple sequence alignment results suggested that each EgrNF-Y family member contains a heterodimerization domain and a DNA-binding domain that recognizes the CCAAT site. This core conserved regions of the EgrNF-YAs proteins were 53AAs, including two highly conserved domains: the NF-YB/C subdomain and the DNA binding, they were separated by a conserved linker with 21 AAs. As shown in Fig. 5 B/C, the central domain of EgrNF-YBs had 91AAs. Among EgrNF-YBs, EgrNF-YB2/4/16 had lower conserved domains. Fig.5C showed that EgrNF-YC subunits were also found to consist of a core histone-like sequence with a central domain about 79 AAs in length. Meanwhile, EgrNF-YC5 and EgrNF-YC7 were slightly different from other NF-YCs. This analysis also suggested that EgrNF-YAs is more evolutionarily conserved in the three subfamilies (Fig. 5).
To reveal the evolutionary relationship and potential function of EgrNF-Ys, an phylogenetic tree was constructed using the NF-Y protein sequences of E. grandis, A. thaliana and P. trichocarpa were created by MEGA7 software with the neighbor-joining (NJ) criteria. The phylogenetic analysis revealed that the 107 NF-Y proteins were clustered into three groups: NF-YA (yellow), NF-YB (green), and NF-YC (pinkish red). It is consistent with our subfamily classifications of the EgrNF-Ys (Table 1). Based on the phylogenetic relationship of the evolutionary tree and the reported functions of AtNF-Ys, the functions of EgrNF-Y members can be further predicted. In each group, we found that some pairs of paralogous NF-Y proteins were composed of one EgrNF-Y and one AtNF-Y, such as EgrNF-YA1 and AtNF-YA6, EgrNF-Y5 and AtNF-YA11, and this close evolutionary relationship generally suggested the similarity of their biological functions. We also found EgrNF-YB9, AtNF-YB6/9 and PtNF-YB3/5 clustered in a subgroup, belonging to LEC1 and its homolog LEC1-like. In addition, we also identified three pairs of analogues: EgrNF-YB4 and EgrNF-YB5, EgrNF-YB7 and EgrNF-YB14, EgrNF-YC2 and EgrNF-YC3, while most EgrNF-Ys share low homology with other members, suggesting that they have evolved in diversity.
Analysis of cis-elements in the promoter regions of EgrNF-Y genes
In order to further explored the potential function of the 31 EgrNF-Y at the transcriptional level, the distribution of cis-elements in the EgrNF-Y promoter regions (2000 bp) was scanned using the PlantCARE software. A total of 19 types of cis-elements were identified in the 31 EgrNF-Y, including light responsive, hormone responsive, stress responsive, growth regulation, and some common and core cis-elements, such as the TATA-box and CAAT-box. The detailed classifcation and sequence information of all the cis-elements are listed in Table S2. Meanwhile, the transcription regulatory cis-elements binding site are also shown in fig.7 except some common and core cis-elements. This result analysis showed that the promoter regions of the EgrNF-Ys contain several phytohormone response cis-elements, including gibberellin, salicylic acid, MeJA, auxin, ethylene, and abscisic acid responsive elements. The results indicated that EgrNF-Ys may play an important role in response to stress and growth regulation.
Expression profiles of the EgrNF-Ys in different tissues
To further investigate the possible functions of the EgrNF-Ys genes in the developmental processes of E.grandis, we examined their gene expression profiles using quantitative real-time PCR in six different tissues and organs (root, stem, young leaf, mature leaf, xylem and flower). All member genes except EgrNF-YA5 were expressed. Therefore, the tissue-specific expression patterns of 30 EgrNF-Ys member genes were analyzed. The results demonstrated that 30 genes were widely expressed in various tissues and organs of E. grandis, but they exhibited different spatial and temporal expression patterns. For EgrNF-YA subfamilies, EgrNF-YA1 and EgrNF-YA7 were significantly expressed in flowers and roots, respectively. Expression was strongest for EgrNF-YA4 and EgrNF-YA6 in the leaves. For EgrNF-YB subfamilies, EgrNF-YB1 and EgrNF-YB11 was most highly expressed in the flowers, more than 90% of other EgrNF-YB members have high expression levels in young leaves. For EgrNF-YC subfamilies, although all memebers in EgrNF-YC subfamilies were expressed in almost all tissues, their expression levels were highest in young leaves (Fig. 8). The diversity of expression patterns of EgrNF-Y gene indicates that EgrNF-Y gene has different biological functions during the growth and development of eucalyptus, and has a wide range of biological applications.
Expression Profiles of the EgrNF-Ys under Low phosphorus environment
E. grandis is mainly distributed in southwest and South China where soil is generally deficient in phosphorus. Studies have shown that NF-Y is an important transcriptional regulator and plays an important role in plant stress and growth and development. In order to investigate the response of EgrNF-Y gene expression to phosphate starvation, E. grandis seedlings with the same growth were cultured in normal and phosphate-free nutrient solution for two weeks respectively, and compared their relative expression levels under the above two growth conditions by qRT-PCR. The results analysis showed that 12 genes (EgrNF-YB3/B8/B11/B12/Bl3/B14/B15, EgrNF-YC1/C2/C3/C5/C7) were upregulated more than 2-fold in the leaves after being phosphate-starved for 14 days. The other genes remained stable or were downregulated. In the root, only EgrNFYB6/B11/B13 was upregulated by more than a factor of 2 under low phosphate treatment, while most genes remained stable (Fig. 9). These results suggested that these up-regulated genes may be involved in phosphate uptake when phosphate is limited in the soil.