Identification and classification of AP2/ERF genes
A total of 218 complete AP2/ERF genes were identified in the sugarcane genome database. The identified genes ranged from 416 to 22,786 bp and were predicted to encode proteins having between 127 to 876 amino acids (aa) (Additional file 1). Based on sequence similarities and the number of conserved AP2 domains (Additional file 4), the AP2/ERF genes were divided into four families: AP2, ERF, RAV, and Soloist. The AP2 family had 43 genes, of which 36 have two identical conserved AP2 domains, and 7 have only one conserved AP2 domain (SsAP2-37 to SsAP2-43). Arabidopsis has 4 AP2/ERF genes that contain only one AP2 domain, but these differ from the ERF type and are closer to the AP2 type. These four genes are, therefore, classified as the AP2 family. The phylogenetic tree shows that the branches of these 7 SsAP2/ERF genes aggregate with the AP2 family instead of the ERF family, so they are also classified as AP2 families according to the classification of Arabidopsis. The ERF family had 160 genes that have only one AP2 conserved domain. Among them, 59 and 101 were assigned to DREB (SsDREB1 to SsDREB59) and ERF (SsERF1 to SsERF101) subfamily, respectively. The RAV family comprises 11 members (SsRAV1 to SsRAV11) that contain conserved AP2 and B3 domains. Another four genes (SsSoloist1 to SsSoloist4) also had only one conserved AP2 domain but had less similarity to ERF. Instead, these four genes had higher homology to Arabidopsis At4g13040 and thus were classified into the Soloist. Because no reliable naming method was defined in previous research on the AP2/ERF family, our naming convention is based on the grouping of 218 sequences and their chromosomal positions. The total number of AP2/ERF superfamily candidate genes in sugarcane is higher than that for Arabidopsis (147) and rice (181) [9]. The number of genes in the AP2, RAV, and Soloist subfamilies was double that for both Arabidopsis and rice. However, the number of ERF family members was similar but slightly higher (122 and 145 for Arabidopsis and rice, respectively).
Phylogenetic Analysis
To analyze the evolutionary relationship of the SsAP2/ERF genes, we constructed phylogenetic trees based on the amino acid sequences encoded by these genes (Fig. 1). The phylogenetic tree clusters all of the SsAP2/ERF genes and had four main branches: AP2, ERF, RAV, and Soloist, which are consistent with the results for examination of conserved amino acid sequences encoded by the candidate genes. According to the classification criteria for Arabidopsis and rice [9], the ERF family was divided into two subfamilies: the DREB subfamily (59 members, divided into groups I, II, III and IV, containing 19, 11, 19 and 10 members, respectively); and the ERF subfamily (101 members divided into groups V, VI, VII, VIII, IX, X, and XIV containing 12, 11, 12, 18, 24, 16 and 8 members, respectively (Fig. 2). Notably, most species lack a XIV group, although Os08g41030 in rice has a conserved domain similar to these eight members of sugarcane and can be classified into the XIV group. Arabidopsis has no genes that are consistent with the XIV group. Thus additional examination is needed to determine whether functional differentiation occurred and the significance of such differentiation.
Gene structure and conserved motif analysis
To characterize the structural diversity of SsAP2/ERF genes, we analyzed the number of introns and exons and the distribution of conserved domains in the coding sequence of a single SsAP2/ERF gene (Additional file 5: Fig. S6). Through gene structure analysis, differences in the AP2, ERF, and RAV subfamilies can be observed, and the results of the conserved domain analysis were consistent with those of previous studies. The number of introns among the different family genes varied markedly (Fig. 3B) and 41, 40, 1, and 9 members of the SsDREB, SsERF, SsAP2, and SsRAV subfamilies, respectively, have no introns. For the SsERF family members, 50.6% have no introns, whereas 97.6% of SsAP2 family members do. Meanwhile, 81.8% of SsRAVs have no introns, but all 4 SsSoloist members do.
We used MEME to investigate SsAP2/ERF gene diversity further and predicted the presence of 25 conserved motifs in the AP2/ERF family (Fig. 3C). Motif-1 and − 2 were present in all SsAP2/ERF protein sequences; Both motif-3 and − 4 were available in most members of the AP2, ERF, DREB, and RAV subfamily; Motif-6, -7, -8, -20, and − 22 were found only in the AP2 family; Motif-10, -12 and − 15 appeared in Group I; Motif-13 appeared in Group XIV, and Motif-9 and − 11 were unique to the RAV family. We also found that Motif-1, -2, and − 7 existed in the AP2 conserved domain, whereas Motif-9, -11, and − 16 were located in the B3 domain (Fig. 3C). As expected, most close relatives among subfamily members had similar motif compositions, suggesting that AP2/ERF proteins within the same subfamily are functionally identical.
Chromosome distribution
The 218 SsAP2/ERF genes we identified were mapped to 32 chromosomes, and the distribution across the chromosomes varied widely (Fig. 4). Chr2A had the most with 12 genes, whereas Chr6C and Chr8C each had only three genes (Additional file 7). At least one of the 59 SsDREB and 101 SsERF genes could be found on each of the 32 chromosomes, and the four Soloist genes were distributed on four homologous chromosomes, Chr4A, Chr4B, Chr4C, and Chr4D. Members of the SsRAV family were present only on chromosomes 3 and 7. Surprisingly, 50% of the SsAP2/ERF genes localized to one of the four rearrangement chromosomes (SsChr02, SsChr05, SsChr06 and SsChr07) and 9 SsAP2/ERF genes were present in the rearranged region that includes SsChr5 (Sb05S) 57.6–89.1 Mbp, SsChr6 (Sb05L) 54.6–90.6 Mbp, SsChr7 (Sb08S) 62.0–83.3 Mbp, SsChr2 (Sb08L) 98.5–125.9 Mbp [45].
Gene duplication analysis
Multiple studies indicated that gene families evolved due to genome-wide duplication, segmental duplication or tandem duplication, and gene diversification occurred after these duplication events. To explore SsAP2/ERF genes evolution, we studied tandem and segmental repetitive events of these genes using chromosomal information for S. spontaneum (Additional file 8). A total of 8 pairs of tandem duplication genes were detected, and of these two pairs were RAV genes, and six ones were ERF genes (Fig. 4). In particular, SRAV4 and SsRAV5, SsRAV5, and SsRAV6 are two tandem duplication gene pairs, and three genes are located on the same chromosome and are adjacent. In addition, 70 pairs of 103 SsAP2/ERF segmental duplication genes were found (Fig. 5). Among these, 53 occurred between alleles, and 17 were between non-alleles. There were four pairs of SsAP2/ERF segmental duplication gene pairs distributed on Chr4 and Chr5. This distribution may be because both Chr4 and Chr5 chromosomes include segments from the ancestral A4 chromosome [45].
We then estimated the time for the divergence of the SsAP2/ERF genes that were predicted to have undergone tandem and segmental duplication based on Ks values (Additional file 8). The divergence time for the 8 SsAP2/ERF tandem duplication pairs ranged from 14.2 to 104.2 million years ago (mya), illustrating that these SsAP2/ERFs arose from recent gene duplication events in S. spontaneum. Meanwhile, 64 segmental duplication pairs arose earlier, based on a divergence time that ranged from 4.9 to 89.9 mya, whereas the other 4 segmental duplication pairs were ancient, diverging between 164.7 and 212.1 mya.
Non-synonymous (Ka) and synonymous (Ks) mutation frequencies can be used to assess selection pressure after duplication events, where Ka/Ks = 1 indicateed neutral selection, Ka/Ks < 1 showed purification selection, and Ka/Ks > 1 indicated positive selection of evolution. We calculated Ka/Ks values for SsAP2/ERF genes in tandem and segmental duplications to detect which selection type promoted AP2/ERF gene family evolution (Additional file 8). The Ka/Ks ratio of tandem duplication gene pairs among AP2/ERF genes ranged from 0.17 to 1.24, with an average of 0.57. Tandem duplication gene pairs having a Ka/Ks ratio < 1 accounted for 89% of the genes tested. The Ka/Ks ratio for segmental duplication gene pairs ranged from 0 to 2.6, with an average of 0.62, and 92% of pairs had Ka/Ks < 1. These results indicate that repetitive SsAP2/ERF genes were under intense purification selection pressure, and the duplication-producing gene had strongly evolved and maintained its functional stability. Meanwhile, nine replicate gene pairs had Ka/Ks > 1, indicating that these genes were subject to positive selection after duplication differentiation.
Evolutionary analysis of SsAP2/ERFs and other plant AP2/ERFs
A syntenic map of five representative species was constructed to examine the evolutionary origin of the AP2/ERF genes in sugarcane (Fig. 6 and Additional file 9). A total of 168 SsAP2/ERF genes were synonymous with genes in sorghum, followed by rice (151), wheat (145), corn (143), and Arabidopsis (19). Orthologous gene pairs were distributed across all SsChrs. Some SsAP2/ERF genes were associated with at least three pairs of corresponding genes (particularly AP2/ERF genes in sugarcane and wheat), suggesting that these genes played an essential role in SsAP2/ERF superfamily evolution. Interestingly, some collinear gene pairs (90 SsAP2/ERF genes) between sugarcane and rice/sorghum/wheat/maize were identified but were absent between sugarcane and Arabidopsis, indicating that these orthologous pairs formed after the divergence of dicotyledonous and monocotyledonous plants.
Analysis of putative cis-regulatory elements (CREs)
CRE is a non-coding DNA sequence that regulates transcription in a defined temporal/spatial expression pattern. CREs are important for understanding expression differences and mutations. We used PlantCare to identify putative CREs of 2,000 bp located on SsAP2/ERF genes (Fig. 7 and Additional file 10). These CREs were classified according to their function, and the number of CRE in each sequence was determined. Many CREs were predicted to respond to different hormone elicitors, such as abscisic acid (ABA), ethylene, salicylic acid (SA), jasmonic acid, methyl jasmonate, gibberellin, cytokinins, and auxins. Of these, ABA-reactive elements were the most prevalent. There were also CREs in genes that respond to various stress stimuli, such as dehydration, salt stress, injury, pathogens, oxidative stress, hypoxia, high osmotic pressure, heat, and cold stress; many were related to dehydration elements. A total of 64 SsAP2/ERF genes were found to contain cis-acting regulatory elements involved in the ethylene-responsive element (ERE), while 182 contained cis-acting elements involved in the abscisic acid response (ABRE). Another 108 SsAP2/ERF genes contained cis-acting elements involved in low-temperature response (LTR) or salicylic acid responsiveness (TCA), whereas 155 contained photoresponsive elements. The various stimuli response elements were unevenly distributed among the families. The promoter region in some AP2/ERF genes contained cis-acting elements for various transcription factors such as MYB, MYC, and ERE and DRE elements, indicating that there may be a mutual regulatory relationship between transcription factors.
Expression pattern of AP2/ERF genes during sugarcane development
To understand specific spatiotemporal expression patterns of SsAP2/ERF genes, we analyzed the expression profiles of the identified genes in 10 different tissues and at different developmental stages using publicly available gene expression data (Fig. 8,9 and Additional file 11). Among the SsAP2/ERF genes examined, 39 were not expressed in all tissue samples, 58 were expressed in all ten tissue types (FPKM > 0), and 27 were constitutively expressed (FPKM > 2). The 19 genes in DREB group I were all expressed in at least one sample, among which SsDREB17, SsDREB27, and SsDREB28 were highly expressed in all samples, indicating that these group I members are likely essential functional genes. The expression levels of 26 SsAP2/ERF genes increased during leaf maturation, suggesting that they may play an important role in leaf growth and development. The expression of SsDREB, SsERF91, SsERF98, SsERF39, and SsERF99 in stems was much higher than that in leaves in the three developmental stages. Among them, four ERF genes belonged to ERF group VII, indicating that they play an important role in cane stems. Expression of 11 genes, including SsDREB56, SsDREB48, SsERF15, and SsRAV3, as well as SsSoloist1 and SsSoloist2, in the three stages of leaf development, was higher than that in the stems, indicating that these genes play an important role in leaf development. The expression of 10 genes, including SsDREB55, SsERF62, and SsERF54 in different tissues in the pre-mature and mature stages, was higher than that seen in various tissues in the seedling stage. Among these genes, many were in the ERF subfamily, indicating that they play an essential role after the sugarcane plant enters the mature stage.
Expression analyses of SsAP2/ERF genes in response to abiotic stress and hormone treatments
To further confirm whether SsAP2/ERF gene expression was affected by different abiotic stresses and hormone treatments, we examined 12 genes from 5 families and analyzed their expression patterns following different treatments (Fig. 10, 11). Expression of all 12 genes was induced by NaCl, PEG6000, and ABA treatments, and the magnitude of the up-regulation was most significant at 3 hours after treatment. Different treatments had varying degrees of influence on gene expression. For example, except for SsSoloist4, which was down-regulated after 72 hours of NaCl treatment, the expression of the other genes examined was induced by NaCl, and the difference was significantly higher than that seen for the other three treatments. Some genes had opposite expression patterns under different treatments. For example, SsERF52 was down-regulated at one h, six h, and 12 h after PEG6000 treatment but was up-regulated at three h, and the expression levels were significantly lower than that for other genes. Meanwhile, SsERF52 was induced by salt stress at six treatment times, and the increases in expression were considerably higher than that for other genes, indicating that SsERF52 responded differently to salt stress and drought stress. SsAP2-13 was inhibited by GA treatment but induced by the other three treatments.