Identification and characterization of GRAS genes in six Cucurbitaceae crops
A total of 237 GRAS genes were identified in the genomes of six Cucurbitaceae crops. The number of GRAS genes was relatively consistent among these species: 37 in C. sativus, 36 in C. melo, 35 in B. hispida, 37 in C. lanatus and 37 in L. siceraria, with a considerably greater number of 55 in C. moschata (Additional file 1: Table S1; Additional file 2: Table S2). Cucurbitaceae GRAS genes contained 1–12 exons, with most containing only one exon (160, 67.51%). Among the 237 GRAS proteins, the length of the putative encoded proteins varied from 258 (Bhi11P000300) to 1,466 (Lsi06P016090.1) amino acids, the predicted molecular weight (MW) ranged from 29.83 (Bhi11P000300) to 164.43 kDa (Lsi06P016090.1), and the isoelectric point (pI) varied from 4.70 (Lsi01P011330.1, CmoCh01G003940) to 9.67 (Cla004408) with a mean of 5.66, demonstrating that most of the proteins are weakly acidic. These GRAS genes (except for BhiUN508M4, BhiUN508P6 and CmoCh00G001590.1, which are not anchored onto the chromosomes) were unevenly distributed across the genomes of the six Cucurbitaceae species (Additional file 3: Fig. S1a–f). For example, among the seven chromosomes of C. sativus, chromosome 3 is the longest, but it had only 7 GRASs, compared with the 10 GRAS genes present on chromosome 6 (Additional file 4: Table S3).
Phylogenetic relationships, conserved motifs and gene structures
To examine the phylogenetic relationships among the GRAS proteins of the Cucurbitaceae species, a phylogenetic tree was constructed using the complete set of 237 Cucurbitaceae GRAS proteins, together with 165 GRAS proteins selected from five other representative angiosperms: A. thaliana (Arabidopsis thaliana L.) (33), S. lycopersicum (Solanum lycopersicum L.) (49), O. sativa (Oryza sativa L.) (24), V. vinifera (Vitis vinifera L.) (29) and A. trichopoda (Amborella trichopoda L.) (30). All 237 GRAS proteins were classified into 16 subfamilies (Fig. 1; Additional file 5: Table S4). The PAT1 subfamily has the largest number of GRAS genes, followed by DELLA (27), HAM (26), LISCL (21) subfamily. Orthologous genes of the Scarecrow-Like A (SCLA) subfamily and Required for Arbuscule Development 2 (RAD2) subfamily, OG-NSP2-Amb, OG-NSP2-3 orthologous groups were not identified in the Cucurbitaceae species, suggesting these genes may be not needed and have lost during the evolution (Additional file 6: Fig. S2).
The GRAS proteins in Cucurbitaceae species shared a highly conserved C-terminus, which contained five distinct conserved motifs in the following order: LHR I, VHIID, LHR II, PFYRE and SAW (Additional file 7: Fig. S3). We identified an additional 20 conserved motifs among the 237 GRAS proteins (Additional file 8, 9: Fig. S4, 5), named motifs 1–20. Most of the motifs had a similar distribution pattern within the same subfamily (Additional file 8: Fig. S4; Additional file 10: Table S5). Some motifs were present only in specific subfamilies; for example, motif 20 was only present in the LISCL and HAM subfamilies (Additional file 8: Fig. S4b). A comparison of members from different GRAS subfamilies showed that those from most closely related subfamilies contained similar motifs. For example, members of the SHR group contained motifs 9, 16, 8, 2, 1, 13, 5, 14, 7, 6, 12, 3, 10, 19 and 4, but those in the NSP1 group contained motifs 9, 16, 8, 2, 1, 13, 5, 14, 7, 10, 19 and 4 (Additional file 8: Fig. S4b). In addition, we analyzed the gene structures of 237 GRAS genes by comparing the patterns of exon-intron architecture among the six Cucurbitaceae species (Additional file 8: Fig. S4c). The majority of GRAS genes within the same subgroup or between orthologous groups contained a similar gene structure. For instance, all genes in the OG-SHR-2, OG-SCL32-2, RAM1 and Ls subfamilies contained only one intron and two CDS.
Further, we compared the predicted motifs in the six cucurbits with those in other species including Arabidopsis, S. lycopersicum (Additional file 11: Fig. S6; Additional file 13: Table S6). Some motifs were specific in Cucurbitaceae species; for example, motif 7 (EFGDFNFPSANQSGFYQQDISKIGDQTNYQQPNSDCLIFDELLFGNDFTI) in the SCLB clade, (Fig. 2a) and motifs 9 (LDDTTAASRWVISFSDEFRHK) and 10 (MALDGDGGSFFSTDFTSVGKEDEDTVGD) in the RAD1 clade (Fig. 2b). These specific motifs may be responsible for the specific processes or traits in Cucurbitaceae species.
Duplication and synteny analysis of GRAS genes among six Cucurbitaceae species
Gene duplication events were surveyed to examine the expansion of the GRAS gene family in the six Cucurbitaceae genomes. Preliminary results showed that four types of duplicated GRAS genes (156 dispersed, 7 proximal, 8 tandem and 66 segmental genes) were present in the six Cucurbitaceae species (Additional file 14: Table S7). Homologous gene pairs (2 dispersed, 4 proximal, 8 tandem and 18 segmental genes) with a protein similarity greater than 80% and coverage greater than 80% were analyzed among the duplicated GRAS genes (Additional file 14: Table S7). In total, 18 segmental duplicated genes were identified in the C. moschata genome and two tandem duplicated genes were located in C. sativus, C. melo, C. lanatus and L. siceraria, whereas no tandem and segmental duplications occurred during the evolution of B. hispidaGRAS genes. Additionally, we observed that all duplicated gene pairs possessed Ka/Ks ratios lower than 0.5, suggesting that these genes underwent strong purifying selection during genome evolution (Additional file 15,16: Table S8,S9). Intra-genome synteny analysis for C. moschata showed that the duplicated GRAS genes occurred equally in subgenomes A and B (Fig. 3).
To infer the evolution of GRAS genes, synteny analysis was carried out among the six Cucurbitaceae species (Fig. 4; Additional file 17: Table S10). A total of 225 GRAS genes (37 in C. sativus, 36 in C. melo, 35 in B. hispida, 36 in C. lanatus, 37 in L. siceraria and 44 in C. moschata) were located within synteny blocks of the six Cucurbitaceae genomes. Some orthologous gene pairs showed a two-to-one syntenic relationship between C. moschata and five other Cucurbitaceae species. We found six gene pairs which exist among all five other species have two counterparts in C. moschata, for example, the Csa4P196810.1, MELO3C025282T1, Bhi03P001806, Cla012151 and Lsi01G016490.1 are orthologs, and have two collinear gene pairs (CmoCh03G002750.1 and CmoCh07G012270.1) in C. moschata (Fig. 4; Additional file 17: Table S10; Additional file 18: Fig. S8). These results show that these GRAS genes were conserved during the evolution of all the six cucurbit species, and the amplification in the C. moschata genome due to a recent whole genome duplication (WGD) event was well preserved, suggesting their conserved roles and the specific traits in C. moschata. In addition, some GRAS genes were found to have been lost in C. moschata. For example, some GRAS genes were existed in C. sativus, C. melo, B. hispida, C. lanatus and L. siceraria, but were lost in C. moschata (the Csa7P322070.1/MELO3C023684T1/Bhi09P001228/Cla015025/Lsi02G020110.1 collinear gene pair) (Fig. 4; Additional file 17: Table S10; Additional file 18: Fig. S8).
Expression analysis of GRAS genes in six Cucurbitaceae species
To investigate the potential function of Cucurbitaceae GRAS genes, we profiled their expression in different tissues, including roots, stems, leaves, flowers and fruits. Among the 237 GRAS genes, 191 were expressed in all analyzed tissues (Additional file 19: Table S11), and 60 were relatively high expressed (FPKM > 10) (Additional file 19: Table S11). Almost all the GRAS genes in the HAM, DELLA, LISCL, SCL4/7 and PAT1 subfamilies were expressed in all five tissues analyzed (Additional file 20, 21: Fig. S9, S10). Transcripts of three GRAS genes (MELO3C008036T1, MELO3C008170T1 and Lsi03G004130.1) could barely be detected in any of the five tissues (FPKM < 0.001) (Additional file 19: Table S11). Expression of some GRAS genes were barely detected in specific tissues (Additional file 19: Table S11; Additional file 21: Fig. S10). For example, MELO3C020907T1 (SHR subfamily) was not expressed in stems and transcripts of CmoCh00G001590.1 and CmoCh17G005960.1 (SHR subfamily) in leaves were nearly absent.
The expression profiles of GRAS genes from some subfamilies or orthologous groups varied greatly. For example, most OG-SCR-3 members were hardly expressed in any analyzed tissues, whereas expression level in most OG-SCR-1 and OG-SCR-2 members were high in all tissues (Fig. 5a). In the NSP2 subfamily, OG-NSP2-2 members were barely expressed in any tested tissues, in contrast to the low level of expression of all Cucurbitaceae OG-NSP2-1 genes in leaves (Fig. 5b).
We found 38 GRAS genes showing tissue-specific expression (tissue-specificity index, τ > 0.85), including 19 in roots, 1 in stems, 9 in leaves, 2 in flowers and 7 in fruits (Additional file 22: Table S12). For instance, Cla014025, Cla020203 and Lsi10G005540.1 (SHR subfamily); Cla014874, Bhi09P001022 and Lsi02G021610.1 (SCR subfamily) (Fig. 5a), Lsi01G009100.1, Lsi10G005540.1 and Cla014014 (SCL32 subfamily) (Fig. 5c) were highly expressed in roots; most genes in the SCL32 subfamily (Bhi03P000775, MELO3C007619T1, Cla007701, Csa6P495010.1, Csa7P044940.1) (Fig. 5c) were high expressed in leaves, etc.
Besides tissue-specific genes, genes in some subfamilies showed expression patterns that were specific to Cucurbitaceae species. For example, all Cucurbitaceae genes in the SCLB clade were relatively high expressed in fruits, where other clade members of species outside the Cucurbitaceae were hardly detectable (Fig. 5d); the same was true for members of the RAD1 clade (Fig. 5e).
In most cases, similar expression patterns were shown by duplicated gene pairs. However, a few exceptions were observed (Additional file 23: Fig. S11; Additional file 24: Table S13). For example, for the Cla_TanDup_1.1/1.2 duplicated gene pair, Cla_TanDup_1.2 was highly expressed in roots, flowers and fruits, whereas Cla_TanDup_1.1 weakly expressed in roots, flowers and fruits; for the Cmo_SegDup_6.1/6.2 duplicated gene pair, Cmo_SegDup_6.2 was highly expressed in roots and fruits, whereas Cmo_SegDup_6.1 was hardly detected in these tissues, etc. These differences in expression suggest that duplicated GRAS genes may have acquired divergent functions.