2.1 Identification and characterization of tomato SlbHLH genes
In this study, the known bHLH proteins in rice and Arabidopsis were used as inquire sequences to screen all the possible bHLH TFs via the BLASTP program. Finally, a total of 195 putative non-redundant bHLH family members were identified from the known tomato genome database (SL4.0). Subsequently, these genes were named bHLH001-bHLH195 following their position on tomato 12 chromosomes (Fig. 1). After that, the molecular weight, protein size (aa), phosphorylation site and theoretical isoelectric point (pI) of 195 SlbHLH proteins were studied. The detailed results were listed in Table S1. The molecular weight and length of SlbHLHs varied widely, with molecular weight ranging from 9.80008 KDa (SlbHLH091) to 102.05852 KDa (SlbHLH182), and length varied from 86 aa (SlbHLH091) to 930 aa (SlbHLH182). The theoretical PI ranged from 4.65 (SlbHLH079) to 11.45 (SlbLHL007). The subcellular location prediction displayed that most of SlbLHLs were located in nucleus, and very few SlbLHLs were located in chloroplast and/or golgi apparatus. The prediction of phosphorylation site of SlbHLHs showed significant variations from 13 (SlbLHL127) to 190 (SlbLHL182), and most SlbHLHs contain more number of Ser sites than that of Thr and Tyr sites. In addition, over 90.8% of the SlbHLHs contain at least 30 phosphorylation sites.
2.2 Chromosome distribution of tomato SlbHLH genes
The chromosome localization investigation exhibited that the 195 SlbHLHs were mapped on all 12 chromosomes of tomato. Among these chromosomes, the Chr 1 contain the greatest number of SlbHLHs (29 genes), and the Chr 2 containd the second number of SlbHLHs (20 genes), the third (Chr 3, Chr 4, Chr 6) and the fourth (Chr 9, Chr 10) also cantian more SlbHLHs (19 and 18 genes, respectively). However, Chr 11 only contain four SlbHLHs, and the rest of chromosomes contain 11–14 SlbHLHs (Fig. 1). The interesting thing was that the chromosomes Chr 6 contain 19 SlbHLHs, while part of larger chromosomes (Chr 5, Chr8, Chr 11) only contain fewer SlbHLHs (11, 11, 4, respectively). In addition, most of the SlbHLHs were located at the ends of tomato chromosomes (Fig. 1). These date indicated that the distribution of SlbHLHs was relatively clustered and uneven among tomato chromosomes and was also disproportionate to chromosome size.
2.3 Phylogenetic relationship of SlbHLH proteins in tomato
To study the phylogenetic relationships of tomato SlbHLHs, the complete protein sequences of 169 Arabidopsis AtbHLHs were tested together with identified 195 tomato SlbHLHs in this investigation (Supplementary file 1, Supplementary file 2, Table S2). Then the MEGA-X software using the Maximum Likelihood method was performed to construct the unrooted phylogenetic tree. According to the topology of the tree, clades support values, and the classification of Arabidopsis AtbHLHs in previous study [6], the phylogenetic analysis exhibited that these 195 SlbHLH proteins were divided into 26 subfamilies (Fig. 2). The number of SlbHLH members included in the 26 subfamilies ranged from 1 to 20, with subfamily XII containing the most SlbHLHs (20) and subfamilies VIIIc (1) with only one SlbHLH. As to certain subfamilies, the bHLH TFs family displayed obvious species bias in the time of evolution, suggesting specific gene duplication happened after species differentiation [55]. Simultaneously, some genes exhibited one-to-one homology, for instance, the subfamily IVb (AtbHLH011 and SlbHLH129, AtbHLH047 and SlbHLH064, AtbHLH121 and SlbHLH028) and subfamily III (a + c) (AtbHLH029 and SlbHLH104), and subfamily IIIf (AtbHLH042 and SlbHLH151), and so on (Fig. 2). The member differences between AtbHLHs and SlbHLHs clustered in the same subfamily suggested that the bHLH gene family had apparent interspecific divergence between Arabidopsis and tomato.
2.4 Collinearity analysis of tomato SlbHLH genes
Genome duplication play significant roles in the process of evolutions and expansions of plant gene families [56]. To investigate the potential gene duplications amone the 195 SlbHLHs, collinear analysis of SlbHLHs was carried out. The results displayed that 13 tandem duplication among 195 SlbHLHs were found, including SlbHLH004-SlbHLH005, SlbHLH011-SlbHLH012, SlbHLH024-SlbHLH025, SlbHLH025-SlbHLH026, SlbHLH046-SlbHLH047, SlbHLH047-SlbHLH048, SlbHLH059-SlbHLH060, SlbHLH060-SlbHLH061, SlbHLH106-SlbHLH107, SlbHLH163-SlbHLH164, SlbHLH164-SlbHLH165, SlbHLH165-SlbHLH166, SlbHLH176-SlbHLH177 (Fig. 1, Table S3-1). And these SlbHLH genes displaying tandem repeats come from the same subfamily. Moreover, the MCScanX programs and BlastP were performed to find the segmental duplications and twenty-one gene pairs were found on 10 of 12 chromosomes as follows: SlbHLH016/168, SlbHLH030/054, SlbHLH032/042, SlbHLH040/087, SlbHLH045/081, SlbHLH049/085, SlbHLH057/114, SlbHLH062/111, SlbHLH063/112, SlbHLH065/110, SlbHLH068/105, SlbHLH078/189, SlbHLH079/190, SlbHLH081/191, SlbHLH083/192, SlbHLH086/193, SlbHLH115/182, SlbHLH130/162, SlbHLH131/161, SlbHLH133/142, SlbHLH132/139 (Fig. 3, Table S3-2). These results exhibited that some chromosomes (Chr02, Chr04, Chr06 and Chr12) harbored more linkage groups than others. Analogously, these linked SlbHLHs were linked inside their subfamilies. However, no fragment duplication of SlbHLH gene pairs were detected on Chr05 and Chr09 (Fig. 3). These results indicated that gene duplication events contributed to the SlbHLHs expansion.
2.5 Collinearity analysis of SlbHLHs between tomato and other plants
To further deduce the evolutionary relationships and origin of tomato SlbHLHs, the collinearity relationships between SlbHLHs and the orthologous genes of ten representative plants was investigated, including two model plants (Oryza sativa and Arabidopsis), two Solanaceae plants (Capsicum annuum and Solanum tuberosum), two Brassica plants (Brassica oleracea and Brassica rape), two Cucurbitaceous plants (Cucumis sativus and Citrullus lanatus), and two cereal plants (Zea mays and Triticum aestivum). The results showed that a total of 141 (72.3%) SlbHLHs shared collinearity relations with those in Solanum tuberosum, following by Capsicum annuum (68), Cucumis sativus (63), Citrullus lanatus (61), Arabidopsis (20), Brassica rapa (12), Brassica oleracea (11), Zea mays (1), Triticum aestivum (1), Oryza sativa (1) (Fig. 4). Therefore, the most collinearity relationship exists between tomato SlbHLHs and Solanum tuberosum genes, indicating a closer relationship between tomato and potato. Moreover, our results also exhibited that many SlbHLH genes had syntenic relationships with 2 to 3 genes of other plants, especially with Solanum tuberosum (Table S4). Analogously, the gene pairs where 2 to 3 SlbHLHs had the collinearity with the same one orthologous gene of tested twelve plants were also observed (Table S4). These results suggested that some orthologous genes may come from a common ancestor of these plants.
2.6 Motif composition analysis of SlbHLH proteins in tomato
To further evaluate the sequence diversities of SlbHLHs, we investigated the motif compositions. A total of 30 conserved motifs were identified from 195 SlbHLHs using the MEME tool on the basis of settings in Arabidopsis (motifs 1–30; Figure S1). In general, the SlbHLHs in the same subfamily exhibited similar motif compositions, which further consolidates the subfamily classification. The results exhibited that the motif number of each SlbHLH ranged from 1–9, which also showed differences in the composition of motifs. Motifs 1 and 2 were widely distributed in the most SlbHLHs, and other motifs were only exist in certain members. Universally, members in the same subfamily harbored similar motif compositions. For instance, all members in subfamilies VIIIc, XI and IVc cantain motifs 1, 2, 3, motifs 1, 2, 3, 9 and motifs 1, 2, 15, 19, respectively. And, some members in the same subfamily also had specific motifs in addition to their common motifs. Moreover, family members in different subfamilies also shared different specific motifs in number and compositions (Figure S1). These data indicated that the number and compositions vary significantly different bHLH subfamilies, and the specific motifs may indicate diverse and distinct functions of SlbHLHs in tomato.
2.7 Conserved domain and gene structure analyses of SlbHLHs
To analyze the sequence diversity of tomato SlbHLHs, the conserved domains and exon-intron structures of each SlbHLH were detected. Conserved domain detection by Batch CD-Search displayed that 20 different forms of bHLH domains with the higher diversity were detected and all SlbHLHs share one or two conserved bHLH domains. In the same subfamilies, the conserved bHLH domain containing in most SlbHLHs had a similar location distribution and gene members. And these SlbHLHs harbored the similar bHLH domain types with few exceptions. This results suggested that only bHLH domain can distinctly construct the phylogenetic relations among bHLH subfamilies. In addition, part of members of Vb, IIIb and Ib (1) contain additional ATG domain (Fig. 5A).
Gene structure analysis by Tbtools software showed that the numbers of SlbHLHs exon markedly varied from 1 to 13, with 25 SlbHLHs having no introns and 25 SlbHLHs only containing 1 intron. SlbHLH177 contained the biggest exon number (13), followed by SlbHLH177 with 12 exons. Furthermore, our results exhibited that the majority of SlbHLHs in the same subfamily generally had similar gene structures and displayed similar exon length and intron number. For example, most SlbHLHs had three or four exons, five or six exons and four exons in subgroups VIIIc (2), XI and IVc, respectively. However, some SlbHLHs exhibited exceptions with differential gene structures in the same subfamilies, such as SlbHLH177 in subfamily Ib (2) and SlbHLH136 in subfamily VIIIa (Fig. 5B). These results suggested that phylogenetic relationships of gene family members were mainly associated with conserved bHLH domains and gene structure.
2.8 Cis-element analysis in the SlbHLHs promoter regions
The cis-elements which are the non-coding sequences in the promoter region of genes are crucial for gene transcription and are universally participated in regulating multiple biological processes [57]. To study the potential regulatory mechanism of SlbHLHs responding to development, abiotic stresses and hormones, the cis-elements within the 2000bp upstream promoter regions of each SlbHLH gene were detected. The data displayed that a total of 4799 CREs in the pormoter sequence of SlbHLHs (Table S5-1) were predicted and 21 types of cis-elements were found (Fig. 6A, Table S5-1). Among them, SlbHLH096 which harbored a total of 44 cis-elements was the gene with the largest number of elements. The number of cis-elements varied in diverse subfamilies (Fig. 6B) and all cis-elements can be categorized as three broad categories (Fig. 6C) :
The first category of cis-elements were relevant to hormone response (1554), including abscisic acid responsive element (361, 7.52%), auxin responsive element (93, 1.95%), gibberellin responsive element (168, 3.52%), MeJA responsive element (420, 8.79%), ethylene responsive element (406, 8.50%) and salicylic acid responsive element (106, 2.22%). The promoter sequences of all subfamilies of SlbHLHs abundantly harbored the elements that in connection with ethylene responsive (ERE), salicylic acid responsive (TCA-element) and abscisic acid responsive (ABRE). Cis-elements (P-box, TATC-box and GARE-motif) associated with gibberellin response were nearly enriched in all subfamilies but absent from subfamily VIIIc (1) and III (a + c). Cis-elements associated with auxin response (TGA-element, AuxRR-core, AuxRE and TGA-box) nearly enriched in all subfamilies but absent from subfamily VIIIc (1) and VIIIa. Cis-elements associated with MeJA response (CGTCA-motif and TGACG-motif) early enriched in all subfamilies expect for the subfamily VIIIc (1) and IIIf. In addition, in the promoter regions of some SlbHLHs, such as SlbHLH015/017/023/029/070/096/110/147/148/175/195 shared multiple identical hormone responsive elements, indicating the possible that they had more intense and rapid responses to specific hormones. In the meantime, in the promoter regions of some SlbHLHs, such as SlbHLH010/027/028/030/073/110/186/ shared diverse hormone response elements indicating their potential functions in numerous hormone regulatory networks (Fig. 6B, Table S5-2).
The second category of cis-elements were associated with growth and development, containing light responsive element (2589, 53.95%), circadian control element (98, 2.04%), seed specific regulatory element (7, 0.15%), meristem expression related element (73, 1.52%), endosperm expression related element (49, 1.02%), cell cycle regulation element (3, 0.06%), root specific regulatory element (2, 0.04%), palisade mesophyll cell differentiation related element (25, 0.52%) and endosperm expression related element (49, 1.02%). Cis-elements related to light responsiveness (such as GA-motif, Box 4, G-box, chs-CMA1a, AE-box, TCT-motif, AT1-motif, GATA-motif et al.) were presented in all SlbHLHs promoters. Analogously, the cis-element related to meristem expression element (CAT-box) also existed in nearly all SlbHLHs except for 4 genes in subfamily IIIf and II. While other cis-elements involved in controlling growth and development were relatively insufficient, especially those elements associated with seed specific regulation, palisade mesophyll cell differentiation, cell cycle regulation, root specific regulation and endosperm specific negative expression. They were absent from numerous subfamilies such as RY-element for seed, MSA-like for cell cycle, motif I for Root specific regulation, HD-Zip 1 for Palisade mesophyll cell differentiation and AACA_motif for endosperm specific negative expression(Fig. 6B, Table S5-2).
The third category of cis-elements were related to stress responsiveness, including wound responsive element (151, 3.15%), drought responsive element (112, 2.33%), low-temperature responsive element (87, 1.81%), defense and stress responsive element (79, 1.65%), dehydration, low-temp, salt stresses related element (1, 0.02%) and flavonoid biosynthetic genes regulation element (18, 0.38%). Among them, cis-elements associated with stress and defense response (TC-rich repeats), low-temperature (LTR), wound response (WUN-motif) and drought response (MBS) presented in almost all subfamilies, while other two elements (dehydration and flavonoid biosynthetic genes regulation) were only existed in a few subfamilies (Fig. 6B, Table S5-2).
2.9 Protein interaction network analysis of SlbHLHs in tomato
Investigating the functional relations of SlbHLH proteins is conducive to revealing their regulatory network. So the STRING software was employed to investigate the portein interacting network of tomato SlbHLHs protein based on the orthologous analysis of Arabidopsis AtbHLHs (Figure S2). In general, multiple members such as PRE1 (SlbHLH041), BIM2 (SlbHLH071/095), PYE (SlbHLH064), bHLH071 (SlbHLH138), bHLH093 (SlbHLH031/045/081/191) and AT5G50915 (SlbHLH057/114) had more interaction relationships than other members did. Among these proteins, GL3 (SlbHLH140), RHD6 (SlbHLH040), AT1G61660 (AtbHLH112, SlbHLH037/053), RSL2 (SlbHLH083/192), LHW (SlbHLH115/141/182), LRL1 (SlbHLH118/184) and UPB1 (SlbHLH009/100) participated in the regulation of root development via the pathway of brassinolide or jasmonic acid or auxin. BIM1 (SlbHLH062/111), BIM2 (SlbHLH071/095), PAR1 (SlbHLH127), MYC2 (SlbHLH132/139) and RGE1 (SlbHLH002/003/006/024/146) play significant roles in controlling embryonic development and/or photomorphogenesis. AT2G41130 (AtbHLH106, SlbHLH080/173), AT3G20640 (AtbHLH123, SlbHLH130/162), ICE1 (SlbHLH065/110), PIF4 (SlbHLH1063/123), and TT8 (SlbHLH151) were participate in controlling low temperture and/or salt stress responses. AT4G29100 (AtbHLH68, SlbHLH067/109/121) and MYC2 (SlbHLH132/139) could enhance the drought stress tolerance through regulating ABA-induced genes under the drought stress. Therefore, the examination of interactions of SlbHLHs indicated that many SlbHLHs tend to form complexes via protein interactions to play significant roles in controlling development and/or stress resistance of plants. These serviceable information will be helpful to investigate the regulatory mechanism of SlbHLHs in regulating stress tolerance and plant growth and development.
2.10 RNA-seq of tomato roots in response to salt stress
Numerous members of various transcription factor families have been reported to be involved in regulating various abiotic stresses. To explore the members of different transcription factor families in response to salt stress, the 8-week-old uniform WT tomato seedlings were treated using 300 mM NaCl for 24h. RNA-seq of salt treated and untreated tomato roots was carried out. The data exhibited that four cDNA libraries of each line were generated to be sequenced and more than 31 million sequence reads of each cDNA library were yielded, representing > 9 Gb sequence data of each sample. Finally, a total of 25274 genes were identified from the salt-treated and untreated tomato roots using FPKM>1 as the standard for gene expression (Fig. 7A). The RNA-seq data of two biological replicates exhibited good correlations and can be used for further investigation (Figure S3A). A summary of RNA-seq, assembly, annotation, and mapping was provided in Table S6-1. Finally, a total of 5548 genes with 2552 genes down-regulated and 2996 genes up-regulated were indentified as the differentially expressed genes (DEGs) between salt-treated and untreated tomato roots using the padj < 0.05 (a adjusted p-value) as the significance cut-off in DESeq2 (Fig. 7A-C, Table S6-2). Using the H-cluster method, these 5548 DEGs also were divided into 4 clusters (sub-clusters 1–4, Figure S3B) with the similar expression trends of these DEGs of each cluster, which was consistent with the hierarchical clustering analysis. Among these DEGs, 2089 DEGs were indentified as the members of different transcription factors families, such as the AP2/ERF, B3, bZIP, DEAD, GRAS, bHLH, Homeobox, MYB, NAC, WRKY, MADS-box, Dof family, and so on (Table S6-3).
To detect the potential functions of these DEGs in response to salt stress, GO (gene ontology) functional enrichment analysis was carried out to categorize them on the strength of their functions in cellular component (CC), biological process (BP), and molecular function (MF). The data displayed that these DEGs were enriched in 888 GO terms (114 CC, 459 BP, 315 MF; padj < 0.05 ). For MF, the DEGs were principally enriched in enzyme activity, transcription factor activity and DNA binding. For CC, the majority of DEGs were associated with cell wall components, protein complex, chromosome, cytoplasmic part, extracellular components and cell nucleus. For BP, most of DEGs were connected to cell wall modification and metabolic, response to stress, transmembrane transport and carbohydrate metabolic process (Fig. 7D, Table S6-4). KEGG analysis exhibited that 106 pathwas were screened and most of the DEGs were mainly enriched in multiple pathways associated with biosynthesis, metabolism and plant hormone signal transduction of numerous amino acids (Fig. 7E, Table S6-5). These data manifested that these DEGs were participate in response to salt stress may via regulating the expression of genes related to cell component modification, amino acid metabolism, enzyme activity, transcription and transmembrane transport.
2.11 Transcriptome‑wide identification of salt-responsive SlbHLHs and their expression patterns under multiple abiotic stresses
The expression level changes of SlbHLHs in WT and salt treated tomato seedlings were analysed based on our transcriptome data in this study. The results showed that almost one-sixth of the SlbHLHs were identified as the DEGs in response to salt stress (Fig. 8, Table S6-6). To verify the RNA-seq results, the expression of 12 SlbHLH genes that displayed obviously different expression change in RNA-seq data were further detected under four kinds of abiotic stresses: drought, salt, cold and heat by qRT-PCR (Fig. 9). The data displayed that 12 SlbHLHs exhibited varied and remarkable expression levels under four abiotic stresses particularly in response to NaCl and PEG6000 stress (Fig. 9). Notably, SlbHLH083 and SlbHLH191 showed the highest induced expression levels under NaCl stress with about 67.4-105.3-fold changes. Higher and similar expression fold changes (8.8–20.3) were examined in the transcript levels of SlbHLH031/053/123/147, while while low induction levels (2.3-4.8-fold) were detected in SlbHLH017/018/089/160 expression under salt stress. The qRT-PCR data displayed good agreement with our RNA-seq data, but no remarkable induction expression of SlbHLH078 was observed since the its induced level showed no significant expression change compared with the 0h data (Fig. 9).
For drought (PEG6000) stress, the expression levels of SlbHLH083/191 were singnificantly upregulated with about 109.1-142.8-fold changes, following by the increased transcripts of SlbHLH031/123/147 with 12.8-35.8-fold. The expression of SlbHLH017/018/053/078/089/160 were increased with about 2.2–4.4 fold, while the SlbHLH010 transcripts showed no obvious changes. For heat stress, the expression levels of SlbHLH017/053/191 were dramatically upregulated with about 7.4-10.4-fold changes, followed by the increased transcripts of SlbHLH010/018/123/147 with 2.6-3.4-fold changes, and SlbHLH031/078/083/089/160 transcripts showed no obvious changes (Fig. 9). For the cold stress, the transcripts of SlbHLH017/018/053/191 were markedly increased with about 4.3-6.7-fold changes, followed by a little increased expression of SlbHLH078/123/160 with 1.2-2.0-fold changes. Two bHLH genes (SlbHLH083/089/134) showed no obvious expression changes, while the expression levels of SlbHLH010/031/147 were downregulated by cold stress (Fig. 9). In conclusion, the diverse and significant transcript levels of SlbHLHs may function as differential and important regulatory genes responding to multiple abiotic stress.
2.12 Expression profiles of SlbHLHs under different hormone treatments
The expression pattern detections of 12 SlbHLHs under various hormone treatments including ABA, IAA, ACC, MeJA and SA were further performed by qRT-PCR. These hormone had been shown that they play critical roles in response to various environmental conditions during plant development [58, 59]. These results displayed that the transcriptional accumulations of these SlbHLHs were increased or reduced to varying degrees after hormone treatments (Fig. 10). Wherein, the expression levels of SlbHLH017/031/053/078/089 could be induced to various degrees by all the hormones treatements (ABA, ACC, IAA, MeJA and SA). Contrarily, the trancscripts of SlbHLH010/018/123/147/160/191 were decreased under several hormone treatments. As a kind of stress hormone, ABA can induce the transcripts of almost all detected SlbHLHs to varying degrees except SlbHLH147. The transcripts of SlbHLH017/018/078/089 exhibited higher fold changes with 4.1-6.5-fold compared with the 0h data under ABA treatment. Notably, the transcripts of SlbHLH017/053/089/191 were significantly upregulated with about 12.01-70.7-fold changes, following by the induced expression of SlbHLH010/031/078/160 with 2.1-4.7-fold changes. The transcripts of SlbHLH017/089 were slightly induced by IAA (3.1-3.6-fold) and the accumulations of SlbHLH017/018/078/083/089/191 were also increased to a similar degree (2.1-3.2-fold) after treated by MeJA. Furthermore, SA treatment could markedly induce the transcripts of SlbHLH078/089/123 to 12.8-45.1-fold, followed by the upregulated rexpression of SlbHLH010/017/018/031/147/160 with 2.35-7.5-fold changes. In addition, the transcript accumulations of multiple SlbHLHs were decreased after treated by different hormone at all or some time points (Fig. 10). The data indicated that many SlbHLH genes may play crucial functions in response to multiple hormone (especially ACC and SA) and/or signal transduction.