Identification and analysis of Hsf and Hsp genes in the cucumber genome
After HMM analysis, BLASTP and keyword search against the Cucumber Genomic Database (Version 3.0), a total of 23 Hsf and 72 Hsp genes in cucumber were identified, among which 33, 15, 12, 6 and 6 genes belonged to Hsp20, Hsp60, Hsp70, Hsp90 and Hsp100 families, respectively. These Hsf and Hsp genes from Cucumis Sativus were abbreviated as CsHsfs and CsHsps. These CsHsp and CsHsf genes were also named according to their chromosome orders. Detailed information of each CsHsf and CsHsp genes was shown in Table S1, including the gene name, gene ID, chromosome location, length of the open reading frame, number of exons, number of amino acids, molecular weight, protein isoelectric point, and subcellular localization prediction.
Structure and conserved motifs of CsHsf and CsHsp
According to the exon-intron arrangement of the coding sequence, the structural diversity in CsHsf and CsHsp genes was compared. In terms of intron number, intron, and exon length, the most closely related members of the same Hsf or Hsp subfamily share a similar gene structure (Fig. 1). Most CsHsfs have only one intron, and CsHsf-11, CsHsf-4 and CsHsf-18 have two introns, while CsHsf-17 has 11 introns, which is quite different from other CsHsfs (Fig. 1A). In the CsHsp20 family,13 (39%) CsHsp20s are intronless, 17 (52%) have only one intron, 2 (6%) have two introns, and only CsHsp20-32 has five introns (Fig. 1B). The number of introns in different members of the CsHsp60 family varied widely (0 to 17 introns), with CsHsp60-8, CsHsp60-14, and CsHsp60-15 which are most likely located in the mitochondria, have the largest number of introns (16 or 17). It was worth noting that CsHsp60-1 has no intron which is different from other CsHsp60 members (Fig. 1C). Except for CsHsp70-3, all CsHsp70s contained introns in the gene sequences (Fig. 1D). The CsHsp70s (CsHsp70-9, CsHsp70-6, CsHsp70-10, and CsHsp70-1) belonging to group I are mainly located in the cytoplasm and have only one intron. The CsHsp70-7 has a relatively complex gene structure and contained 13 introns. The CsHsp90s located in different subcellular positions have different intron numbers (Fig. 1E). Among them, CsHsp90-1, CsHsp90-2, and CsHsp90-4 which are mainly located in the cytoplasm have fewer introns (2 to 3), while CsHsp90-6 and CsHsp90-5 which mainly located in the mitochondria or chloroplast have more introns (18 to 19). Except for CsHsp100-3 with 16 introns, the number of introns of other CsHsp100s is less than 10, and only CsHsp100-3 is mainly located in the nucleus (Fig. 1F). In conclusion, it was found that the number of introns in CsHsps is closely related to their subcellular localization and evolutionary relationship.
Then, we conducted a prediction of conserved motifs shared among the related proteins in each subfamily using the Multiple Expectation Maximization for Motif Elicitation (MEME) and identified 10 putative motifs in each family. In general, most closely related members of the phylogenetic trees in different subfamilies had similar motifs (Fig.1). Details of these motifs were shown in Table S2.
Like Hsfs in other plants, the protein structure of CsHsfs is also very conserved. Based on analyses of Pfam, CDD, and SMART, we found all the CsHsfs proteins contain the DBD domain composed of motif 1 and motif 2. Multiple alignments of the CsHsf protein sequences revealed a highly conserved DBD domain existed in all CsHsfs (Fig. 2A). The HR-A/B region was an essential domain in Hsfs, which was characterized by the predicted coiled-coil structure (Guo et al. 2016). The HR-A/B domain in CsHsfs was composed of two typical motifs (motif 3 and motif 4) (Fig.1). Besides, 21 amino acids were inserted between the HR-A and HR-B regions in class A CsHsf proteins and an insertion of 7 amino acids was found in class C CsHsf proteins (Fig. 2B). However, there was no insertion between the HR-A and HR-B regions in Class B CsHsf proteins. AHA motif (motif 6) was distinctive in the great majority of class A CsHsf proteins, and 8 (66.7%) CsHsfA proteins have AHA motif. NLS and NES were of vital importance for intercellular distribution and interactions of Hsf proteins in the nucleus and the cytoplasm (Heerklotz et al. 2001). NLS of CsHsfs was predicted using the cNLS Mapper software and NES was predicted by NetNES. 17 (73%) CsHsfs contained NLS domains, and 12 (52%) CsHsfs contained NLS domains (Table S1-1).
All the CsHsp20s have a highly conserved α -crystallin domain (ACD) at the C-terminus. Multiple sequence alignment analysis and sequence logo showed that the ACD domain consisted of two conserved regions, a conserved region I and conserved region II (Fig. 3). Besides, the consensus region I contained motif 3, and the consensus region II consisted of motif 1, while motif 6 and motif 4 were inserted between the consensus region I and the consensus region II of the ACD domain (Fig. 1B).
Phylogenetic analysis of CsHsf and CsHsp
We further investigated the evolutionary relationships of Hsf and Hsp in cucumber, Arabidopsis, tomato, and rice, and based on the full-length amino acid sequences, we generated an unrooted phylogenetic tree by MEGA 7.0 using the neighbor-joining method. The CsHsf gene family could be divided into three classes: Class A (12 genes), class B (9 genes), and class C (2 genes), and each class can be further subdivided into subclasses according to the branches (Fig. S1). Besides, the subfamily of each CsHsp gene family could be allocated according to the predicted subcellular location of the protein. For example, 130 Hsp20s from cucumber, Arabidopsis, rice, and tomato were divided into 17 different subfamilies, including CI (Cytosol I) (42 genes), CII (7 genes), CIII (3 genes), CIV (3 genes), CV (4 genes), CVI (4 genes), CVII (3 genes), CVIII (10 genes), CIX (5 genes), CX (9 genes), CXI (5 genes), MI (mitochondria I) (6 genes), MII (7 genes), MIII (4 genes), P (plastids) (7 genes), Po (peroxisome) (3 genes) and ER (endoplasmic reticulum) (8 genes) (Fig. S2). The Hsp60 family consisted of four subfamilies. According to the prediction, 24 and 6 Hsp60s of the I and IV subfamilies were mainly located in the cytoplasm, 9 Hsp60s belonging to the II subfamily were mainly located in mitochondria, and 15 Hsp60s of the III subfamily were mainly located in the chloroplast (Fig. S3). Besides, 81 Hsp70s were divided into seven groups (I to VII), among which the I to V groups were clustered in the DnaK subfamily, and the VI and VII groups were clustered in the HSP110 / SSE subfamily (Fig. S4). The group I contained four CsHsp70s, which was the largest group, and these members were mainly located in the cytoplasm. Only CsHsp70-8 belongs to group II, which was involved in the metabolism of the endoplasmic reticulum. Members (CsHsp70-2 and CsHsp70-11) of group III functioned mainly in the chloroplast and only CsHsp70-5 belonged to group IV, which participate in metabolism occurring in the mitochondrion. CsHsp70-3 belonged to group V and was widely distributed in various subcellular regions (cytoplasm, chloroplast, mitochondria, plastid, and unknown regions). The Hsp90 family could be divided into five classes, among which the members of class I (CsHsp90-1 and CsHsp90-2) and II (CsHsp90-4) were mainly effective in the cytoplasm (Fig. S5). According to subcellular localization prediction, members of class III (CsHsp90-3), IV (CsHsp90-6), and V (CsHsp90-5) played a role in the endoplasmic reticulum, mitochondria, and chloroplast, respectively. The Hsp100 family consisted of four groups, in which members of groups I (CsHsp100-5) and IV (CsHsp100-1 and CsHsp100-4) produced a marked effect in chloroplasts, and members of groups II (CsHsp100-6) and III (CsHsp100-2 and CsHsp100-3) made a contribution to the homeostasis of mitochondria and cytoplasm, respectively (Fig. S6). All phylogenetic tree analysis showed that there was a common ancestor before monocotyledons and dicotyledons differentiated. Although the role of most heat shock genes in cucumber is yet to be elucidated, Hsf and Hsp genes with conserved functions in different plants may show a tendency to aggregate to the same subgroup and may have a recent common evolutionary origin.
Chromosomal location and synteny analysis of CsHsf and CsHsp genes
According to the cucumber genome database, 23 CsHsf and 72 CsHsp genes were located on 7 chromosomes (Chr) (Fig. 4). Although CsHsf and CsHsp genes were contained on each of the 7 chromosomes of cucumber, the distribution appeared to uneven. A relatively small density of CsHsf and CsHsp genes was found on chromosome 4 (10 genes), 6 (9 genes), and 7 (6 genes), while more CsHsf and CsHsp genes were located on chromosome 1 (22 genes), 2 (12 genes), 3 (18 genes) and 5 (18 genes), and most of CsHsps were distributed at both ends of chromosomes.
During the process of plant evolution, gene duplication, especially tandem and segmental duplication events, was the main mechanism for the expansion of gene families, made great contributions to the diversity of gene families (Kotak et al. 2004; Liu et al. 2012). In the analysis of duplication events of CsHsf and CsHsp genes, only CsHsp20 genes were identified for tandem duplication. Among the 33 CsHsp20 genes, 16 (49%) CsHsp20 genes had a tandem duplication event, resulting in the formation of 6 tandem duplication clusters (Fig. 4). On chromosomes 1 and 3, a total of three tandem duplication clusters were composed of three different genes in pairs (CsHsp20-7/CsHsp20-8, CsHsp20-9/CsHsp20-10 and CsHsp20-14/CsHsp20-15). In addition, two groups of tandem duplicated genes were located on chromosome 1 and 5, each of which included three genes (CsHsp20-3/CsHsp20-4/CsHsp20-5 and CsHsp20-21/CsHsp20-22/CsHsp20-23). Only one tandem duplication cluster was composed of four similar genes (CsHsp20-24/CsHsp20-25/CsHsp20-26/CsHsp20-27), which were present on chromosome 5. The above results showed that tandem duplications greatly promoted the expansion of the CsHsp20 gene family. In addition to tandem duplication events, using BlastP and MCScanX, we also identified 13 segmental duplication events including 10 CsHsf and 14 CsHsp genes, all of which improved the diversity of heat shock genes in cucumber (Fig. 4).
Furthermore, to analyze the selection of the above-duplicated gene pairs, the non-synonymous to synonymous substitution ratios (Ka/Ks) were calculated (Table 1). Ka and Ks values were worthy to analyze the selective pressure on a protein-encoding gene as well as to estimate the approximate date of duplication events. Ka/Ks ratio = 1 was commonly used to identify genes under the neutral mutation or no selection, and Ka/Ks >1 indicated the genes evolved under positive selection, while Ka/Ks <1 indicated the negative purifying selection. In this study, the Ka/Ks values of 22 pairs (96%) of duplicated genes were < 1, indicating that they had experienced strong purifying selection, and only one pair of tandem duplicated genes had a Ka/Ks ratio > 1, suggesting that these genes may have experienced positive selection. The Ks values of these duplicated gene pairs ranged from 0.2153 to 6.1528, corresponding to divergence times of 16.41 to 468.96 Mya (Table 1).
To further infer the phylogenetic mechanism of the CsHsf and CsHsp families, we performed a synteny analysis of the heat shock genes in cucumber, Arabidopsis, and rice (Fig. 5). A total of 52 pairs of heat shock genes showed the synonymous relationship between cucumber and Arabidopsis, followed by rice was 33. Many cucumber heat shock genes were homologous to both Arabidopsis and rice, and most of the homologous genes had Ka / Ks <1 (Table S3), suggesting that these genes were essential in plant evolution and contributed greatly to maintaining the function of heat shock genes.
Analysis of putative cis-acting elements in the promoters of CsHsf and CsHsp genes
To identify potential cis-acting elements located on the promoter regions of CsHsf and CsHsp genes, 1500 bp upstream sequences from translational start sites extracted from the cucumber genome database, were submitted to the PlantCARE database. As shown in Table 2, we analyzed 12 hormone response elements and 9 stress-induced components. Among the 95 genes, 60, 34, 55, 36, 50 and 38 (63%,36%,58%,38%,53% and 40%) genes had at least one type of abscisic acid (ABA) -responsive element, auxin (IAA) -responsive elements, ethylene (ER) -responsive elements, gibberellin (GA) -responsive elements, Jasmonic acid (MeJA) -responsive elements and salicylic acid (SA) -responsive elements, respectively (Fig. 6). For stress-induced components, one or more ARE existed in 14 (61%) CsHsf and 61 (87%) CsHsp genes, which involved in Hypoxia-inducible response. What’s more, heat shock element (HSE) was found in 12 (52%) CsHsf and 40 (56%) CsHsp genes, and WUN-motif was presented in 9 (39%) CsHsf and 31 (43%) CsHsp genes, participating in wound response. Other elements were also predicted, such as MBS, LTR, TC-rich repeats, and W-box, known to function as stress-induced components in CsHsf and CsHsp genes, were effective at variable positions and can effectively respond to drought, low temperature, and biotic stress response, while GC-motif only existed in 4 CsHsp genes which were useful in anoxic specific inducible response (Fig. 6). The above analysis of cis-elements showed that CsHsf and CsHsp genes could respond rapidly under different stress conditions, maintaining physiological and metabolic balance, reducing the damage caused by unfavorable environments, to promote the normal growth of cucumber.
Expression patterns of CsHsf and CsHsp genes in different tissues
Cucumber Illumina RNA-seq data were obtained from the Cucurbit Genomics Database (http://cucurbitgenomics.org/rnaseq/home). Using the RNA-seq data, we analyze the expression patterns of CsHsf and CsHsp genes in different tissues including root, stem, leaf, male flower, female flower, ovary(unexpanded), expanded ovary (fertilized), expanded ovary (unfertilized), and tendril (Fig. 7). Except for that CsHsp20-6, CsHsp20-24, CsHsp20-26, and CsHsp20-30 were almost not expressed in any tissue or organ, most of the CsHsf and CsHsp genes were expressed in at least one tissue. Almost all of the CsHsp60s had low transcript levels in male flowers. Some genes had high levels in tendril, such as CsHsf-23, CsHsp20-4, CsHsp20-5 and CsHsp70-10. The wide distribution of CsHsf and CsHsp genes in various tissues ensures the normal morphology of cucumber tissues under stress.
Expression profiles of CsHsf and CsHsp genes in response to abiotic and biotic stresses
To explore the responses of CsHsf and CsHsp genes to biotic and abiotic stresses, the expression patterns of CsHsf and CsHsp genes in response to stresses were investigated using RNA-seq data. Our RNA-seq data were used to analyze the response of CsHsf and CsHsp genes to heat stress, and the RNA-seq data for CsHsf and CsHsp genes in response to NaCl and downy mildew stresses were obtained from the Cucurbit Genomics Database (http://cucurbitgenomics.org/rnaseq/home). According to the log2 (FPKM+ 1) value, the expression of many CsHsf and CsHsp genes was increased after heat treatment (Fig. 8). Interestingly, CsHsf and CsHsp genes which were up-regulated under heat stress were usually also up-regulated under NaCl treatment, indicating that cucumbers have similarities in response to heat and salt stress. Compared with the control, after 3 hours of heat treatment, the expression of CsHsf -7 belonging to HsfA2 increased sharply, indicating that it was a very sensitive receptor during the heat stress response in cucumber. Previous studies have shown that HsfA1 is a master regulator that triggers the thermal response, leading to the acquired thermotolerance in tomatoes and Arabidopsis (Mishra 2002; Yoshida et al. 2011), but the regulation of CsHsf-9 belonging to HsfA1 was not significant under heat stress in this study (Fig. 8A). Compared with the control, the expression level of most CsHsp20s increased significantly in the first 3 hours after heat treatment and then decreased gradually in the next 3 hours (Fig. 8B). After heat or NaCl treatment, the expression level of many CsHsp60s decreased, and their expression changed were not so obvious compared with other heat shock proteins (Fig. 8C). The infection of downy mildew reduced the expression level of most heat shock genes in cucumber, which may be related to the immune response of cucumber to pathogens. In conclusion, when cucumber is subjected to biotic and abiotic stresses, CsHsfs and CsHsps genes will respond rapidly, forming a complex reaction regulation network to obtain resistance.
Potential protein-protein interaction between CsHsf and CsHsp
To further explore the possible protein-protein interaction between CsHsf and CsHsp, an interaction network was constructed using the STRING program. As one of the most important heat shock transcription factors, CsHsf7 had potential interactions with CsHsp20s, CsHsp70s, CsHsp90s, and CsHsp100s, implying that CsHsfA2 is an activator of the downstream CsHsps and acts as a key regulator to improving adaptability to various stresses for cucumber (Fig. 9). According to predictions, there is a strong interaction between CsHsf9 and many CsHp70s (CsHsp70-6, CsHsp70-9, and CsHsp70-10), reflecting previous studies that Hsp70 and Hsp90 interact with HsfA1 under normal conditions and inhibit HsfA1 activity (Hahn et al. 2011; Ohama et al. 2017). In addition to the potential regulatory relationship between CsHsf and CsHsp, we have also detected many other interactions among CsHsps of different subfamilies, such as interactions among CsHsp60, CsHsp70, and CsHsp90, suggesting that the CsHsps of different subfamilies might also be activated or inhibited through interactions with each other when cucumber was subjected to various stresses.