Identification and classification of CgbHLH members in pummelo
Based on the methods in “Materials and methods,” we finally identified 128 bHLH proteins from pummelo (Table 1 and Additional file 2: Table S1). These bHLHs showed 21.48% to 73.16% sequence identity with bHLHs of A. thaliana (AtbHLHs), and they were named CgbHLHs according to the highest sequence identity of AtbHLH (Table 1). If more than one CgbHLHs corresponded to the same AtbHLH, they were presented with an extra decimal point (e.g. CgbHLH29.1–CgbHLH29.11). The ORF length of the CgbHLHs ranged from 210 bp (CgbHLH29.1) to 4401 bp (CgbHLH73.2), encoding 69-1466 amino acids (Table 1 and Additional file 2: Table S1). Although these CgbHLHs showed large differences in length, 74.2% of them (95/128) were in the range of 700–2000 bp. The predicted MW and pI of CgbHLHs ranged from 7.56 to 163.79 kDa and 4.55 to 10.41, respectively (Additional file 2: Table S1). To explore the evolutionary relationship and the classification of CgbHLHs, a neighbor-joining phylogenetic tree was constructed with conserved sequences of 128 CgbHLHs and 136 AtbHLHs. Based on this phylogenetic analysis and the previous reported classification of the AtbHLHs, we classified 128 CgbHLHs into 18 subfamilies using Arabic numerals 1–18 (Fig. 1). Group 1 was the largest subfamily with 17 CgbHLH members, while the smallest, group 3, contained only 2 members.
Multiple sequence alignment and analysis of the gene structure, conserved motif and domain of CgbHLHs
As shown in Additional file 1: Figure S1, multiple sequence alignment of 128 CgbHLHs showed that most of them were highly conserved in their bHLH domains, except that CgbLH37, CgbLH87.1, CgbLH88, CgbLH29.1, CgbLH45 and CgbLH162.9 were absent in the basic region, and CgbLH96.2 and CgbLH25.1 were short of the second helix. There were 19 conserved amino acid residues with a consensus ratio higher than 50%, including the basic region of Glu-13, Arg-14, Arg-16 and Arg-17, the first helix region of Lys-19, Arg-23, Leu-27, Leu-30, Val-31 and Pro-32, the loop region of Lys-39 and Asp-41, and the second helix region of Ala-43, Leu-46, Ala-49, Ile-50, Tyr-52, Lys-54 and Leu-56 (Fig. 2 and Additional file 1: Fig. S1). Among these, Arg-16, Arg-17, Leu-27 and Leu-56 showed extreme conservation, with a consensus ratio higher than 80%. The basic region of CgbHLHs consisted of a maximum of 17 amino acid residues, which determined the DNA binding ability of CgbHLH proteins. Based on the rule developed by Toledo-Ortiz et al. (2003) [6], we identified 104 DNA binding CgbHLHs (more than 5 basic residues existing in their basic region) and 24 non-DNA binding CgbHLHs (less than 6 basic residues existing in their basic region). The DNA binding CgbHLHs were further subdivided into 98 E-box binders (containing Glu-13 and Arg-16) and 6 non-E-box binders (lacking Glu-13 or Arg-16) based on the presence or absence of Glu-13 and Arg-16. Most of non-DNA binding and non-E-box CgbHLHs were distributed in subfamilies 11, 12, 14 and 15 (Fig. 1). Among 98 E-box CgbHLHs, 58 proteins (containing His/Lys-9, Glu-13 and Arg-17) were predicted to bind the G-box motif (CACGTG), while 40 proteins were predicted to recognize other types of E-boxes (CANNTG) and were defined as non-G-box binders (Fig. 1).
Gene structure analysis showed that 127 CgbHLHs contained 1 to 14 exons, and one CgbHLH (CgbHLH73.2) contained 21 exons (Additional file 2: Table S1). A total of 6 members (CgbHLH37/43/87.2/88 in subfamily 11 and CgbHLH6/14.1 in subfamily 18) were intronless, and 51 members had no untranslated region (UTR). Conserved motif prediction showed the constitutions of motif 1 to motif 20 in each CgbHLH protein (other possible motifs are not shown). Details of the 20 motifs are shown in Additional file 2: Table S1 and Figure 3. Motifs 1, 2 and 3, located in bHLH domains, were found in almost all CgbHLHs, while the other motifs existed only in certain members. The members that phylogenetically clustered together or in the same subfamily often showed a similar gene structure and motif pattern. For example, the closely clustered pair CgbHLH93.1 and CgbHLH93.2 had identical motifs 1, 2, 3, 4 and 15. Most of the members of subfamily 1 contained motifs 1, 2, 3 and 4, and those of subfamily 2 contained motifs 4, 7 and 15. In addition, we found that motifs 9 and 10 were identified only in subfamily 5 and subfamily 6, respectively. Five types of motifs (motif 1, 3, 6, 18 and 19) were repeated twice or thrice in certain members, such as twice for motif 3 in CgbHLH48/50/57/137.1/137.2, twice for motif 6 in CgbHLH75.2, and thrice for motif 1 in CgbHLH155. As shown in Figure 3, conserved domain analysis showed that all 128 CgbHLHs had an HLH domain, but there were also 36 CgbHLH genes with other domains, which might be fusion genes.
Chromosomal distribution and gene duplication of CgbHLHs
According to the genome annotation information, we found that 125 CgbHLH genes were distributed on nine chromosomes, while the chromosomal location of three CgbHLHs (CgbHLH6/110.2/121) could not be determined (Fig. 4). Chromosome 5 (chr5) contained the most CgbHLHs (25 genes), followed by chr2 (22 genes), chr8 (18 genes) and chr1 (14 genes). Although chr4 was longer than chr7 and chr8, it contained the fewest CgbHLHs (only 4 genes). The gene duplication and collinear correlations analysis showed that a total of 12 CgbHLHs were identified to be tandem duplicated genes, distributed on chr1, chr2, chr8 and chr9 (Fig. 4). In addition, 47 CgbHLHs were predicted to be segmental duplicated genes, accounting for about 37% of all CgbHLH genes, and the identified collinear genes showed great overlapping with these segmental duplicated genes (Fig. 4). These results suggested that the tandem and segmental duplication events that occurred during citrus evolution might have played an essential role in CgbHLH family expansion.
GO annotation and protein interactions of CgbHLHs
To determine potential functions of each CgbHLH gene, GO annotation was performed for all CgbHLHs. As shown in figure 5A, except for two genes that were not annotated a GO term, the other 126 CgbHLHs were annotated in three functional categories: biological process (BP), cellular component (CC) and molecular function (MF). In the BP category, these CgbHLHs were further annotated to respond to abiotic stimulus (86 genes), reproduction (88 genes), post-embryonic development (84 genes), flower development (62 genes) and photosynthesis (93 genes). Under CC and MF categories, we found that all 126 CgbHLHs were annotated to the nucleus, protein or DNA binding, and transcription factor activity, which agreed well with TF property of these CgbHLHs.
Proteins in the bHLH family often interact with each other by forming homodimers or heterodimers, which are essential for binding and regulating downstream target gene expression. Thus, prediction of protein interactions of CgbHLHs was performed by using orthologous bHLHs of A. thaliana. The result showed that a total of 27 CgbHLH proteins were predicted to have protein interaction relationships (Fig. 6). For example, CgbHLH29 (FIT ortholog), CgbHLH39, CgbHLH47 (PYE ortholog), CgbHLH104, CgbHLH105 (ILR3 ortholog) and CgbHLH110 were predicted to have direct or indirect interactions with each other. CgbHLH33 (ICE2 ortholog) was predicted to interact with CgbHLH45 (MUTE ortholog), CgbHLH97 (FMA ortholog) and CgbHLH98 (SPCH) directly. Moreover, interactions among several phytochrome interacting factor (PIF) orthologous proteins (CgbHLH8, CgbHLH9, CgbHLH15 and CgbHLH72) were predicted. Overall, the result provides an important reference for identifying true interactions of CgbHLHs with biochemical experiments.
Candidate CgbHLH members in response to Fe deficiency
As described in the “Introduction,” bHLH genes play an essential regulatory role in plant Fe homeostasis. To identify candidate CgbHLH members in response to Fe deficiency in pummelo, their expression levels in the root were evaluated by analyzing the previous RNAseq data and performing qRT-PCR confirmation. Based on analysis of previous RNAseq data published by Guo et al. (2017) [39], a total of 95 CgbHLH genes were determined to be transcribed in the root of normal cultured pummelo (Fig. 5B). Their transcription levels (expressed as the TPM value) ranged from 0.05 (CgbHLH85.2) to 281.08 (CgbHLH105.1). These root-expressed genes are preliminarily considered to respond to Fe deficiency, as the root is directly responsible for ion uptake. To further narrow the range of candidates, we searched for root-expressed genes among the GO annotated abiotic stimulus responsive genes; as a result, 66 CgbHLHs were found to overlap (Fig. 5C). Due to lower sensitivity of qRT-PCR than RNAseq in determining gene expression, we finally selected 39 CgbHLHs with TPM values higher than 3.00 from these 66 genes for qRT-PCR confirmation. In addition, three genes (CgbHLH104, CgbHLH105.1 and CgbHLH105.2) that are orthologous to known Fe-related bHLHs but that were not included in the 39 CgbHLHs were also determined by qRT-PCR in this study.
Except for eight genes were still undetectable, the other 34 genes were determined to be up- or down-regulated in roots of pummelo after 0.5 d, 1.5 d, 2 d, 7 d, and 12 d of Fe-deficient (-Fe) treatments when comparing with those at corresponding time points of CK (Fig. 7). In general, the tested genes showed three types of expression patterns. The first type, such as CgbHLH3, CgbHLH6, CgbHLH13, CgbHLH14.2, CgbHLH16, CgbHLH29.4, CgbHLH30, CgbHLH39, CgbHLH48, CgbHLH63, CgbHLH68, CgbHLH73.1, CgbHLH79, CgbHLH80, CgbHLH102.1, CgbHLH104, CgbHLH105.1, CgbHLH107.2, CgbHLH122, CgbHLH123 and CgbHLH128, was up-regulated from an early period (0.5 d), which indicated an early response of these genes to Fe deficiency. The second type, such as CgbHLH14.1, CgbHLH33.1, CgbHLH49, CgbHLH69.1, CgbHLH91, CgbHLH93.1, CgbHLH96.1, CgbHLH105.2 and CgbHLH153, was up-regulated at medium periods but was down-regulated during early and late periods. The third type was only up-regulated during late periods, such as CgbHLH35, CgbHLH62, CgbHLH77 and CgbHLH130.2. Statistical analysis showed that most of the tested genes were significantly differentially expressed at multiple time points. In particular, CgbHLH6, CgbHLH14.2, CgbHLH16, CgbHLH48, CgbHLH63, CgbHLH79, CgbHLH80, CgbHLH104, CgbHLH105.1, CgbHLH123, CgbHLH128, CgbHLH93.1, CgbHLH62 and CgbHLH130.2 were significantly up-regulated while CgbHLH3, CgbHLH29.4, CgbHLH14.1, CgbHLH69.1, CgbHLH153 and CgbHLH77 were significantly down-regulated at least three time points. Moreover, CgbHLH16 and CgbHLH63 showed continuous up-regulation at all tested times. We speculate that these significantly differentially expressed CgbHLH genes are possibly the key candidates that respond to Fe deficiency in pummelo.