Identification and genomic distribution of B3 superfamily in citrus
A total of 72 (CsB3) and 69 (CgB3) B3 superfamily TFs were identified in the sweet orange and pummelo genomes, respectively (Additional file 1). As reported in Arabidopsis [1], the B3 superfamily members could be classified into LAV, RAV, ARF and REM family according to their sequence similarity. We further named these genes based on the family categories. In the present study, REM was found to be the biggest B3 family, with 52.8% (38 CsREMs) and 55.1% (38 CgREMs) of the total B3 genes identified in sweet orange and pummelo, respectivelyAdditional file 1). The ARF family constituted the second largest group of B3 superfamily, consisting of 26.4% (19 CsARFs) and 24.6% (17 CgARFs) of the total B3 genes of sweet orange and pummelo, respectively. On contrary, LAV and RAV are two relatively small families, with 11.1% (8 CsLAVs) and 9.7% (7 CsRAVs) of B3 genes identified in sweet orange, and 11.6% (8 CgLAVs) and 8.7% (6 CgRAVs) of B3 genes identified in pummelo.
CsB3 TFs were distributed over eight of the nine sweet orange chromosomes. None of the CsB3 genes was located on chromosome 9 (Fig. 1A). The CsB3 gene density per chromosome was patchy, with only three genes (4.2%) i.e. CsRAV5, CsARF11 and CsARF17 on chromosome 4, but up to 17 (23.6%) of the 72 members on chromosome 5. Relatively high densities of CsB3 genes were observed at the chromosome ends, of which the highest was located at the bottom of chromosome 5. However, the chromosomal locations for 10 CsB3 genes were not defined because of the incompleteness of sweet orange physical genome map. Meanwhile, the distribution and density of the CgB3 TFs were also not uniform on the nine chromosomes of pummelo (Fig. 1B). Chromosome 8 encompassed the largest number of 19 (27.5% of) CgB3 genes, whereas on chromosome 1 there were only three (4.3% of) CgB3 genes. The orthologous genes of B3 superfamily between sweet orange and pummelo were not located consistently on the same citrus chromosomes. For example, CsLAV7 was on chromosome 1 of sweet orange (Fig. 1A), whereas its orthologous gene CgLAV7 was on chromosome 2 of pummelo (Fig. 1B). These different locations of B3 TFs on chromosomes between citrus species indicated that genetic recombination have extensively occurred in citrus varieties. Among all identified CsB3 genes, a total of 10 chromosomal segmental duplication events and 4 tandem duplication events were identified in the sweet orange genomes, whereas in the pummelo genome the corresponding events were 11 and 9 respectively (Fig.1 and Additional file 2), indicating that segmental and tandem duplications may contribute to the expansion of citrus B3 superfamily. Segmentally duplicated gene pairs (average Ka/Ks = 0.22, Ka/Ks also known as non-synonymous/synonymous substitution ratio) appeared to have undergone extensive intense purifying selection compared to tandemly duplicated gene pairs (average Ka/Ks = 0.52). The Ka/Ks ratios for the majority (82.4%) of duplicated pairs were less than 0.5, suggesting that citrus B3 superfamily had evolved under the effect of purifying selection. However, the other two tandemly duplicated gene pairs (CgREM28–1/CgREM28–2 and CgREM6–1/CgREM29–2) seemed to be under neutral selection, as their Ka/Ks ratios were close to 1.0.
To further explore the phylogenetic relationship of B3 superfamily genes between citrus and other plant species, comparative syntenic analyses were conducted in a pairwise manner (Fig. 2), with 37 and 24 collinear B3 gene pairs identified in the sweet orange/Arabidopsis and sweet orange/rice pairs, respectively (Additional file 3). For pummelo/Arabidopsis and pummelo/ricepairs the corresponding gene pair numbers were 39 and 24. The number of orthologous events of CsB3/CgB3-AtB3 was higher than that of CsB3/CgB3-OsB3, indicating that the divergence between citrus and Arabidopsis occurred after the divergence of the rice and their common ancestor of dicotyledons. Note that some B3 collinear gene pairs of citrus/Arabidopsis were anchored to the highly conserved syntenic blocks, in which the number of syntenic gene pairs was up to 246, whereas none of syntenic blocks of citrus/Oryza sativa pair contained more than 20 genes (Additional file 3). The high level of syntenic conservation between the citrus and Arabidopsis indicated that B3 TFs in citrus might share the similar structure and function with orthologs in Arabidopsis.
Characterization of B3 proteins in citrus
To understand the molecular characteristics of B3 proteins in citrus, their physiochemical properties were analyzed. The amino acids length of putative citrus B3 proteins varied widely, ranging from 93 to 1134. The molecular weights and theoretical isoelectric points were also diverse (Additional file 1).
The majority of B3 TFs contained only one B3 domain except for some REM family members in citrus (Fig. 3D and 4D). Seven β-barrels and two short α helices of the known core structure were present among the B3 domains (Additional file 4 and 5). Amino acid sequences alignments showed that the B3 domain sequences were highly conserved in LAV, RAV and ARF families (Additional file 4), whereas the B3 domains of REM family exhibited a higher degree of divergence (Additional file 5). A total of 20, 38, and 24 highly conserved amino acid residues were identical among the B3 domains of all the LAV, RAV, and ARF family members, respectively (Additional file 4). For REM family members, only some conserved amino acid residues including one proline (position 27, P), two tryptophans (position 52 and 69, W), three glycines (position 49, 68 and 81, G) and three phenylalanines (position 30, 72 and 86, F) were observed in the B3 domains (Additional file 5), which indicated that the B3 domain might have been evolved independently in REM family.
In total, the five conserved motifs, viz. B3, AP2, AUX/IAA, ARF and CW-type zine finger, were identified in the B3 members (Fig. 3D and 4D). The number of the conserved motifs in each B3 protein varied from one to three. Each family of B3 proteins specifically shared some other conserved motifs, in addition to the B3 domain. For example, motifs ARF and AUX/IAA were specifically shared by ARF family, and the motif CW-type zinc finger and AP2 exclusively appeared in the LAV family and RAV family, respectively. Although most of these conserved motifs remain to be functional elucidated, it is likely that these motifs were evolutionarily conserved and functionally diversified in the specific families.
Phylogenetic analyses of B3 genes
To explore the phylogenetic relationships of B3 superfamily, an unrooted phylogenetic tree was constructed among the B3 genes of citrus (sweet orange and pummelo) and the model plant Arabidopsis (Additional file 6). According to the classification criteria in Arabidopsis, we further divided the four family members into fourteen major classes (Fig. 3A and 4A).
In detail, the LAV family could be subdivided into two classes, i.e. LEC2-ABI3classI) and VALclass (II). Four CsLAVs in sweet orange (CsLAV1, CsLAV2, CsLAV6 and CsLAV8) and theircounterparts in pummelo (CgLAV1, CgLAV2, CgLAV6 and CgLAV8) were clustered with Arabidopsis LEC2-ABI3subgroup. The VAL subgroup of four citrus LAV genes (CsLAV3/CgLAV3, CsLAV4/CgLAV4, CsLAV5/CgLAV5 and CsLAV7/CgLAV7),, which had conserved B3 domain and CW-type zinc finger, were clustered with three Arabidopsis VAL proteins (Fig. 3 and Additional file 6).
The RAV family was grouped into two main classes based on their phylogenetic relationship. The Class I comprise three citrus RAV genes (CsRAV1/CgRAV1, CsRAV2/CgRAV2 and CsRAV4/CgRAV4) that clustered with four AtNGA genes and three AtRAV-like genes from the same branch (Fig. 3A and Additional file 6). These genes commonly had the conserved B3 domain and contained no more than one intron (Fig. 3C and 3D). Classes II was comprised of four CsRAV genes (CsRAV3, CsRAV5, CsRAV6 and CsRAV7) and three CgRAV genes (CgRAV3, CgRAV5 and CgRAV6),, featured by a B3 domain with an upstream AP2 domain (Fig. 4D), which have no intron except CgRAV5 (Fig. 3C).
All citrusARF geneswere classified into four major classes. Class I and II belonged to the same branch, and contained 6 members (CsARF1/CgARF1, CsARF3/ CgARF3, CsARF5/CgARF5, CsARF11/CgARF11, CsARF17/CgARF17 and CsARF18) and 5 members (CsARF2/CgARF2, CsARF7/CgARF7, CsARF8/CgARF8, CsARF15/CgARF15 and CsARF16/CgARF16),, respectively (Fig. 3A and Additional file 6). Most of them were characterized by the B3 DNA binding domain, ARF and AUX/IAA (Fig. 3D). Class III (CsARF4/CgARF4, CsARF6/CgARF6, CsARF10/CgARF10 and CsARF19) and Class IV (CsARF9/CgARF9, CsARF12/CgARF12-CsARF14/CgARF14) only had the B3 and ARF domains. All the coding sequences of ARF geneswere disrupted by introns, the number of which ranges from 2 to 15 (Fig. 3C).
As most of REMs in citrus possessed multiple B3 domains and had low sequence similarity with each other (Fig. 4D and Additional file 5), we decided to perform the phylogenetic analysis within each class of REM family. The first step of phylogenetic analysis was the comparison of the AtREM sequences with CsREM/CgREM sequences according to the previous study [4] (Additional file 6). After this initial analysis, six common REM types (REM I and REM VI to REM X) were identified between citrus and Arabidopsis, whereas REM V type (AtREM5) was exclusively identified in Arabidopsis. The vast majority of class I and class II genes contained one B3 domain, and shared homology with the AtREM I and VII type genes, respectively (Fig. 4 and Additional file 6). The classes III and IV genes belonged to the AtREM IX and X type, respectively, which possessed only one B3 domain and presented a relatively low expression level among the most detected tissues. Class V (AtREM VI) and class VI (AtREM VIII) genes contained several members, the majority of which had more than one B3 domain.
Expression profiles of B3 genes in different tissues and during somatic embryogenesis
To understand the tissue expression profiles of the B3 genes in citrus, we compared their transcript abundance based on the previously published RNA-seq data of different tissues including leaf, fruit callus, flower, ovule and seed (Fig. 3B and 4B). The hierarchical cluster analysis showed that many citrus B3 genes exhibited high transcript abundance level in all the tissues. However, LEC2-ABI3subgroup and two REM classes (REM IX type and REM X type) exhibited relatively lower expression level compared with other CsB3 genes. In addition, some of the B3 TFs exhibited tissue-specific expression. For example, CsLAV1/2/6/7, CsARF9/19, CsREM3/4/6/7/9/13/14/17/27/28/29 showed the highest transcript abundance in the embryogenic callus (EC), whereas CsREM24 was expressed predominantly in fruit. These genes may be involved in certain biological processes that occurred in the corresponding tissues. Some duplicated gene pairs also showed divergent expression profiles. For example, CgARF13 showed a low expression level (RPKM = 2.76; RPKM: reads per kilobase per million mapped reads) in fruit; whereas its duplicated gene, CgARF14, was highly expressed (RPKM = 56.13) in fruit. These results may suggest that duplicated genes may evolve to have diverse functions. Some clustered citrus B3 genes, which were considered as orthologous genes between sweet orange and pummelo species, showed different expression profiles. For example, CgARF17 was mainly expressed in leaf (RPKM = 59.06) and ovule (RPKM = 57.40) of pummelo, whereas its orthologous geneCsARF17) of sweet orange showed relatively low expression in all detected citrus tissues, with RPKM values ranged from 4.16 to 7.57. These species-specific expression differences suggest that novel functional roles of B3 genes might have been generated during citrus domestication.
To explore the possible involvement of CsB3 genes during citrus SE, the expression profile of 23 CsB3 genes was investigated by qRT-PCR in the six SE stages of ‘Valencia’ orange, a citrus variety with strong SE capability. These genes were carefully selected based on their relatively high transcript abundance or specifically higher expression level in EC according to RNA-seq data. Based on their expression profiles, these genes could be classified into four types (Fig. 5). The expression of Type I genes was up-regulated during differentiation and showed a relative high peak value at E2 stage (embryogenic callus induced for somatic embryos for 2 weeks;CsARF1, CsARF14, CsREM17 and CsREM18) or E4 stage (embryogenic callus induced for somatic embryos for 4 week;CsLAV1, CsREM4, CsREM5,CsREM13 and CsREM29),, and then down-regulated at the early embryo morphogenesis stage (GE, globular embryos), whereas they showed another high peak at late embryo morphogenesis stage (CE, cotyledon embryos). Type II genes comprise five CsLAVs (CsLAV2, CsLAV3, CsLAV5, CsLAV6 and CsLAV7),, one CsRAV (CsRAV3),, two CsARFs (CsARF5 and CsARF19) and one CsREM (CsREM27),, and specifically expressed highly at CE stage, some of which also showed high transcript abundance in one other stage. For Type III genes (CsLAV4, CsARF12 and CsREM6),, the mRNA abundance was down-regulated during differentiation stages (E0-E4, embryogenic callus induced for somatic embryos for 0–4 weeks), but was higher at the subsequent stages of embryo morphogenesis (GE or CE). However, genes in type IV (CsARF7 and CsREM9) increased progressively throughout the whole SE process.
Candidate B3 TFs potentially involved in embryogenesis and callus initiation
To identify the B3 regulatory factors potentially involved in embryogenesis and callus initiation, protein sequence and expression pattern were compared among the B3 genes of sweet orange and pummelo (Fig. 3 and 4). A total of 15 CsB3 genes which were specifically accumulated in EC were retrieved from the RNA-seq data, including five CsLAVs (CsLAV1 to CsLAV4 and CsLAV7),, two CsARFs (CsARF12 and CsARF19) and eight CsREMs (CsREM4 to CsREM7, CsREM9, CsREM13, CsREM27, CsREM29) (Fig. 3B and 4B). Among their orthologous genes, eight (five CgLAVs, CgREM13, CgREM27 and CgREM29–1) were preferentially expressed in the seeds of pummelo, suggesting that these genes may associated with embryogenesis in vivo and in vitro. Meanwhile, eight B3 genes were identified in the genome of sweet orange, but not in that of pummelo, including CsRAV7, CsARF18, CsARF19, CsREM24, CsREM25, CsREM33, CsREM37 and CsREM38. Among them, CsARF19 (Cs7g02210) showed markedly high expression levels (≥6-fold) in EC compared with the other tissues (Fig. 3B), indicating its potential association with callus initiation, because empirically, EC can only be induced from the seeds of the polyembryonic citrus genotypes. With the availability of the citrus genome sequences [43–47], two orthologs of CsARF19, MSYJ162170.1 (amino acids sequence identity of 99.36%) and Ciclev10030751m (amino acids sequence identity of 99.87%), were identified in Mangshan mandarin (Citrus reticulate, a wild mandarin) and Clementine mandarin (C. clementina,which is believed to be a chance hybrid of mandarin and sweet orange) [45, 47, 48], respectively, but not in atalantia (Atalantia buxifolia, a primitive citrus), Ichang papeda (C. ichangensis, a wild citrus) and three relative genera of citrus, viz. hongkong kumquat (Fortunella hindsii),, trifoliate orange (Poncitrus trifoliate) and citron (C. medica)..