Identification and characterization of XTHs
Compared with the 48 and 27 XTHs in B. rapa and B. oleracea that have previously been reported [9], we identified 53 and 38 XTHs, which include some novel members of the family, while Bol012212 was filtered out in this study because of a lack of the XET_C domain. These genes were designated corresponding to the orthologous XTH genes in Arabidopsis (AtXTH) (Table 1). The identity of BraXTHs and their Arabidopsis orthologs ranged from 61% to 96%, while, the identity of BolXTHs and their Arabidopsis orthologs varying between 57% and 95% (Additional file 1). Where the final lowercase letter in the gene name is "a", this indicates the highest homology with Arabidopsis, "b" indicates the next highest homology, and so on. The capital letter A or C in the name indicates, respectively, the B. rapa Ar genome or the B. oleracea Co genome. The comparison results of BraXTHs reported in this paper and BraXTHs by Behar et al. [9] are shown in Additional file 2.
No orthologs of AtXTH1, AtXTH2, AtXTH6, AtXTH10, AtXTH14, AtXTH18 or AtXTH19 were found in the B. oleracea genome, while the genome of B. rapa lacked orthologs of AtXTH1, AtXTH3, AtXTH19 and AtXTH20. Thus more XTH genes have been lost from B. oleracea than from B. rapa.
The lengths of BraXTHs ranged from 212 (BraA.XTH24.c) to 473 (BraA.XTH3) amino acids, with the molecular weights varying between 24.37 kDa to 55.10 kDa, while, the length of BolXTHs ranged from 163 (BolC.XTH29.b) to 346 (BolC.XTH27.a) amino acids, with the molecular weights varying between 18.67 kDa and 39.87 kDa. BraXTH3 was the largest XTH protein in this study. It possesses an ER lumen protein retaining receptor (ER_lumen_recept: InterPro IPR000133, Pfam PF00810) domain in the N-terminal compared with other identified XTHs.
The theoretical PI values for XTHs ranged from 5.06 to 9.58 in B. rapa and 4.96-9.75 in B. oleracea due to the differences in the polarities of the amino acids making up these proteins. The numbers of introns in XTH genes were relatively similar in the two species; 86.8% of BraXTH genes and 89.5% of BolXTH genes had 2-3 introns, of which 24 BraXTHs and 19 BolXTHs had 3 introns, and 22 BraXTHs and 15 BolXTHs had 2 introns. The number of introns in BraA.XTH3 was the largest (7), while BolC.XTH29.b lacked introns.
The Plant-mPLoc server (http://www.csbio.sjtu.edu.cn/bioinf/plant-multi/) was used to predict the subcellular location of BraXTH and BolXTH proteins. The result showed that all XTH proteins were located on the cell wall. In addition to the cell wall, 20 BraXTHs and 12 BolXTHs were also predicted to localize in the cytoplasm. BraA.XTH3 was found to be located in both the cell wall and the endoplasmic reticulum. XTH localize just were bioinformatic speculation and the real situation will be experimental evidence. The signal peptide prediction results indicated that 46 BraXTHs and 33 BolXTHs had signal peptides.
Phylogenetic analysis of XTH proteins
In order to investigate the evolutionary relationship among different XTH gene family members, we used the full-length XTH protein sequences from B. rapa, B. oleracea and A. thaliana to generate a phylogenetic tree based on the Maximum Likelihood method, using a structurally characterized bacterial lichenase (1GBG, EC 3.2.1.73) as an outgroup(Fig.1, Additional file 3). Three groups (Early diverging group, Group I/II and Group III) were identified based on clade support values, the topology of the phylogenetic tree, and the previous classification of XTH families in Arabidopsis [6,13]. So far, XEH activity has only been reported in clade IIIA [9,42]. The early diverging close to the root was the smallest group, containing 12 members. There were 11 XTHs in Group IIIA and 20 in Group IIIB. The rest of the XTHs belonged to Group I/II, which included 22 AtXTHs, 35 BraXTHs and 23 BolXTHs. As Fig. 1 shows, XTHs from B. rapa and B. oleracea were clustered with their A. thaliana homologs. There were 41 sister pairs at the termini of phylogenetic tree branches that showed close relationships and 30 of these were orthologous pairs between the B. rapa genome and the B. oleracea genome.
Structure of XTH genes, pattern of motifs and structure-based sequence alignment in XTH proteins
Previous studies showed that the exon organization in Arabidopsis XTH genes is well conserved within each group [13,43].To better characterize the structural conservation and diversification of XTH genes during their evolution, the exon-intron organization of the coding sequences of individual XTH genes coding sequence was obtained for members of each group. Each XTH protein in the two species had a Glyco_hydro_16 domain and an XET_C domain. As shown in Fig.2B, 2C and Fig.3B, 3C, the Glyco_hydro_16 domain spanned the sequence of motifs 6-4-3-1-2-8, though some proteins lacked one or more of these motifs. The lengths of 4 BraXTHs and 9 BolXTHs, including 7 newly identified XTHs, are less than 250 amino acids, due to the deletion of 1 to 4 motifs from the Glyco_hydro_16 domain (Fig. 2, 3). The XET_ C domain mainly covered motifs 5 and 9. Fifteen BraXTHs and 10 BolXTHs also shared motif 10, forming the block 10-5-9. Six BraXTHs and 7 BolXTHs replaced motif 9 with motif 7, forming a new tandem motif pattern (motif 5-7 in tandem). Overall, motifs had a similar distribution within the same group.
In addition to XTH26, all genes in Group I contained 1-2 introns. Apart from XTH8, all genes in Group II contained 3 introns. All Group III genes in the two species had 3 introns with the exceptions of BolC.XTH29.a and BolC.XTH29.b. Generally, the motif patterns in different XTH proteins showed only small differences, and the genes that clustered in the same group showed similar patterns of gene structure.
The alignments of the XTHs together with PttXET16A (PDB id: 1UN1), a xyloglucan endotransglycosylase with known protein structure [44,45], were used to predict the secondary structures of the BraXTH proteins and BolXTH proteins using ESPript (http://espript.ibcp.fr/ESPript/ESPript/) (Additional Files 4 and 5). The position of the N-glycosylation site of PttXET16A and BobXET16A with known protein structure, was conserved [44-46]. The site also was conserved in 46 BraXTHs and 28 BolXTHs, but it was not found in 7 BraXTHs and 10 BolXTHs: BolC.XTH31, BolC.XTH32.a, BolC.XTH33, the 7 BolXTHs that lacked the EXDXE conserved active-site motif (BolC.XTH8, BolC.XTH20, BolC.XTH27.b, BolC.XTH29a\b and BolC.XTH32.b\c), BraA.XTH2.a, BraA.XTH31.a\b, BraA.XTH32.a\b\c and BraA.XTH33 (Additional file 4,5). Alterations of amino acid residues were found within this catalytic region in AtXTH11 and its homologs. In AtXTH11, EXDXE was replaced by ELCFQ, while it was replaced by GLCFQ in BraA.XTH11b and BolC.XTH11.a\b, and by QLCFQ in BraA.XTH11.a. Though XTH proteins identified in this study contained two characteristic conserved domains (Glyco_hydro and XET_C) by searching Pfam database, some XTHs lacked one or several α-helices or/and β-strands compared with PttXET16A. Comparative analysis showed motif 6 covered α1-helices, β1-β2 strands, motif 4 covered β4, part of β3 and β5, motif 3 covered β6 and part of β5, motif 1 covered β7-8, motif 2 covered β9-12, motif 8 covered β 13-14, motif 5 covered α1 and β 15, respectively. There is no uniform correspondence between motif and α-helices or/and β-strands.
Chromosomal distribution and duplication analysis of XTH Genes
The chromosomal locations of all XTH genes in both Brassica species were investigated based on their physical positions and are shown in Fig. 4. Excluding BraA.XTH10, which was positioned on a scaffold, the remaining fifty-two BraXTH genes had definite chromosomal locations; mapping onto the different chromosomes was uneven. Chromosome Ar03 in B. rapa carried the greatest number of genes (13), while Ar04 carried only one XTH gene. In B. oleracea, there were 34 XTH genes with definite locations and they were distributed among all chromosomes excluding chromosome Co06. Chromosome Co01 was a “hot region”, carrying the greatest number of genes (8); in contrast Co04 and Co05 each contained only one XTH gene. Incomplete genome assembly meant that definite chromosomal locations were not available for five XTHs: BraA.XTH10, BolC.XTH2, BolC.XTH27.b, BolC.XTH30.a and BolC.XTH32.c.
TD events contribute to the expansion of gene families and can produce tandemly repeated genes in clusters [47]. We obtained putative tandemly-duplicated XTH genes of the two Brassica species from PTGBase. As a result, 15 BraXTH genes and 8 BolXTH genes were found to be present in tandem arrays, representing 28.3% and 21.1% of the total XTH genes in B. rapa and B. oleracea respectively. These tandemly repeated genes were clustered, which was consistent with their chromosomal locations (Fig.4). Seven tandem arrays were identified on chromosomes Ar01, Ar02, Ar03, Ar08 and Ar010 in B. rapa. Protein BLAST analysis revealed that BraA.XTH17.a is 93% identical to BraA.XTH17.b, BraA.XTH22C is 99% identical to BraA.XTH22.c or BraA.XTH22.d, and BraA.XTH22.a is 100% identical and 75% coverage to BraA.XTH22.d. The identity of the other tandem gene pairs is varying from 55% to 68%. Four tandem arrays occurred on Co01, Co02, Co03 and Co07 in B. oleracea, with 58% to 84% identity of tandem gene pairs.
In A. thaliana, four tandemly duplicated gene arrays composed of nine AtXTHs were found (Fig.4). Tandem arrays including AtXTH1/2, AtXTH23/14 and AtXTH24/18/19 were located on chromosome At04 while AtXTH12/13/25/22 was on chromosome At05. It is worth mentioning that some genes that bear syntenic relationships to these tandem genes, though not AtXTH1/2, have a conserved tandem repeat pattern in both the B. rapa genome and the B. oleracea genome, suggesting that these tandem arrays arose before the divergence of A. thaliana and the Brassica ancestor.
Syntenic analyses of XTH genes
The ancestor of diploid Brassica species experienced a WGT event since divergence from the Arabidopsis lineage. Syntenic genes are orthologous genes located in fragments syntenic between different species that derive from a shared ancestor, and synteny analysis can be used to transfer gene annotations and investigate genomic evolution in related species [48]. We obtained the genes syntenic with the XTH genes of Arabidopsis for the two Brassica species by searching for ‘syntenic gene’ in BRAD(Additional file 6). According to comparative genomics analysis, the density and expression level of genes in different regions show some differences in the genomes of B. rapa and B. oleracea, which can be divided into three fractionated subgenomes which we denoted LF (Least-fractionated), MF1 (Medium-fractionated), and MF2 (Most-fractionated) according to the extent of gene retention [41,49]. Statistical analysis indicated that there were 13, 13, and 6 BraXTH genes and 9, 10, and 5 BolXTH genes located in the LF, MF1 and MF2 subgenomes respectively (Additional file 6). In summary, 60.4% and 63.2% of the total XTH genes in, respectively, B. rapa and B. oleracea were located in syntenic blocks. WGD events are therefore likely to have played a major role in the expansion of XTH genes in the two Brassica species. The identities of 75% (24 out of 32) BraXTHs and 62.5% (15 out of 24) BolXTHs with their Arabidopsis syntenic orthologs exceeded 80% (Additional file 6)
A total of 23 AtXTH genes had corresponding syntenic genes in the two Brassica species. The copy numbers of syntenic genes in the genomes of the two Brassica species differed. The first situation was one in which genes syntenic with AtXTH genes were completely preserved in the same syntenic block in the Ar and Co subgenome; 8 genes were of this type. In the second case, AtXTH genes were retained in the Ar genome but lost from the Co genome, this applied to AtXTH3 and AtXTH5. The third case was where AtXTH genes had more than one syntenic gene in B. rapa or B. oleracea. For example, 8 and 1 AtXTH genes had 3 syntenic genes in B. rapa and B. oleracea respectively. An AtXTH should theoretically correspond to 3 syntenic genes and if there are fewer than 3 it may be the result of gene loss after genome replication.
Selection forces acting on XTH duplicated pairs
To assess whether XTH duplicated pairs in Brassica species experienced different selective forces, Ka/Ks values were calculated (Additional file 7). A Ka/Ks ratio > 1 represents positive selection, Ka/Ks = 1 represents neutral selection and a Ka/Ks ratio < 1 represents purifying selection [50]. We found 33 and 18 segmentally duplicated XTH gene pairs in B. rapa and B. oleracea respectively. All segmentally duplicated XTH gene pairs had Ka/Ks <1, while two tandemly duplicated gene pairs (BraA.XTH22.a-BraA.XTH22.d and BraA.XTH22.c-BraA.XTH22.d) had no Ka/Ks value in B. rapa because they shared the same sequence.
The segmental duplications of the XTH genes in B. rapa originated between 0.34 Mya (Ks = 0.0103) and 28.80 Mya (Ks = 0.8640), with a mean of 12.88 Mya (Ks = 0.1436). After comparative analysis, the segmental duplications of the BolXTH genes were found to have originated from 5.37 Mya (Ks = 0.1612) to 32.12 Mya (Ks = 0.9637), with a mean of 13.20 Mya (Ks = 0.3960). Overall, the Ka/Ks ratios for segmental duplication of BolC.XTH11.b and BolC.XTH11.a, BraA.XTH2.b and BraA.XTH2.a, together with BraA.XTH23.a and BraA.XTH23.b, were >0.3, while the ratios for the other segmental duplication pairs were all <0.3, suggesting that significant functional divergence of some XTH genes might have occurred after the duplication events.
Expression patterns of XTH genes in different tissues of B. rapa and B. oleracea
To understand the variations in expression pattern for XTH genes, we analyzed XTH gene expression patterns across different tissues in the two species of Brassica based on RNA-Seq retrieved from the GEO database (Additional file 8). If the FPKM of a gene was less than 1, it was considered to be an unexpressed gene in this study, including BraA.XTH2.a/b, BraA.XTH5.b, BraA.XTH11.a, BraA.XTH12.a/b/c, BraA.XTH25.a/b, BolC.XTH5, BolC.XTH11.a, BolC.XTH20, BolC.XTH21, BolC.XTH22.b, BolC.XTH24.c, BolC.XTH25 and BolC.XTH26. In addition, BolC.XTH12 and BolC.XTH13 lacked FPKM values. On this basis, 44 BraXTH genes and 28 BolXTH genes were expressed in at least one tissue, while the remaining genes lacked expression data or were unexpressed in all the tissues tested, indicating that they might be non-functional or have specific temporal and spatial expression patterns that were not detected in this study. There were 23 out of 53 (approximately 43.4%) BraXTH genes and 14 out of 38 (approximately 36.8%) BolXTH genes that were widely expressed in all the tissues tested (root, stem, leaf, flower, silique and callus of B. rapa; root, stem, leaf, flower, silique, callus and bud of B. oleracea). The remaining 21 BraXTH genes and 14 BolXTH genes were expressed in at least one but not in all tested tissues. For example, BraA.XTH29.a and BraA.XTH29.b were expressed specifically in the flower; BraA.XTH10, BraA.XTH17.c, BraA.XTH17.d and BraA.XTH32.b were expressed in all tissues except callus. BolC.XTH2 was expressed solely in the silique and BolC.XTH29.b was expressed only in buds at low levels.
Clustering analysis of expression values showed that both the B. rapa and the B. oleracea XTH genes can be divided into four groups (Fig. 5). In B. rapa, XTH genes in cluster 1 were more highly expressed in the leaf than in the other tissues examined, while cluster 2 were expressed mainly in the root, apart from BraA.XTH32.b and BraA.XTH9.b. Cluster 3 showed higher expression in callus and group 4 was expressed mainly in flower, silique or callus. In B. oleracea, XTH genes in cluster 1 were highly expressed in the root, whereas cluster 2 was expressed mainly in the flower. Four genes in cluster 3 were expressed mainly in the stem or leaf and genes in cluster 4 were expressed mainly in the leaf, silique or callus. XTH genes in the same group based on phylogenetic analysis did not show the same expression patterns.
Some tandemly repeated family members, such as BraA.XTH22.a and BraA.XTH22.c in cluster 1, showed similar expression patterns across the tissues tested, indicating the possible existence of redundancy (Fig. 5A). However, most tandemly repeated members displayed distinct expression patterns. For example, BolC.XTH24.a and BolC.XTH24.b showed higher expression levels in the flower than the other tissues, whereas tandem repeats of them, BolC.XTH24.c and BolC.XTH20, were not expressed in these tissues. BolC.XTH17.a showed high expression in the root and low expression in the bud, leaf and silique, while BolC.XTH24.d showed high expression in the flower and low expression in the leaf (Fig. 5, Additional file 8). All XTH tandem genes in seven arrays were also analyzed and compared in B. rapa. A total of 2 tandem genes (BraA.22b/e and BraA.14b/23b) showed different abundances, but the same trend with respect to patterns, whereas the two members of each of the other pairs of tandem genes showed differences in abundance and tissue specificity of expression. In general, XTH genes in the two Brassica species exhibit differential patterns of expression across different tissues, leading to different functional clusters and suggesting functional divergence.