Identification, physicochemical properties and chromosomal locations of GhMCTPs
AtFTIP1 is one of the well-researched MCTPs in Arabidopsis [25]. Its protein sequence was used as the query to search against the protein database of G. hirsutum for putative GhMCTPs. After confirming the protein domains of the BLAST hits in SMART database, we identified 31 GhMCTPs, each of which contained three to four C2 domains in their N-terminus and one to four transmembrane regions in their C-terminus. The putative GhMCTPs were numbered from 1 to 18 according to their sequence similarity to AtFTIP1 with syntenic GhMCTPs given the same number and a distinct subgenome letter (A or D) (Figure 5). These GhMCTPs were classified into five subfamilies based on phylogenetic analysis and previous classification of AtMCTPs [7], while both subfamily III and subfamily V were divided into a and b subclades (Figure 1A). The lengths of GhMCTPs protein sequences ranged from 730 (GhMCTP11_D10) to 1059 (GhMCTP14_D07) amino acids (aa). Correspondingly, GhMCTP11_D10 and GhMCTP14_D07 had the minimum and maximum molecular weight, respectively. The pI and Grand average of hydropathicity (GRAVY) of GhMCTPs ranged from 5.81 to 9.38 and -0.445 to -0.075, respectively (Figure 1A). All the GhMCTPs within the same subfamilies showed distinct GRAVYs, especially GhMCTPs within subfamily III, IV and V. GhMCTPs from subfamily V showed the lowest pIs that were less than 7, indicating their acidic nature and distinct molecular roles from other GhMCTPs. Notably, GhMCTPs within subfamily I and II showed similar pIs, whereas GhMCTPs within subfamily III, IV and V showed different pIs (Figure 1A), suggesting that GhMCTPs within different subfamilies had experienced different divergences during their evolution.
Thirty one GhMCTPs were unevenly distributed on 18 chromosomes, while the other 8 chromosomes didn’t contain any GhMCTPs. Most of the chromosomes contained 1-2 GhMCTPs, while both A08 and D08 contained 4 GhMCTPs. In addition, A subgenome contained more GhMCTPs than D subgenome (Figure 1B).
Phylogenetic analysis of MCTPs in twenty seven plant species
To understand the evolutionary relationships among MCTPs in plants, MCTP homologs in D. carota (15), C. canephora (13), S. lycopersicum (15), M. guttatus (14), V. vinifera (3), M. truncatula (17), G. max (27), P. persica (14), C. sativus (11), P. trichocarpa (21), G. arboreum (16), G. barbadense (29), G. raimondii (17), T. cacao (12), C. papaya (9), A. thaliana (16), B. rapa (18), O. sativa (11), S. bicolor (13), Z. mays (17), Z. marina (9), A. trichopoda (6), P. abies (4), S. moellendorffii (4), P. patens (6), C. reinhardtii (0) were identified with the same method used in GhMCTPs’ identification (Figure 2). AtMCTPs identified in our study were identical to those identified in the previous study [7]. There was no MCTP identified in chlorophytes (C. reinhardtii), suggesting that MCTPs began to form and evolve in terrestrial bryophytes, pteridophytes, gymnosperms and angiosperms (Figure 2). Different angiosperms had experienced different rounds of whole genome duplications (WGD) [36]. However, MCTP numbers in species that had experienced more WGDs didn’t increase correspondingly compared with MCTP numbers in their close relatives, such as 16 MCTPs in G. arboreum, 17 MCTPs in G. raimondii compared with 12 MCTPs in T. cacao and 18 MCTPs in B. rapa compared with 16 MCTPs in A. thaliana (Figure 2). In addition, MCTPs in two AtDt allotetraploids, G. hirsutum and G. barbadense, were less than the sum of MCTPs in D-genome G. raimondii and MCTPs in A-genome G. arboreum (Figure 2). These results suggested that MCTPs experienced gene loss after whole genome duplications. Phylogenetic analysis of 368 MCTPs in 26 plant species classified them into subfamily I-V and one outgroup with 53, 58, 123, 44, 80 and 10 members, respectively. Both subfamily III and subfamily V were divided into a and b subclades. MCTPs within subfamily III, IV and V were more divergent than those within subfamily I and II, which was consistent with different pIs and GRAVYs of GhMCTPs within subfamily III, IV and V (Figure 1A, 2 and Additional file 1: Figure S1). Six MCTPs in bryophytes (P. patens) and four MCTPs in pteridophytes (S. moellendorffii) were classified into outgroup, which was consistent with the previous classification [7]. It was noteworthy that MCTPs from subfamily V, III and I, II began to evolve in gymnosperms (P. abies) and early angiosperms (A. trichopoda), respectively, while MCTPs from subfamily IV began to evolve in dicots (Figure 2). Unexpectedly, there were only 2 MCTPs from subfamily V and 1 MCTP from subfamily III identified in V. vinifera (a dicot). These results indicated that the chronological order of MCTPs’ evolution might be outgroup, subfamily V, III, I, II and IV.
Evolution of intron numbers in MCTPs
To better understand the evolution of MCTPs in plant species, the intron numbers of 368 MCTPs identified in 26 plant species were comparatively analyzed. In bryophytes (P. patens), all the MCTPs (6) contained more than 10 introns. In pteridophytes (S. moellendorffii), two MCTPs contained 1-3 introns, while another two MCTPs were intronless. In gymnosperms and angiosperms, except that all the MCTPs (3) in V. vinifera contained 1-3 introns, ratios of intronless MCTPs in different species diverged significantly, ranging from 0.64 to 1.00 (Figure 3). These results suggested that MCTPs had experienced drastic intron loss during the speciation of early spermatophytes and the genesis of intron-containing and intronless MCTPs were species-specific during the evolution of spermatophytes. Noteworthily, higher ratios of MCTPs from subfamily III (0.19), IV (0.20) and V (0.19) contained introns than MCTPs from subfamily I (0.17) and II (0.07) (Figure 3), suggesting that not only the protein sequences but also the gene structures of MCTPs within subfamily III, IV and V were more divergent than those of MCTPs within subfamily I and II.
Domain architectures and conserved motifs of GhMCTPs
The conserved domains of GhMCTPs were obtained by searching against the SMART database (Additional file 2: Table S1) and six conserved motifs of GhMCTPs were found using MEME. To further investigate the conservation and diversification of GhMCTPs, the featured domains, 3-4 N-terminal C2 domains and 1-4 C-terminal transmembrane regions, and conserved motifs of GhMCTPs were demonstrated on the phylogenetic tree. All the GhMCTPs from subfamily I, II and IV contained 3 N-terminal C2 domains, whereas most members from subfamily III and V contained 4 N-terminal C2 domains, except GhMCTP7_A08, GhMCTP10_A07, GhMCTP16_D11 and GhMCTP17_D13. Members from subfamily I, II, IV (except GhMCTP13_A01) and V contained 4, 3, 2 and 2 C-terminal transmembrane regions, respectively, whereas members from subfamily III contained 1-4 C-terminal transmembrane regions. (Figure 4B). The transmembrane regions of GhMCTPs were confirmed by TMHMM program (Additional file 3: Figure S2). The different domain architectures of GhMCTPs from different subfamilies hinted their divergent roles in cotton growth and development. However, GhMCTPs within subfamily I and II had similar domain architectures, indicating their functional similarity, while GhMCTPs within subfamily III, IV and V showed relatively divergent domain architectures, which was consistent with their divergent pIs and GRAVYs.
Six conserved motifs were detected in most GhMCTPs, while GhMCTP8_D11 and GhMCTP11_D10 contained five conserved motifs. For most GhMCTPs, motif 1, 2 and partial motif 6 were detected in the end of N-terminus which was the corresponding region of the last C2 domain, while motif 3, 4, 5 and partial motif 6 were detected in the C-terminus. However, no conserved motifs were detected in the most regions of N-terminus (Figure 4C), suggesting that the last C2 domain and transmembrane regions were more conserved than the other C2 domains, whose divergence might contribute to the structural and functional diversification of GhMCTPs.
Orthologous GhMCTPs between A and D subgenome of G. hirsutum
To determine whether GhMCTPs from A and D subgenome exhibited functional divergence, we identified 13 syntenic pairs of homologous GhMCTPs between A and D subgenome of G. hirsutum and all these syntenic pairs were located on the similar positions of homologous chromosomes between A and D subgenome, except that GhMCTP12_A03 and GhMCTP12_D02 were located on the A03 and D02, respectively (Figure 5), which might be due to the large reciprocal translocation between A02 and A03 [37]. The synonymous distances (Ks values) between these detected syntenic pairs, partially representing sequence divergence between the two progenitor genomes (A genome and D genome) that formed G. hirsutum, ranged from 0.032 to 0.119. According to the Ks values, the divergence times of these syntenic GhMCTPs were estimated to be 6.20-22.84 million years ago (MYA), with an average of 12.6 MYA (Table 1). In addition, 13 and 14 syntenic pairs of homologous MCTPs found in G. barbadense, G. raimondii and G. arboreum showed similar ranges of Ks values and divergence times to those in G. hirsutum (Additional file 4: Figure S3 and Additional file 5: Table S2), which were wider than the previously estimated divergence time (6.2-7.1 MYA) of A and D progenitor genomes [34]. The Ka/Ks ratios between all the syntenic MCTPs were less than 1.0, implying that these syntenic MCTPs experienced purifying selection during the divergence of the two progenitor genomes and might perform similar functions.
Spatiotemporal expression patterns of GhMCTPs
The previously published transcriptome datasets of G. hirsutum (TM-1) were used to profile the expression of GhMCTPs in various tissues, including anther, filament, pistil, bract, sepal, petal, torus, root, leaf, stem, fibers and ovules at different developmental stages [34]. GhMCTPs from subfamily II were highly expressed in most tissues, especially in ovules at different developmental stages. GhMCTP7_A08, GhMCTP7_D08 from subclade IIIb and all the members from subclade Vb also showed high expression levels in most tissues (Figure 6), suggesting their constitutive roles in the development of various tissues. However, GhMCTPs from other subfamilies were only highly expressed in specific tissues. GhMCTP5_D08, GhMCTP5_A08, GhMCTP10_D07 and GhMCTP10_A07 from subfamily III were highly expressed in early developmental fibers and ovules at different developmental stages. GhMCTP11_A10, GhMCTP11_D10 from subfamily IV also showed specific expression in early developmental fibers and ovules at different developmental stages (Figure 6), suggesting their important roles in ovule and fiber development. These results revealed that GhMCTPs from different subfamilies had different expression patterns and might be involved in different biological processes.
Physicochemically different N-terminus and C-terminus of GhMCTPs
Since the N-terminus and C-terminus of GhMCTPs contained structurally and functionally different domains, which might be reflected by their physicochemical properties, we further analyzed the pIs and GRAVYs of the N-terminus and C-terminus of GhMCTPs (Additional file 6: Table S3). Both pIs and GRAVYs of full-length GhMCTPs were between those of N-terminus and C-terminus, and the C-terminus possessed higher pIs and GRAVYs than the N-terminus. Interestingly, the pIs of the C-terminus were almost invariable among all the GhMCTPs, while the pIs of the N-terminus varied significantly among GhMCTPs from different subfamilies and GhMCTPs within subfamily III and IV (Figure 7), suggesting that the N-terminus was more variable than the C-terminus and might be the main source of functional divergence of GhMCTPs. However, both the N-terminus and the C-terminus showed significantly different GRAVYs among GhMCTPs within the same subfamilies (Figure 7).
Evolutionary divergence of multiple C2 domains in the N-terminus of GhMCTPs
Since 3-4 C2 domains were contained in the N-terminus of GhMCTPs and showed great difference in protein sequences and physicochemical properties among GhMCTPs, we queried whether the 3-4 C2 domains contained in each of the GhMCTPs had different evolutionary histories or molecular roles and which C2 domain was more divergent among GhMCTPs than the other C2 domains. Four C2 domains of 4-C2-containing GhMCTPs and three C2 domains of 3-C2-containing GhMCTPs were designated as 4aC2, 4bC2, 4cC2, 4dC2 and 3aC2, 3bC2, 3cC2, respectively. The protein sequences of 107 C2 domains contained in 31 GhMCTPs (Additional file 2: Table S1) were used to construct the phylogenetic tree, which classified these C2 domains into subclade I-IV. Consistent with the multiple sequence alignment of the full-length GhMCTPs, in which the 4bC2, 4cC2 and 4dC2 of 4-C2-containing GhMCTPs were aligned with the 3aC2, 3bC2, 3cC2 of 3-C2-containing GhMCTPs, respectively (Additional file 7: Figure S4), the corresponding C2 domains of 4-C2-containing and 3-C2-containing GhMCTPs were classified into the same subclades. In addition, the C2 domains within subclade II and III exhibited larger sequence divergence than those within subclade I and IV (Figure 8). These results suggested that the 3-4 C2 domains contained in the GhMCTPs began to diverge before the formation of GhMCTPs probably through module exchange and fulfilled different functions in the multidomain GhMCTPs. Moreover, the more divergent 4bC2, 3aC2 and 4cC2, 3bC2 within subclade II and III might be the main source of GhMCTPs’ functional diversification.
The N-terminus of GhMCTP2_A08 and GhMCTP3_A09 interacted with GhFT
The most widely researched MCTPs were FTIP1s [7, 25, 29, 30], which interacted with FTs to mediate their transport from leaves to SAM. We chose three GhMCTPs with different evolutionary distances, including GhMCTP2_A08 (subfamily I), GhMCTP3_A09 (subfamily II) and GhMCTP16_A11 (subfamily V), to detect their interactions with GhFT via yeast two-hybrid assay. All the three full-length GhMCTPs couldn’t interact with GhFT, whereas the N-terminus of GhMCTP2_A08, GhMCTP3_A09 and GhMCTP16_A11 showed strong, weak, and no interaction with GhFT, respectively (Figure 9). This was consistent with the interaction between FT and N-terminal C2 domains of FTIP1 in Arabidopsis, rice and orchid [7, 25, 29, 30]. The transmembrane regions in the C-terminus might hinder GhFT’s interaction with GhMCTP2_A08 and GhMCTP3_A09 in yeast cells. The results suggested that the N-terminal C2 domains of GhMCTPs played key roles in transporting other regulators by direct interaction.