Identification of SAC domain-containing proteins
HMMER searched was performed against the T. cacao, V. vinifera, G. hirsutum, G. raimondii and G. arboreum protein databases with SAC-domain PF02383 as a query. As a result, 6, 6, 29, 17, 13 putative SAC genes were identified initially. Meanwhile, all Arabidopsis SAC protein sequences were used as queries for TBLASTN. We checked all the sequences by Interpro online tool to search the SAC domain. Ultimately,6༌6༌24༌12༌12༌SAC domain-contained proteins were identified from the above five genomes respectively. All SAC genes in G. hirsutum are designated as GhSAC and named according to the order of the closest orthologues in Arabidopsis [44]. The accession number, chromosome distribution, protein molecular weight and length of the GhSAC genes were listed in Table 1. By comparison of number of genes in the three closely related species, SAC gene family members in G. hirsutum showed an obvious expansion of number of genes.
Table 1
Nomenclature of SAC genes
Gene Name | Chromosomes | Start | End | Gene Length(bp) | Gene ID | Protein(aa) | CDS(bp) | Locus |
GhSAC1.1A | A02 | 34096061 | 34112341 | 16280 | GH_A02G1007.1 | 908 | 2727 | + |
GhSAC1.1D | D02 | 24480105 | 24496398 | 16293 | GH_D02G1055.1 | 908 | 2727 | + |
GhSAC2.1A | A05 | 12818773 | 12826690 | 7917 | GH_A05G1394.1 | 812 | 2439 | + |
GhSAC2.2A | A06 | 125121473 | 125129160 | 7687 | GH_A06G2227.1 | 799 | 2400 | - |
GhSAC2.3A | A10 | 5084206 | 5091417 | 7211 | GH_A10G0522.1 | 807 | 2424 | + |
GhSAC2.1D | D05 | 11741598 | 11749565 | 7967 | GH_D05G1409.1 | 812 | 2439 | + |
GhSAC2.2D | D06 | 64016931 | 64024566 | 7635 | GH_D06G2261.1 | 799 | 2400 | - |
GhSAC2.3D | D10 | 4755877 | 4763002 | 7125 | GH_D10G0550.1 | 807 | 2424 | + |
GhSAC3.1A | A06 | 22413672 | 22420686 | 7014 | GH_A06G0875.1 | 827 | 2484 | - |
GhSAC3.1D | D06 | 15193386 | 15200340 | 6954 | GH_D06G0859.1 | 827 | 2484 | - |
GhSAC4.1A | A07 | 1784997 | 1791282 | 6285 | GH_A07G0178.1 | 834 | 2505 | - |
GhSAC4.2A | A13 | 105624342 | 105631137 | 6795 | GH_A13G2182.1 | 828 | 2487 | + |
GhSAC4.1D | D07 | 1777332 | 1783572 | 6240 | GH_D07G0189.1 | 834 | 2505 | - |
GhSAC4.2D | D13 | 59684021 | 59690829 | 6808 | GH_D13G2164.1 | 828 | 2487 | + |
GhSAC6.1A | A10 | 19377649 | 19381844 | 4195 | GH_A10G0992.1 | 444 | 1335 | + |
GhSAC6.1D | D10 | 11132719 | 11138798 | 6079 | GH_D10G0964.1 | 599 | 1800 | - |
GhSAC7.1A | A02 | 108027042 | 108033476 | 6434 | GH_A02G2039.1 | 596 | 1791 | + |
GhSAC7.1D | D03 | 171041 | 177362 | 6321 | GH_D03G0023.1 | 596 | 1791 | - |
GhSAC8.1A | A04 | 76778183 | 76783215 | 5032 | GH_A04G1114.1 | 602 | 1809 | - |
GhSAC8.1D | D04 | 47908493 | 47913530 | 5037 | GH_D04G1457.1 | 628 | 1887 | - |
GhSAC9.1A | A02 | 642432 | 657397 | 14965 | GH_A02G0080.1 | 1930 | 5793 | + |
GhSAC9.2A | A09 | 79901119 | 79918983 | 17864 | GH_A09G2310.1 | 1630 | 4893 | + |
GhSAC9.1D | D02 | 688844 | 703749 | 14905 | GH_D02G0086.1 | 1927 | 5784 | + |
GhSAC9.2D | D09 | 49049613 | 49062591 | 12978 | GH_D09G2248.1 | 1630 | 4893 | + |
GaSAC1 | chr03 | 39716438 | 39732638 | 16200 | Ga03G1088.1 | 908 | 2727 | + |
GaSAC2.1 | chr05 | 12981653 | 12989776 | 8123 | Ga05G1465.1 | 812 | 2439 | + |
GaSAC2.2 | chr06 | 130666881 | 130674547 | 7666 | Ga06G2488.1 | 799 | 2400 | + |
GaSAC2.3 | chr10 | 124288924 | 124296135 | 7211 | Ga10G2534.1 | 809 | 2430 | - |
GaSAC3 | chr06 | 20549396 | 20556110 | 6714 | Ga06G0884.1 | 827 | 2484 | + |
GaSAC4.1 | chr07 | 2003435 | 2009716 | 6281 | Ga07G0186.1 | 834 | 2505 | - |
GaSAC4.2 | chr13 | 118864840 | 118871627 | 6787 | Ga13G2361.1 | 828 | 2487 | + |
GaSAC6 | chr10 | 108262526 | 108268588 | 6062 | Ga10G1985.1 | 599 | 1800 | - |
GaSAC7 | chr02 | 305956 | 312394 | 6438 | Ga02G0025.1 | 596 | 1791 | - |
GaSAC8 | chr04 | 11351705 | 11356764 | 5059 | Ga04G0609.1 | 622 | 1869 | + |
GaSAC9.1 | chr03 | 639823 | 657965 | 18142 | Ga03G0085.1 | 1939 | 5820 | + |
GaSAC9.2 | chr09 | 81728068 | 81745949 | 17881 | Ga09G2424.1 | 1630 | 4893 | + |
GrSAC1 | chr05 | 22714765 | 22731541 | 16776 | Gorai.005G115800.1 | 908 | 2727 | + |
GrSAC2.1 | chr09 | 10929781 | 10938648 | 8867 | Gorai.009G144100.1 | 883 | 2652 | + |
GrSAC2.2 | chr10 | 60633465 | 60641801 | 8336 | Gorai.010G235900.1 | 799 | 2400 | - |
GrSAC2.3 | chr11 | 4476638 | 4484614 | 7976 | Gorai.011G056600.1 | 811 | 2436 | + |
GrSAC3 | chr10 | 14796621 | 14804827 | 8206 | Gorai.010G092400.1 | 827 | 2484 | - |
GrSAC4.1 | chr01 | 1641103 | 1648424 | 7321 | Gorai.001G017800.1 | 834 | 2505 | - |
GrSAC4.2 | chr13 | 54202495 | 54209306 | 6811 | Gorai.013G222100.1 | 828 | 2487 | + |
GrSAC6 | chr11 | 10841283 | 10848063 | 6780 | Gorai.011G097800.1 | 599 | 1800 | - |
GrSAC7 | chr03 | 159037 | 166166 | 7129 | Gorai.003G002700.1 | 596 | 1791 | - |
GrSAC8 | chr12 | 26630443 | 26635722 | 5279 | Gorai.012G115300.1 | 605 | 1818 | - |
GrSAC9.1 | chr05 | 705299 | 715171 | 9872 | Gorai.005G010100.1 | 1611 | 4836 | + |
GrSAC9.2 | chr06 | 48157290 | 48171812 | 14522 | Gorai.006G232600.1 | 1630 | 4893 | + |
VviSAC1 | chr14 | 8378602 | 8409165 | 30563 | VIT_214s0081g00460.1 | 614 | 1845 | - |
VviSAC3 | chr09 | 414338 | 425305 | 10967 | VIT_209s0002g00590.1 | 850 | 2553 | + |
VviSAC4 | chr11 | 427858 | 442383 | 14525 | VIT_211s0016g00440.1 | 835 | 2508 | + |
VviSAC7 | chr04 | 20669057 | 20711649 | 42592 | VIT_204s0044g00030.1 | 599 | 1800 | + |
VviSAC8 | chr08 | 7811659 | 7821276 | 9617 | VIT_208s0105g00480.1 | 608 | 1827 | - |
VviSAC9 | chr05 | 24160942 | 24192208 | 31266 | VIT_205s0094g00850.1 | 1644 | 4935 | + |
TcSAC1 | Chr08 | 19053519 | 19066755 | 13236 | Thecc.08G188300.1 | 913 | 2739 | + |
TcSAC2 | Chr06 | 21734116 | 21743385 | 9269 | Thecc.06G127600.1 | 814 | 2442 | - |
TcSAC4 | Chr09 | 3580273 | 3588581 | 8308 | Thecc.09G070200.1 | 844 | 2532 | + |
TcSAC6 | Chr01 | 304575 | 312871 | 8296 | Thecc.01G006400.1 | 598 | 1794 | - |
TcSAC8 | Chr05 | 667593 | 673314 | 5721 | Thecc.05G013700.1 | 590 | 1770 | + |
TcSAC9 | Chr04 | 26260218 | 26276209 | 15991 | Thecc.04G163000.1 | 1672 | 5016 | - |
Table 2
The cis-element analysis of GhSACs promoters
Gene | A | B | C | D | E | F | G | H | I | J |
GhSAC1.1A | 5 | 1 | 1 | 0 | 2 | 2 | 1 | 1 | 0 | 0 |
GhSAC1.1D | 4 | 3 | 1 | 0 | 1 | 2 | 1 | 1 | 1 | 0 |
GhSAC2.1A | 7 | 4 | 2 | 1 | 0 | 4 | 3 | 3 | 2 | 0 |
GhSAC2.2A | 4 | 0 | 0 | 0 | 0 | 0 | 5 | 5 | 0 | 0 |
GhSAC2.3A | 2 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 2 |
GhSAC2.1D | 6 | 4 | 0 | 1 | 0 | 5 | 3 | 3 | 1 | 2 |
GhSAC2.2D | 6 | 6 | 2 | 1 | 0 | 4 | 4 | 4 | 0 | 0 |
GhSAC2.3D | 4 | 1 | 0 | 1 | 1 | 1 | 2 | 2 | 0 | 2 |
GhSAC3.1A | 2 | 1 | 2 | 2 | 0 | 0 | 0 | 0 | 2 | 1 |
GhSAC3.1D | 3 | 4 | 5 | 2 | 1 | 2 | 1 | 1 | 2 | 0 |
GhSAC4.1A | 4 | 1 | 5 | 2 | 0 | 2 | 1 | 1 | 0 | 2 |
GhSAC4.2A | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 |
GhSAC4.1D | 6 | 4 | 5 | 1 | 0 | 3 | 1 | 1 | 0 | 1 |
GhSAC4.2D | 1 | 6 | 0 | 0 | 0 | 4 | 0 | 0 | 1 | 1 |
GhSAC6.1A | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 1 | 0 |
GhSAC6.1D | 2 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
GhSAC7.1A | 2 | 1 | 2 | 0 | 0 | 2 | 0 | 0 | 3 | 0 |
GhSAC7.1D | 2 | 2 | 2 | 0 | 0 | 3 | 0 | 0 | 2 | 0 |
GhSAC8.1A | 3 | 0 | 3 | 2 | 1 | 0 | 0 | 0 | 1 | 0 |
GhSAC8.1D | 4 | 0 | 3 | 3 | 2 | 0 | 0 | 0 | 1 | 0 |
GhSAC9.1A | 4 | 0 | 2 | 1 | 2 | 0 | 4 | 4 | 0 | 0 |
GhSAC9.2A | 3 | 3 | 4 | 0 | 0 | 2 | 1 | 1 | 0 | 0 |
GhSAC9.1D | 7 | 4 | 1 | 1 | 1 | 3 | 1 | 1 | 0 | 0 |
GhSAC9.2D | 3 | 2 | 2 | 1 | 0 | 2 | 2 | 2 | 0 | 0 |
Phylogenetic analysis of the GhSACs
We constructed a phylogenetic tree from a multiple alignment of SAC protein sequences, comprising 6 TcSACs from T. cacao, 10 VviSACs from V. vinifera, 12 GaSACs from G. arboreum, and 9 AtSACs from Arabidopsis. The phylogenetic analysis revealed evolutionary origin for these genes as well as more recent duplications. The SAC proteins were clustered into three groups (Fig. 1), as previously suggested [42]. Genes from these species are found in all three groups, suggesting that the higher plant species have at least one gene in each of the three groups.
Our phylogenetic reconstruction showed that the SAC family in cotton diversified after the common ancestor of cotton and Arabidopsis because SAC genes of group I and group II in G. arboretum were obviously more than in Arabidopsis. And most of the SAC proteins from the diploids had orthologs in the allotetraploid G. hirsutum, which derived from a hybridization of A group and D group genome ancestors (Additional file 1). The short branches separating the paralogs suggested that the hybridization event occurred relatively recently [45].
Chromosome Localization And Synteny Analysis Of Sac Genes
To determine chromosome distribution and gene duplication of the SAC genes,all the SAC genes in G. hirsutum were mapped to approximate chromosome positions (Fig. 2). These twenty-five GhSAC genes were distributed among the 17 chromosomes unevenly. Except for A1, A3, A8, A11, A12, D1, D8, D11 and D12, all chromosomes harbor at least one of the SAC genes. 12 and 12 SAC genes were found to located at the A-subgenome and D-subgenome respectively.
To further infer the phylogenetic mechanisms of SAC family, we constructed syntenic maps of T. cacao with G. raimondii and V. vinifera (Fig. 3). A total of 7 GrSACs and 4 TcSACs genes showed syntenic relationship with those in T. cacao and V. vinifera, respectively. TcSAC2 and TcSAC4 were found to be associated with more than one syntenic gene pairs between G. raimondii and T.cacao SAC genes, guessed that these genes may have played an important role of SAC gene family during evolution. In addition, VviSAC9/TcSAC9 gene pair identified between T.cacao and V. vinifera were not found between G. raimondii and T. cacao, which may indicate that this orthologous pair lost after the divergence of G. raimondii and T.cacao from their ancestors.
Gene structures and conserved domain of GhSACs
Gene structure analysis is important for studying genetic evolution. First, we mapped the domain structure by IBS software(version v1.0) (Fig. 4). Then, to understand the evolutionary relationship of SAC protein in G. hirsutum, we constructed the unrooted tree based on the alignments of full-length SAC protein sequences using MJ method of MEGA X. The 25 SAC proteins in G. hirsuhum were divided into three distinct groups (from I to III). Group I consist of the maximum number 14 of GhSACs, while group III contains only four GhSACs. The genomic sequence of the GhSACs genes ranged from 4195 bp to about 17 kb. To obtain further gene structure information, we compared the coding sequence with the genomic sequence of all GhSAC genes (Fig. 5). Different introns (from 6 to 19) were observed among the GhSAC genes. The genes possess maximum number of introns were in group II. The GhSAC proteins gene clusters that were divided into the same group exhibited similar structure. We used MEME to detect conserved motif in the GhSAC family. There were some differences between the groups. 20 conserved motifs were scattered among each GhSAC family (Fig. 5). All of the GhSAC proteins shared the same three motifs: M1, M2 and M3 these motifs together compose the SAC domain which was characteristic for all GhSAC family members.
The SAC domains of SAC proteins yeast and animal proteins are approximately 400 amino acids in length and consists of seven highly conserved motifs which appear to important for the phosphatase activities [1]. To examine in detail the motif organization of the SAC domains of the GhSAC proteins, we compared the SAC domain sequences between Sac1p and the GhSAC proteins and created the seven conserved motifs by the Weblogo online tools(Fig. 6A). Meanwhile, characteristic transmembrane motifs which followed by SAC domains in GhSAC proteins of Group II except GhSAC6.1A were also created.(Fig. 6B)
Sequence analysis showed that the GhSAC proteins except Group III contain all seven conserved motifs found in Sac1p (Additional file 3). The sixth conserved region contains a highly conserved CX5R(T/S) motif, which was identified as the catalytic motif in many metal-independent proteins and inositide polyphosphate phosphatases in previous reports. However, the putative catalytic core sequence RXNCXDCLDRTN located in motif VI is completely conserved among the GhSAC proteins (except these in Group III). This result suggests that GhSAC proteins may have SAC domain functions similar to those of yeast and animals.
In addition, we found that SAC proteins in subgroup III seemed to lack motif VII. However, in their place is a putative WW domain. WW domains have been shown to be involved in protein-protein interactions by recognizing Pro-containing ligands[46], and they are considered to be the smallest protein domain involved in protein-protein interactions. The WW domain is a short conserved region in a number of unrelated proteins, which folds as a stable, triple stranded beta-sheet. This short domain of approximately 40 amino acids, may be repeated up to four times in some proteins [47–49]. The name WW or WWP derives from the presence of two signature tryptophan residues that are spaced 20–23 amino acids apart and are present in most WW domains known to date, as well as that of a conserved Pro. It is frequently associated with other domains typical of proteins in signal transduction processes. The putative WW domain of these GhSAC proteins in Group III contained all the features typical of identified WW domains, such as the two Trp residues separated by 22 residues, and the presence of other conserved residues including the essential aromatic doublet and Pro. None of the other GhSAC proteins contains a putative WW domain. The functional significance of the putative WW domain in GhSACs of Group III remains to be investigated.
Cis-element analysis in the promoter regions of GhSAC genes
To identify the putative cis-acting regulatory elements, 2000 bp of sequence upstream from the start codon was isolated. Ultimately, we identified 44 different regulatory elements which divided into two main types: light responsive elements and hormone responsive elements from the promoter regions of GhSACs. (Table. 2)
Light responsive elements, including Box 4, G-Box, GT1-motif, GATA-motif and MRE, were enriched in the upstream promoter regions of GhSAC genes. Box 4, part of a conserved DNA module involved in light responsiveness, was the most abundant light responsive element in the promoters of GhSAC genes. The genes, except GhSAC6.1A, contained at least one Box 4 element. In addition, 19 members contained a G-Box element, 17 members contained a GT1-motif element, whereas 15 members contained a GATA-motif element. Then, we hypothesized that light could induce the expression of GhSAC genes through their responsive cis-acting elements, further regulating the balance between reproductive and vegetative growth.
The other important type of cis-acting elements in the upstream regions of GhSAC genes are plant hormone-responsive elements. In total, nine types of elements were found that respond to five respective kinds of plant hormones. These regulatory elements included ABA-responsive elements (ABREs), MeJA-responsive elements(TGACG-motifs and CGTCA-motifs), salicylic acid responsive elements(TCA-elements), auxin-responsive element(TGA-elements). This indicates that GhSAC genes may respond to ABA, SA and JA.
Expression profile of GhSACs
To understand expression patterns of these 25 GhSAC genes in G. hirsutum, we used publicly available transcriptome data to assess the expression of different tissues and organs. The analysis (Fig. 7) revealed that four GhSAC genes (GhSAC2.1A/GhSAC2.1D/GhSAC4.2A/GhSAC4.2D) predominantly expressed in flowers, whereas the expression of other genes was not significantly altered in different tissues and two genes (GhSAC2.2A/GhSAC2.2D) were not expressed in all tissues and organs. In addition, the expression of GhSAC genes were not significantly altered under different abiotic stresses conditions, i.e. cold, heat, salt and drought (Addition file 6). We also performed RT-PCR to confirm the expression levels of four GhSACs in different tissues, including roots, stems, leaves, bracts, sepals, receptacles, petals, pistils, anthers. There was very high sequence similarity within these GhSACs CDSs of A-subgenome and D-subgenome, so primers were designed to detect the transcription levels of genes both in A- and D-subgenome. As shown in Fig. 8, GhSAC2.1 and GhSAC4.2 genes were predominantly expressed in stigmas and stamens with little expression in other organs while GhSAC7.1 and GhSAC9.2 were expressed in all organs examined. All these genes had a relatively lower level of expression in roots, stems and leaves. These results suggest that GhSAC genes have diverse expression patterns and some genes may play dominant roles in particular organs.