Genome-wide identification and phylogenetic analysis
We used the Hidden Markov Model (HMM) profile of HMAD domain (PF00403.26) from Pfam (http://www.pfam.sanger.ac.uk/) database as queries to search the HMAD members in G. hirsutum, G. arboreum, G. raimondii and G. barbadense by Hmmer software with default parameters. A total of 169 proteins were identified belonging to the HMAD gene family in G. hirsutum with the number of amino acids ranged from 56 to 1011. Furthermore, we identified 84, 76 and 159 HMAD proteins in each G. raimondii, G. arboreum and G. barbadense, respectively (Table S1).
In order to explore the evolutionary relationships of the HMAD gene family, an unrooted phylogenetic tree was constructed using the full length HMAD protein sequences from G. arboreum, G. barbadense,G. raimondii, G. hirsutum (Fig. 1). The HMAD proteins in the three Gossypium species were divided into five groups (I, II, III, IV, Va, Vb, Vc), which the HMA5-8 belongs to IV group (Table S2). Additionally, 87 orthologs of HMAD genes (Table 2) were identified in four Gossypium species (I account for 18.39%, II account for 18.39%, III account for 1.15%, IV account for 10.34%, Va account for 1.15%, Vb account for 20.69%, Vc account for 29.89%) (Figure 1), such as genes Gh_D08G1950 and Gh_A08G2387 of G. hirsutum are orthologs of the Gorai.004G210800.1 and Cotton_A_25987 gene in G. raimondii and G. arboreum, respectively. In addition, 15 genes were lost during evolution, among which 4 in A genome (Cotton_A_04626, Cotton_A_25931, Cotton_A_00150, Cotton_A_35231), 11 in D genome (Gorai.001G250300.1, Gorai.005G218500.1, Gorai.005G220100.1, Gorai.007G134300.1, Gorai.007G295300.1, Gorai.008G005700.1, Gorai.009G162900.1, Gorai.009G199900.1, Gorai.009G414800.1, Gorai.012G027800.1, Gorai.008G245900.1).
Chromosomal distribution and syntenic analysis
Physical mapping of the 169 G. hirsutumHMAD genes were showed that 79 and 77 HMA genes were variably distributed on 26 chromosomes of the A and D sub-genomes, respectively (Fig. 2), among which 13 genes localized in scaffold. Additionally, a maximum of 17 and 16 genes were located on the paralogous chromosome 12 of the A sub-genomes and D sub-genomes. Moreover, there were nine pairs and two gene clusters were marked as tandemly duplicated based on the criteria of less than five intervening genes. Among these tandemly duplicated genes, five pairs and two clusters belonged to group Vb except of Gh_D05G1684 - Gh_D05G1685 and Gh_A05G1510 - Gh_A05G1511pairs, which belonged to group III. To study the locus relationship of orthologs between the A and D sub-genomes, we also performed synteny analysis. The result of synteny analysis indicated that most of the HMAD gene loci were highly conserved between the A and D sub-genomes (Fig. 3) respectively. We also found that the HMAD genes located on A02 and A03 chromosomes while their corresponding orthologs were located on D03 and D02 (Table 2), respectively. These results are consistant with the previous research [53], which might be due to the chromosomal translocation between Chr02 and Chr03 before cotton polyploidization forming an allotetraploid [53].
Analysis of gene structure and conserved motifs
Gene structure is important to determine its role in showing the phylogenetic relation between the HMAD genes. A NJ tree was generated with MEGA using all the HMAD protein sequences from G. hirsutum and gene structure were determined (Fig. 4). Though the number of genes used for generating this phylogenetic tree was different from the phylogenetic tree shown in Figure 1, the gene members within the subclades were nearly same. As shown in the Fig. 4, HMAD genes from G. hirsutum were divided into five subclades (group I, group II, group III, group IV, group Va and group Vb, among which, group I contained 13 genes while group II to group Va and group Vb contained 66, 29, 14, 22 and 25 genes, respectively. Furthermore, the analysis of gene structure showed that just 4 genes (Gh_D01G1640, Gh_Sca011408G01, Gh_A05G3385 of group I and Gh_A08G0990 of group Vb) contained no intron.
To investigate the presence of domain sequence and the degree of conservation of the HMAD domain in G. hirsutum, we performed multiple sequence alignment. The result of different HMAD protein groups indicated that a highly conserved motif presence in the HMAD domain, as the metal binding, with two conserved cysteines. Consistent with previous studies [54, 55], an anion binding box (CXXC) was found in the catalytic centre (Fig. 5a).
Based on the Ka/Ks ratio, it can be assumed that Darwinian positive selection was linked with the HMAD gene divergence after duplication [56, 57]. In our study, we found that 79 genes pairs had low Ka/Ks ratios (smaller than 0.5) and 24 gene pairs had the Ka/Ks ratios between 0.5 and 1.0. And 13 genes pairs had Ka/Ks larger than 1, might be due to relatively rapid evolution following duplication (Table 2). As most of the Ka/Ks ratios were smaller than 1.0, we presumed that the cotton HMAD gene family had undergone strong purifying selection pressure with limited functional divergence that occurred after segmental duplications and whole genome duplication (WGD).
Expression profile of HMAD genes across different tissues and different stress condition in TM-1
To understand the temporal and spatial expression patterns of different HMAD genes, a publicly deposited RNA-seq data was used to assess the expression profile across different tissues and developmental stages. Results showed that HMAD genes were not widely expressed in tissues as well as under stress condition (cold, salt, PEG), indicating their critical role in different tissues and stress condition. Gh_D03G0414 highly expressed in root and 1DPA, and Gh_D04G0001 and Gh_Sca013298G01 highly expressed in the petals and stamens. Gh_D09G0521 highly expressed in the ovule and in pistil, especially in 3DPA, 5DPA and 35DPA. Gh_D05G1684 highly expressed in the 10 DPA in fiber. Gh_A01G1399, Gh_D01G1640, Gh_Sca011408G01 highly expressed in the ovule after -1DPA.
Interestingly, we found that some HMAD genes highly expressed under stress condition. For example, Gh_D08G0132 and Gh_A05G1510 highly expressed after 12 hours of the salt stress condition, while Gh_A01G1576 highly expressed after 1 hours of the stress condition (cold, salt, PEG). Gh_A09G1374, Gh_D09G1375, Gh_D10G0078 expression level increased under stress condition (cold, salt, PEG). Even Gh_A08G1780, Gh_D08G2126 highly expressed not only in torus, ovule and fiber after 25DPA, but also after 12 hours under stress condition.
Core promoter element analysis
To further explore why HMAD gene family highly expressed under biotic stress condition except heavy metal, the core promoter element of HMAD genes from G. hirsutum were divided into four types (hormone, stress, tissue and others) (Fig. 8), among which, element involved in hormone-responsiveness mainly contained ABA (abscisic acid), GA (gibberellins), IAA/auxin, SA (salicylic acid), MeJA (Methyl jasmonate). Element involved in defense and stress responsiveness mainly contained drought, low-temperature,dehydration, salt stress, anaerobic, among which, 72 genes involved in drought, 51 genes involved in low-temperature responsiveness, 55 genes involved in defense and stress responsiveness with TC-rich repeats element, and 1 gene (Gh_D04G1066) both involved in salt and low-temperature responsiveness. In total, there were 111 genes of 169 HMAD genes with core promoter element responding to stress. As described above in TM-1 RNA-seq data, 12 of the 18 genes were highly expressed with at least one abiotic stress-related promoter element (Table S1). Element involved in tissues including the palisade mesophyll cells, meristem, endosperm, seed-specific. And element involved in other’s function, such ascircadian control, cell cycle, flavonoid biosynthetic. It’s interesting that 9 of 12 genes with element of flavonoid biosynthetic were along with other’s stress element. In previous study, anthocyanins, as secondary metabolites, may respond to stress resistance through osmotic equilibrium [58, 59, 60]. For example, Gh_A01G1576 highly expressed after 1 hours of the stress condition (cold, salt, PEG), whose core promoter element contained drought-inducibility, low-temperature responsiveness and MYB binding site involved in flavonoid biosynthetic genes regulation with MBSI promoter element.
The expression level of HMAD gene in different tissues under Na2SO4 stress
To further identify the function of HMAD genes under other abiotic stress condition, we take advantage of the material Zhong 9835 which was more resistant to Na2SO4 than TM-1. Based on the HMAD gene family of RNA-seq data (Fig. 8) in Zhong 9835 (Table S5), 14 genes significantly expressed differentially in roots, stems and leaves between control and treatment with 300mM Na2SO4 (Fig. S1), in which 10 genes with at least one core promoter element about stress (Table S1). It is interesting to note that 3 of 4 flavonoid biosynthetic element were along with the stress element. More important, some genes highly expressed in both TM-1 and Zhong 9835 under stress condition, such as Gh_D04G0145, Gh_D10G0078, Gh_Sca011408G01, Gh_A01G1576 and so on.