Identification of GhIQD gene family members in G. hirsutum L.
In order to identify IQD gene family members in G. hirsutum L., 36 Arabidopsis IQD protein sequences were used as queries to search the allotetraploid cotton genome database. After the further selection of conserved domains, a total of 102 IQD genes from the G. hirsutum L. genome were identified as members of the GhIQD gene family. The chromosomal locations of the GhIQD genes were then determined using the allotetraploid cotton genome information [25]. As a result, all of the GhIQD genes were mapped to the 26 chromosomes and named GhIQDA01.1 to GhIQDD13.12 based on their relative positions on the chromosomes (Table 1, Fig. 1). The number of amino acids (aa) in the predicted GhIQD protein sequences ranged from 120 (GhIQDA02.3) to 900 aa (GhIQDA05.6) with an average length of 458 aa, and the open reading frames (ORFs) ranged from 363 base pairs (bp) to 2703 bp with an average length of 1377 bp. The molecular weights (MWs) of the proteins encoded by these proteins varied from 13,689.84 Daltons (Da) (GhIQDA02.3) to 99,360.68 Da (GhIQDD05.6), with an average MW of 51,151.02 Da. Based on isoelectric point (pI) analysis, the calculated pIs of the 96 GhIQD genes were >7.0 (with an average of 10.27), whereas six GhIQD genes were predicted to encode proteins with pIs <7.0 (average of 6.05), including GhIQDD06.1, GhIQDA01.2, GhIQDA06.1, GhIQDD01.2, GhIQDD05.6, and GhIQDA05.6. The predicted subcellular localizations showed that 82 GhIQD proteins localize to the nucleus, nine GhIQD proteins localize to the mitochondria, nine GhIQD proteins localize to the chloroplasts, and two GhIQD proteins were found to localize to the endoplasmic reticulum (ER) (Table 1).
IQD proteins are reported to specifically bind to calcium via CaM-binding sites[26]. To better explore the biological functions of GhIQD proteins, the CaM-binding sites of the GhIQD proteins were predicted using the online Calmodulin Target Database software. As a result, GhIQD proteins are predicted to contain CaM-binding sites. Multiple consecutive strings of amino acid residues with scores >7 are given in Additional file 1. This result suggests that all GhIQD proteins contain CaM-binding sites with 1-3 strings of high-scoring amino acid residues. Based on the whole-genome duplication analysis in G. hirsutum L., 5926 (8.14%) and 55707 (76.56%) genes originated from tandem and segmental duplication, respectively. Therefore, we investigated the role of duplication events in the evolution of GhIQD genes. As shown in Fig. 2, among the 102 GhIQD genes identified in G. hirsutum L., 100 (98.04%) were derived from segmental duplication events, and only two genes (GhIQDA13.3 and GhIQDD13.3) resulted from proximal duplication. In contrast, none of the GhIQD genes was found to have arisen from tandem duplication events (Additional files 2 and 3). These results indicate that segmental duplication is the main driving force in the expansion of the GhIQD genes.
Phylogenetic analysis of GhIQD proteins
To examine the molecular evolutionary relationships among plant IQD proteins, the amino acid sequences of the IQD proteins from Arabidopsis, tomato, soybean, and G. hirsutum L. were used in a phylogenetic analysis. As shown in Fig. 3, a phylogenetic tree was constructed with the Neighbor-Joining (NJ) method from an alignment of all complete IQD protein sequences. The NJ tree showed that the IQD proteins group into seven clusters (Ia, Ib, Ic, II, IIIa, IIIb and IV).
A Ka/Ks ratio 1> indicates that paralogous gene pairs were produced by positive selection, a ratio <1 indicates that paralogous gene pairs were under purifying selection and a ratio equal to 1 indicates that paralogous gene pairs were not subjected to selection pressure [27]. To explore the type selection pressure experienced by the duplicated GhIQD genes, paralogous GhIQD gene pairs were used to calculate synonymous (Ks) and non-synonymous (Ka) substitution rates to assess the ratio of non-snonymous to synonymous substitutions. As shown in Fig. 3, 50 paralogous gene pairs were identified. The Ka/Ks ratios of 48 members were <1.0, and the Ka/Ks ratios for the remaning two paralogous gene pairs were >1 (Additional file 4), suggesting that the GhIQD paralogous gene pairs were mainly produced by purifying selection.
Expression profiling of GhIQD genes during fiber development
Gene expression profiling can provide us with clues about the possible biological functions of genes. Therefore, we analyzed the gene expression profiles of GhIQD genes using the transcriptome data downloaded from the publicly available CottonFGD database. As shown in Fig. 4, a total of 89 GhIQD genes are expressed during the developmental process in fiber cells. Based on the heatmap, six clusters of GhIQD genes are predominately expressed in cotton fiber cells (Fig. 4a). In detail, the 13 GhIQD genes in cluster 2 were highly expressed in fiber cells at 5 days post anthesis (dpa); the expression levels of 17 GhIQD family members in cluster 5 were up-regulated in the 10 dpa samples; 21 genes in cluster 3 were significantly expressed in fibers at 20 dpa; genes in cluster 1 with 21 members were highly expressed in 25 dpa fiber cells; in cluster 4, the transcripts of seven genes were abundant in fibers at 20 and 25 dpa; and the remaning 10 GhIQDs in cluster 6 were highly expressed in the 5 dpa and 25 dpa samples (Fig. 4b). These results imply that GhIQD genes may function in fiber cell development in cotton.
Tissue-specific expression analysis of GhIQD genes by quantitative real time-PCR (qRT-PCR)
The GhIQD gene family has 102 members; of these, 20 genes were randomly selected to investigate their expression patterns in different tissues, including the calyx, leaf, stigma, stem, root, petal, pollen, and hypocotyl. As shown in Fig. 5, GhIQDD12.1, GhIQDA13.1, and GhIQDD13.1 were predominantly expressed in pollen (Fig. 5d), indicating that these genes may play pivotal roles in pollen development. GhIQDD01.3. GhIQDD01.2, and GhIQDD05.2 showed stem-specific expression (Fig. 5b), and GhIQDA01.1, GhIQDA05.2, and GhIQDA08.1 were expressed preferentially in leaves (Fig. 5a). The GhIQDA06.1, GhIQDD06.1, and GhIQDD09.1 genes showed higher expression levels in leaves and stems (Fig. 5c). Most genes investigated were abundantly expressed in different tissues (Fig. 5e), as observed for GhIQDA02.1 and GhIQDD02.1 that were highly expressed in all tissues with similar expression patterns. The cluster II genes, GhIQDA12.3 and GhIQDD12.4, were highly expressed in the leaf, petal, and hypocotyl (Fig. 5e).
Expression profiling of GhIQD genes in response to MeJA treatment
According to previous studies, the expression of most IQD genes can be induced by MeJA stress in plants [22] In this study, the expression patterns of GhIQD genes in plants exposed to MeJA treatment were examined in an qRT-PCR experiment. The results showed that the expression levels of the 20 randomly-selected GhIQD genes were significantly increased by MeJA treatment (Fig. 6). As the time of treatment increased, the transcript levels for most genes increased significantly. In detail, the expression levels of GhIQDD01.3, GhIQDA09.3, GhIQDD02.1. GhIQDD05.2, GhIQDD09.1, GhIQDD13.1 and GhIQDA13.1 were induced from 0 h, with the highest expression levels detected at 12 and 24 h after the MeJA treatment. The GhIQDA06.1, GhIQDD06.1, and GhIQDD09.4 genes also exhibited the highest expression levels at 24 h after the MeJA treatment. The expression levels of GhIQDD01.2, GhIQDA13.4, GhIQD12.4, GhIQDD12.1, GhIQDA12.3, GhIQDA08.1, GhIQDA01.1 and GhIQDA02.1 peaked at 6 h after the treatment. Compared with the other genes, the maximum expression of GhIQDA05.5 occurred at 72 h. These results showed that the GhIQD genes are widely induced by MeJA treatment.