Identification of CBF family genes in the cotton genome
The availability of the whole sequences for the seven cotton species enabled us to identify the CBF proteins harbored in their genome. The Pfam domain PF00847 was used as the query to obtain the CBF proteins, and finally 29 members of the CBF genes were identified in G. herbaceum, 28 members in G. arboreum, 25 members in G. thurberi, 21 members in G. raimondii, 30 members in G. turneri, G. longicalyx had 26 members while G. australe had 15 CBF genes. Three representative cotton species from these seven species were then chosen for further detailed analysis: G. herbaceum, G. thurberi and G. australe. The CBF CDS length in G. herbaceum ranged from 306 bp to 1,230 bp, while for G. australe, CDS ranged from 429 bp to 1,077 bp. In the analysis of the physiochemical properties of the CBF proteins, the results show a great difference, for instance, the CBF proteins obtained from the G. herbaceum, their molecular weights (MW) ranged from 11,241.88 Da to 39,020.73 Da; isoelectric value (pI) ranged from 5.26 to 10.27. While in G. thurberi, the MW of the proteins encoded by the CBF genes ranged from 16.17923kDa to 45.59153kDa, the pI ranged from 5.11 to 10.83. Similarly in G. australe, it ranged from 15.67636kDa to 40.86796kDa. The CBF genes differed substantially by the encoded protein size and its biophysical properties (Table S1).
Phylogenetic analysis of cotton CBF gene family
In order to determine the phylogenetic relationship of the CBF proteins, we constructed a phylogenetic tree by MEGA7.0, using the Neighbor-joining (NJ) method with minimal evolution and maximum parsimony. The CBF proteins were clustered into 7 clades and designated as clade 1 to 7 (Fig. 1). Clade 1 contained 61 CBF protein sequences at most, while clade 3 contains only 7 CBF amino acid sequences. Consistent with previous classification, all of the Arabidopsis CBFs were distributed among the clade 6 [2,24,29]. Except G. australe did not appear in clade 3, the other cotton species were distributed in all 7 groups.
Chromosomal mapping, Gene structure and C-terminal conserved motifs analysis
All the genes located on various chromosomes in the three cotton genomes, were named according to their position on the chromosome. In G. herbaceum a member of the diploid type of the A genome, the CBF genes were mapped on all the chromosomes, except chromosome Chr01 which harbored no CBF genes. The highest gene loci were observed in chromosome Chr05 and Chr07 with 5 and 6 genes respectively, while the lowest gene loci was observed in chromosome Chr04,06,08 and Chro09 with a single gene locus in each (Figure 2A). In the diploid of the D genome, G. thurberi, chromosome Gthu_1, Gthu_8 and Gthu_9 harbored no genes, while chromosomes, Gthu_5, Gthu_7, and Gthu_12 had more gene loci, however, the highest gene loci was noted in chromosome Gthu_05, 07 and 12 with 4, 6 and 5 CBF genes respectively, similarly lowest gene loci was noted on chromosome Gthu_3, 6 and 13 with a single gene locus each (Fig 2B). Finally, in G. australe diploid cotton of the G genome, no CBF genes were found in chromosome G6, G9, G11 and G12, but the highest gene loci was only observed in chromosome G7 with 5 genes (Figure 2C).
In order to analyze the motifs, we employed MEME to detect conserved motifs in the CBF family. There were 10 conserved motifs distributed in each CBF family (Figure 3A). Almost all CBF proteins had the same number of motifs; three motifs were prominent among the CBF proteins, motifs 1, 2, and 4. Analyzing the arrangement of exons and introns can provide important insights into the evolution of gene families [13]. To study the exon/intron structure of the CBF gene, the CDS and the genome sequence were compared. The results showed that most of G. herbaceum and G. australe harbored no intron, and indication that the genes were highly conserved, but for the G. thurberi the least number of exons were two but others contained as higher number (Figure 3B-E). Basically members of the same group phylogenetically harbored similar number of intron-exon ratio. Stress regulates the expression of primary and specialized metabolism genes at the transcriptional level via transcription factors binding to specific cis-elements, a number of cis-regulatory element were found to be associated with the various cotton CBF genes, for instance MYB, ABA-responsive element (ABRE), long term repeat (LTR) among others (Figure 3F), the MYB, ABRE are among the top ranked stress responsive cis-regulatory elements [33]. The detection of myriad of cis-regulatory element reveals the significant role played by the members of the cotton CBF genes in enhancing stress tolerance, more so cold stress in plants.
In determining the possible cellular sublocalization of the proteins encoded by CBF genes, an online tool wolf sport was employed. Among the three cotton species, the highest proportions of CBF proteins were embedded in the nucleus. Moreover, the nucleus is the central regulator of the cellular activities. However, other signals were observed in other cellular compartments, such as the chloroplast, mitochondria, plasma membrane, vacuole membrane and chloroplast (Table S2)
RNA-seq analysis and RT-qPCR validation of the CBF genes under cold stress conditions G. thurberi transcriptome data was used to analyze the expression patterns of the 24 CBF genes under cold stress (Figure 4A). Among them, only 17 genes were differentially expressed under cold stress. According to G. thurberi transcriptome data (previous work by the research team, yet to be published), 12 differentially expressed CBF genes were selected, and 15 and 13 genes were selected from the two cotton varieties G. herbaceum and G. australe based on the RNA sequence data. RT-qPCR was employed to analyze the expression pattern; most genes were up-regulated in the three cotton species (Figure 4B). Among the 12 genes in G. thurberi, 10 genes exhibited significantly higher upregulation, with only 2 genes being down-regulated. The expression trend was consistent with the transcriptome data. In G. herbaceum, 9 genes were up-regulated, and 6 were down-regulated. In G. australe, 8 were up-regulated and 5 were down-regulated. The higher level of upregulation among the CBF genes across the various cotton types is an indicator that the proteins encoded by the CBF genes could be playing an integral role in enhancing cold acclimatization in cotton plants. Moreover, by integrating the transcriptome data and RT-qPCR result; we selected one of the highly up regulated CBF genes, GthCBF12.5 (GthCBF4) for further validation.
Experimental validation of subcellular localization of cotton CBF proteins
The results showed that the none pCAMBIA2300-eGFP-Flag-GthCBF4 infused plants showed green fluorescence signals on both the nucleus and cell membrane, while the fusion protein pCAMBIA2300-eGFP-Flag-GthCBF4 only had green fluorescence signals in the nucleus (Figure 5), indicating that the protein encoded by the gene was localized in the nucleus. The results were in agreement to the bioinformatics prediction of the possible cell compartmentalization of the CBF proteins.
Phenotype and cell damage identification of GthCBF4 overexpressed Arabidopsis under low temperature.
Two highly expression lines OE-1 and OE-3 were selected for phenotypic evaluation (Figure 6A), phenotypically, the GthCBF4 overexpressed lines (OE-1 and OE-3) showed no significant difference with the wild types under normal conditions, but when the plants were exposed to cold stress, the survival of the WT were significantly reduced, while the OE-1 and OE-3 showed higher level of survival (Figure 6B), in which the OE plants survival rate was estimated at 60% compared to the WT with only 2% survival rate (Figure 6C). in determining the expression levels of the GthCBF4 overexpressed gene, at 0h, the WT showed no expression, but at 1h and 3h of cold stress exposure, the CBF genes were partially inducted but insignificantly, slightly below one, while the reverse was observed among the OE plants, The expression levels of the GthCBF4 overexpressed gene was significantly high, with 0h showing expression levels close to three folds, while at 1h and 3h, the expression levels of the GthCBF4 overexpressed gene was above four folds (Figure 6D). .Under normal conditions, the stained blue area on the leaves of the transgenic overexpression plants and wild-type plants was very small. While under cold treatment, the blue areas on transgenic leaves were significantly smaller than the wild-type, the color depth was also lighter. Furthermore, the DAB staining method was used to reflect the accumulation of H2O2 in Arabidopsis leaves. The accumulation of H2O2 in the transgenic overexpression leaves and the wild-type were very low under normal conditions, and the production of brown matter was hardly seen. But after cold treatment, the brown area on the wild type Arabidopsis leaves was obviously larger than the wild type, and the color depth was also deeper (Figure 6E). The results showed that the overexpression of the CBF gene in the transgenic lines improved the ability of the plants to oxidize the oxidative agents such as the H2O2 thereby reducing the levels of oxidative damage in the plants.
Evaluation of physio-morphological traits in GthCBF4 Overexpressed and wild type plants under low temperature environment
Germination of the OE and WT lines showed no significant differences under normal or controlled conditions, however, under cold stress, none of the WT germinated while the OE lines showed some level of germination (Figure 7A-B), an indication that the OE lines were significantly improved and were able to adapt to cold stress condition. In the evaluation of the root lengths, no significant differences was observed under controlled conditions, but under cold stress conditions, the OE lines showed higher root lengths compared to the WT (Figure 7C-D), thus the overexpression of the CBF genes could be playing a role either in the rate of cell division and or cell elongation. We further evaluated known stress responsive genes such as the COR15A、 RD29A、 KIN1 and COR47 [17]. The OE lines significantly showed higher expression levels to all the stress responsive genes profiled (Figure 7E). The high induction levels of the stress responsive genes in the OE lines, showed that the overexpressed genes do not suppress the expression of the stress responsive genes but do promote their expression, an indication that the overexpressed gene could be playing a vital role in enhancing cold stress tolerance in plants.