Identification of the Cotton LHC proteins
109 proteins translated by the LHC genes were recognized in the three sequenced cotton genomes, with 55, 27, and 27 proteins in G. hirsutum (AD), G. raimondii (D) and G. arboreum (A), respectively (Table S2). The amounts of the proteins found in the LHC genes in the two diploid cotton species, G. raimondii and G. arboreum were less by one compare with the number of LHC proteins in G. hirsutum, may be due to AD emerged in the whole genome duplications between A and D genomes.
The evaluation of the physicochemical properties G. hirsutum of the Chloro a/b binding protein genes, the protein lengths for the G. hirsutum proteins stretched from 62 aa to 644 aa, molecular weights reached from 6.88 kDa to 72.66 kDa were scored respectively in Gh_Sca017783G01 and Gh_A02G1068, a charge ranged from − 8.5 (Gh_A01G0519) to 7(Gh_A02G1068), the isoelectric point (pI) ranged from 4.701 (Gh_D06G2350) to 10.228 (Gh_D04G1505) and finally the grand average of hydropathy (GRAVY) ranged from − 0.529 (Gh_A01G0519) to 0.233 (Gh_D06G2350) (Table 1).
Table 1
Physiochemical properties of LHC proteins in G. hirsutum, G. arboreum and G. raimondii Species
Gene ID
|
PL (aa)
|
MW(KDa)
|
Charge
|
IEP
|
GRAVY
|
Gh_D11G1504
|
272
|
29.46
|
4.5
|
9.273
|
-0.089
|
Gh_A05G2108
|
291
|
30.991
|
-2.5
|
5.416
|
-0.005
|
Gh_A06G1447
|
279
|
30.514
|
-1.5
|
5.793
|
-0.002
|
Gh_A01G0976
|
268
|
29.312
|
5
|
7.942
|
0.051
|
Gh_A10G0361
|
264
|
28.182
|
-4.5
|
4.904
|
-0.004
|
Gh_A04G0961
|
260
|
27.85
|
4
|
9.316
|
0.053
|
Gh_D07G1663
|
265
|
28.609
|
-3.5
|
5.084
|
-0.001
|
Gh_D07G1659
|
265
|
28.609
|
-3.5
|
5.084
|
-0.001
|
Gh_A07G2182
|
265
|
28.608
|
-2.5
|
5.335
|
-0.001
|
Gh_D06G1791
|
282
|
30.791
|
-1.5
|
5.793
|
-0.02
|
Gh_A04G0218
|
262
|
28.483
|
-4.5
|
4.89
|
0.023
|
Gh_D01G1508
|
291
|
31.443
|
0
|
6.504
|
0.11
|
Gh_D05G1429
|
265
|
28.101
|
-2.5
|
5.329
|
0.027
|
Gh_D02G1996
|
259
|
27.33
|
2
|
8.425
|
0.126
|
Gh_D05G3484
|
262
|
28.449
|
-4.5
|
4.89
|
0.026
|
Gh_D01G1028
|
268
|
29.391
|
6
|
8.203
|
0.01
|
Gh_A01G1349
|
297
|
32.139
|
1.5
|
7.004
|
0.097
|
Gh_A11G2259
|
166
|
18.205
|
2
|
8.741
|
0.07
|
Gh_A07G2184
|
265
|
28.764
|
-1.5
|
5.756
|
-0.036
|
Gh_D01G2232
|
285
|
31.101
|
-3
|
5.378
|
-0.048
|
Gh_D04G1505
|
240
|
25.65
|
9
|
10.228
|
0.037
|
Gh_A07G1725
|
265
|
28.407
|
-3.5
|
5.071
|
0.036
|
Gh_D01G0531
|
261
|
28.405
|
-4.5
|
4.89
|
0.011
|
Gh_D12G1495
|
304
|
33.898
|
7
|
8.922
|
0.037
|
Gh_A10G0616
|
285
|
30.716
|
-1.5
|
5.805
|
0.021
|
Gh_D07G0661
|
252
|
27.857
|
2.5
|
7.551
|
-0.128
|
Gh_D05G2361
|
291
|
31.003
|
-2.5
|
5.416
|
0.013
|
Gh_A10G2108
|
273
|
29.857
|
-1.5
|
5.793
|
-0.059
|
Gh_A07G0594
|
252
|
27.841
|
2.5
|
7.551
|
-0.118
|
Gh_A12G1617
|
252
|
27.946
|
3.5
|
7.968
|
-0.117
|
Gh_D10G0369
|
264
|
28.21
|
-3.5
|
5.085
|
-0.008
|
Gh_D07G1929
|
265
|
28.407
|
-3.5
|
5.071
|
0.036
|
Gh_D05G0860
|
247
|
26.797
|
0.5
|
6.675
|
-0.116
|
Gh_A13G0222
|
246
|
26.76
|
0.5
|
6.675
|
-0.164
|
Gh_A05G1261
|
265
|
28.101
|
-2.5
|
5.329
|
0.027
|
Gh_D12G1757
|
252
|
27.953
|
2.5
|
7.5
|
-0.124
|
Gh_D10G0784
|
288
|
31.051
|
-0.5
|
6.294
|
-0.022
|
Gh_A01G1972
|
285
|
31.107
|
-3
|
5.378
|
-0.071
|
Gh_D13G0236
|
246
|
26.744
|
0.5
|
6.675
|
-0.153
|
Gh_A05G0725
|
247
|
26.797
|
0.5
|
6.675
|
-0.116
|
Gh_D03G0610
|
349
|
38.293
|
2.5
|
7.543
|
0.084
|
Gh_D10G2385
|
273
|
29.883
|
-1.5
|
5.789
|
-0.073
|
Gh_Sca123119G01
|
90
|
9.881
|
-0.5
|
5.795
|
-0.397
|
Gh_Sca053293G01
|
143
|
15.811
|
-3
|
5.138
|
0.158
|
Gh_Sca017783G01
|
62
|
6.884
|
-1
|
4.879
|
0.105
|
Gh_D06G2350
|
192
|
20.154
|
-4
|
4.701
|
0.233
|
Gh_D06G2351
|
265
|
28.165
|
-2.5
|
5.329
|
0.014
|
Gh_A07G2366
|
281
|
31.181
|
-1
|
6.128
|
0.064
|
Gh_A03G2154
|
259
|
27.423
|
2
|
8.428
|
0.107
|
Gh_D07G0125
|
261
|
28.763
|
-2
|
5.7
|
-0.029
|
Gh_A11G1357
|
272
|
29.45
|
4.5
|
9.273
|
-0.086
|
Gh_A13G1282
|
166
|
18.169
|
2.5
|
8.417
|
0.051
|
Gh_D06G2120
|
151
|
16.501
|
0.5
|
7.256
|
-0.041
|
Gh_A01G0519
|
466
|
51.35
|
-8.5
|
5.178
|
-0.529
|
Gh_A02G1068
|
644
|
72.664
|
7
|
7.727
|
-0.2
|
Ga01G0731
|
482
|
53.372
|
-6
|
5.503
|
-0.377
|
Ga01G1437
|
265
|
28.978
|
2.5
|
7.705
|
0.035
|
Ga02G0756
|
610
|
68.741
|
9
|
8.188
|
-0.241
|
Ga02G1050
|
270
|
29.006
|
2
|
7.45
|
0.008
|
Ga03G2284
|
228
|
24.599
|
-1.5
|
5.716
|
0.084
|
Ga04G1033
|
114
|
12.823
|
5.5
|
9.897
|
-0.255
|
Ga05G0924
|
247
|
26.797
|
0.5
|
6.675
|
-0.116
|
Ga05G1596
|
265
|
28.101
|
-2.5
|
5.329
|
0.027
|
Ga05G2647
|
291
|
30.991
|
-2.5
|
5.416
|
-0.005
|
Ga05G4015
|
262
|
28.449
|
-4.5
|
4.89
|
0.026
|
Ga05G4018
|
262
|
28.495
|
-4.5
|
4.89
|
0.042
|
Ga06G2006
|
282
|
30.819
|
-1.5
|
5.793
|
-0.011
|
Ga06G2455
|
167
|
18.453
|
4
|
8.634
|
-0.057
|
Ga07G0172
|
261
|
28.86
|
1.5
|
7.045
|
-0.048
|
Ga07G0768
|
252
|
27.841
|
2.5
|
7.551
|
-0.118
|
Ga07G1916
|
265
|
28.665
|
-2.5
|
5.335
|
-0.02
|
Ga07G1918
|
265
|
28.607
|
-0.5
|
6.271
|
-0.002
|
Ga07G2205
|
265
|
28.407
|
-3.5
|
5.071
|
0.036
|
Ga10G0035
|
273
|
29.897
|
-1.5
|
5.793
|
-0.073
|
Ga10G2236
|
271
|
29.226
|
-1.5
|
5.791
|
0.043
|
Ga10G2674
|
264
|
28.182
|
-4.5
|
4.904
|
-0.004
|
Ga11G2486
|
272
|
29.477
|
4.5
|
9.273
|
-0.096
|
Ga12G1052
|
252
|
27.946
|
3.5
|
7.968
|
-0.117
|
Ga12G1366
|
300
|
32.84
|
6.5
|
8.712
|
-0.079
|
Ga13G0254
|
200
|
21.273
|
3.5
|
9.544
|
0.167
|
Ga13G0268
|
246
|
26.76
|
0.5
|
6.675
|
-0.164
|
Ga14G0061
|
285
|
31.03
|
-2
|
5.724
|
-0.083
|
Gorai.001G016400
|
261
|
28.749
|
-2
|
5.696
|
-0.029
|
Gorai.001G074300
|
252
|
27.857
|
2.5
|
7.551
|
-0.128
|
Gorai.001G192000
|
265
|
28.575
|
-3.5
|
5.084
|
0.003
|
Gorai.001G192300
|
265
|
28.609
|
-3.5
|
5.084
|
-0.001
|
Gorai.001G220900
|
265
|
28.393
|
-3.5
|
5.071
|
0.035
|
Gorai.002G076400
|
262
|
28.505
|
-4.5
|
4.89
|
0.027
|
Gorai.002G132100
|
268
|
29.357
|
6
|
8.203
|
0.016
|
Gorai.002G183400
|
292
|
31.461
|
-1
|
6.114
|
0.12
|
Gorai.002G263900
|
285
|
31.144
|
-2
|
5.73
|
-0.077
|
Gorai.003G092700
|
349
|
38.267
|
2.5
|
7.493
|
0.081
|
Gorai.005G219000
|
259
|
27.33
|
2
|
8.425
|
0.126
|
Gorai.007G163200
|
285
|
31.695
|
7.5
|
9.296
|
-0.249
|
Gorai.008G165200
|
313
|
34.49
|
6
|
8.855
|
0.071
|
Gorai.008G194000
|
252
|
27.983
|
2.5
|
7.5
|
-0.114
|
Gorai.009G090600
|
247
|
26.821
|
1
|
6.79
|
-0.155
|
Gorai.009G156900
|
265
|
28.101
|
-2.5
|
5.329
|
0.027
|
Gorai.009G262000
|
291
|
30.991
|
-2.5
|
5.416
|
-0.005
|
Gorai.009G430800
|
262
|
28.479
|
-4.5
|
4.897
|
0.016
|
Gorai.010G165000
|
192
|
20.226
|
-4
|
4.701
|
0.244
|
Gorai.010G165100
|
265
|
28.165
|
-2.5
|
5.329
|
0.014
|
Gorai.010G198600
|
282
|
30.764
|
-2.5
|
5.391
|
-0.008
|
Gorai.010G239300
|
151
|
16.55
|
0.5
|
7.254
|
-0.026
|
Gorai.011G041600
|
264
|
28.21
|
-3.5
|
5.085
|
-0.008
|
Gorai.011G089000
|
217
|
23.895
|
0.5
|
7.235
|
0.048
|
Gorai.011G285900
|
273
|
29.883
|
-1.5
|
5.789
|
-0.073
|
Gorai.012G141200
|
260
|
27.805
|
3
|
8.943
|
0.055
|
Gorai.013G026000
|
246
|
26.744
|
0.5
|
6.675
|
-0.153
|
In the two diploid cotton species, the G. arboreum and G. raimondii LHC proteins physiochemical properties exhibited slight differences, in molecular weights, protein lengths, pI, molecular charge, and GRAVY values. The protein length stretched from 114 aa to 610 aa, and 151 aa to 349, molecular weights ranged from 12.823 to 68.741 KDa, and 16.55 to 38.267 KDa by a charge range of − 6 to 9 and − 4.5 to 7.5 in G. arboreum and G. raimondii, respectively (Table 1).
On the other hand, the values for pI and GRAVY was almost the same, pI ranges from 4.87 to 9.897, and 4.701 to 9.296, GRAVY − 0.377 to 0.167 and − 0.249 to 0.244 in order of G. arboreum and G. raimondii. In all cotton species, the GRAVY value was lower (positive and negative), which indicates all proteins may be a sign of the likelihood of enhanced relations with water that leads to hydrophilic nature.
Phylogenetic Tree and Synteny block Analysis of the Cotton LHC Proteins
The phylogenetic tree constructed grouped the cotton Light-Harvesting Chloro a/b binding proteins together with other plants into 12 clades. Numerous homolog gene pairs were formed among the several proteins encrypted by the cotton Light-Harvesting Chloro a/b binding genes (Fig. 1A).
The collinearity analysis among the three cotton species was analyzed, in which Circle gene viewer was applied to distinguish the collinear gene pairs with TBtools software (Chen et al., 2018). Finally, the collinearity analysis between the genetic map of At and Dt Subgenomes of G. hirsutum, G. arboreum and G. raimondii for their A Vs D; A vs At, and finally between D Vs Dt Subgenome relationships were observed. We found good collinearity between A vs D with 23 genes, A vs At with 20 genes, and finally between D vs Dt with 23 genes in the Subgenome (Fig. 1B).
Gene Ontology Analysis
Gene Ontology (GO) has a structure that allows powerful comparisons and inferences about gene functions in biological, cellular, and molecular levels (Gene & Consortium, 2000). Presumed functions of 109 genes in the Gossypium Light-Harvesting Chloro a/b-bind gene family, including biological processes (BP), molecular functions (MF), and cellular components (CC) were identified using agriGO online analysis.
In G. hirsutum biological processes (GO: 0008150), the functions included cellular and metabolic processes. Various cellular (GO: 0005575) functions were noted in the cell and cell part. Similarly, in G. arboreum, the biological (GO: 0008150) functions were responsible for stimuli, cellular and metabolic processes. In cellular component (GO: 00055750), the functions were focused on cell, macromolecular complex (Protein), and membrane related issues, whereas in molecular function (GO: 0003674), were related with binding function. In G. raimondii the biological process (GO: 0008150) was coined with cellular and metabolic processes, which is similar to G. hirsutum, whereas in cellular component (GO: 0005575), the function is related to membrane. In both G. hirsutum and G. raimondii, there is no significant GO term in molecular function (Fig. 2).
Gene Structure and Motif Identification of Chloro a/b-bind Proteins
Gene structural study is observed as a likely sign of the evolution of multigene families. To obtain additional evidence into the structural diversity of cotton Light-Harvesting Chloro a/b-bind genes, the exon/intron association in the full-length cDNAs was investigated in contrast with their equivalent genomic DNA sequences of distinct genes in G. hirsutum, and it was found that a higher proportion of the Light-Harvesting Chloro a/b-bind genes and their exons were extremely conserved inside the group. Gene structural diversity is regarded as a possible indicator of the evolution of multigene families. To gain further information into the structural diversity of cotton Light-Harvesting Chloro a/b-bind genes, the exon/intron organization in the full-length cDNAs was analyzed in comparison with their corresponding genomic DNA sequences of individual genes in G. hirsutum, and it was identified that a greater percentage of the Light-Harvesting Chloro a/b-bind genes and their exons were highly conserved within the group.
In the study of the gene structures, some of the Light-Harvesting Chloro a/b-bind gene structures were disturbed by introns. The maximum level of intron disruption of the Chloro a/b-bind gene structures was 11(Gh_A02G1068), 11(Ga02G0756), and 5 (Gorai.003G092700) for G. hirsutum, G. arboreum and G. raimondii, respectively. Light-Harvesting Chloro a/b-bind genes are mostly found with the occurrence of two exons and one intron. The highest number of exons and introns were found in Gh_A02G1068 (12 exons, 11 introns) and Gh_A01G0519 (10 exons, 9 introns). Remarkably, Exons and introns for diverse Light-Harvesting Chloro a/b-bind genes were observed to be dissimilar based on their lengths. For example, 18 genes had to have two exons and one intron and 7 genes three exons by two introns and seven genes with one exon and no intron. (Fig. 3).
On the other hand, in the diploid species, the maximum number of exon/intron were 12 exons, 11 introns (Ga02G0756) and 11 exons, 10 introns (Ga01G0731) in G. arboreum, 6 exons, 5 introns (Gorai.003G092700) and 6 exons, 5 introns (Gorai.009G262000) in G. raimondii¸ respectively. Similarly, the number of genes that have two exons with one intron is seven and ten in G. arboreum and G. raimondii. Genes with three exons and two introns as well as a single exon with no intron were five and three respectively in both species. To explore the structural evolution of LHC proteins, the patterns of motifs were analyzed. A total of 20 different motifs were detected by the MEME analysis (http://meme-suite.org/) in the three Gossypium species (Fig. 4). Based on the identified motifs, motif 3, motif 4 and motif 12 are the conserved motifs in the G. hirsutum, whereas motif 2 and 8 in G. arboreum and while motif 11 and 4 in G. raimondii, respectively.
Chromosomal Mapping Analysis of the Light-Harvesting Chloro a/b binding Genes
The LHC genes were evenly distributed across the various chromosomes of the A2, D5, and (AD)1 cotton genomes. In the tetraploid (AD)1 genome with At Subgenome, the highest gene loci were found on chromosome At01, At05, and At10 with 3 genes, while At03, At08, and At09 chromosomes harbored none. Similarly, in the (AD)1, Dt Subgenome, the highest gene loci were found in Dt07, Dt01, and Dt05 with 5, 4, and 4 genes, respectively, whereas At03, At08, and At09 had zero genes. The rest of the chromosome harbored between 1 to 3 genes (Fig. 5A and B). With the two diploid cotton species, A2 and D5 genomes, the gene distribution arrangement was different, In G. arboreum, the highest gene loci were observed on the chromosome, A205, and A207, with the same 4 genes while in G. raimondii, chromosome D501, D509, and D510 concealed the highest gene loci with 4 genes, respectively, while chromosome A204and D506 harbored none (Fig. 5C and D).
Identification of Cis-regulatory elements
Cis-Acting regulatory elements are important molecular switches involved in the transcriptional regulation of a dynamic network of gene activities controlling various biological processes, including abiotic stress responses, hormone responses, and developmental processes. It encodes the genomic blueprints for coordinating spatiotemporal gene expression programs underlying highly specialized cell functions (Mao et al., 2020). In the plant Care analysis of Cis-regulatory elements ABRE, ARE, MRE, MYB, AT-rich elements, DRE, MBS, Box-4, and ACE were found related to drought stress in the three cotton species (Fig. 6). The major cis-acting elements, such as the ABA-responsive element (ABRE) and the dehydration-responsive element/C-repeat (DRE/CRT), that are a vital part of ABA-dependent and ABA-independent gene expression in osmotic and cold stress responses (Yamaguchi-Shinozaki & Shinozaki, 2005).
Evolution of LHC genes in Gossypium species
The Ks value in gene evolution was not affected by natural selection generally, but Ka does. The Ka/Ks value showed positive, neutral, and negative selection when the value was Ka/Ks > 1, Ka/Ks = 1, and Ka/Ks < 1 respectively (Zhao et al., 2020b). The distributions of Ka, Ks, and Ka/Ks among homologous pairs of Gossypium species were revealed similar results. (Fig. 7, Table S3) The Ka/Ks of GhAt-Ga ranged from 0–0.949034416, while for GhDt-Gr from 0–0.838286204. The Ka/Ks of GhAt-GhDt ranged from 0–0.523637063, whereas the Ka/Ks value of Ga-Gr was 0–0.755930549. In all the pairs, the Ka/Ks value was < 1 which indicated that the gene family was subjected to negative selection. The result suggested that the LHC of G. hirsutum genes derived from G. raimondii and G. arboreum experienced negative selection commands throughout the evolution.
RT-qPCR Validation of Light-Harvesting Chloro a/b binding genes under Water Deficit Conditions
Twenty-seven LHC genes expression profiles were carried out under drought stress conditions in different tissues and varying time intervals. The genes showed differential expression pattern on the tissues analyzed, in root tissues, the highly upregulated genes were Gh_D10G2385, Gh_A13G0222, Gh_A05G0725, Gh_D05G0860, Gh_D07G0661, Gh_D01G1508, Gh_D12G1495, Gh_A07G2182, and Gh_A10G2108, while in the leaf tissues, Gh_A07G2184, Gh_D10G2385, Gh_D05G0860, Gh_D02G1996, Gh_A13G0222, and Gh_A05G0725 showed higher upregulation after 12 h of stress exposure. Similarly, Gh_A13G0222, Gh_D06G1791, and Gh_A06G1447 genes were Up-regulated in stem tissues starting from 6 hours up to 24 hours (Fig. 8).
Most genes were Down-regulated mainly in leaf tissue followed by stem. Genes like Gh_A10G0361, Gh_D10G0369, Gh_A03G2154, and Gh_D03G0610 were Down-regulated in the three tissues of cotton in almost all time points. Generally, many genes were Up-regulated in the root tissue. Gh_A13G0222 (CAB6A) was Up-regulated in all tissue samples and Gh_D10G2385 (LHCB4), Gh_D05G0860 (CAB6A), and Gh_A05G0725(CAB6A) also Up-regulated in Leaf and root tissues under drought stress. A detailed exploration of these genes will offer efficient information on considerate LHC genes in cotton (Gossypium) and its part in drought stress tolerance. Drought effect is first felt at the root zone, and the higher upregulation of various genes in the root tissues is in line with earlier results in which most of the LEA genes were upregulated in the root tissues in relative to leaf and stem tissues during drought stress situation (Magwanga et al., 2018).