Genome-wide Identification and Characterization of CCCH Proteins in Barley
The most updated barley Morex assembly was used for the identification of barley CCCH genes. A multifarious approach was used to obtain the most comprehensive results. First, the AtC3H and OsC3H proteins were used as queries to search against barley proteins as well as the HMM profile of the CCCH domain in Pfam (PF00642). A total of 52 putative CCCH-encoding genes were acquired from the barley genome. The protein sequences of HvC3H candidates were submitted to the Pfam, NCBI-CDD, and SMART online databases to ensure that the putative genes contained the CCCH domain. Only proteins that were verified by at least two databases were retained (Supplementary Table S4). A total of 17 genes were excluded from our dataset because no complete CCCH domain was verified by these online databases. A total of 35 high-confident HvC3H genes with complete open reading frames were identified, accounting for 1.07% of the total annotated protein-coding genes in barley (Table 1). Because there is no standard nomenclature for barley CCCH genes, the candidate HvC3Hs were designated as HvC3H1 to HvC3H35 according to their chromosomal number and location. A BLAST search against the barley ESTs indicated that 32 of the 35 HvC3Hs possessed EST records, which supported the existence of the HvC3Hs. Analysis of the physicochemical properties of HvC3H demonstrated that the amino acid length varied from 211 amino acids (HvC3H25) to 1,456 amino acids (HvC3H4), with an average length of 457.2 amino acids. The pI varied from 5.13 to 10.15, and the MW ranged from 4.501 kDa to 160.371 kDa. All of these CCCH proteins possessed negative GRAVY values (average value: -0.7715), indicating the hydropathic nature of HvC3Hs. Subcellular localization prediction showed that most of these proteins were located in the nucleus (29 HvC3Hs; 82.86%). In addition, three HvC3Hs were located in the chloroplast, and the other three were located in both the nucleus and chloroplast, which was consistent with their localization in Arabidopsis, rice, and wheat [11, 44].
Table 1
Characteristics of CCCH transcription factor gene family in barley.
Gene Name | Gene ID | Protein Length (aa) | Isoelectric Point | Molecular Weight (kDa) | Subcellular Location | Grand Average of Hydropathicity | ESTs Hit |
HvC3H1 | HORVU.MOREX.r2.1HG0072390 | 304 | 9.15 | 34.276 | Nucleus | -1.163 | 11 |
HvC3H2 | HORVU.MOREX.r2.1HG0074950 | 402 | 9.57 | 42.999 | Nucleus | -0.305 | 2 |
HvC3H3 | HORVU.MOREX.r2.1HG0074970 | 327 | 9.42 | 35.232 | Nucleus | -0.389 | 6 |
HvC3H4 | HORVU.MOREX.r2.1HG0078680 | 1456 | 5.13 | 160.373 | Nucleus | -0.84 | 14 |
HvC3H5 | HORVU.MOREX.r2.2HG0091490 | 697 | 6.26 | 73.624 | Chloroplast | -0.384 | 25 |
HvC3H6 | HORVU.MOREX.r2.2HG0140180 | 341 | 10.15 | 38.974 | Nucleus | -1.009 | 22 |
HvC3H7 | HORVU.MOREX.r2.2HG0166340 | 304 | 9.38 | 31.487 | Nucleus | -0.623 | 33 |
HvC3H8 | HORVU.MOREX.r2.2HG0176080 | 695 | 5.49 | 77.804 | Nucleus | -1.149 | 11 |
HvC3H9 | HORVU.MOREX.r2.3HG0196310 | 379 | 7.51 | 40.767 | Nucleus | -0.53 | 57 |
HvC3H10 | HORVU.MOREX.r2.3HG0200320 | 224 | 9.13 | 24.591 | Nucleus | -0.81 | 0 |
HvC3H11 | HORVU.MOREX.r2.3HG0200330 | 232 | 6.17 | 24.956 | Nucleus | -0.616 | 1 |
HvC3H12 | HORVU.MOREX.r2.3HG0210880 | 467 | 7.85 | 49.838 | Nucleus | -0.493 | 53 |
HvC3H13 | HORVU.MOREX.r2.3HG0210900 | 426 | 8.07 | 4.502 | Nucleus | -0.658 | 16 |
HvC3H14 | HORVU.MOREX.r2.3HG0225190 | 500 | 8.67 | 54.722 | Nucleus | -0.648 | 32 |
HvC3H15 | HORVU.MOREX.r2.3HG0228250 | 384 | 8.6 | 42.042 | Nucleus | -0.419 | 9 |
HvC3H16 | HORVU.MOREX.r2.3HG0230880 | 281 | 9.59 | 32.735 | Chloroplast Nucleus | -1.177 | 64 |
HvC3H17 | HORVU.MOREX.r2.3HG0258540 | 435 | 8.82 | 47.4 | Nucleus | -0.564 | 61 |
HvC3H18 | HORVU.MOREX.r2.4HG0318770 | 299 | 8.3 | 32.522 | Nucleus | -0.595 | 2 |
Table 1
Characteristics of CCCH transcription factor gene family in barley (Continued)
Gene Name | Gene ID | Protein Length (aa) | Isoelectric Point | Molecular Weight (kDa) | Subcellular Location | Grand Average of Hydropathicity | ESTs Hit |
HvC3H19 | HORVU.MOREX.r2.4HG0325540 | 326 | 7.14 | 36.231 | Nucleus | -1.003 | 11 |
HvC3H20 | HORVU.MOREX.r2.5HG0362710 | 691 | 9.39 | 78.976 | Nucleus | -1.218 | 5 |
HvC3H21 | HORVU.MOREX.r2.5HG0374920 | 509 | 6.39 | 55.923 | Nucleus | -0.702 | 23 |
HvC3H22 | HORVU.MOREX.r2.5HG0377520 | 442 | 8.78 | 47.516 | Nucleus | -0.504 | 20 |
HvC3H23 | HORVU.MOREX.r2.5HG0407060 | 314 | 9.25 | 36.792 | Chloroplast Nucleus | -1.241 | 36 |
HvC3H24 | HORVU.MOREX.r2.5HG0429150 | 752 | 7.39 | 85.417 | Nucleus | -1.256 | 5 |
HvC3H25 | HORVU.MOREX.r2.6HG0475520 | 211 | 9.17 | 23.087 | Nucleus | -0.813 | 0 |
HvC3H26 | HORVU.MOREX.r2.6HG0475540 | 358 | 8.63 | 38.145 | Chloroplast | -0.34 | 0 |
HvC3H27 | HORVU.MOREX.r2.6HG0475570 | 308 | 9.49 | 31.997 | Chloroplast | -0.541 | 39 |
HvC3H28 | HORVU.MOREX.r2.6HG0505660 | 359 | 6.66 | 40.211 | Nucleus | -0.46 | 75 |
HvC3H29 | HORVU.MOREX.r2.6HG0515160 | 1001 | 8.81 | 110.273 | Nucleus | -1.121 | 16 |
HvC3H30 | HORVU.MOREX.r2.6HG0526270 | 647 | 5.23 | 71.397 | Nucleus | -1.069 | 45 |
HvC3H31 | HORVU.MOREX.r2.7HG0560290 | 489 | 9.46 | 55.275 | Nucleus | -0.79 | 59 |
HvC3H32 | HORVU.MOREX.r2.7HG0579580 | 363 | 6.85 | 38.791 | Nucleus | -0.706 | 20 |
HvC3H33 | HORVU.MOREX.r2.7HG0600900 | 297 | 9.64 | 30.833 | Chloroplast Nucleus | -0.541 | 46 |
HvC3H34 | HORVU.MOREX.r2.7HG0602740 | 407 | 8.56 | 44.684 | Nucleus | -0.954 | 26 |
HvC3H35 | HORVU.MOREX.r2.7HG0607870 | 375 | 9.32 | 42.524 | Nucleus | -1.372 | 6 |
Supplementary materials |
Ccch Domain Structure Analysis Of Hvc3hs
To evaluate the degree to which CCCH proteins in barley are evolutionarily conserved, the full-length proteins were used to characterize the domain organization of HvC3Hs. Significant differences in the domain organization of HvC3Hs were observed. A total of five domain organizations of 77 CCCH motifs (C-X7-10-C-X4-5-C-X3-H) were identified, with an average of 2.2 CCCH motifs per protein. CCCH proteins have been shown to have one to six CCCH motifs (Figure 1) [11, 22, 49, 50], and the same pattern was observed in our study. However, no HvC3H proteins contained four CCCH motifs. Approximately 77.14% (27) of the HvC3H proteins had one or two CCCH motifs, 14.29% contained five copies, and 5.71% had six copies (Supplementary Table S5); similar patterns have also been observed in rice and Arabidopsis[51]. Although different frequencies of CCCH motifs have been identified among barley CCCH genes, two conventional CCCH motifs C-X8-C-X5-C-X3-H and C-X7-C-X5-C-X3-H were the two most common, suggesting that C-X7-8-C-X5-C-X3-H might be ancestral to the other CCCH motifs (Supplementary Figure S1). Additionally, a total of four non-conventional CCCH zinc finger motifs, including 2 C-X7-C-X4-C-X3-H, 1 C-X9-C-X5-C-X3-H, and 1 C-X10-C-X5-C-X3-H, were observed, which were previously identified to be abundant non-conventional CCCH motifs in Arabidopsis and rice. In contrast to other plants, the uncommon CCCH motifs C-X4-6,11-15-C-X4-5-C-X3-H were not detected; no novel motif types were detected in barley CCCH proteins, suggesting that the CCCHs in barley are highly conserved. Aside from the CCCH zinc finger motifs, some HvC3H proteins also contained several other known functional domains, such as KH, RRM, and RING. Four (HvC3H7, -26, -27, and -33) and five (HvC3H1, -5, -16, -23, and -24) HvC3H members possessed the KH and RRM domains, respectively.
Phylogenetic Relationships, Gene Structure, and Conserved Domain Organization of HvC3H Genes
To determine the evolutionary relationships among HvC3Hs, a neighbor-joining (NJ) phylogenetic tree was constructed based on the alignment of the full-length CCCH protein sequences of barley. According to the criteria proposed by Wang and Peng et al. with slight modifications [11, 13], the 35 HvC3H family members within the phylogenetic tree were classified into seven subfamilies (group I to group VII) (bootstrap values >60%) (Figure 2A). Sixteen HvC3Hs formed eight sister gene pairs, seven of which possessed high bootstrap support (98%). Most of the phylogenetic clades had high bootstrap support; the relationships of some CCCH proteins (eight HvC3Hs) remained ambiguous because of their low bootstrap values at deep nodes. The number of CCCH proteins varied greatly for different subfamilies; group I and group II had seven and five members, respectively, whereas groups V and VI had only two members. We also constructed another phylogenetic tree based on the alignment of 170 CCCH proteins from Arabidopsis (68), rice (67), and barley (35) (Supplementary Figure S2). The phylogenetic tree revealed an alternating distribution of monocot and eudicot CCCH genes in most of the subfamilies. Within each subfamily, the HvC3H genes tended to cluster with their Arabidopsis and/or rice orthologs, suggesting that the orthologous genes might have evolved from a common ancestor when duplication events took place prior to species divergence. However, some groups consist of Arabidopsis and rice CCCH genes but lack barley genes. This indicates a presumed barley-specific loss of CCCH genes following the divergence of monocots and dicots.
The intron–exon gene structure not only provides important insights into the functional diversification of genes but also the evolutionary relationships within a gene family [52]. Unlike other TF family genes, which tend to lack introns, the average intron number of HvC3Hs was 3.97 (ranging from 0 to 12) (Figure 2B). In general, genes within the same subfamily had a similar structure of introns and exons. For example, genes from subfamily VII tended to be intron-less; subfamilies II, IV, and VI were nearly identical in their intron/exon lengths and structural organization. The intron–exon gene structure of subfamily I was highly variable. For example, HvC3H14 possessed 12 introns, whereas HvC3H12, HvC3H17, and HvC3H22 had six introns. HvC3H genes within subfamily I also had variable intron lengths.
To gain additional insights into the functional regions of HvC3Hs and their evolutionary relationships, the distribution of conserved motifs of HvC3Hs was visualized (Figure 2C). Consistent with the patterns in intron–exon gene structure, HvC3H proteins within the same subfamily tended to have a similar organization of motifs, and the patterns were highly variable among different phylogenetic clades. For example, the HvC3Hs in subfamily VII possessed two CCCH motifs and one RRM motif. Subfamilies IV, V, and VI had only one CCCH motif. Although the gene structure varied greatly among subfamily I members, the protein motif composition was conserved; there were five to seven CCCH motifs for each protein. The variation in gene structure and motif composition among subfamilies suggests prior sub-functionalization or neofunctionalization of these HvC3Hs.
Chromosomal Distribution And Gene Duplication
Chromosome location analysis revealed that the 35 HvC3H genes were unevenly located on the seven barley chromosomes, and chromosome 3H possessed the most abundant CCCH genes (nine HvC3Hs) (Supplementary Figure S3). By contrast, chromosomes 1H, 2H, and 4H had four, four, and two CCH genes, respectively. Chromosome 6H contained six CCCH genes, and chromosome 5H and 7H both had five. There was no significant correlation between the number of HvC3H genes and chromosome length (Pearson correlation r = 0.0278, p-value = 0.9528), demonstrating that longer chromosomes do not have more HvC3H genes.
Gene duplication is considered one of the primary drivers of gene family expansion in plants and plays an important role in the evolution of new gene functions and adaptation [53, 54]. A total of three duplicated gene pairs were identified (Figure 3). Two gene pairs (HvC3H25/HvC3H33 and HvC3H28/HvC3H31) were segmentally duplicated genes; all segmentally duplicated genes were located between chromosome 6H and 7H. HvC3H10/HvC3H11 were tandem duplicated genes and located on chromosome 3H. These duplicated genes were clustered in the same clade. Specifically, HvC3H10/HvC3H11 were clustered in subfamily III, HvC3H25/HvC3H33 were assigned to subfamily II, and HvC3H28/HvC3H31 belonged to subfamily VI. The aforementioned results indicated that the segmental and tandem duplication events might have played a role in the expansion of the HvC3H gene family.
To further evaluate the evolutionary constraints acting on HvC3Hs, the Ka/Ks ratios for each duplicated gene pair were calculated. The Ka/Ks ratios of the segmentally duplicated genes HvC3H28/HvC3H31 and HvC3H25/HvC3H33 and the tandem duplicated genes HvC3H10/HvC3H11 were all lower than 1, indicating purifying selection (Supplementary Table S6) [55]. The timing of duplication events was estimated to be approximately 22.83 – 808.04 MYA.
Syntenic relationships with six other representative species, including three monocots (Zea mays, Oryza sativa, and Triticum aestivum) and three dicots (Brassica rapa, Solanum lycopersicum, and Glycine max), were analyzed to determine the mechanisms underlying the evolution of HvC3Hs (Figure 4). A total of 63, 24, and 16 orthologous gene pairs between barley and Triticum aestivum, Zea mays, and Oryza sativa were identified, respectively. A total of 16 of 24 (66.7%) of the HvC3H genes were orthologous to three copies of CCCH genes in wheat, which might stem from the fact that the heterologous hexaploid wheat contained three distinct ancestral genomes, namely A, B, and D [56]. HvC3H11 on chromosome 3H was orthologous to four CCCH genes in wheat, which were located on chromosome 3A (one gene), chromosome 3B (one gene), and chromosome 3D (two genes). In addition, six HvC3Hs possessed two orthologous genes, and two HvC3Hs had one orthologous gene in wheat. This inconsistency might stem from gene loss and duplication events during the process of wheat polyploidization. However, the number of orthologous gene pairs between barley and three dicots (Glycine max, Brassica rapa, and Solanum lycopersicum) was ten, five, and one, respectively, which was much lower than those between barley and three monocots. This finding is consistent with the observed phylogenetic relationships between barley and these species. HvC3Hs are phylogenetically closer to CCCHs in Triticum aestivum, Zea mays, and Oryza sativa than CCCHs in Glycine max, Brassica rapa, and Solanum lycopersicum. Furthermore, we found that most of the HvC3H genes showed syntenic bias towards the chromosomes of these plants. For example, orthologous genes were on chromosome 1 in Oryza sativa, whereas the syntenic relationships in Zea mays tended to be located on chromosomes 3 and 8, suggesting that chromosomal rearrangements such as inversions or duplications might shape the organization and distribution of CCCH genes in these genomes. A total of seven HvC3H genes (HvC3H7, -11, -14, -15, -17, -18, and -31) were only syntenic with Triticum aestivum, Zea mays, and Oryza sativa, suggesting that these orthologous pairs may have formed after the divergence between monocotyledonous and dicotyledonous plants.
The Ka/Ks ratios were calculated for each orthologous gene pair to characterize selection on orthologous gene pairs. The overall Ka/Ks ratios (Triticum aestivum: 0.2733, Oryza sativa: 0.1682, Zea mays: 0.1873, Brassica rapa: 0.0149, Solanum lycopersicum: 0.0166, and Glycine max: 0.0295) of all the orthologous gene pairs were less than 1, suggesting that these HvC3Hs might have experienced strong purifying selection (Supplementary Table S7).
Cis -element Analysis of HvC3H Genes
Cis-elements play important roles in the transcriptional regulation of genes throughout the life cycle of plants. To obtain preliminary insights into the potential function and transcriptional regulation of HvC3H genes, the cis-regulatory elements in the promoter regions were analyzed. A total of 52 functional cis-elements were identified and grouped into five categories. A large number of light-responsive elements were detected in the promoter regions of HvC3Hs, which accounted for most of the putative cis-elements (Supplementary Table S8, Supplementary Figure S4). We also obtained a total of 11 types of hormone-responsive regulatory elements, such as auxin-responsive elements (AuxRR-core, TGA-box, and TGA-element), gibberellin-responsive elements (P-box, GARE-motif, and TATC-box), salicylic acid-responsive elements (TCA-element), MeJA-responsive elements (CGTCA-motif and TGACG-motif), ABA-responsive elements (ABRE), and ethylene-responsive elements (ERE). Several types of biotic and abiotic stress-related regulatory elements were observed in HvC3H promoters. Anaerobic induction elements (44 ARE and 30 GC-motif) were detected in 29 HvC3Hs, which were the most abundant cis-regulatory elements involved in the response to environmental stress. A total of 23 low temperature-responsive elements (LTR) and 28 drought-responsive elements (MBS, myeloblastosis ninding site) were detected in 15 and 17 HvC3Hs, respectively. A total of six HvC3H genes possessed wound-responsive elements (WUN-motif), suggesting that these genes might play essential roles in the response to biotic stress in plants. Additionally, eight types of plant organogenesis-related cis-elements were identified, such as the meristem expression-related element CAT-box (11 genes), zein metabolism regulation-related element O2-site (seven genes), endosperm expression-related element GCN4-motif (six genes), and seed-specific regulation element RY-element (four genes). These findings suggest that HvC3Hs might play an important role in barley plant growth and development, hormone signal transduction, and the response to biotic and abiotic stress.
Temporal-spatial And Stress-induced Expression Pattern Analysis
Analysis of tissue-/stage-specific expression profiles provided valuable insights into the potential functions of genes in plant species. To investigate the expression patterns of HvC3Hs in different tissues and developmental stages, publicly available RNA-seq data from 16 different tissues were used to estimate the expression levels of HvC3Hs, and the HvC3Hs were clustered into three major groups according to their expression levels (Figure 5). Distinct expression patterns were observed for the HvC3Hs, suggesting that these genes might have undergone significant differentiation and played various roles in particular stages during barley growth and development. The expression levels of HvC3Hs in group I were lower than those of genes in the other groups; eight genes were not expressed in most of the tissues/stages. By contrast, a total of eight genes in group II were highly expressed in all studied tissues/stages. HvC3H17 was predominantly expressed in LOD, CAR15, and EPI, whereas HvC3H1 and HvC3H12 showed high expression in INF2 and LOD, respectively. Genes in cluster III showed a medium expression level. Within this cluster, HvC3H5, -7, -9, -27, and -35 tended to be expressed in INF1 and INF2. These findings indicate that these HvC3Hs might be associated with the development of these tissues in barley. These genes might thus be interesting targets for barley breeding. The differentiation of subfamilies observed in the phylogenetic tree was not consistent with the expression clustering results, indicating that sequence similarity was not a strong predictor of expression patterns and functional similarity [57].
We analyzed the expression of HvC3Hs in response to different types of environmental stress. Under cold treatment, six HvC3Hs (HvC3H2, -3, -10, -11, -15, and -26) with FPKM values equal to 0 were excluded from the subsequent analysis (Figure 6A). The rest of the HvC3Hs were clustered into two major groups. Five HvC3H genes displayed increased expression patterns (>2.0-fold change). Among these genes, HvC3H18, HvC3H9, and HvC3H20 exhibited their highest level of expression under cold treatment, showing fold-changes of 9.21, 3.01, and 2.88, respectively. In addition, the expression of six genes was slightly down-regulated under cold treatment. Specifically, the expression of HvC3H4 decreased 0.65-fold, and HvC3H17 decreased 0.74-fold compared with the control, which was not exposed to low temperature. Salt stress induced differential expression patterns of HvC3H genes in the three root regions (Figure 6B). Compared with the unstressed control, a total of four, five, and three HvC3H genes were highly expressed in the meristematic, elongation, and maturation zones, respectively, especially HvC3H15, which exhibited a 23.16 and 32.50-fold increase in expression in the meristematic and elongation zones relative to the unstressed control. HvC3H17 was up-regulated in all tissues; its expression was increased 8.18-, 8.41-, and 2.11-fold in the meristematic, elongation, and maturation zones, respectively. Similarly, the expression of HvC3H22 and HvC3H28 was up-regulated in both the meristematic and elongation zones, and HvC3H9 was significantly up-regulated in the elongation and maturation zones. Under metal ion stress, the expression of HvC3H4, HvC3H9, and HvC3H18 was significantly up-regulated, and the up-regulation of HvC3H9 was induced by copper treatment (Figure 6C). Under zinc stress, HvC3H3 was up-regulated 2.67-fold.
Expression of HvC3Hs under Drought, Salt, Cold, and ABA Treatment by qRT-PCR Analysis
To investigate the expression of HvC3H genes in response to multiple treatments, 26 HvC3Hs were randomly subjected to qRT-PCR analysis. Under drought treatment, nine HvC3Hs were up-regulated at all time points (Supplementary Figure S5), and the expression of six of the nine HvC3Hs (HvC3H1, -5, -6, -12, -24, and -33) peaked at 24 h. The expression of HvC3H1 was approximately 54-fold larger than that of the control at 24 h. Several MBS cis-acting elements associated with drought inducibility were also identified within their promoter regions. However, no MBS cis-acting elements were detected in some genes, such as HvC3H5 and HvC3H6, suggesting that unknown regulatory elements related to drought stress might exist for these genes.
After salt treatment, the expression of HvC3H13 was suppressed compared with the control at all time points; the expression of 21 genes was significantly up-regulated, and the expression of these genes peaked at different times (Supplementary Figure S6). For example, the expression of HvC3H1 peaked at 3 h and was up-regulated 43-fold, whereas the expression of HvC3H5, -6, and -12 was initially slightly up-regulated and peaked at 24 h.
The expression levels of HvC3Hs after cold treatment were analyzed, and the expression of six genes (HvC3H4, -5, -7, -20, -25, and -27) was inhibited compared with the control; the expression of the other HvC3Hs was up-regulated at specific time points (Supplementary Figure S7). The expression of three HvC3Hs was up-regulated at 1 h (HvC3H23, -30, and -32), 3 h (HvC3H6, -28, and -33), and 6 h (HvC3H1, -17, and -22), suggesting that these HvC3Hs might primarily function in the initial stage in the response to cold injury. The expression of the other HvC3H genes peaked at 12 h or 24 h. The promoters of these genes were found to possess abundant LTR cis-elements, which might be responsible for their expression profiles under cold stress.
Plant CCCH proteins have been shown to be effective regulators of ABA-mediated stress responses [58]. qRT-PCR analysis showed that ABA treatment had a pronounced effect on the expression patterns of HvC3Hs, and a complex expression profile was observed (Supplementary Figure S8). For example, the expression of HvC3H1 was significantly up-regulated at 1 h and 3 h but down-regulated thereafter. By contrast, the expression of HvC3H24 was down-regulated before 12 h but significantly up-regulated at 24 h. With the exception of HvC3H5, -13, -20, -21, and -30, whose expression was suppressed relative to the control, the maximum expression levels of the other HvC3Hs peaked at different time points. Abundant ABRE cis-elements, which are important cis-acting elements associated with ABA-responsiveness, were detected in the promoter regions, including five ABRE cis-elements for HvC3H23, three for HvC3H33, and two for HvC3H17. Overall, our results demonstrated that ABA sensitivity might be a common feature of HvC3H genes.
Genetic Variation of CCCH Genes
We analyzed the sequence diversity of HvC3H genes at the population level based on exome-captured sequencing datasets. The single nucleotide polymorphism (SNP)-calling pipeline generated 331 high-confidence SNPs, 172 of which were in HvC3H21, followed by HvC3H14 (42) and HvC3H34 (23) (Supplementary Table S9; Supplementary Table S10). Most HvC3H-related SNPs were located within the intron regions (318); the rest of the SNPs were non-synonymous (8) and synonymous (5) SNPs, with a non-synonymous to synonymous ratio of 1.6. There were 277 InDels ranging from 1 to 55 bp in length, and short InDels 1 to 4 bp (76.90%) in length were more common than long InDels (Supplementary Table S11). Similarly, most HvC3H-related InDels were located in introns, which might be explained by the fact that the reading frame-independent variants were under weaker negative selection than frame-change variants.
To investigate the relatedness among the landraces and wild barley accessions worldwide, we carried out principal component analysis using HvC3H-related SNPs (Figure 7A,B; Supplementary Table S12). The first principal component was correlated with the biological differentiation between landrace from wild barley and explained 25.16% of the total genetic variance; the second and third principal components captured 5.09% and 5.00% of the genetic variance, respectively, and revealed geographical differentiation in barley accessions. These patterns were consistent with the topology of the NJ tree (Figure 7C). The phylogenetic tree revealed genetically divergent clusters associated with the contrast between barley wild accessions versus landraces rather than their geographical origins. ADMIXTURE analysis confirmed these findings (Figure 7D). When K=2, two groups coinciding with landraces and wild barley were observed. Increasing K to 4 provided additional insights. Within barley landraces, we detected two geographically distributed components from Europe and Africa, whereas the rest of the landraces from Mediterranean areas displayed signs of genetic admixture. Within wild barley accessions, accessions from the Southern and Northern Levant regions formed two distinct groups.
Genetic Diversity and Haplotypes of HvC3Hs in Wild and Domesticated Barley Populations
Population-based nucleotide diversity was calculated to assess the occurrence of prior genetic bottlenecks of HvC3H genes during barley domestication. The genetic diversity of HvC3Hs was further analyzed using exome-captured resequencing datasets; a total of 51 barley landraces and 95 wild barley accessions were included in the analysis. The total genetic diversity of HvC3H genes decreased by ~34.94% from the wild (π = 0.07622) to domesticated (π = 0.04959) barley population (Supplementary Figure S9).
A haplotype consists of a group of closely linked SNPs located in a specific chromosome region that determine the same trait. We constructed the haplotype network for each HvC3H gene using their SNPs. A total of 529 haplotypes belonging to 22 HvC3H genes were observed, with an average of 24.05 haplotypes per gene (Figure 8). Specific haplotypes represented in more than half of wild or landrace populations were defined as dominant haplotypes. Four HvC3H genes (HvC3H14, -21, -22, and -30) had no dominant haplotype, whereas 13 HvC3H genes had the same dominant haplotype in both wild and landrace populations. Nevertheless, clear genetic differentiation in haplotypes between wild and domesticated barley accessions was observed. HvC3H20 in wild barley mainly had the AAAAGGGGGGTTTTGGCC haplotype, but domesticated barley mainly had the AAAAGGGGAATTTTGGCC haplotype. The dominant haplotype in wild barley was AAGTTTTCCCTTGGGGAA, but haplotype AAGTTTTCTTTTGGGGTT was the most common in domesticated barley. Some rare haplotypes were also observed for specific HvC3H genes, such as HvC3H32, HvC3H34, and HvC3H35. The appearance of novel allelic variants greatly increased the degree of haplotype polymorphism of HvC3Hs. The rare haplotypes were mainly observed in the wild barley group, which was consistent with the results of the genetic diversity analysis indicating that these genes experienced a severe genetic bottleneck during barley domestication and that the haplotype diversity decreased in domesticated barley relative to the wild population. By contrast, the haplotype polymorphism of HvC3H14 and HvC3H20 in domesticated barley was higher than that in wild barley. We speculate that the process of artificial breeding has introduced various novel alleles and increased the degree of haplotype polymorphism.