Phenotypic variation of GPC shows positive impact on grain weight in RIL population
The study used a Recombinant Inbred Line (RIL) population derived from a WH1105/TAC75 cross, grown in three geographically distinct regions over two consecutive years, with two replications per environment. Six agronomic traits were recorded for the parents and 325 RIL (F7-9) individuals: GPC, HD, FT, TGW, PH and TN.
GPC was normally distributed (Fig. 1) with a broad-sense heritability (H2) of 0.72, indicating significant genetic control. The parental line TAC75 had a GPC of 14.61%, while WH1105 had 12.56%. The genotype 'AM274' exhibited the lowest GPC (9.4% in E1), whereas 'AM6' had the highest (18.99% in E3). Genotypes 'AM53' (~ 16.63%) and 'AM368' (~ 10.35%) showed the most stable GPC values across environments. The population's GPC ranged from 9.4–18.99%, with a mean of 13.4%. The genetic advance as a percentage of mean (GAM) for GPC was 16.56% (Table 1).
Table 1
Statistical analysis of grain protein content (GPC) and other agronomically important traits in WH1105/TAC75 population
| GPC | TGW* | Heading | Flowering | Height | Tiller |
WH1105 | 12.56 | 37.81 | 96 | 102 | 89 | 8 |
TAC75 | 14.42 | 42.74 | 87 | 93 | 112 | 5 |
Maximum | 18.89 | 57.55 | 117 | 124 | 134.42 | 13 |
Minimum | 9.4 | 11.11 | 77 | 81 | 62.33 | 3 |
Grand Mean | 13.4 | 37.37 | 95.84 | 100.95 | 101.95 | 7.62 |
Standard Error of Mean (SEm) | 0.34 | 1.62 | 1.18 | 0.88 | 2.67 | 0.53 |
Critical Difference (CD) 5% | 0.94 | 4.5 | 3.29 | 2.46 | 7.42 | 1.48 |
Critical Difference (CD) 1% | 1.24 | 5.92 | 4.33 | 3.24 | 9.76 | 1.95 |
Environmental Variance | 0.58 | 15.83 | 8.47 | 4.74 | 42.99 | 1.72 |
Genotypic Variance | 1.52 | 25.24 | 28.59 | 25.87 | 114.08 | 1.19 |
Phenotypic Variance | 2.10 | 41.07 | 37.06 | 30.61 | 157.08 | 2.91 |
Environmental Coefficient of Variance | 5.72 | 10.64 | 3.03 | 2.15 | 6.43 | 17.21 |
Genotypic Coefficient of Variance | 9.25 | 13.44 | 5.57 | 5.03 | 10.48 | 14.36 |
Phenotypic Coefficient of Variance | 10.88 | 17.15 | 6.35 | 5.48 | 12.29 | 22.42 |
Heritability (Broad Sense) | 0.72 | 0.61 | 0.77 | 0.84 | 0.726 | 0.41 |
Genetic Advance | 12.16 | 8.11 | 9.67 | 9.63 | 18.75 | 1.44 |
Genetic Advance as percentage of mean | 16.56 | 21.71 | 10.09 | 9.541 | 18.39 | 18.96 |
*TGW = Thousand grain weight |
ANOVA revealed significant variation due to genotype (G), environment (E), and G × E interaction for GPC (P < 0.001) (Table 2). GPC data, analyzed across all environments, correlated with other environments except for Environment-2, which was excluded from further analysis (Figure-S1). Other agronomic traits also exhibited normal distribution and were recorded for correlation with GPC (Figure-S2). Pearson’s correlation coefficients indicated that GPC was positively correlated with TGW with correlation of 0.143 (P < 0.001), showing no negative relationship. FT and HD were negatively correlated with GPC − 0.121 and − 0.103 respectively (P < 0.001), PH and TN showed minor correlation and were not significant (Table 3, Figure S-3).
Table 2
ANOVA for grain protein content to determine the variance across environment
Source of variation | Df ** | Mean Sq | F-value | P-value |
Environment | 4 | 61.516 | 116.18 | < 0.001 |
Genotype | 324 | 7.068 | 13.35 | < 0.001 |
Genotype x Environment | 1296 | 0.908 | 1.71 | < 0.001 |
Replicates x Environment | 5 | 9.36 | 19.74 | 0.285 |
Residuals | 985 | 0.529 | | |
**Df = Degree of freedom |
Table 3
Pearson’s correlations between the grain protein content and other agronomic traits in the mapping population
Trait | Correlation coefficient |
Days to heading (DH) | -0.121*** |
Flowering time (FT) | -0.103*** |
Plant height (PH) | 0.041* |
Tiller number per plant (TN) | -0.009 |
Thousand grain weight (TGW) | 0.143*** |
*Significance level at P < 0.05; ***significance level at P < 0.001 |
Genotyping by sequencing (GBS) and SNP identification
GBS was performed using the double digest restriction associated DNA (ddRAD) method using PstI and MspI restriction enzymes (Poland et al. 2012). Approximately 1200 million clean reads were generated for 200 individuals, including parents, averaging 6 million reads per individual. Reads were filtered with a quality score threshold of 30 and mapped to the wheat genome (IWGSC RefSeq V2.1). A total of 1,882,167 SNPs with a minimum sequencing depth of 25 and occurrence in ≥ 90% of the samples were retained. Of these, 1,870,849 SNPs were mapped to 21 chromosomes, while 11,318 uncharacterized SNPs were excluded from further analysis. The density distribution of filtered SNPs was consistent across the wheat genome: 34.25% (644,617) in the B genome, 32.5% (613,025) in the A genome, and 32.58% (613,207) in the D genome. Twelve individuals were excluded with high SNP missing rate (> 15%), leaving 188 individuals for linkage analysis. Further screening of homozygous polymorphic loci within parents identified 40,023 SNPs, and were used for linkage mapping. The highest number of homozygous polymorphic SNPs was recorded on chromosome 2B (3,181) and the lowest on chromosome 4D (230).
Linkage mapping and QTL identification
For Linkage mapping, the BIN function from IciMapping v4.2 was used to remove redundant SNPs, resulting in 2,418 BIN IDs shortlisted based on segregation distortion and lower missing rate for linkage mapping. The 2,418 BIN IDs included 973 from genome A, 1,071 from genome B, and 384 from genome D. These were used for linkage mapping using the MAP function of IciMapping v4.2. The linkage analysis produced a high-density genetic map spanning 2,947.34 cM, with the map length for genome A being the highest (1,260.68 cM), followed by genome B (1,139.25 cM) and genome D (547.91 cM). A total of 34 linkage groups were formed: 11 each for genomes A and B, and 12 for genome D. The overall marker density was 1.22 cM per BIN marker (Table 4, Fig. 2). Total of ten QTLs were identified in the present study for the trait GPC, located on chromosomes 2B (3 QTLs), 5B (3 QTLs), 5A (2 QTLs) and one each on 4B and 1D (Fig. 3).
Table 4
Linkage map developed for WH1105/TAC75 RILs using 2418 BIN IDs representing 400023 SNPs.
Chromosome | Mapped SNP | BIN markers | linkage group | Map length (cM)* | cM per bin marker |
1A | 2761 | 164 | 1 | 191.79 | 1.17 |
2A | 2537 | 138 | 1 | 151.75 | 1.1 |
3A | 2177 | 131 | 2 | 199.53 | 1.52 |
4A | 1847 | 134 | 2 | 128.16 | 0.96 |
5A | 1993 | 103 | 2 | 189.35 | 1.84 |
6A | 1793 | 154 | 1 | 174.88 | 1.14 |
7A | 2636 | 139 | 2 | 225.22 | 1.62 |
A genome | 15744 | 963 | 11 | 1260.68 | 1.31 |
1B | 3059 | 179 | 2 | 187.13 | 1.05 |
2B | 3181 | 224 | 1 | 197.4 | 0.88 |
3B | 3337 | 106 | 3 | 148.75 | 1.4 |
4B | 888 | 40 | 1 | 132.32 | 3.31 |
5B | 1988 | 114 | 1 | 95.05 | 0.83 |
6B | 2953 | 195 | 1 | 187.41 | 0.96 |
7B | 3040 | 213 | 2 | 191.19 | 0.9 |
B genome | 18446 | 1071 | 11 | 1139.25 | 1.06 |
1D | 975 | 70 | 2 | 98.91 | 1.41 |
2D | 1435 | 123 | 2 | 52.32 | 0.43 |
3D | 811 | 58 | 2 | 64.98 | 1.12 |
4D | 230 | 12 | 1 | 78.51 | 6.54 |
5D | 989 | 66 | 1 | 72.22 | 1.09 |
6D | 562 | 25 | 2 | 129.9 | 5.2 |
7D | 831 | 30 | 2 | 50.57 | 1.69 |
D genome | 5833 | 384 | 12 | 547.41 | 1.42 |
Total | 40023 | 2418 | 34 | 2947.34 | 1.22 |
*cM = Centi morgan |
The LOD scores ranged from 2.59 to 5.49 (Table 5). Among these, QGPC.nabi-2B.2 and QGPC.nabi-5B.1 were identified in six and four environments respectively, including the BLUP dataset. QGPC.nabi-2B.2 explained 5.19%-6.97% PVE with LOD values ranging from 2.59 to 3.60, while QGPC.nabi-5B.1 explained 7.15%-10.28% PVE with LOD values ranging from 3.57 to 3.96. Therefore, these two QTLs were considered major and stable for GPC. The remaining eight QTLs were detected in one or two environments, explaining 5.1%-15.42% of the PVE with LOD values ranging from 2.61 to 5.49, and were designated as minor QTLs. Additive effect values suggest that positive alleles of QGPC.nabi-2B.2 and QGPC.nabi-5B.1 were derived from the parent WH1105. The major QTL QGPC.nabi-2B.2 covered a genetic distance of 9 cM and a physical distance of 5.8 Mb, while QGPC.nabi-5B.1 covered 10 cM genetically and 31 Mb physically. Since QGPC.nabi-2B.2 was detected in all environments, including the BLUP dataset, it was considered for further analysis.
Table 5
Quantitative trait loci (QTL) analysis for grain protein content studied across five environment and BLUP data set.
sr no | QTL | Enva | Marker | Position (cM) | Left marker | Right marker | Physical interval (Mb) | LODb | Addc | PVE (%)d |
1 | QGPC.nabi-1D.1 | E3 | M5959 | 32.03 | M5892 | M5959 | 8.45 | 2.61 | 0.25 | 6.46 |
E6 | M5960 | 32.03 | M5892 | M5959 | 8.45 | 3.35 | 0.31 | 6.88 |
2 | QGPC.nabi-2B.1 | E1 | M9585 | 12.71 | M9533 | M9585 | 4.44 | 2.96 | 0.71 | 6.48 |
E6 | M9586 | 11.71 | M9554 | M9593 | 3.67 | 3.33 | 0.84 | 6.8 |
3 | QGPC.nabi-2B.2 | E1 | M9629 | 17.91 | M9618 | M9650 | 2.22 | 3.45 | 0.72 | 6.64 |
E3 | M9629 | 17.91 | M9618 | M9650 | 2.22 | 2.79 | 0.59 | 5.24 |
E4 | M9618 | 20.01 | M9650 | M9662 | 3.57 | 2.59 | 0.74 | 5.19 |
E5 | M9618 | 16 | M9593 | M9618 | 5.8 | 2.78 | 0.3 | 6.95 |
E6 | M9629 | 17.91 | M9618 | M9650 | 2.22 | 3.6 | 0.85 | 6.97 |
BLUP | M9629 | 17.91 | M9618 | M9650 | 2.22 | 2.78 | 0.58 | 5.26 |
4 | QGPC.nabi-2B.3 | E5 | M12257 | 195 | M12257 | M12340 | 4.51 | 5.49 | -0.36 | 15.42 |
5 | QGPC.nabi-4B.1 | E5 | M22376 | 84 | M22376 | M22357 | 6.98 | 5.01 | -0.34 | 13.7 |
6 | QGPC.nabi-5A.1 | E6 | M24671 | 22.65 | M24671 | M24615 | 4.39 | 3.29 | 0.31 | 6.86 |
7 | QGPC.nabi-5A.2 | E3 | M23555 | 114.1 | M23555 | M23561 | 0.54 | 2.91 | 0.26 | 7.19 |
8 | QGPC.nabi-5B.1 | E1 | M25710 | 59 | M25742 | M25723 | 6.32 | 3.95 | 0.32 | 8.7 |
E4 | M25629 | 64 | M25669 | M25629 | 2.8 | 3.96 | 0.37 | 10.28 |
E3 | M25571 | 68.51 | M25607 | M25542 | 11.86 | 3.57 | 0.69 | 6.83 |
BLUP | M25571 | 68.51 | M25607 | M25542 | 11.86 | 3.89 | 0.71 | 7.47 |
9 | QGPC.nabi-5B.2 | E1 | M25536 | 72.51 | M25542 | M25511 | 4.25 | 3.61 | 0.75 | 7.15 |
E6 | M25454 | 82.21 | M25471 | M25428 | 1.92 | 3.95 | 0.93 | 8 |
10 | QGPC.nabi-5B.3 | E3 | M25927 | 53.11 | M25909 | M25764 | 249.06 | 2.63 | 0.6 | 5.1 |
BLUP | M25927 | 53.11 | M25909 | M25764 | 249.06 | 2.79 | 0.61 | 5.45 |
a Environment, blogarithm of odds, cadditive effect, dphenotypic variance explained |
Precise mapping of QGPC.nabi-2B.2 identifies NB-ARC domain containing proteins as candidate genes
QGPC.nab-2B.2 was covering a physical distance of 5.8 Mb. Using the physical position information of the flanking markers, SSRs were screened for polymorphism between parents (Fandade et al. 2024). Total of forty SSRs were synthesized and eight were used to narrow down the genomic region (TableS-4). The polymorphic SSRs were used to screen the extreme bulks (n = 16*2) of GPC from RIL population. The recombination frequency ranged from 0.31 to 0.63. SSR_23 showed a minimum recombination frequency of 0.31 for high GPC lines, and SSR_34 showed the same for low GPC lines (Table 6). The region spanning SSR_23 and SSR_34, considered as the minimum QTL region, which covered 2.3 Mb on chromosome 2B (Fig. 4).
Table 6
Recombinant frequency estimate for precise QTL mapping using eight polymorphic SSR markers for high (n = 16) and low GPC (n = 16) genotypes.
| | High GPC | Low GPC |
Sr no | SSR | P1* | P2** | H*** | Recombination frequency (C) | Genetic distance cM | P1 | P2 | H | Recombination frequency (C) | Genetic distance cM |
1 | SSR_4 | 9 | 7 | 0 | 0.44 | 0.68 | 7 | 8 | 1 | NL | - |
2 | SSR_9 | 10 | 6 | 0 | 0.38 | 0.49 | 7 | 9 | 0 | NL | - |
3 | SSR_17 | 10 | 6 | 0 | 0.38 | 0.49 | 7 | 9 | 0 | NL | - |
4 | SSR_18 | 8 | 8 | 0 | 0.5 | 0 | 10 | 5 | 1 | 0.34 | 0 |
5 | SSR_22 | 5 | 9 | 2 | 0.63 | 0.55 | 9 | 6 | 2 | 0.41 | 0.58 |
6 | SSR_23 | 11 | 5 | 0 | 0.31 | 0.37 | 6 | 10 | 0 | NL | - |
7 | SSR_34 | 10 | 5 | 0 | 0.33 | 0.4 | 11 | 5 | 0 | 0.31 | 0.37 |
8 | SSR_39 | 6 | 10 | 0 | 0.63 | 0.55 | 10 | 6 | 0 | 0.38 | 0.49 |
P1 = parent1, P2 = parent 2, H = heterozygotes, SSR = short sequence repeats, NL = Not linked |
This region contained 30 high-confidence genes identified from the genome browser (Zhu et al. 2021). The most abundant class of protein was NB-ARC domain-containing protein (seven genes), followed by membrane traffic proteins (exocyst) (PC00150) (six genes). Other proteins included glycosyltransferase (PC00111) (two genes), ATP-binding cassette (ABC) transporter (PC00003) (one gene), dehydrogenase (PC00092) (one gene), transferase (PC00220) (four genes), and winged helix/fork-head transcription factor (PC00246) (one gene). Eight genes were uncharacterized (Table-S1). Expression analysis using expVIP and WheatOmics showed fewer genes having expression of more than 2 TPM and most of them belonged to the NB-ARC domain-containing protein family, marking them as important candidates. (Fig. 5)
QGPC.nabi-2B.2 shows positive effect on GPC in different genetic populations
Three SNPs A) 2B: 21949674 (G◊T) (SNP1), B) 2B: 21991706 (T◊G) (SNP2) and C) 2B: 22514403 (C◊T) (SNP3) were identified from the 2.3 mb region based on their co-segregation with trait GPC (Figure-S4). When the genotypes showing extreme GPC phenotype (25 from each end) of the RILs were studied, the identified SNPs shown to associates with GPC in the range of 67% − 80% (Fig. 6 (a)) indicating the dominance of the alleles. Subsequently these SNPs were converted to tetra ARMS-PCR primer for PCR validation (Table 7, Fig. 6 (b)). The PCR validation for the three SNP markers was carried out on Indian wheat varieties and phenotyped for 2 consecutive years (Fig. 7 (a), (b)). Similarly, another genetically different population EH-RIL was validated using two SNPs i.e., SNP-2 and SNP-3 markers, polymorphism for SNP1 was not detected in the population. Each population was divided into two groups based on the genotyping results. Groups carrying positive allele showed significantly (> 0.01) higher GPC as compared to negative allele in all of the SNPs (Fig. 7 (c)). The phenotypic variation explained by the SNPs is calculated using mean values and considered as a percentage increase in GPC. The SNPs were able to explain phenotypic variation in the range of 2.85–6.04%. Interestingly, SNP2 and SNP3 markers were able to differentiate between alleles in the EH RIL population, even though the population showed lower variation in GPC (Fig. 7 (d)).
Table 7
Tetra ARMS-PCR primer designed for screening in natural accessions and RIL population from different genetic background
SNP Name | Primer type | Primer Sequence (5'-->3') | Tm* | Product Size |
2B: 21949674 (G◊T) (SNP1) | Forward inner primer (G allele) | AGCAAGGGTGGAGGAGACAAACATAATTGG | 71 | 163 |
Reverse inner primer (T allele) | GATGAATGTGCTCGTGAATCACGCAA | 71 | 197 |
Forward outer primer | GACCAAGATGTGGATGGCTCTTGGTTTC | 71 | 304 |
Reverse outer primer | CTGGTAACTGACAAGTGGCGAACCCTTG | 71 |
2B: 21991706 (T◊G) (SNP2) | Forward inner primer (T allele) | ATGTGCGAGACGCAGGGTGTGCCCTAT | 76 | 227 |
Reverse inner primer (G allele) | CTTCTAGTGAGTGTCTCTTATGTGGTGTAC | 61 | 184 |
Forward outer primer | AAGGAGAAACCAGATCTAGCAGCTGCAG | 68 | 354 |
Reverse outer primer | ACTTGTTCATTTTTTCCCGAATAACCGG | 68 |
2B: 22514403 (C◊T) (SNP3) | Forward inner primer (C allele) | GTGCACGCACCTACGGGAGACAATGAT | 73 | 134 |
Reverse inner primer (T allele) | GAGAGACGTACGTGGGTTAGGTCGGATTGG | 73 | 110 |
Forward outer primer | AATTCGTTACACGTCACCGGCCTGTGAA | 73 | 187 |
Reverse outer primer | ATAATGTTGCATCTGCAGGCAGCGGATT | 73 |
*Melting temperature |