Soil test and phenotypic variation of yield and yield-components
The soil test comparison of sodic and non-sodic sites revealed higher levels of exchangeable cations on sodic site including toxic levels of boron (B), manganese (Mn) and sodium (Na) and favourable levels of nutrients – potassium (K), calcium (Ca) and magnesium (Mg) (Table 1). However, the two sites had similar levels copper (Cu) and zinc (Zn) and toxic elements – iron (Fe) and aluminium (Al). Dispersion index was 12 times higher in sodic compared to non-sodic site and pH was above 9.0 at 30-70 cm depth. The phenotypic data for YD, PM, N, HM, GH, TGW, and HI followed normal distribution for both years, indicating multigenic inheritance of the traits. The genetic variance and heritability (H2) from the mixed models are presented in Table S2. Data for HM, GH and TGW for 2018 trial are not available. Comparison of the magnitudes of genetic variances at the two soil types (S2non-sodic vs. S2sodic) revealed a greater variation in YD, TGW and HI at the sodic site. Within-year variance comparison for PM and HM showed opposite trends in both years albeit in opposite order; the variance of PM was less than HM on non-sodic than sodic site in 2018 and the vice versa in 2019. Correlation coefficients between trial sites were high and positive for all traits in both years. Moderate to high broad-sense H2 was estimated for YD, N, and PM for both the years. However, H2 was low for HM, GM, TGW and HI. The plot of sodic blups against non-sodic blups was used to determine ICT for each trait. The slope b in Table S2 was used to calculate ICTs for each trait.
Genetic diversity of FIGS wheat panel
The genetic diversity based in 206 FIGS wheat lines from 24 countries is shown in radial dendrogram (Fig. 1a). The wheat lines from different countries were randomly distributed across eight major clusters, highlighted in eight colours (red, blue, purple, green, black, yellow, pink, and orange), with some of them clustering together, indicating intra- and inter-continental variation. The control check cultivars from Australia were found to be highly related and clustered together. The kinship matrix (Fig. 1b) represented the degree of allele sharing between the FIGS lines. The matrix showed large proportion of yellow, representing low genetic correlation, greater diversity, and an absence of strong population structure among genotypes.
Linkage disequilibrium, model and significant associations
The analysis of linkage disequilibrium (LD) showed that the LD decayed with the genetic distance in A, B and D genomes [RB1] [RB2] (Fig. 2). The average of LD in each genome identified the pattern of LD in the three genomes. Based on the average of chromosomes at the genome level, the highest LD was found in the B genome (R2 = 1.0), followed by genomes D (R2 = 0.96) and A (R2 = 0.78). In the present t-GBS assay, D genome (17%) had the lowest number of SNPs followed by genomes A (38.1%) and B (44.8%) (Table S3). The LD decay[RB3] was rapid in A genome, followed by B and D genome.
The MLM analysis generated a significance value (P < 0.005) for each SNP, using F test for the null hypothesis that a SNP has no effect on the measured trait (Zhang et al., 2010). Plots of the negative logarithmic estimates of the observed against expected p values (Q-Q plots) showed that the MLM with (Q+K) matrix represented a good model fit and accounted for relatedness between wheat lines, minimising false positives and improving the marker-trait associations estimates (Fig. S1). The scree plot generated by plotting the percentage of variances explained by optimal number of PCs from GAPIT (Fig. 1c) was visualised to select optimal PC number. The elbow point occurred approximately from 5 and therefore, PCs between 5 and 8 were adjusted based on trait variance in Q model. [RB4] [RB5] After applying a mixed linear model, we found – 927, 1097, 159, 158, 114, 107, and 163 QTNs significantly (P ≤ 0.05) associated with ICTs of YD, PM, N, HM, GH, TGW, and HI, respectively. The significant threshold was then adjusted by setting the P ≤ 0.005 and q < 0.10. The cut-off threshold p values for the traits with FDR correction were YD – 2.9E-03; PM – 1.1E-03; N – 2.2E-03; HM – 5.2E-04; GH – 1.3E-03; TGW – 8.4E-04; HI – 2.6E-03. This reduced the number of significantly associated SNPs to 21 for YD ICT, 18 for PM ICT, 5 for N ICT, 5 for HM ICT, 7 for GH ICT, 5 for TGW ICT, and 12 for HI ICT (Table S4).
GWAS and potential candidate genes for tolerance to sodic-dispersive soils in wheat
A total of 36,644 exome markers were genotyped, of these 14,925 are co-dominant and 21,719 are dominant. The significant quantitative trait (QTSs), Indel, favourable alleles, and their effect identified for two years are presented in Table 2. Targeted SNP and de novo SNP within 180 bp sequence around target SNP[RB6] [RB7] were defined as haplotypes. QTSs, minor allele frequency and phenotypic variance explained by favourable allele is presented in Table 2. The genome-wide distribution of significant and pleiotropic QTSs and haplotypes across 21 wheat chromosomes are shown in Fig. 3. The TRAES genes, length, and location on forward and reverse strand along with its variant consequence, splice variants, GO term, and highest gene expression state at various wheat developmental stage are presented in Table 3. The listed genes are detected either at the exact location of QTS (overlapping) or within 200 Kb from SNP location. The IWGSC low confidence protein coding and uncharacterised genes were not considered for further analysis.
The association analysis for YD ICT revealed strong associations (Table 2) on chromosomes 1A, 1B, 2B, 3B, 5A, and 7B with −log10(P) value above 2.5 and PVE% in range of 17.8–4.1. Five haplotypes, having more than one SNP within 200 bp on respective chromosome were detected for YD4 (1B): 94828187–94828277, YD5 (1B): 411775338–411775423, YD7 (3B): 817580012–817580066, YD9 (5A): 613704427–614358933, and YD10 (7B): 148015309–148015372. The QTSs for YD1 and YD7 are homozygous biallelic and the genotypic effects on yield ICT are presented in Fig. 4. The yield ICT increased by 11.3 % in the presence of beneficial allele C with respect to allele T at YD1 (365186774) on chromosome 1A. Similarly, for haplotype YD7, the beneficial allelic combination CC, CC, and AA [RB8] at locations 817580012, 817580019 and 817580066, improved average yield ICT by 79.5 % with respect to the allele TT, TT and GG, respectively. In general, most of the overlapping genes for YD ICT have activities for metal ion binding, DNA and ATP binding, proteolytic, and electron transfer. These genes have highest expression during grain development, anthesis, stem elongation, photosynthesis, and germination.
The association analysis for PM ICT, revealed ten QTSs on chromosomes 1A, 1B, 1D, 2A, 3A, 3B, 6B, 7B and 7D, with −log10(P) value above 3.0 and PVE% in the range of 39.1–8.0. Four haplotypes were detected on QTSs PM2 (1B): 53737652–53737716, PM4 (2A): 63307274–63307383, PM5 (3A): 502921633–502921779, and PM10 (7D): 566251264–566251365. Two QTSs PM4 (2A) and PM7 (6B) had significant impact on wheat establishment on sodic-dispersive soil with PVE% of 24.2 and 37.6, respectively. The QTSs PM1, PM3 and PM7 are homozygous biallelic having higher additive effect for wheat establishment (Fig. 4). The presence of the allele GG (232828996), TT (420202667) and TT (486517837) at PM1, PM3 and PM7, improved average wheat establishments by 85 %, 99 % and 43.7 % with respect to alleles AA, CC and AA, respectively. Most of the overlapping genes identified for PM ICT have protein kinase activity and binding activities for ATP, chitin, GTP, DNA, and protein dimerization. These genes have highest expression during germination, internodes visible stage, and anthesis. The association analysis for N ICT identified four QTSs on chromosomes 2B, 6B and 7A with −log10(P) value above 2.5 and PVE% in the range of 9.1–7.2. A [RB9] [RB10] haplotype was detected on chromosome 7A at QTS N4 (7A): 696574393–696574479. The QTS N3 on chromosome 6B [RB11] [RB12] at 694681316 had minor allele frequency of 0.47 with PVE% of 8.5 and this QTS corresponds to a gene TraesCS6B02G426300, having photoreceptor activity, highly expressed during stem elongation and flowering stage.
The association analysis for yield-components ICT of HM, GH, TGW, and HI identified 1, 3, 4, and 7 QTSs, respectively. All QTSs had −log10(P) value above 2.5 and PVE% in the range of 9.6–3.1. A haplotype was identified for HM ICT on chromosome 1D (HM1). This haplotype has five SNPs between 356142294 and 356142422, which is within 128 bp with a minor allele frequency [RB13] [RB14] of 0.45 and PVE% of 9.6. This genomic region overlaps with the gene TraesCS1D02G262200, involved in metal ion binding and transcription regulation, and is highly expressed during germination, flower opening stage and maximum stem elongation. Three haplotypes[RB15] [RB16] were identified for GH ICT on chromosome 2B, 2D and 5D. The haplotype on chromosome 2B, overlaps with the gene TraesCS2B02G067500, which is pleiotropic for YD and HI and encodes for dirigent protein, highly expressed during stem elongation. Likewise, four QTSs were identified for TGW ICT on chromosome 1A, 3B and 6B. The QTS TGW3 on chromosome 6B at 638828274, corresponds to a gene TraesCS6B02G366100 which has two splice variants and is involved in RNA polymerase II cis-regulatory sequence specific binding. This gene is highly regulated 14 days after anthesis. The haplotype QTSs TGW4: 640352471–640352516, is 1.52 Mb apart from TGW3: 638828274 on same chromosome 6B. Seven QTSs were identified for HI ICT on chromosome 1A, 2B, 5D, 6B, and 7A. The haplotypes were identified on chromosome 2B (34083794–34083886), 5D (437358882–437358910), 6B (69712612–69712756), and 7A (177135543–177135615). The overlapping genes on chromosome 1A, 6B, and 7A have metal ion and nucleic acid binding and translation activity and these genes are highly expressed during anthesis, post-anthesis and maximum stem length. The gene TraesCS7A02G213400 caused missense mutation with a deleterious SIFT score of 0.01, inducing an amino acid change from valine to methionine. Most of the QTSs identified for TGW and HI ICT had rare variants MAF < 0.05 (Table 2).
Pleiotropy of Quantitative trait SNPs
Genome-wide analysis for tolerance to sodic-dispersive soil revealed that three regions contained QTSs for more than one trait ICT (Fig. 3 and Table 2) at same genomic region representing pleiotropy. Chromosome 1A: 588006280 harboured QTS for TGW (TGW1) and HI (HI2) at same genomic location. Chromosome 2B: 34083794 harboured QTS for YD (YD6), GH (GH1), and HI (HI3). QTSs with haplotypes on chromosome 2B, GH1 and HI3 had two SNPs located within 92 bp (34083794–34083886) and are pleiotropic. A haplotype in chromosome 5D: 437358882–437358910, harboured two QTSs detected within 28 bp interval, having common associations for GH (GH3) and HI (HI5), indicating pleiotropism.
Additionally, the closely related association for traits can be visualised in Fig. 3. The QTS for HI on chromosome 1A, HI1 (555622815) is found close to HI2 (588006280) and TGW1 (588006280), within 32.38 Mb. The QTS for YD on chromosome 1B, YD3 (55014354) is found close to haplotype PM2 (53737652–53737716) within 1.27 Mb. The QTS for HI on chromosome 2B, HI4 (41415831) is found close to YD6 (34083794), GH1 haplotype (34083794–34083886) and HI3 (34083794–34083886), within 7.33 Mb. The QTS for TGW on chromosome 3B, TGW2 (658270393) is found close to PM6 (681510374), within 23.23 Mb.