Phenotyping of the population for antioxidant content in rice
A total of six antioxidant compounds including one antioxidant enzyme namely superoxide dismutase, flavonoids, anthocyanins, carotenoids, γ-oryzanol and ABTS were estimated from 270 genotypes during wet seasons of 2019 (Supplementary Table 1). Wide variation exists among the germplasm lines for 6 antioxidant compounds. The germplasm lines were classified into 5 groups based on the phenotying results of each compounds (Fig. 1). The frequency distribution of germplasm lines showed various groups or populations for each compound and enzyme (Fig. 1). A panel population was preparedby selecting120 genotypes representing each group and trait from the original population of 270 germplasm lines (Table 1; Fig. 2). The mean estimates of 6 antioxidants obtained from the representative panel population showed wide variation among the genotypes for each trait (Table 1). Very high values of carotenoid content was found in grmplasm lines Ac. 44598, Ac. 44597, and Ac. 9005. Very high TAC content was estimated from Ac. 43670, Ac. 43660 and Ac. 43675. Germplasm lines namely AC. 9063, AC. 20371 and AC-20627 showed very high level of γ-oryzanol in the seed. Good donor lines having very high TFC content, namely Ac. 43670, Ac. 43660, Ac. 44646, Ac. 44592, Ac. 44595, Ac. 43737, Ac. 43738 and Ac. 43676 were identified. The SOD level was very high in the seeds of genotypes such as Ac. 20317, Palinadhan-1, Ac.20362, Ac. 20328, Gochi, Chatuimuchi, Ac. 20770, Ac. 20920, Ac. 20907, Magra and Chinamal. The potential donors identified from the population for showing very high level of ABTS were Ac. 44592, Ac. 43670, Ac.4460, Ac. 44595, Ac. 44588, Ac. 43660, Ac. 43738 and Ac. 43732, However, a higher level of antioxidants such as more than 3 compounds in Ac. 44592, Ac. 44646, Ac. 44595, Ac. 43660, Ac. 43738, Ac. 43660, and Ac.43669 were indentified from the germplasm lines.
Relatedness among germplasm lines for antioxidant compounds through Genotype-by-trait biplot analysis
The scatter diagram was plotted taking the first two principal components to generate genotype-by-trait biplot graph for the 6 antioxidants estimated from the 120 genotypes present in the panel (Fig. 3). The first and second principal components showed 68.3 and 19.8 % of the total variability with eigen values of 8064 and 2342, respectively (Supplementary Fig. 1). γ-oryzanol contributed maximum diversity among the 6 antioxidant compounds in the genotypes present in the panel followed by TFC and ABTS (Fig. 3). The scattering pattern of genotypes in the 4 quadrants indicated that genotypes containing high estimates of antioxidants are placed in the quadrant 1 (top right) and II (bottom right). Higher estimates of antioxidants with multiple compounds containing genotypes have been encircled in the figure (Fig. 3). The top right (Ist quadrant) and bottom right (2nd quadrant) accommodated majority of the genotypes containing high estimates of antioxidant compounds. The 3rd (bottom left) quadrant kept most of the germplasm lines with moderate in antioxidant compounds while the 4th quadrant (top left) accommodated majority of poor in antioxidant compounds containing germplasm lines (Fig. 3).
Nature of association among antioxidant traits
The association among 6 antioxidant traits revealed a strong positive correlation (r≥0.7) of TAC with TFC and TFC with ABTS. Moderate positive correlation (r: 0.5-0.7) of TAC with ABTS and weak positive correlation (r < 0.5) was observed for carotenoid with GO (Fig. 4). These antioxidant traits positively or negatively correlated may be controlled by the closely linked genes or because they might be structurally related. Therefore, a variety that accumulates high concentrations of one antioxidant may contain higher quantity of other correlated antioxidants.
Genetic diversity parameters analysis
The panel population containing 120 germplasm lines which exhibited wide variation for the antioxidants was genotyped using 136 SSR markers. The genetic diversity parameters estimated from the panel population are presented in the Table 2. Genotyping results showed a total of 508 markers alleles from the population exhibiting mean alleles of 3.74 per locus. The number of alleles per locus ranged from 2 to 7 per marker. The highest numbers of alleles were produced by the marker RM493 in the studied panel for antioxidants content in rice. The measure for the variation by a marker in the population was analysed by the of availability major allele frequency parameter. The average major allele frequency linked to the polymorphic markers was computed to be 0.561 which showed a range of 0.279 (RM8044) and 0.925 (RM6054) (Table 2). The informativeness of a genetic marker is estimated by the PIC value. It ranged from 0.137 (RM6045 and 6054) to 0.787 (RM493) with average value of 0.496. The observed mean heterozygosity (Ho) in the population was 0.116 which varied from 0.00 to 0.958 (RM3735). Twenty marker loci showed 0.00 Ho value in the panel population. The gene diversity (He) which gives a measure of genetic diversity in the panel population ranged from 0.142 (RM6054) to 0.813 (RM493) with a mean value of 0.555.
Population Genetic Structure Analysis
The diverse population for antioxidants was genotyped for genetic structure and analyzed by adopting probable sub-populations (K) and selecting higher delta K-value by applying the STRUCTURE 2.3.6 software. The rate of change in the log probability of data between successive K values is the delta K value used in the analysis. The panel population was categorized into two sub-populations with a high ∆K peak value of 264.2 at K = 2 among the assumed K (Supplementary Table 2; Supplementary Fig. 2). The two subpopulations were in the proportion of 0.277 and 0.723 for population 1 and population 2, respectively. But, the subpopulations showed poor correspondence with antioxidants containing genotypes present in the studied population. Therefore, next ∆K peak at K=3 was compared in which the population was classified into 3 subpopulations. The 3 subpopulations showed genotypes in the proportion of 0.208, 0.689 and 0.103 in the inferred clusters for the sub-population 1, 2, and 3, respectively. The Fst1, Fst2 and Fst3 values were 0.3392, 0.1664 and 0.3701 for the sub-population 1, 2, and 3, respectively (Supplementary Table 2; Supplementary Fig. 3). The ancestry value of ≥80% obtained in a genotype grouped the genotype into the particular subpopulation.
The assumed subpopulations at K=3 differentiated the germplasm lines for antioxidant compounds but did not clearly separate for the SP2 and SP3 subpopulations. Hence, next ∆K peak at K=4 was considered for the subpopulations in which the population was classified into 4 genetic groups. The antioxidant compounds present in the studied population showed a fair correspondence at K=4 with inferred structure values in the subpopulations. Majority of the germplasm lines with high to very high antioxidants were present in the subpopulations 4. The moderate value antioxidant containing germplasm lines are present in the subpopulation 2. Poor and moderate antioxidants containing germplasm lines were in subpopulation 1 while very poor to poor types are in the subpopulation 3 (Table 3; Fig. 5). The alpha value of the panel showed a low value (α=0.0578) estimated by the structure analysis at K=4. Positively skewed leptokurtic distributions were observed for the mean alpha-value while normally skewed leptokurtic distributions detected for all the 4 Fst values for the panel population showing a distinct variation in the distribution among the Fst values (Supplementary Fig. 4).
Molecular variance (AMOVA) and LD decay plot analysis
The closely related plants among themselves in a population are grouped into different subpopulations. The genetic variations obtained within and between the subpopulations at K=4 were estimated by the analysis of molecular variance (AMOVA) (Table 4). The genetic variations estimated at K=4 was computed to be 6 % among the populations, nil among individuals and 94 % variation within individuals of the panel population. Wright’s F statistics was used to know the deviation from Hardy-Weinberg’s prediction. The parameter FIS was used to know the uniformity of individuals within the subpopulation and FIT for individual within the total population for differentiation of the population. The FIT and FIS of the total population and within population estimated on the basis of 136 marker loci showed -0.148 and -0.235, whereas the total population had FST value of 0.071 between the 4 subpopulations. Fst is used to know the subpopulations or population differentiation within the total population. A clear differentiation among the 4 subpopulations was observed from each other for their distribution pattern was based on the Fst values (Supplemental Fig. 4).
The association of alleles by different loci in a nonrandom manner is utilized in the marker-trait association analysis. Existence of marker–trait association is dependent on the LD decay rate in a population over a time period. The LD decay rate will indicate the possibility of new genes or allelic variants controlling the antioxidant compounds associated with molecular markers for these traits. Syntenic r2 value was used to plot the linkage disequilibrium decay of the population versus the physical distance in million base pair (Fig. 6A). Tightly linked markers had higher r2 value and the average r2 values rapidly decreases for increase in linkage distance. In the LD plot it is observed that the LD decay in the beginning was delayed in the studied panel populations. However, a decline of LD decay was noticed in the curve for the associated markers at about 1-2 mega base pair and thereafter a gradual and very slow decay was noticed from the graph. The graph clearly indicated the continuance of linkage disequilibrium decay in the population for the studied antioxidant compounds in the rice population. The limitation for LD decay depends on non-random mating, mutation, selection, migration or admixture, and genetic drift will influence the estimates of LD. This LD decay plot also provided clue for creation of genetic admixture groups for various antioxidants compounds in the normal population. A similar trend was also noticed in the marker ‘P’ versus marker ‘F’ and marker R2 (Fig. 6B) curve. The detected markers from this study indicated the strength of the markers for the studied antioxidant compounds.
Principal coordinates and cluster analyses for genetic relatedness among the germplasm lines
The two dimensional plot for principal coordinate analysis (PCoA) was constructed based on the genotyping results of 136 SSR markers which classified the 120 germplasm lines as per the genetic relatedness among the lines (Fig. 7). The inertia showed by component 1 was 11.73 % while 7.49 % exhibited by the component 2. The germplasm lines were allotted different spots in the four quadrants forming 3 major groups (Fig. 7). The biggest group accommodated all the germplasm lines of the subpopulation 2 and 3 together and clustered in the 2nd (bottom right) quadrant. The genotypes in the 1stquadrant are divided into 2 groups of which one group on the top of the 1st quadrant forms the sp3 subpopulation with low to very low in antioxidant content in the seeds. The other group near to the axis1 is all admix type of germplasm lines. Few germplasm lines of quadrant 2 and closer to the axis 1 are also admix genotypes. The admix genotypes present on both sides of axis 1 are depicted in black colour (Fig. 7).
The germplasm lines containing high to very high mean values of antioxidants are grouped together forming the subpopulation 4. This subpopulation is present on the quadrant 3 (top left) and 4 (bottom left) and encircled in red color. The germplasm lines rich in antioxidants are placed both side of the axis 1 on the quadrant 3 and 4 (Fig. 7). The PCoA distributed all the germplasm lines into the four quadrants classifying into 4 clusters and a separate admixture group. The subpopulations clustered by PCoA showed correspondence with the population structure (Fig. 7). Germplasm lines namely Ac. 44594, Ac. 43669, Ac. 44597, Ac. 44588, Ac. 43737, Ac. 44595, Ac. 43676, Ac. 44597, Ac. 44592, Ac. 43738, and Ac. 44646 are placed together in one structure group present in the quadrant III & IV and are rich in antioxidants. The PCoA placed germplasm lines in the Quadrant II which were mostly average in antioxidant content. This quadrant formed the group by placing all the germplasm lines of subpopulation 1 and 2.
The Wards’s clustering broadly grouped all the genotypes into two major groups. The largest cluster, cluster 1 accommodated 111 germplasm lines in which all poor to average containing antioxidants were present. The cluster II had 9 germplasm lines only. The dendrogram placed all the germplasm lines in this cluster II which were rich in antioxidant content. This cluster again subdivided into 2 sub groups which further divided into 6 sub-sub clusters. The cluster I was divided into two main sub clusters which finally divided into 32 small groups. All the clusters and small groups accommodated in the Ward’s clustering approach were based on the antioxidant compounds content in the germplasm lines (Fig. 8A).
The cluster analysis discriminated the germplasm lines on the basis of genotyping of 136 SSR markers and placed the genotypes into different clusters which corresponded with the studied antioxidant traits. The unweighted-neighbour joining tree differentiated the genotypes into 4 different clusters (Fig. 8B ). The Cluster for Subpopulation 4 was differentiated from SP2 by the presence of germplasm lines containing high antioxidants in it while moderate to high containing genotypes were in the subpopulation 2. The green coloured portion of the tree is designated as SP4 while blue for SP2. The very poor antioxidants containing germplasm lines were in the subpopulation 3 depicted in red colour in the tree. Majority of the germplasm present in the subpopulation 1 were poor to medium in antioxidant content and shown in pink colour. The germplasm lines with admix type of population are depicted in black color in the neighbour joining tree neighbour joining tree neighbour joining tree (Fig. 8B).
Marker-trait association for antioxidant compounds in rice
Marker-trait associations was computed for 6 antioxidant compounds by using Generalized Linear Model (GLM) and Mixed Linear Model (MLM/ K+Q model)) in the TASSEL 5 software. The marker-trait association values were compared at less than 1% error i.e. 99% confidence (p<0.01). A total of 57 and 23 significant marker-trait associations were detected for 5 antioxidant compounds by GLM and MLM, respectively at p<0.01. The range for marker R2 values was from 0.0477 to 0.159 by GLM while 0.0607 to 0.1169 detected by Mixed Linear Model (Supplementary Table 3; Supplementary Table 4). A total of 15 significant marker-trait associations were detected by both the models for 5 antioxidant compounds present in the seed at p<0.01(Fig. 9A). Significant association of 5 SSR markers with TAC; 3 with SOD, TFC; 2 with GO, and ABTS were detected. Five antioxidant compounds present in the studied germplasm lines showed higher marker R2 (>0.1) with low p (<0.01) values in the associations study includes SOD with RM405 and GO with RM3701 (Table 5; Fig. 9A). The Q-Q plot also confirmed the association of these markers with the associated antioxidant traits in rice (Fig. 9B).
Four markers namely RM440, RM5638, RM253 and RM5626 showed significant associations with compound, TAC detected by GLM and MLM models at p<0.01 showing > 0.05 marker R2 value. The QTLs controlling anthocyanin content in these genotypes are detected to be located markers RM440, RM5638, RM253 and RM5626 at 92.7, 86, 37 and 99 cM on the chromosome 5, 1, 6 and 3, respectively. Three markers namely RM582, RM467 and RM405 located at 66.4, 46.8 and 28.6 cM positions on chromosome 1, 10 and 5, respectively were associated with the compound, SOD. TFC content was detected to be associated with markers RM 3701, RM235 and RM494 present at 45.3, 101.8 and 124.4 cM on chromosome 1, 11 and 12, respectively. The QTLs for ABTS showed significant associations with RM3701 and RM235 on chromosome 1 and 11, respectively. The marker RM216 showed association with SOD at very low p-value and high marker R2 value of >0.10618 analyzed by the GLM only. The QTLs for antioxidant compound, OZ showed significant associations with RM3701 and RM502 on chromosome 1 and 8, respectively (Table 5; Fig. 9A). The Q-Q plot also confirmed the associations of these markers with the estimated antioxidant compounds in rice (Fig. 9).
Association mapping study for antioxidant compounds in rice seeds identified co localization of QTLs controlling antioxidant compounds in rice. It is observed that same marker showed significant associations with different antioxidant compounds in rice by both the models (Table…). Significant associations of marker RM3701 with antioxidant compounds GO, TFC, and ABTS present in the germplasm lines were detected. In addition, it was also detected association of RM235 with compounds TAC, TFC, and ABTS by both the models at <1% error and p<0.01 (Table 5). While considering marker association analyzing by GLM, the marker RM494 showed association with both carotenoids and TFC. In addition, RM494 was associated with both SOD and TFC content analyzed by MLM.