Statistical analysis
-
Phenotypic analysis
All statistical analyses were performed in R 4.1.2 software 51. Descriptive statistics (mean, standard error, range, and coefficient of variation) were computed for each yield-related trait. Traits were subsequently subjected to a linear mixed model analysis in the sommer package 52 using the Direct-inversion Newton-Raphson (NR) restricted maximum likelihood (REML) algorithm 53. The model structure is given by Eq. (1):
$$\:\varvec{y}=X\varvec{\beta\:}+V\varvec{r}+T\varvec{g}+\varvec{\epsilon\:}$$
1
where y is the vector of phenotypic observations; β is the vector of the fixed effects of replication added to the overall mean; \(\:\varvec{r}\) is the effect of blocks nested within replications and was considered as random [\(\:\varvec{r}\:\sim\:\:N\left(0,\:I{\sigma\:}_{b\left(r\right)}^{2}\right)\)]; \(\:\varvec{g}\) is the vector of accession effects and was regarded as random [\(\:\varvec{g}\sim\:\:N\left(0,\:I{\sigma\:}_{g}^{2}\right)\)]; \(\:\varvec{\epsilon\:}\) is the vector for error [\(\:\varvec{\epsilon\:}\sim\:\:N\left(0,\:I{\sigma\:}_{e}^{2}\right)\)]; and \(\:X\), \(\:V\), and \(\:T\) are the incidence matrices that relate the independent vectors to the response variable \(\:\varvec{y}\). Wald’s and likelihood ratio tests were performed to assess the significance of fixed and random factors, respectively.
Based on the observation that when molecular markers are available, no specific mating design is required to dissect the genetic variance into its additive and non-additive (i.e., dominance and epistatic) components 52, the model in Eq. (1) was extended to the following variant (Eq. (2), which enabled to capture the additive (\(\:{\sigma\:}_{A}^{2}\)) and dominance (\(\:{\sigma\:}_{D}^{2}\)) genetic variances:
$$\:\varvec{y}=X\varvec{\beta\:}+\text{V}\varvec{r}+{Z}_{add}\varvec{a}+{Z}_{dom}\varvec{d}+\varvec{\epsilon\:}$$
2
where \(\:\varvec{y}\) is raw phenotypic observations; \(\:\varvec{\beta\:}\) is the vector of the fixed effects of replication; \(\:\varvec{r}\) is the random effect of blocks nested within replications [\(\:\varvec{r}\:\sim\:\:N\left(0,\:I{\sigma\:}_{b\left(r\right)}^{2}\right)\)] ; \(\:\varvec{a}\) is the additive effect [\(\:\varvec{a}\sim\:\:N\left(0,\:{G}_{a}{\sigma\:}_{A}^{2}\right)\)], where \(\:{G}_{a}\) is the additive genetic relationship matrix; \(\:\varvec{d}\) is the dominance effect [\(\:\varvec{d}\sim\:\:N\left(0,\:{G}_{d}{\sigma\:}_{D}^{2}\right)\)], where \(\:{G}_{d}\) is the dominance genetic relationship matrix; \(\:\varvec{\epsilon\:}\) is the vector for error [\(\:\varvec{\epsilon\:}\sim\:\:N\left(0,\:I{\sigma\:}_{e}^{2}\right)\)]; and \(\:V\), \(\:{Z}_{add}\), and \(\:{Z}_{dom}\) are the incidence matrices relating observations to the levels of each factor. The \(\:{G}_{a}\) and \(\:{G}_{d}\) matrices were obtained from SNP data respectively using the functions A.mat and D.mat available in the sommer package.
Broad-sense (H2) and narrow-sense (h2) heritabilities were estimated from the aforementioned model to capture the fraction of total phenotypic variance due to genotypic and additive variation, respectively. H2 and h2 were estimated respectively using the following equations (3) and (4) 54:
$$\:{H}^{2}=\frac{{\sigma\:}_{A}^{2}+{\sigma\:}_{D}^{2}}{{\sigma\:}_{A}^{2}+{\sigma\:}_{D}^{2}+{\sigma\:}_{b\left(r\right)}^{2}+{\sigma\:}_{e}^{2}}$$
3
$$\:{h}^{2}=\frac{{\sigma\:}_{A}^{2}}{{\sigma\:}_{A}^{2}+{\sigma\:}_{D}^{2}+{\sigma\:}_{b\left(r\right)}^{2}+{\sigma\:}_{e}^{2}}$$
4
where \(\:{\sigma\:}_{A}^{2}\) is the additive variance component; \(\:{\sigma\:}_{D}^{2}\) is the dominance variance component; \(\:{\sigma\:}_{b\left(r\right)}^{2}\) is the variance component related to the effect of incomplete blocks nested within replications; and \(\:{\sigma\:}_{e}^{2}\) is the residual variance component.
The expected genetic gain was also estimated from the same model in Eq. (2) as the percentage of genetic advance over the population mean following Johnson, et al. 55 formula (Eq. (5):
$$\:EGG=\:\frac{GA}{\mu\:}\times\:100$$
5
Where GA is the genetic advance expressed as \(\:GA=\:\frac{K{\sigma\:}_{g}^{2}}{\sqrt{{\sigma\:}_{p}^{2}}}\), with \(\:{\sigma\:}_{g}^{2}\) = genotypic variance, \(\:{\sigma\:}_{p}^{2}\) = phenotypic variance and K = Selection differential at 5% selection pressure i.e., 2.063. µ is the population mean.
Finally, to assess data quality, the experimental accuracy was computed using the Eq. (6) below:
$$\:{r}_{gg}={(1-\frac{1}{1+{rCV}_{R}^{2}})}^{1/2}$$
6
Where r is the number of replicates, \(\:{CV}_{R}={CV}_{g}/{CV}_{e}\), with \(\:{CV}_{g}=\left(\frac{\sqrt{{\sigma\:}_{g}^{2}}}{\mu\:}\right)\bullet\:100\) and \(\:{CV}_{e}=\left(\frac{\sqrt{{\sigma\:}_{e}^{2}}}{\mu\:}\right)\bullet\:100\), and µ is the grand mean of the trait 56.
-
Structural equation modeling
A structural equation model (SEM) was performed in the lavan R package 57, to quantify the multivariate causal relationships network among grain yield and fourteen potential yield components. SEM requires to specify an initial model that hypothesizes the causal relationships among the variables at hand 58. Therefore, an initial model was specified including relationships confirmed in previous studies on legume crops such as chickpea 59, cowpea 60 and kersting’s groundnut 15. Four yield component traits viz. PHE, HSW, YPP, and GY were considered as endogenous variables while the remaining 11 traits were considered as exogenous variables in the initial model. The model was subsequently modified using the modification indices until an acceptable model fit was achieved. Modification indices were a standard metric for improving SEM model fit 32. The comparative fit index (CFI), root mean square error of approximation (RMSEA), and Chi-square p-value were used to assess the model goodness of fit. Models with a CFI > 0.95, RMSEA < 0.05, and p-value > 0.05 were considered to fit well. Standardized partial regression coefficients were reported to permit direct comparisons across paths 61. Standardized coefficients with absolute values below 0.10 are often considered to have a small effect, while values around 0.30 show a medium effect and values above 0.50 indicate a large effect 61.
-
Multi-trait genotype selection
To reveal the associations between traits and trait profiles (i.e., strengths and weaknesses) of accessions, genotype by trait (GT) 62, and genotype by Yield*Trait (GYT) 63, biplots were generated. For this purpose, a two-way genotype by trait matrix consisting of the 81 accessions (in rows) and the 15 traits (in columns) was constructed by extracting the best linear unbiased predictions (BLUPs) of all traits from the previous mixed-effect models. BLUPs were extracted using the emmeans R package 64. The GYT table was subsequently obtained by multiplying the grain yield (GY) value with each of the other trait values, for each accession. For traits such as days to 50% flowering (DTF) and grain filling duration (GFD) for which lower values are desirable, the values for the yield-trait combinations were obtained by dividing the GY value with the trait value 63. The datasets were then standardized to a mean of 0 and a unit variance and the GT and GYT biplots were constructed in the metan package 65 based on the first two principal components.
The multi-trait genotype-ideotype distance index (MGIDI) was used to rank the accessions based on desired values of multiple traits as proposed by Olivoto and Nardino 45. The steps to compute the MGIDI index were fourfold.
-
In the first step, the traits were rescaled to a 0–100 range such a way that 0 and 100 represent the minimum and maximum of the traits in which positive gains are desired, and the inverse for traits in which negative gains are desired.
-
In the second step, an exploratory factor analysis was performed to group correlated traits into factors and estimate factorial scores for each genotype.
-
The third step was ideotype planning, where the ideotype had the highest rescaled value (100) for all considered traits and was defined by a [1×p] vector.
-
In the fourth step, the Euclidean distance between the scores of the genotypes and the ideotype was computed as the MGIDI index using the following Eq. (7) 45:
$$\:{MGIDI}_{i}={\left[\sum\:_{j=1}^{f}{({\gamma\:}_{ij}-{\gamma\:}_{j})}^{2}\right]}^{0.5}$$
7
Where γij is the score of the ith genotype in the jth factor (i = 1, 2, ..., g; j = 1, 2, ..., f), where g and f are the number of genotypes and factors, respectively; and γj is the jth score of the ideotype.
The genotype with the lowest MGIDI is closer to the ideotype representing desired values for all the studied traits. The selection differential was calculated for all traits considering a selection intensity of 15%. Data manipulation and computation of the MGIDI index were performed in the metan R package 65.