Prediction of genetic values according to the dimensionality reduction of SNP's markers in complex models

doi:10.21203/rs.3.rs-2331100/v1

Download PDF

Research Article

Prediction of genetic values according to the dimensionality reduction of SNP's markers in complex models

https://doi.org/10.21203/rs.3.rs-2331100/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The presence of non-informative markers in Genome Wide Selection (GWS) needs to be evaluated so that the genomic prediction is more efficient in a breeding program. This study proposes to evaluate the efficiency of RR-BLUP after reducing the dimensionality of SNP's markers in the presence of different levels of dominance, heritability, and epistatic interactions in order to demonstrate that the results obtained with reduced information improve prediction and preserve the same biological conclusions when using a larger data set. 10 F₂ populations of a diploid species (2n = 2x = 20) with an effective size of 1000 individuals were simulated, involved the random combination of 2000 gametes generated from contrasting homozygous parents. 10 linkage groups (LG) with a size of 100 cM each and comprised 2010 bi-allelic SNP´s distributed equally and equidistant form. Nine traits were simulated, formed by different degrees of dominance, heritability, and epistatic interactions. The dimensionality reduction was performed randomly in the simulated population and then the efficiency of RR-BLUP was tested in two different studies. The parameters square of correlation (r²), root mean squares error (RMSE), and the Akaike Information Criterion (AIC) was used to evaluate the efficiency of the model used in the RR-BLUP. The results obtained from the reduced information predicted by the RR-BLUP were able to improve the prediction and preserve the same biological conclusions when using a larger data set. Non-informational or small effect markers can be removed from the original data set. The inclusion of dominance effects was an efficient strategy to improve predictive capacity.

Genome-Wide Selection

quantitative genetics

RR-BLUP

Genome-Wide Selection (GWS), proposed by (Meuwissen et al. 2001) has the view to reducing the time demand and increasing selection accuracy, using the incorporation of molecular information directly to predict the genomic estimated breeding value (GEBV) of an individual, that is a measure used to select the best individuals, according to their merit within the population. GWS stands out for promoting high selective accuracy and for not requiring the knowledge of the prior location of the QTLs in the chromosomes (de Almeida et al. 2010; Almeida et al. 2017; Alkimim et al. 2020). One of the challenges of GWS is the high dimensionality, for the number of markers is greater than the number of genotyped and phenotyped individuals (de Almeida Filho et al. 2019; Lima et al. 2019). When there is high dimensionality, it is possible that some traits are controlled by a smaller number of loci and, in some cases, the linkage groups do not contribute with determinant loci in the variation of these traits. Information from these loci, that not genetically related to the target traits and that is not linkage disequilibrium with QTL may be excluded from the analysis as they are not informative. That is, it is necessary to evaluate the reduction in accuracy, due to the presence of these unnecessary markers, so that the model generated can be of more use to the breeder (Sousa et al. 2019). Thus, when verifying the undesirable effects of the presence of non-informative markers, research needs to focus on how to identify them in order to remove these markers. The RR-BLUP method (Random Regression Best Linear Unbiased Predictor), allows to estimate of all allelic effects simultaneously, this method assumes the infinitesimal model with many loci of small effects, that is, it considers that the effects of QTL present normal distribution with constant variance over the chromosomal segments (Resende et al. 2010). In addition, it is similar to the traditional BLUP, however, in predicting random effects, it is not necessary to use the kinship matrix (Schaeffer 2006).

Phenotypic variance usually combines the genotype variance with the environmental variance. Genetic variance has three major components: additive genetic variance, dominance variance, and epistatic variance. In general, GWS models neglect the influence of dominance and epistasis, taking into consideration only the additive effects of the traits (Costa et al. 2020). When expressing the phenotypic value based solely on the additive effect, there may often be difficulties in the more realistic representation of the genetic architecture of quantitative characters. Then, the inclusion of dominance effects and epistatic interactions may possibly increase the accuracy of the prediction. The role of epistasis in the genetic architecture of complex characters has been discussed since the emergence of quantitative genetics and, although it is seen from different perspectives, recognition of its importance is growing. Currently, most published genetic models include only additive genetic effects (Nishio et al. 2014; Martini et al. 2017; Islam et al. 2020). From this, it is increasingly necessary to explore these models in order to predict genetic merits and possible explanations of the genetic effects of dominance and epistasis.

In this context, the present study proposes to evaluate the efficiency of RR-BLUP after reducing the dimensionality of SNP's markers in the presence of different levels of dominance, heritability, and epistatic interactions to demonstrate that the results obtained with reduced information improve prediction and preserve the same biological conclusions when using a larger data set.

Genome And Simulated Populations

10 F₂ populations of a diploid species (2n = 2x = 20) with an effective size of 1000 individuals were simulated. This simulation involved the random combination of 2000 gametes generated from contrasting homozygous parents (dominant P1 and recessive P2). The simulated genome was composed of 10 linkage groups (LG) with a size of 100 cM each and comprised 2010 bi-allelic single nucleotide polymorphisms (SNP´s) distributed equally and equidistant form. For the generation of gametes, the percentage of recombination equivalent to the distance between loci was 0.5cM, providing the linkage disequilibrium. The chi-square (X²) segregation test was performed on all 10 simulated F₂ populations in order to identify the population with the lowest distortion rate of Mendelian segregation, provided by sampling or gametic randomization. The other simulated populations that had the highest number of distortions were eliminated.

Simulation Of Traits

Nine traits were simulated, representing different scenarios, with heritabilities of 0.25, 0.5, and 0.75 and three degrees of dominance (dd), 0, 0.5, and 1.0 (Table 1), and the epistatic model, according to Eq. 1.0:

${Y}_{i}=\mu +{\sum }_{j}{\alpha }_{j}+{\sum }_{j}{\sum }_{j{\prime }}{\alpha }_{j}{\alpha }_{j{\prime }}+{\epsilon }_{i}$

(1.0)

where: µ + ${a}_{i}$, µ + ${d}_{i}$ and µ − ${a}_{i}$ the genotypic values associated with classes AA, Aa and aa, respectively, which were identified by coding 1, 0 or -1, respectively.

Table 1

Heritability values (h²) and degree of dominance (dd), of nine simulated traits
Traits	h² (%)	Dd
T1	0.25	0
T2	0.25	0.5
T3	0.25	1.0
T4	0.5	0
T5	0.5	0.5
T6	0.5	1.0
T7	0.75	0
T8	0.75	0.5
T9	0.75	1.0

The individuals' phenotypes were generated according to the model ${P}_{i}={G}_{i}+{E}_{i}$, in which ${G}_{i}$ is the genotypic effect given by the sum of the genotypic effects of each locus, plus multiplicative effect between the effects of pairs of loci when epistasis exists, and ${E}_{i}$ the environmental effect, generated according to a normal distribution with zero average and variance compatible with the heritability of the simulated trait.

Dimensionality Reduction

Dimensionality reduction was performed randomly in the simulated population and then the efficiency of RR-BLUP was tested in two different studies, considering each of the letters below as a scenario, as described:

Study 1: A) 40 markers associated with the traits; B) 442 markers (LG9 and LG10 - including 40 associated markers); C) 1608 markers (LG1 to LG8 - including 40 associated markers); D) 2010 total markers (LG1 to LG10); E) 402 markers (LG 9 and LG 10).

Study 2: F) 40 markers associated with the traits; G) 80 markers (distributed in LG1 to LG8 - including 40 associated markers); H) 120 markers (distributed in LG1 to LG8 - including 40 associated markers); I) 160 markers (distributed in LG1 to LG8 - including 40 associated markers); J) 200 markers (distributed in LG1 to LG8 - including 40 associated markers); K) 1608 markers (LG1 to LG8 - including 40 associated markers).

Prediction Rr-blup

The RR-BLUP method (Random Regression Best Linear Unbiased Predictor) uses BLUP predictors and assumes that the effects of SNP's markers are covariates of random effects. The prediction using the RR-BLUP method is based on the mixed linear model (Resende 2007 and Resende et al. 2008), according to the following model:

$y=Wb+Xm+e$

(2.0)

where: y is the vector of phenotypic observations; b is the vector of fixed effects with incidence matrix W; m is the vector of the random effects with incidence matrix X with m ~ N (0, I${\sigma }_{g}^{2}$) and $e$ refers to the vector of random errors with e ~ N (0, I${{\sigma }}_{\text{e}}^{2}$), where ${\sigma }_{e}^{2}$ the variance of the error. X is the incidence matrix composed of values 1, 0 and − 1 according to the number of marker alleles of the MM, Mm, and mm genotypes, respectively.

For the training and validation of the techniques used, cross-validation (k-fold) was performed with k = 5 partitions (Bengio et al. 2004). In each of the five rounds, four of these subsets constituted the training population (80% of individuals), and the remaining subset constitutes the validation population (20% of individuals).

To calculate the accuracy the genomic estimated breeding values (GEBV) was calculated using the expression:

$GEBV={\widehat{y}}_{j}={\sum }_{i}^{n}{x}_{ij}{\widehat{m}}_{i}$

(3.0)

where: ${x}_{ij}$ equals − 1, 0 or 1 for the mm, Mm and, MM genotypes, respectively, for marker $i$. The component ${x}_{ij}$ is the element $i$ of line $j$ of matrix $X$, referring to individual $j$.

To evaluate the efficiency of the model used in the RR-BLUP after the reduction of dimensionality, were used the parameters square of correlation (${r}^{2}$), root mean squares error (RMSE), and the Akaike Information Criterion (AIC).

The selective accuracy was measured by the square of correlation (${r}^{2}$) between the estimated values ($\widehat{y}$) and true values ($y$), that is, it measures how much the estimate obtained is related to the real value of the parameter, which in quantitative genetics, expresses the heritability of the traits (Schaeffer 2006). Accuracy was given by the following equation:

$${r}^{2}={\left(cor\left(\widehat{y}, y\right)\right)}^{2}$$

4.0

The root mean squares error (RMSE) was used to express the predictive accuracy of the models, as it has the advantage of presenting the error values on the same scale as the variable of interest, and is described as follows:

$$RMSE=\sqrt{\frac{\sum (\widehat{y}-y)²}{n}}$$

5.0

The Akaike Information Criterion (AIC), proposed by Akaike (1974), consists of a model selection criterion, which uses the Kullback Leibler (KL) (Kullback and Leibler 1951) information to test whether a given model is suitable. The criterion adopted for obtaining the AIC chooses the model that minimizes the KL divergence, so that lower values of the AIC indicate a better adjustment (Akaike, 1974). The AIC is defined as:

$$AIC=-2\text{log}\left(L\left(\widehat{\theta }\right)\right)+2p$$

6.0

where: $L\left(\widehat{\theta }\right)$ is the maximum of the likelihood function of the model considered; $p$ is the number of parameters to be estimated in the model.

Statistical Test

Each of the five simulated populations was considered to be replicates. The analysis of variance of the data was performed, obtaining the mean squares of the factors by the F test. The ${r}^{2}$, RMSE and the AIC resulting from each study were subjected to the Tukey test, at 5% probability.

Computational Analysis

Population simulations were performed using the GENES software (Cruz 2016). The other analyzes were also performed in the GENES software in the integration with the R software ( R Core Team 2019).

Selective Accuracy

Figure 1 shows the variation in the values of selective accuracy (r²) obtained for study 1, according to the degree of dominance and heritability. The similarity was observed between the values of r² for almost all scenarios (A-D), with the exception of scenario E, which has only the markers belonging to LG 9 and LG 10 (Figure 1).

For the scenarios A, B, C and D, the highest value of r² was for trait 3 (h2: 0.25 and dd: 1.0) followed by trait 6 (h²: 0.5 and dd:1.0) and trait 9 (h²: 0.75 and dd: 1.0) (Table 2). In contrast, the lowest value of r² was for trait 1 (h²:0.5 and dd: 0.0), followed by trait 4 (h²: 0.5 and dd: 0.0) and trait 7 (h²: 0.75 and dd: 0.0) (Table 2). Thus, it was observed that the presence of dominance improved the selective accuracy, because, with the inclusion of higher degrees of dominance, there was an increase in the value of the studied parameter (r²). Otherwise, the high reduction in the value of r² in scenario E, can be explained due to the inexistence of markers associate with the evaluated traits (Table 2).

Table 2

Prediction values of selective accuracy (r²) according to the RR-BLUP methodology (study 1) for all evaluated traits, according to the variation in the number of markers
	r²	Study 1
Traits		A		B		C		D		E
1	h²_0.25/dd_0	0.24	Ac	0.17	Ad	0.21	Ad	0.20	Ae	0.00	Ba
2	h²_0.25/dd_0.5	0.41	Ab	0.38	Ac	0.39	Abc	0.39	Acd	0.00	Ba
3	h²_0.25/dd_1.0	0.64	Aa	0.59	Aa	0.59	Aa	0.58	Aa	0.00	Ba
4	h²_0.5/dd_0.0	0.25	Ac	0.22	Ad	0.26	Acd	0.25	Ae	0.00	Ba
5	h²_0.5/dd_0.5	0.40	Ab	0.38	Ac	0.40	Ab	0.40	Abcd	0.00	Ba
6	h²_0.5/dd_1.0	0.58	Aa	0.53	Aad	0.53	Aa	0.53	Aab	0.00	Ba
7	h²_0.75/dd_0.0	0.18	Ac	0.18	Ad	0.21	Ad	0.28	Ade	0.00	Ba
8	h²_0.75/dd_0.5	0.42	Ab	0.41	Abc	0.4	Ab	0.38	Acd	0.00	Ba
9	h²_0.75/dd_1.0	0.53	Aab	0.51	Aabc	0.51	Aab	0.51	Aabc	0.00	Ba

Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h²: heritability; dd: degree of dominance. A) 40 markers associated with the traits; B) 442 markers (LG9 and LG10 - including 40 associated markers); C) 1608 markers (LG1 to LG8 - including 40 associated markers); D) 2010 total markers (LG1 to LG10); E) 402 markers (LG 9 and LG 10).

Dominance was formulated by Mendel as one of the first concepts of genetics (Wilkie 1994). In quantitative genetics, dominance is measured by the relative position of the heterozygote in relation to the average of homozygotes (Falconer and Mackay 1996). The effect due to dominance is determined when the effects of the alleles are not totally additive, that is, there is the interaction in which the genetic value of the heterozygote differs from the average of the genetic values of the homozygotes. As it is partially inheritable, additive effects are traditional of greater interest in studies, but the understanding of effects due to dominance has been increasing and demonstrating its influence on the phenotype (Muñoz et al. 2014; Nishio et al. 2014; Varona et al. 2018; Lyra et al. 2019; Amadeu et al. 2020) which leads to more accurate analyzes in genomic selection. Thus, the contribution of dominance in the inheritance of quantitative traits of interest is especially important for perennial species and clones, with the possibility of perpetuating all genotypic value, and also for annual species where there is commercial interest in hybrids and exploitation of heterosis (de Almeida Filho et al. 2016).

In addition to the effects of dominance, the applicability of genomic selection can be influenced by many factors, such as heritability, marker density, gene interaction and population structure (Guo et al. 2014; Crosso et al. 2017). Numerous studies show that adding multiple intra and interallelic interactions, even if in small effects, is important to explain as much as possible the heritability of quantitative characters in prediction studies (McKinney and Pajewski 2012). Heritability (h²) consists of the genetic parameter that expresses the genetic variability existing in the population, that is, the intensity with which the phenotype expresses the genotype (Falconer and Mackay 1996). Although studies have shown that the prediction of genomic values has proven its effectiveness for traits of low heritability (Resende et al. 2014), in this work it can be verified that there was a good performance of prediction in situations where high or low explanatory variables were used heritability (0.25, 0.5 and 0.75), as shown in Fig. 1.

In relation to epistasis, this is a genetic interaction that occurs when two or more genes act on the same trait (Mathew et al. 2018), and which has great importance in quantitative traits as suggested by important classical studies of quantitative genetics in plants (Holland 2006). However, the inclusion of epistatic interactions is difficult due to the multiplicative effects between the alleles involved in character determination, which leads to overparameterization of the models, leading to statistical and data processing problems, due to the number of markers most of the time is higher than the number of individuals (Gianola et al. 2006). In most models applied in the context of GWS, selection focuses exclusively on the main effects, but including epistasis can improve prediction efficiency within populations. However, the information to use epistasis in GWS is still very limited, since is a need to have good prior knowledge about the importance of variance due to the main versus epistatic effects, and understanding that role of epistasis is complicated by its own effect (Melchinger et al. 2007). In addition, the population size needed to obtain robust estimates of the effects of SNP is much larger for epistatic effects than for the main effects (Carlborg and Haley 2004).

In relation to study 2, it was observed that the values of selective accuracy (r²), showed a tendency to stabilize when 1608, 200, 160, 120 and 80 markers were used (LG1 to LG8 - including 40 associated markers), or only the 40 markers associated with the traits (Fig. 2). The highest r² value was observed for trait 3 (h²: 0.25 and dd: 1.0), followed by trait 6 (h²: 0.5 and dd: 1.0) and trait 9 (h²: 0.75 and dd: 1.0). In contrast, the lowest value of r² was observed for trait 7 (h²: 0.75 and dd: 0.0), followed by trait 1 (h²: 0.25 and dd: 0.0) and trait 4 (h² = 0.25 and dd: 0.0). This result was similar to that found in study 1, in which it showed the strong influence of dominance in the values of r² (Table 3).

It was observed that there was no significant difference between the means of the selective accuracy values between the scenarios for study 2, that is, there was no difference between the different numbers of molecular markers distributed in GL1 to GL8 - including the 40 associated markers, or only when the 40 markers associated with the characteristics were evaluated (Table 3).

Table 3

Prediction values of selective accuracy (r²) according to the RR-BLUP methodology (study 2) for all evaluated traits, according to the variation in the number of markers
r²		Study 2
Traits		F		G		H		I		J		K
1	h²_0.25/dd_0	0.24	Ac	0.22	Ac	0.22	Ad	0.22	Ad	0.21	Ad	0.22	Ae
2	h²_0.25/dd_0.5	0.41	Ab	0.40	Ab	0.39	Abc	0.39	Abc	0.39	Abc	0.39	Acd
3	h²_0.25/dd_1.0	0.64	Aa	0.61	Aa	0.60	Aa	0.60	Aa	0.60	Aa	0.60	Aa
4	h²_0.5/dd_0.0	0.25	Ac	0.27	Ac	0.26	Acd	0.27	Acd	0.26	Acd	0.26	Ade
5	h²_0.5/dd_0.5	0.41	Ab	0.40	Ab	0.40	Ab	0.40	Abc	0.40	Ab	0.40	Abc
6	h²_0.5/dd_1.0	0.59	Aa	0.56	Aa	0.54	Aa	0.54	Aa	0.54	Aa	0.53	Aab
7	h²_0.75/dd_0.0	0.18	Ac	0.21	Ac	0.21	Ad	0.20	Ad	0.21	Ad	0.21	Ae
8	h²_0.75/dd_0.5	0.42	Ab	0.42	Ab	0.41	Ab	0.40	Ab	0.40	Ab	0.40	Ac
9	h²_0.75/dd_1.0	0.54	Aab	0.53	Aab	0.52	Aab	0.52	Aab	0.52	Aab	0.52	Aabc

Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h²: heritability; dd: degree of dominance. F) 40 markers associated with the traits; G) 80 markers (distributed in LG1 to LG8 - including 40 associated markers); H) 120 markers (distributed in LG1 to LG8 - including 40 associated markers); I) 160 markers (distributed in LG1 to LG8 - including 40 associated markers); J) 200 markers (distributed in LG1 to LG8 - including 40 associated markers); K) 1608 markers (LG1 to LG8 - including 40 associated markers).

Some authors reported that the inclusion of dominance (and epistasis in some cases) was advantageous for breeding programs when compared to the use of models with only additive effects (Nishio et al. 2014; de Almeida Filho et al. 2016; dos Santos et al. 2016; Lyra et al. 2018; Amadeu et al. 2020). A similar result was also observed for simulated populations [37–39]. [36] report that when evaluating GBLUP (Genomic Best Linear Unbiased Prediction) prediction models, the inclusion of dominance effects in the models was an efficient strategy to improve the predictive capacity and the quality of estimation of the variance components. The authors (Amadeu et al. 2020) investigated the effects of dominance in predicting genotypic values for complex characteristics in autotetraploid species using Bayesian structure, and found that the effects of dominance resulted in better prediction values. (Lyra et al. 2018), used the GBLUP model in corn hybrids, and also observed that models with dominance inclusion were important for predicting grain yield. (de Almeida Filho et al. 2016) studied the contribution of dominance to phenotypic prediction in simulated populations and concluded that, as dominance increases, the predictive accuracy of GWS becomes less adequate, since the model used by this methodology does not include the matrix of effects due to the deviation of dominance.

Predictive Accuracy

The predictive accuracy (RMSE) for study 1 is shown in Fig. 3. The lowest value of RMSE was observed for trait 3 (h²: 0.25 and dd: 1.0), followed by trait 6 (h² = 0.5 and dd: 1.0) and trait 9 (h²: 0.75 and dd: 1.0). Otherwise, the highest RMSE value was observed for trait 7 (h²: 0.75 and dd: 0.0), trait 1 (h²: 0.25 and dd: 0.0) and trait 4 (h²: 0.5 and dd = 0.0) in all scenarios.

In Table 4 it was observed that most of the significant differences for the RMSE values occurred in scenario E, that is, for the 402 markers not related to the evaluated traits. As observed for the selective accuracy values, the presence of dominance improved the predictive accuracy, since, with the inclusion of higher levels of dominance, there was a decrease in the value of the studied parameter (RMSE).

Table 4

Prediction values of predictive accuracy (RMSE) according to the RR-BLUP methodology (study 1) for all evaluated traits, according to the variation in the number of markers
	RMSE	Study 1
	Traits	A		B		C		D		E
1	h²_0.25/dd_0	14.20	Ab	14.72	Ab	14.35	Ab	14.49	Ab	16.19	Ab
2	h²_0.25/dd_0.5	9.55	Acd	9.66	Ad	9.56	Ad	9.59	Ade	12.24	Acd
3	h²_0.25/dd_1.0	5.80	Be	6.17	Be	6.12	Be	6.21	Be	9.64	Ad
4	h²_0.5/dd_0.0	14.64	Ab	14.11	Abc	13.79	Abc	13.88	Abc	16.00	Ab
5	h²_0.5/dd_0.5	10.10	Bcd	10.06	Bd	9.92	Bd	9.93	Bd	13.29	Abc
6	h²_0.5/dd_1.0	7.05	Bde	7.69	Bde	7.68	Bde	7.72	Bde	11.24	Acd
7	h²_0.75/dd_0.0	20.16	Aa	19.81	Aa	19.50	Aa	19.61	Aa	21.79	Aa
8	h²_0.75/dd_0.5	10.75	Bc	10.77	Bcd	10.84	Bcd	10.94	Bcd	13.97	Abc
9	h²_0.75/dd_1.0	8.33	Bcde	8.59	Bde	8.42	Bde	8.53	Bde	12.17	Acd

Regarding study 2, the graphical representation of the predictive accuracy values is shown in Fig. 4. Similar to the selective accuracy result, the RMSE values were almost constant when using 1608, 200, 160, 120 and, 80 markers (LG1 a LG8 - including 40 associated markers), or only the 40 markers associated with the traits. Such to the result found in study 1, the lowest RMSE value was observed for traits 3, 6 and 9 and, the highest RMSE values for traits 7, 1 and, 4 (Table 5). These results again demonstrated the effect of dominance on the predicted values.

It was observed in Table 5 that there was no significant difference between the different scenarios, except for some comparisons made for the scenario with the 40 associated markers, which presented some variations. This result reflects that it is possible to use a reduced number of markers in prediction analysis using the RR-BLUP method.

Table 5

Prediction values of predictive accuracy (RMSE) according to the RR-BLUP methodology (study 2) for all evaluated traits, according to the variation in the number of markers
	RMSE	Study 2
	Traits	F		G		H		I		J		K
1	h²_0.25/dd_0	14.20	Abc	14.29b	Ab	14.34	Ab	14.36	Ab	14.38	Ab	14.35	Ab
2	h²_0.25/dd_0.5	9.55	Ade	9.50de	Ade	9.59	Ade	9.56	Ade	9.58	Ade	9.56	Ade
3	h²_0.25/dd_1.0	5.80	Af	6.01e	Ae	6.10	Ae	6.08	Ae	6.10	Ae	6.12	Ae
4	h²_0.5/dd_0.0	14.64	Ab	13.72bc	Abc	13.79	Abc	13.76	Abc	13.79	Abc	13.79	Abc
5	h²_0.5/dd_0.5	10.10	Ade	9.93d	Ad	9.97	Ad	9.97	Ad	9.93	Ad	9.92	Ad
6	h²_0.5/dd_1.0	7.05	Aef	7.52de	Ade	7.63	Ade	7.63	Ade	7.66	Ade	7.68	Ade
7	h²_0.75/dd_0.0	20.16	Aa	19.48a	Aa	19.52	Aa	19.47	Aa	19.53	Aa	19.50	Aa
8	h²_0.75/dd_0.5	10.75	Acd	10.66cd	Acd	10.81	Acd	10.78	Acd	10.83	Acd	10.84	Acd
9	h²_0.75/dd_1.0	8.33	Adef	8.36de	Ade	8.42	Ade	8.39	Ade	8.43	Ade	8.42	Ade

Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h²: heritability; dd: degree of dominance. F) 40 markers associated with the traits; G) 80 markers (distributed in LG1 to LG8 - including 40 associated markers); H) 120 markers (distributed in LG1 to LG8 - including 40 associated markers); I) 160 markers (distributed in LG1 to LG8 - including 40 associated markers); J) 200 markers (distributed in LG1 to LG8 - including 40 associated markers); K) 1608 markers (LG1 to LG8 - including 40 associated markers).

Akaike Information Criterion (AIC)

For study 1, it was observed that scenarios C, D, E had the lowest AIC values (Fig. 5). When evaluating within each scenario, it was observed that traits 7, 1 and 4, presented the highest AIC values, that is, they are the absence of dominance traits. In contrast, traits 3, 6, and 9 were the lowest AIC. Thus, it was demonstrated that dominance had a great influence on the values of AIC.

Analyzing each of the scenarios in study 1 separately, it was observed that there was a significant difference between almost all scenarios, except only between scenarios B and E, for trait 1 and between scenarios C and E for traits 4 and 7 that did not differ significantly from each other (Table 6).

Table 6

Akaike Information Criterion (AIC) according to the RR-BLUP methodology (study 1) for all evaluated traits, according to the variation in the number of markers
	AIC	Study 1
	Trait	A		B		C		D		E
1	h²_0.25/dd_0	6797.47	Db	7650.95	Cb	9934.34	Bf	10740.93	Ab	7652.62	Cb
2	h²_0.25/dd_0.5	6046.56	Ee	6942.64	De	9206.18	Bf	10021.49	Af	7107.56	Cd
3	h²_0.25/dd_1.0	5332.11	Eh	6257.95	Dh	8513.22	Bi	9340.33	Ai	6722.78	Cf
4	h²_0.5/dd_0.0	6634.52	Dc	7514.04	Cc	9797.10	Bc	10606.84	Ac	7553.03	Cb
5	h²_0.5/dd_0.5	6161.44	Ed	7051.25	Dd	9318.15	Be	10131.54	Ae	7207.36	Cd
6	h²_0.5/dd_1.0	5669.75	Eg	6566.46	Dg	8815.96	Bh	9639.14	Ah	6951.68	Ce
7	h²_0.75/dd_0.0	7138.29	Da	8015.07	Ca	10298.22	Ba	11111.15	Aa	8018.42	Ca
8	h²_0.75/dd_0.5	6248.43	Ed	7145.87	Dd	9426.63	Bd	10241.13	Ad	7341.25	Cc
9	h²_0.75/dd_1.0	5849.95	Ef	6771.07	Df	9036.62	Bg	9857.45	Ag	7106.99	Cd

In relation to study 2, the AIC result indicated that the model with the lowest number of markers has a better prediction than the model that includes 1608 markers (Fig. 6). Lower AIC values indicate a preferable model, in addition, the AIC is related to the cross-validation criteria (McQuarrie and Tsai 1998; Piepho and Gauch 2001). As observed for study 1, it was observed that traits 3, 6 and 9 had the lowest AIC values. And that traits 7, 1 and 4 presented the highest AIC values, demonstrating once again the importance of the dominance effects. (Dias et al. 2018) when evaluating various GWS models as to their effectiveness in predicting traits of wheat genotypes, demonstrated that the inclusion of dominance effects in the models, in general, improved the AIC estimates. Dias et al. (2018) when evaluating the accuracy of GWS to predict the performance of simple maize hybrids, in multi-environment tests for drought tolerance, also concluded that the presence of dominance in the tested models was able to better the prediction.

According to Table 7, it was observed that scenario F showed a significant difference between all other scenarios. Otherwise, the high reduction in the value of r² in scenario E, can be explained due to the inexistence of markers associate to the evaluated traits (Table 2), scenarios G, H and I did not show significant differences between them, and scenarios J and K did not show significant differences for traits 1, 2 and 6.

Table 7

Akaike Information Criterion (AIC) according to the RR-BLUP methodology for study 2 for all evaluated traits, according to the variation in the number of markers
AIC		Study 2
	Trait	F		G		H		I		J		K
1	h²_0.25/dd_0	6797.47	Cb	6880.51	Cb	7090.53	Bb	7041.88	Bb	7122.18	Bb	9934.34	Ab
2	h²_0.25/dd_0.5	6046.56	Ce	6141.46	Ce	6361.13	Bf	6312.54	Be	6394.72	Bf	9206.18	Af
3	h²_0.25/dd_1.0	5332.11	Dh	5440.16	Ch	5660.04	Bi	5611.21	Bh	5693.21	Bi	8513.22	Ai
4	h²_0.5/dd_0.0	6634.52	Dc	6737.83	Cc	6949.35	Bc	6901.03	Bc	6982.64	Bc	9797.10	Ac
5	h²_0.5/dd_0.5	6161.44	Dd	6261.77	Cd	6472.43	Be	6428.50	Bd	6505.06	Be	9318.15	Ae
6	h²_0.5/dd_1.0	5669.75	Cg	5744.68	Cg	5968.53	Bh	5916.15	Bg	6001.22	Bh	8815.96	Ah
7	h²_0.75/dd_0.0	7138.29	Da	7240.35	Ca	7452.73	Ba	7405.23	Ba	7484.59	Ba	10298.22	Aa
8	h²_0.75/dd_0.5	6248.43	Dd	6357.85	Cd	6582.14	Bd	6531.85	Bd	6614.86	Bd	9426.63	Ad
9	h²_0.75/dd_1.0	5849.95	Df	5964.37	Cf	6180.08	Bg	6128.78	Bf	6213.93	Bg	9036.62	Ag

Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h²: heritability; dd: degree of dominance. F) 40 markers associated with the traits; G) 80 markers (distributed in LG1 to LG8 - including 40 associated markers); H) 120 markers (distributed in LG1 to LG8 - including 40 associated markers); I) 160 markers (distributed in LG1 to LG8 - including 40 associated markers); J) 200 markers (distributed in LG1 to LG8 - including 40 associated markers); K) 1608 markers (LG1 to LG8 - including 40 associated markers).

The incorporation of markers in all the genome represents an advance in the context of superior genotype selection and has made genomic selection superior in many respects in relation to assisted selection (Dekkers and Hospital 2002). Many authors (Long et al. 2007; Habier et al. 2009; Usai et al. 2009; Macciotta et al. 2009; Weigel et al. 2010) have realized that using a SNP's subset, carefully chosen for the character of interest, can result in high reliability in the prediction of genomic values. Although we chose to reduce the number of markers at random, there are some proposed methodologies in the literature so that the problems generated by the matrix's dimensionality are solved (Azevedo et al. 2013; James et al. 2013; Azevedo et al. 2014). The methods of Penalized Regression (RR-BLUP and G-BLUP) and Bayesian (Bayes A and B and Lasso) have been the most used, however, there are other methods that have been applied to GWS, such as Partial Least Squares (PLS) and Principal Components Regression (PCR) (Resende et al. 2010; Azevedo et al. 2015), but all of them are computationally demanding. (James et al. 2013) proposed three methodological lines of dimensionality reduction, based on the selection of sub-samples of variables, on penalty methods and on reduction methods themselves via non-correlated linear combinations, and highlighted that high dimensionalities can result in problems of overfitting, multicollinearity and high bias of variance. (Weigel et al. 2009) observed that when using a set of 300 SNP´s with higher estimated effects out of the total of 32518 SNPs, it was possible to obtain half of the prediction accuracy than the complete model using the Lasso Bayesian methodology. Usai et al. (2009), in their simulations, reported higher accuracy when the prediction was made using 169 markers selected from 6000 SNP´s using Lasso.

Some authors have investigated the number of SNP´s markers applied in GWS and divergent results were found on the use of higher or lower density of SNP´s sets (Habier et al. 2009; Weigel et al. 2009; Moser et al. 2010; Crossa et al. 2013; Perez-Rodriguez et al. 2013; Tayeh et al. 2015; Sousa et al. 2019). The higher density of genotyping does not always offer the best accuracy and subsets of markers can sometimes outperform the data set (Zhang et al. 2010; Ma et al. 2016). Sousa et al. (2019) concluded that genotyping costs can be reduced without decreasing the accuracy of GWS through the application of methodologies for obtaining subsets of markers and demonstrated that it is possible to select the most informative markers and create a low-cost SNP chip to implement in genomic selection in breeding programs. Thus, the use of subsets of markers selected based on their effects or positions provides an efficient strategy to improve accuracy (Vazquez et al. 2010; Resende Jr et al. 2012; Zhang et al. 2015). Tayeh et al. (2015) decreased the number of markers from 9824 to 2945, maintaining a single marker per position exclusive to the map. These authors did not observe a reduction in the accuracy of precision in any of the evaluated traits. Ma et al. (2016), concluded that the pre-selection of markers based on the analysis of haplotype blocks is an interesting option to reduce costs with the implementation of genomic selection.

In this study, the reduction in dimensionality proved to be effective, since the values of the r², RMSE, and AIC statistics, associated with models with a reduced number of markers, proved to be efficient, indicating an improvement in the quality and accuracy of prediction. Thus, it was possible to verify that the presence of non-informative or small effect markers can be removed from the data set and that the prediction of the genetic value with reduced information preserved the same biological conclusions when using a larger data set.

The results obtained from the reduced information predicted by the RR-BLUP were able to improve the prediction and preserve the same biological conclusions when using a larger data set.

Non-informational or small effect markers can be removed from the original data set.

The inclusion of dominance effects was an efficient strategy to improve predictive capacity.

Funding

This study was funded by CAPES (Coordination of Superior Level Staff Improvement) and CNPq (National Science and Technology Development Council) for financial support.

Conflicts of interest

The authors declare that there is no conﬂict of interest.

Availability of data and material

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request. If the article is accepted, we can add the data in the repository.

Authors' contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Michele Jorge da Silva. The first draft of the manuscript was written by Michele Jorge da Silva and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Compliance with Ethical Standards

This manuscript is not under consideration for publication elsewhere and has not been made publicly available online. All authors have approved your submission and all copyrighted individuals are listed.

Alkimim ER, Caixeta ET, Sousa TV, Resende MDV, da Silva FL, Sakiyama NS, Zambolim L. Selective efficiency of genome wide selection in Coffea canephora breeding. Tree Genetics and Genomes. 16(3). https://doi.org/10.1007/s11295-020-01433-3.
Amadeu R, Ferrao F, de Bem Oliveira I, Benevenuto J, Endelman J, Muñoz P. Impact of dominance effects on autotetraploid genomic prediction. Crop Science. 2020; 60. https://doi.org/10.1002/csc2.20075.
Azevedo CF, de Resende MDV, Fonseca F, Lopes OS, Guimarães SEF. Regressão via componentes independentes aplicada à seleção genômica para características de carcaça em suínos. Pesquisa Agropecuária Brasileira. 2013; 48(6): 619-626.
Azevedo CF, Silva FF, Resende MD, Lopes MS, Duijvesteijn N, Guimarães SEF, Lopes PS, Kelly MJ, Viana JMS, Knol EF. Supervised independent component analysis as an alternative method for genomic selection in pigs. Journal of Animal Breeding and Genetics. 2014; 131(6): 452-461.
Azevedo CF, Nascimento M, Silva FF, Resende MDV, Lopes PS, Guimarães SEF, Glória LS. Comparison of dimensionality reduction methods to predict genomic breeding values for carcass traits in pigs. Genet Mol Res. 2015; 14:12217-12227.
Bajgain P, Zhang X, Anderson JA. Dominance and G×E interaction effects improve genomic prediction and genetic gain in intermediate wheatgrass (Thinopyrum intermedium). Plant Genome; 2020; 1–13.
Bengio Y, Grandvalet Y. No Unbiased Estimator of the Variance of K-Fold Cross-Validation. J Mach Learn Res. 2004; 5:1089-1105.
Carlborg O and Haley CS. Epistasis: too often neglected in complex trait studies? Nature Rev. Genet. 2004; 5, 618-625.
Costa JAD, Azevedo CF, Nascimento M, Resende MDV, Nascimento ACC. Genomic prediction with the additive-dominant model by dimensionality reduction methods. Pesquisa Agropecuária Brasileira. 2020; 55: e01713.
Crossa J, Pérez P, Hickey J et al. Genomic prediction in CIMMYT maize and wheat breeding programs. Heredity (Edinb) 2013; 112:48-60. https://doi.org/10.1038/hdy.2013.16.
Crossa J, Pérez-Rodríguez P, Cuevas J et al. Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci. 2017; 22:961-975.
Cruz CD. Genes Software – extended and integrated with the R, Matlab and Selegen. Acta Scientiarum. Agronomy. 2016; 38:547-552.
de Almeida Filho JE, Guimarães JFR, Silva FF, Resende MDV, Muñoz P, Kirst M, Resende Jr MFR. The contribution of dominance to phenotype prediction in a pine breeding and simulated population. Heredity. 2016; 117:33-41. https://doi.org/10.1038/hdy.2016.23.
de Almeida Filho, JE, Guimarães JFR, Fonsceca e Silva F, Vilela de Resende MD, Muñoz P, Kirst M., Resende Jr MFR. 2019. Genomic prediction of additive and non-additive effects using genetic markers and pedigrees. G3. 2019:2739-2748. https://doi.org/10.1534/g3.119.201004
Dekkers JCM and Hospital F. The use of molecular genetics in the improvement of
agricultural populations. Nature Reviews Genetics. 2002; 322-32.
Denis M and Bouvet JM. Efficiency of genomic selection with models including dominance effect in the context of Eucalyptus breeding. Tree Genet Genomes. 2012; 9: 37-51.
Dias KO das G, Gezan SA, Guimarães CT, Nazarian A, da Costa e Silva L, Parentoni SN, et al. Improving accuracies of genomic predictions for drought tolerance in maize by joint modeling of additive and dominance effects in multi‐environment trials. Heredity. 2018;121, 24–37. https://doi.org/ 10.1038/s41437-018-0053-6.
dos Santos JPR, Vasconcellos RCC, Pires LPM, Balestre M, Von Pinho RG (2016) Inclusion of dominance efects in the multivariate gblup model. PLoS One 11(4):1–21. https://doi.org/10.1371/ journal.pone.0152045.
Falconer DS and Mackay TFC. Introduction to quantitative genetics. 4.ed. Edinburgh: Longman Group Limited, 1996. 464p.
Gianola D, Fernando RL, Stella A. Genomic assisted prediction of genetic value with semiparametric procedures. Genetics. 2006; 173:1761-1776. https://doi.org/10.1534/genetics.105.049510.
Guo Z, Tucker DM, Basten CJ et al. The impact of population structure on genomic prediction in stratified populations. Theor Appl Genet. 2014; 127:749-762.
Habier D, Fernando RL, Dekkers JCM. Genomic selection using low-density marker panels. Genetics. 2009; 182:343-353. https://doi.org/10.1534/genetics.108. 100289.
Holland JB. Estimating genotypic correlations and their standard errors using multivariate restricted maximum likelihood estimation with SAS Proc MIXED. Crop Sci. 2006; 46:642-654. https://doi.org/10.2135/cropsci2005.0191
Islam MS, Fang DD, Jenkins JN, Guo J, McCarty JC, Jones DC. Evaluation of genomic selection methods for predicting fiber quality traits in Upland cotton. Mol Genet Genom. 2020; 295:67-79.
James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning (Vol. 112). 2013; New York: Springer.
Jannink JL, Lorenz AJ, Iwata H. Genomic selection in plant breeding: from theory to practice. Brief Funct Genomics. 2010; 9:166-177. https://doi.org/10.1093/bfgp/elq001.
Kullback, S and Leibler RA. On information and sufficiency. Ann. Math. Statist. 1951; 22: 79-86.
Lima LP, Azevedo CF, Resende MDV de, Silva FF, Viana JMS, de Oliveira, EJ. Triple categorical regression for genomic selection: application to cassava breeding. Scientia Agricola. 2019:76:368-375. https://doi.org/10.1590/1678-992x-2017-0369.
Long N, Gianola D, Rosa GJ, Weigel KA, Avendano S. Machine learning classification procedure for selecting SNPs in genomic selection: application to early mortality in broilers. Journal of animal breeding and genetics. 2007; 124(6): 377-389.
Lyra DH, Galli G, Alves FC, ÍSC G, Vidotti MS, Bandeira e Sousa M. Modeling copy number variation in the genomic prediction of maize hybrids. Theor Appl Genet. 2019; 132(1):273. doi: 10.1007/ s00122-018-3215-2.
Lyra DH, Granato, ISC, Morais PPP, Alves FC, dos Santos ARM, Yu X, Guo T, Yu J, Fritsche-Neto R. Controlling population structure in the genomic prediction of tropical maize hybrids. Mol. Breeding. 2018; 38, 126.
Ma Y, Reif JC, Jiang Y et al. Potential of marker selection to increase prediction accuracy of genomic selection in soybean (Glycine max L.). Mol Breed. 2016; 36:1-10. doi: 10.1007/s11032-016-0504-9.
Macciotta NPP, Gaspa G, Steri R, Pieramati C, Carnier P, Dimauro C. Preselection of most significant SNPS for the estimation of genomic breeding values. BMC Proc. 2009, 3.
Martini JWR, Gao N, Cardoso DF, Wimmer V, Erbe M, Cantet RJC, Simianer H. Genomic prediction with epistasis models: on the marker-coding-dependent performance of the extended GBLUP and properties of the categorical epistasis model (CE). BMC Bioinform. 2017; 18:3. https://doi.org/10.1186/s12859-016-1439-1.
Mathew B, Léon J, Sannemann W, Sillanpää MJ. Detection of epistasis for flowering time using bayesian multilocus estimation in a Barley MAGIC population. Genetics. 2018; 208, 525-536.
McKinney BA and Pajewski NM. Six degrees of epistasis: Statistical network models for GWAS. Front. Genet. 2012; 2, 109.
McQuarrie ADR and Tsai CL. Regression and time series model selection. World Scientific, 1998. Singapore.
Melchinger AE, Utz HF, Piepho HP, Zeng ZB, Schon CC. The role of epistasis in the manifestation of heterosis: a systems-oriented approach. Genetics. 2007; 177: 1815-1825.
Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome wide dense marker maps. Genetics. 2001;157:1819-1829.
Moser G, Khatkar MS, Hayes BJ, Raadsma HW. Accuracy of direct genomic values in Holstein bulls and cows using subsets of SNP markers. Genet Sel Evol. 2010; 42:37. doi: 10.1186/1297-9686-42-37.
Muñoz PR, Resende MFR, Gezan SA, Resende MDV, de los Campos G, Kirst M. Unraveling additive from nonadditive effects using genomic relationship matrices. Genetics 2014; 198: 1759-1768
Nishio M and Satoh M. Including dominance efects in the genomic BLUP method for genomic evaluation. PLoS ONE. 2014; 9:e85792.
Perez-Rodriguez P, Gianola D, Gonzalez-Camacho JM et al. Comparison between linear and non-parametric regression models for genome-enabled prediction in wheat. G3 Genes Genomes Genet. 2013; 2:1595-1605. https://doi.org/10.1534/g3.112.003665.
Piepho,HP and Gauch HG. Marker pair selection for QTL detection. Genetics. 2001; 157:433-444.
R Core Team. R: A language and environment for statistical computing. 2019; 3. Available: https://www.r-project.org/.
Resende Jr. MFR, Muñoz P, Resende MDV, Garrick DJ, Fernando RL, Davis JM, Jokela EJ, Martin TA, Peter GF and Kirst M. Accuracy of Genomic Selection Methods in a Standard Data Set of Loblolly Pine (Pinus taeda L.). Genetics; 2012; 190:1503-1510.
Resende MDV, Lopes OS, Silva RL, Pires IE. Seleção genômica ampla (GWS) e maximização da eficiência do melhoramento genético. Pesquisa Florestal Brasileira. Colombo. 2008; 63-77.
Resende MDV, Silva FF, Azevedo CF. Estatística matemática, biométrica e computacional: Modelos Mistos, Multivariados, Categóricos e Generalizados (REML/BLUP), Inferência Bayesiana, Regressão Aleatória, Seleção Genômica, QTL-GWAS, Estatística Espacial e Temporal, Competição Sobrevivência. Viçosa: Suprema, 2014. 881p.
Resende MDV. Matemática e estatística na análise de experimentos e no melhoramento genético. Colombo: Embrapa Florestas. 2007; 561p.
Resende, MDV, Aguiar AM, Abad JIM, Missiaggia AA, Sansaloni C, Petroli C,
Grattapaglia D, Resende Júnior MFR. Computação da Seleção Genômica
Ampla (GWS). 2010. Colombo: Embrapa Florestas, p.79.
Schaeffer LR. Strategy for applying genome-wide selection in dairy cattle. Journal of animal breeding and genetics. 2006; 123:218-223.
Sousa MBE, Galli G, Lyra DH, Granato ISC, Matias FI et al. Increasing accuracy and reducing costs of genomic prediction by marker selection. Euphytica. 2019; 215: 18. https://doi.org/10.1007/ s10681-019-2339-z.
Sousa MBE, Galli G, Lyra DH, Granato ISC, Matias, FI, Alves FC, Fritsche-Netto R. Increasing accuracy and reducing costs of genomic prediction by marker selection. Euphytica. 2019: 215: 18. https://doi.org/10.1007/ s10681-019-2339-z.
Sousa TV, Caixeta ET, Alkimim ER et al. Early selection enabled by the implementation of genomic selection in Coffea arabica breeding. Front Plant Sci. 2019; 9. https://doi.org/10.3389/fpls.2018.01934.
Tayeh N, Klein A, Le Paslier MC et al. Genomic prediction in pea: effect of marker density and training population size and composition on prediction accuracy. Front Plant Sci. 2015; 6:1–11. https://doi.org/10.3389/fpls.2015.00941
Toro MA and Varona L. A note on mate allocation for dominance handling in genomic selection. Genet Sel Evol. 2010; 42: 33.
Usai MG, Goddard ME, Hayes BJ. LASSO with cross-validation for genomic selection. Genetics Research. 2009; 91, 427-436.
Varona L, Legarra A, Toro MA, Vitezica ZG. Non-additive effects in genomic selection. Front. Genet. 2018; 9:78. https://doi.org/10.3389/fgene.2018.00078.
Vazquez AI, Rosa GJM, Weigel KA et al. Predictive ability of subsets of single nucleotide polymorphisms with and without parent average in US Holsteins. J Dairy Sci. 2010; 93:5942-5949.
Weigel K, de los Campos G, Vazquez A, Rosa G, Gianola D, Tassell CV.
Accuracy of direct genomic values derived from imputed single nucleotidepolymorphism genotypes in Jersey cattle. Journal of Dairy Science. 2010; 93: 5423-5435.
Weigel KA, de los Campos G, Gonza´lez-Recio O et al. Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J Dairy Sci. 2009; 92:5248-5257.
Wilkie AOM. The molecular basis of genetic dominance. J Med Genet. 1994; 31: 89-98.
Zeng J, Toosi A, Fernando RL, Dekkers JCM, Garrick DJ. Genomic selection of purebred animals for crossbred performance in the presence of dominant gene action. Genet Sel Evol. 2013; 45: 11.
Zhang X., Pérez-Rodríguez P., Semagn K., Beyene Y., Babu R., López-Cruz M. A., et al.. Genomic prediction in biparental tropical maize populations in water-stressed and well-watered environments using low-density and GBS SNPs. Heredity. 2015; 114 291-299. https://doi.org/10.1038/hdy.2014.99.
Zhang Z, Liu J, Ding X et al. Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix. PLoS ONE. 2010; 5:1-8. https://doi.org/10.1371/journal.pone.0012648.

Download PDF

Version 1

posted

You are reading this latest preprint version

Prediction of genetic values according to the dimensionality reduction of SNP's markers in complex models

Status:

Version 1

Abstract

Figures

Introduction

Material And Methods

Genome And Simulated Populations

Simulation Of Traits

Dimensionality Reduction

Prediction Rr-blup

Statistical Test

Computational Analysis

Results And Discussion

Selective Accuracy

Predictive Accuracy

Akaike Information Criterion (AIC)

Conclusions

Declarations

References

Status:

Version 1