Selective Accuracy
Figure 1 shows the variation in the values of selective accuracy (r²) obtained for study 1, according to the degree of dominance and heritability. The similarity was observed between the values of r² for almost all scenarios (A-D), with the exception of scenario E, which has only the markers belonging to LG 9 and LG 10 (Figure 1).
For the scenarios A, B, C and D, the highest value of r² was for trait 3 (h2: 0.25 and dd: 1.0) followed by trait 6 (h2: 0.5 and dd:1.0) and trait 9 (h2: 0.75 and dd: 1.0) (Table 2). In contrast, the lowest value of r² was for trait 1 (h2:0.5 and dd: 0.0), followed by trait 4 (h2: 0.5 and dd: 0.0) and trait 7 (h2: 0.75 and dd: 0.0) (Table 2). Thus, it was observed that the presence of dominance improved the selective accuracy, because, with the inclusion of higher degrees of dominance, there was an increase in the value of the studied parameter (r²). Otherwise, the high reduction in the value of r² in scenario E, can be explained due to the inexistence of markers associate with the evaluated traits (Table 2).
Table 2
Prediction values of selective accuracy (r²) according to the RR-BLUP methodology (study 1) for all evaluated traits, according to the variation in the number of markers
|
r²
|
Study 1
|
Traits
|
A
|
B
|
C
|
D
|
E
|
1
|
h2_0.25/dd_0
|
0.24
|
Ac
|
0.17
|
Ad
|
0.21
|
Ad
|
0.20
|
Ae
|
0.00
|
Ba
|
2
|
h2_0.25/dd_0.5
|
0.41
|
Ab
|
0.38
|
Ac
|
0.39
|
Abc
|
0.39
|
Acd
|
0.00
|
Ba
|
3
|
h2_0.25/dd_1.0
|
0.64
|
Aa
|
0.59
|
Aa
|
0.59
|
Aa
|
0.58
|
Aa
|
0.00
|
Ba
|
4
|
h2_0.5/dd_0.0
|
0.25
|
Ac
|
0.22
|
Ad
|
0.26
|
Acd
|
0.25
|
Ae
|
0.00
|
Ba
|
5
|
h2_0.5/dd_0.5
|
0.40
|
Ab
|
0.38
|
Ac
|
0.40
|
Ab
|
0.40
|
Abcd
|
0.00
|
Ba
|
6
|
h2_0.5/dd_1.0
|
0.58
|
Aa
|
0.53
|
Aad
|
0.53
|
Aa
|
0.53
|
Aab
|
0.00
|
Ba
|
7
|
h2_0.75/dd_0.0
|
0.18
|
Ac
|
0.18
|
Ad
|
0.21
|
Ad
|
0.28
|
Ade
|
0.00
|
Ba
|
8
|
h2_0.75/dd_0.5
|
0.42
|
Ab
|
0.41
|
Abc
|
0.4
|
Ab
|
0.38
|
Acd
|
0.00
|
Ba
|
9
|
h2_0.75/dd_1.0
|
0.53
|
Aab
|
0.51
|
Aabc
|
0.51
|
Aab
|
0.51
|
Aabc
|
0.00
|
Ba
|
Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h2: heritability; dd: degree of dominance. A) 40 markers associated with the traits; B) 442 markers (LG9 and LG10 - including 40 associated markers); C) 1608 markers (LG1 to LG8 - including 40 associated markers); D) 2010 total markers (LG1 to LG10); E) 402 markers (LG 9 and LG 10).
Dominance was formulated by Mendel as one of the first concepts of genetics (Wilkie 1994). In quantitative genetics, dominance is measured by the relative position of the heterozygote in relation to the average of homozygotes (Falconer and Mackay 1996). The effect due to dominance is determined when the effects of the alleles are not totally additive, that is, there is the interaction in which the genetic value of the heterozygote differs from the average of the genetic values of the homozygotes. As it is partially inheritable, additive effects are traditional of greater interest in studies, but the understanding of effects due to dominance has been increasing and demonstrating its influence on the phenotype (Muñoz et al. 2014; Nishio et al. 2014; Varona et al. 2018; Lyra et al. 2019; Amadeu et al. 2020) which leads to more accurate analyzes in genomic selection. Thus, the contribution of dominance in the inheritance of quantitative traits of interest is especially important for perennial species and clones, with the possibility of perpetuating all genotypic value, and also for annual species where there is commercial interest in hybrids and exploitation of heterosis (de Almeida Filho et al. 2016).
In addition to the effects of dominance, the applicability of genomic selection can be influenced by many factors, such as heritability, marker density, gene interaction and population structure (Guo et al. 2014; Crosso et al. 2017). Numerous studies show that adding multiple intra and interallelic interactions, even if in small effects, is important to explain as much as possible the heritability of quantitative characters in prediction studies (McKinney and Pajewski 2012). Heritability (h2) consists of the genetic parameter that expresses the genetic variability existing in the population, that is, the intensity with which the phenotype expresses the genotype (Falconer and Mackay 1996). Although studies have shown that the prediction of genomic values has proven its effectiveness for traits of low heritability (Resende et al. 2014), in this work it can be verified that there was a good performance of prediction in situations where high or low explanatory variables were used heritability (0.25, 0.5 and 0.75), as shown in Fig. 1.
In relation to epistasis, this is a genetic interaction that occurs when two or more genes act on the same trait (Mathew et al. 2018), and which has great importance in quantitative traits as suggested by important classical studies of quantitative genetics in plants (Holland 2006). However, the inclusion of epistatic interactions is difficult due to the multiplicative effects between the alleles involved in character determination, which leads to overparameterization of the models, leading to statistical and data processing problems, due to the number of markers most of the time is higher than the number of individuals (Gianola et al. 2006). In most models applied in the context of GWS, selection focuses exclusively on the main effects, but including epistasis can improve prediction efficiency within populations. However, the information to use epistasis in GWS is still very limited, since is a need to have good prior knowledge about the importance of variance due to the main versus epistatic effects, and understanding that role of epistasis is complicated by its own effect (Melchinger et al. 2007). In addition, the population size needed to obtain robust estimates of the effects of SNP is much larger for epistatic effects than for the main effects (Carlborg and Haley 2004).
In relation to study 2, it was observed that the values of selective accuracy (r²), showed a tendency to stabilize when 1608, 200, 160, 120 and 80 markers were used (LG1 to LG8 - including 40 associated markers), or only the 40 markers associated with the traits (Fig. 2). The highest r2 value was observed for trait 3 (h2: 0.25 and dd: 1.0), followed by trait 6 (h2: 0.5 and dd: 1.0) and trait 9 (h2: 0.75 and dd: 1.0). In contrast, the lowest value of r2 was observed for trait 7 (h2: 0.75 and dd: 0.0), followed by trait 1 (h2: 0.25 and dd: 0.0) and trait 4 (h2 = 0.25 and dd: 0.0). This result was similar to that found in study 1, in which it showed the strong influence of dominance in the values of r2 (Table 3).
It was observed that there was no significant difference between the means of the selective accuracy values between the scenarios for study 2, that is, there was no difference between the different numbers of molecular markers distributed in GL1 to GL8 - including the 40 associated markers, or only when the 40 markers associated with the characteristics were evaluated (Table 3).
Table 3
Prediction values of selective accuracy (r²) according to the RR-BLUP methodology (study 2) for all evaluated traits, according to the variation in the number of markers
r2
|
Study 2
|
Traits
|
F
|
G
|
H
|
I
|
J
|
K
|
1
|
h2_0.25/dd_0
|
0.24
|
Ac
|
0.22
|
Ac
|
0.22
|
Ad
|
0.22
|
Ad
|
0.21
|
Ad
|
0.22
|
Ae
|
2
|
h2_0.25/dd_0.5
|
0.41
|
Ab
|
0.40
|
Ab
|
0.39
|
Abc
|
0.39
|
Abc
|
0.39
|
Abc
|
0.39
|
Acd
|
3
|
h2_0.25/dd_1.0
|
0.64
|
Aa
|
0.61
|
Aa
|
0.60
|
Aa
|
0.60
|
Aa
|
0.60
|
Aa
|
0.60
|
Aa
|
4
|
h2_0.5/dd_0.0
|
0.25
|
Ac
|
0.27
|
Ac
|
0.26
|
Acd
|
0.27
|
Acd
|
0.26
|
Acd
|
0.26
|
Ade
|
5
|
h2_0.5/dd_0.5
|
0.41
|
Ab
|
0.40
|
Ab
|
0.40
|
Ab
|
0.40
|
Abc
|
0.40
|
Ab
|
0.40
|
Abc
|
6
|
h2_0.5/dd_1.0
|
0.59
|
Aa
|
0.56
|
Aa
|
0.54
|
Aa
|
0.54
|
Aa
|
0.54
|
Aa
|
0.53
|
Aab
|
7
|
h2_0.75/dd_0.0
|
0.18
|
Ac
|
0.21
|
Ac
|
0.21
|
Ad
|
0.20
|
Ad
|
0.21
|
Ad
|
0.21
|
Ae
|
8
|
h2_0.75/dd_0.5
|
0.42
|
Ab
|
0.42
|
Ab
|
0.41
|
Ab
|
0.40
|
Ab
|
0.40
|
Ab
|
0.40
|
Ac
|
9
|
h2_0.75/dd_1.0
|
0.54
|
Aab
|
0.53
|
Aab
|
0.52
|
Aab
|
0.52
|
Aab
|
0.52
|
Aab
|
0.52
|
Aabc
|
Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h2: heritability; dd: degree of dominance. F) 40 markers associated with the traits; G) 80 markers (distributed in LG1 to LG8 - including 40 associated markers); H) 120 markers (distributed in LG1 to LG8 - including 40 associated markers); I) 160 markers (distributed in LG1 to LG8 - including 40 associated markers); J) 200 markers (distributed in LG1 to LG8 - including 40 associated markers); K) 1608 markers (LG1 to LG8 - including 40 associated markers).
Some authors reported that the inclusion of dominance (and epistasis in some cases) was advantageous for breeding programs when compared to the use of models with only additive effects (Nishio et al. 2014; de Almeida Filho et al. 2016; dos Santos et al. 2016; Lyra et al. 2018; Amadeu et al. 2020). A similar result was also observed for simulated populations [37–39]. [36] report that when evaluating GBLUP (Genomic Best Linear Unbiased Prediction) prediction models, the inclusion of dominance effects in the models was an efficient strategy to improve the predictive capacity and the quality of estimation of the variance components. The authors (Amadeu et al. 2020) investigated the effects of dominance in predicting genotypic values for complex characteristics in autotetraploid species using Bayesian structure, and found that the effects of dominance resulted in better prediction values. (Lyra et al. 2018), used the GBLUP model in corn hybrids, and also observed that models with dominance inclusion were important for predicting grain yield. (de Almeida Filho et al. 2016) studied the contribution of dominance to phenotypic prediction in simulated populations and concluded that, as dominance increases, the predictive accuracy of GWS becomes less adequate, since the model used by this methodology does not include the matrix of effects due to the deviation of dominance.
Predictive Accuracy
The predictive accuracy (RMSE) for study 1 is shown in Fig. 3. The lowest value of RMSE was observed for trait 3 (h2: 0.25 and dd: 1.0), followed by trait 6 (h2 = 0.5 and dd: 1.0) and trait 9 (h2: 0.75 and dd: 1.0). Otherwise, the highest RMSE value was observed for trait 7 (h2: 0.75 and dd: 0.0), trait 1 (h2: 0.25 and dd: 0.0) and trait 4 (h2: 0.5 and dd = 0.0) in all scenarios.
In Table 4 it was observed that most of the significant differences for the RMSE values occurred in scenario E, that is, for the 402 markers not related to the evaluated traits. As observed for the selective accuracy values, the presence of dominance improved the predictive accuracy, since, with the inclusion of higher levels of dominance, there was a decrease in the value of the studied parameter (RMSE).
Table 4
Prediction values of predictive accuracy (RMSE) according to the RR-BLUP methodology (study 1) for all evaluated traits, according to the variation in the number of markers
|
RMSE
|
Study 1
|
|
Traits
|
A
|
B
|
C
|
D
|
E
|
1
|
h2_0.25/dd_0
|
14.20
|
Ab
|
14.72
|
Ab
|
14.35
|
Ab
|
14.49
|
Ab
|
16.19
|
Ab
|
2
|
h2_0.25/dd_0.5
|
9.55
|
Acd
|
9.66
|
Ad
|
9.56
|
Ad
|
9.59
|
Ade
|
12.24
|
Acd
|
3
|
h2_0.25/dd_1.0
|
5.80
|
Be
|
6.17
|
Be
|
6.12
|
Be
|
6.21
|
Be
|
9.64
|
Ad
|
4
|
h2_0.5/dd_0.0
|
14.64
|
Ab
|
14.11
|
Abc
|
13.79
|
Abc
|
13.88
|
Abc
|
16.00
|
Ab
|
5
|
h2_0.5/dd_0.5
|
10.10
|
Bcd
|
10.06
|
Bd
|
9.92
|
Bd
|
9.93
|
Bd
|
13.29
|
Abc
|
6
|
h2_0.5/dd_1.0
|
7.05
|
Bde
|
7.69
|
Bde
|
7.68
|
Bde
|
7.72
|
Bde
|
11.24
|
Acd
|
7
|
h2_0.75/dd_0.0
|
20.16
|
Aa
|
19.81
|
Aa
|
19.50
|
Aa
|
19.61
|
Aa
|
21.79
|
Aa
|
8
|
h2_0.75/dd_0.5
|
10.75
|
Bc
|
10.77
|
Bcd
|
10.84
|
Bcd
|
10.94
|
Bcd
|
13.97
|
Abc
|
9
|
h2_0.75/dd_1.0
|
8.33
|
Bcde
|
8.59
|
Bde
|
8.42
|
Bde
|
8.53
|
Bde
|
12.17
|
Acd
|
Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h2: heritability; dd: degree of dominance. A) 40 markers associated with the traits; B) 442 markers (LG9 and LG10 - including 40 associated markers); C) 1608 markers (LG1 to LG8 - including 40 associated markers); D) 2010 total markers (LG1 to LG10); E) 402 markers (LG 9 and LG 10).
Regarding study 2, the graphical representation of the predictive accuracy values is shown in Fig. 4. Similar to the selective accuracy result, the RMSE values were almost constant when using 1608, 200, 160, 120 and, 80 markers (LG1 a LG8 - including 40 associated markers), or only the 40 markers associated with the traits. Such to the result found in study 1, the lowest RMSE value was observed for traits 3, 6 and 9 and, the highest RMSE values for traits 7, 1 and, 4 (Table 5). These results again demonstrated the effect of dominance on the predicted values.
It was observed in Table 5 that there was no significant difference between the different scenarios, except for some comparisons made for the scenario with the 40 associated markers, which presented some variations. This result reflects that it is possible to use a reduced number of markers in prediction analysis using the RR-BLUP method.
Table 5
Prediction values of predictive accuracy (RMSE) according to the RR-BLUP methodology (study 2) for all evaluated traits, according to the variation in the number of markers
|
RMSE
|
Study 2
|
|
Traits
|
F
|
G
|
H
|
I
|
J
|
K
|
1
|
h2_0.25/dd_0
|
14.20
|
Abc
|
14.29b
|
Ab
|
14.34
|
Ab
|
14.36
|
Ab
|
14.38
|
Ab
|
14.35
|
Ab
|
2
|
h2_0.25/dd_0.5
|
9.55
|
Ade
|
9.50de
|
Ade
|
9.59
|
Ade
|
9.56
|
Ade
|
9.58
|
Ade
|
9.56
|
Ade
|
3
|
h2_0.25/dd_1.0
|
5.80
|
Af
|
6.01e
|
Ae
|
6.10
|
Ae
|
6.08
|
Ae
|
6.10
|
Ae
|
6.12
|
Ae
|
4
|
h2_0.5/dd_0.0
|
14.64
|
Ab
|
13.72bc
|
Abc
|
13.79
|
Abc
|
13.76
|
Abc
|
13.79
|
Abc
|
13.79
|
Abc
|
5
|
h2_0.5/dd_0.5
|
10.10
|
Ade
|
9.93d
|
Ad
|
9.97
|
Ad
|
9.97
|
Ad
|
9.93
|
Ad
|
9.92
|
Ad
|
6
|
h2_0.5/dd_1.0
|
7.05
|
Aef
|
7.52de
|
Ade
|
7.63
|
Ade
|
7.63
|
Ade
|
7.66
|
Ade
|
7.68
|
Ade
|
7
|
h2_0.75/dd_0.0
|
20.16
|
Aa
|
19.48a
|
Aa
|
19.52
|
Aa
|
19.47
|
Aa
|
19.53
|
Aa
|
19.50
|
Aa
|
8
|
h2_0.75/dd_0.5
|
10.75
|
Acd
|
10.66cd
|
Acd
|
10.81
|
Acd
|
10.78
|
Acd
|
10.83
|
Acd
|
10.84
|
Acd
|
9
|
h2_0.75/dd_1.0
|
8.33
|
Adef
|
8.36de
|
Ade
|
8.42
|
Ade
|
8.39
|
Ade
|
8.43
|
Ade
|
8.42
|
Ade
|
Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h2: heritability; dd: degree of dominance. F) 40 markers associated with the traits; G) 80 markers (distributed in LG1 to LG8 - including 40 associated markers); H) 120 markers (distributed in LG1 to LG8 - including 40 associated markers); I) 160 markers (distributed in LG1 to LG8 - including 40 associated markers); J) 200 markers (distributed in LG1 to LG8 - including 40 associated markers); K) 1608 markers (LG1 to LG8 - including 40 associated markers).
Akaike Information Criterion (AIC)
For study 1, it was observed that scenarios C, D, E had the lowest AIC values (Fig. 5). When evaluating within each scenario, it was observed that traits 7, 1 and 4, presented the highest AIC values, that is, they are the absence of dominance traits. In contrast, traits 3, 6, and 9 were the lowest AIC. Thus, it was demonstrated that dominance had a great influence on the values of AIC.
Analyzing each of the scenarios in study 1 separately, it was observed that there was a significant difference between almost all scenarios, except only between scenarios B and E, for trait 1 and between scenarios C and E for traits 4 and 7 that did not differ significantly from each other (Table 6).
Table 6
Akaike Information Criterion (AIC) according to the RR-BLUP methodology (study 1) for all evaluated traits, according to the variation in the number of markers
|
AIC
|
Study 1
|
|
Trait
|
A
|
B
|
C
|
D
|
E
|
1
|
h2_0.25/dd_0
|
6797.47
|
Db
|
7650.95
|
Cb
|
9934.34
|
Bf
|
10740.93
|
Ab
|
7652.62
|
Cb
|
2
|
h2_0.25/dd_0.5
|
6046.56
|
Ee
|
6942.64
|
De
|
9206.18
|
Bf
|
10021.49
|
Af
|
7107.56
|
Cd
|
3
|
h2_0.25/dd_1.0
|
5332.11
|
Eh
|
6257.95
|
Dh
|
8513.22
|
Bi
|
9340.33
|
Ai
|
6722.78
|
Cf
|
4
|
h2_0.5/dd_0.0
|
6634.52
|
Dc
|
7514.04
|
Cc
|
9797.10
|
Bc
|
10606.84
|
Ac
|
7553.03
|
Cb
|
5
|
h2_0.5/dd_0.5
|
6161.44
|
Ed
|
7051.25
|
Dd
|
9318.15
|
Be
|
10131.54
|
Ae
|
7207.36
|
Cd
|
6
|
h2_0.5/dd_1.0
|
5669.75
|
Eg
|
6566.46
|
Dg
|
8815.96
|
Bh
|
9639.14
|
Ah
|
6951.68
|
Ce
|
7
|
h2_0.75/dd_0.0
|
7138.29
|
Da
|
8015.07
|
Ca
|
10298.22
|
Ba
|
11111.15
|
Aa
|
8018.42
|
Ca
|
8
|
h2_0.75/dd_0.5
|
6248.43
|
Ed
|
7145.87
|
Dd
|
9426.63
|
Bd
|
10241.13
|
Ad
|
7341.25
|
Cc
|
9
|
h2_0.75/dd_1.0
|
5849.95
|
Ef
|
6771.07
|
Df
|
9036.62
|
Bg
|
9857.45
|
Ag
|
7106.99
|
Cd
|
Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h2: heritability; dd: degree of dominance. A) 40 markers associated with the traits; B) 442 markers (LG9 and LG10 - including 40 associated markers); C) 1608 markers (LG1 to LG8 - including 40 associated markers); D) 2010 total markers (LG1 to LG10); E) 402 markers (LG 9 and LG 10).
In relation to study 2, the AIC result indicated that the model with the lowest number of markers has a better prediction than the model that includes 1608 markers (Fig. 6). Lower AIC values indicate a preferable model, in addition, the AIC is related to the cross-validation criteria (McQuarrie and Tsai 1998; Piepho and Gauch 2001). As observed for study 1, it was observed that traits 3, 6 and 9 had the lowest AIC values. And that traits 7, 1 and 4 presented the highest AIC values, demonstrating once again the importance of the dominance effects. (Dias et al. 2018) when evaluating various GWS models as to their effectiveness in predicting traits of wheat genotypes, demonstrated that the inclusion of dominance effects in the models, in general, improved the AIC estimates. Dias et al. (2018) when evaluating the accuracy of GWS to predict the performance of simple maize hybrids, in multi-environment tests for drought tolerance, also concluded that the presence of dominance in the tested models was able to better the prediction.
According to Table 7, it was observed that scenario F showed a significant difference between all other scenarios. Otherwise, the high reduction in the value of r² in scenario E, can be explained due to the inexistence of markers associate to the evaluated traits (Table 2), scenarios G, H and I did not show significant differences between them, and scenarios J and K did not show significant differences for traits 1, 2 and 6.
Table 7
Akaike Information Criterion (AIC) according to the RR-BLUP methodology for study 2 for all evaluated traits, according to the variation in the number of markers
AIC
|
Study 2
|
|
Trait
|
F
|
G
|
H
|
I
|
J
|
K
|
1
|
h2_0.25/dd_0
|
6797.47
|
Cb
|
6880.51
|
Cb
|
7090.53
|
Bb
|
7041.88
|
Bb
|
7122.18
|
Bb
|
9934.34
|
Ab
|
2
|
h2_0.25/dd_0.5
|
6046.56
|
Ce
|
6141.46
|
Ce
|
6361.13
|
Bf
|
6312.54
|
Be
|
6394.72
|
Bf
|
9206.18
|
Af
|
3
|
h2_0.25/dd_1.0
|
5332.11
|
Dh
|
5440.16
|
Ch
|
5660.04
|
Bi
|
5611.21
|
Bh
|
5693.21
|
Bi
|
8513.22
|
Ai
|
4
|
h2_0.5/dd_0.0
|
6634.52
|
Dc
|
6737.83
|
Cc
|
6949.35
|
Bc
|
6901.03
|
Bc
|
6982.64
|
Bc
|
9797.10
|
Ac
|
5
|
h2_0.5/dd_0.5
|
6161.44
|
Dd
|
6261.77
|
Cd
|
6472.43
|
Be
|
6428.50
|
Bd
|
6505.06
|
Be
|
9318.15
|
Ae
|
6
|
h2_0.5/dd_1.0
|
5669.75
|
Cg
|
5744.68
|
Cg
|
5968.53
|
Bh
|
5916.15
|
Bg
|
6001.22
|
Bh
|
8815.96
|
Ah
|
7
|
h2_0.75/dd_0.0
|
7138.29
|
Da
|
7240.35
|
Ca
|
7452.73
|
Ba
|
7405.23
|
Ba
|
7484.59
|
Ba
|
10298.22
|
Aa
|
8
|
h2_0.75/dd_0.5
|
6248.43
|
Dd
|
6357.85
|
Cd
|
6582.14
|
Bd
|
6531.85
|
Bd
|
6614.86
|
Bd
|
9426.63
|
Ad
|
9
|
h2_0.75/dd_1.0
|
5849.95
|
Df
|
5964.37
|
Cf
|
6180.08
|
Bg
|
6128.78
|
Bf
|
6213.93
|
Bg
|
9036.62
|
Ag
|
Means followed by the same capital letters horizontally and lower case letters in the vertical do not differ statistically by the Tukey test at 5% probability. h2: heritability; dd: degree of dominance. F) 40 markers associated with the traits; G) 80 markers (distributed in LG1 to LG8 - including 40 associated markers); H) 120 markers (distributed in LG1 to LG8 - including 40 associated markers); I) 160 markers (distributed in LG1 to LG8 - including 40 associated markers); J) 200 markers (distributed in LG1 to LG8 - including 40 associated markers); K) 1608 markers (LG1 to LG8 - including 40 associated markers).
The incorporation of markers in all the genome represents an advance in the context of superior genotype selection and has made genomic selection superior in many respects in relation to assisted selection (Dekkers and Hospital 2002). Many authors (Long et al. 2007; Habier et al. 2009; Usai et al. 2009; Macciotta et al. 2009; Weigel et al. 2010) have realized that using a SNP's subset, carefully chosen for the character of interest, can result in high reliability in the prediction of genomic values. Although we chose to reduce the number of markers at random, there are some proposed methodologies in the literature so that the problems generated by the matrix's dimensionality are solved (Azevedo et al. 2013; James et al. 2013; Azevedo et al. 2014). The methods of Penalized Regression (RR-BLUP and G-BLUP) and Bayesian (Bayes A and B and Lasso) have been the most used, however, there are other methods that have been applied to GWS, such as Partial Least Squares (PLS) and Principal Components Regression (PCR) (Resende et al. 2010; Azevedo et al. 2015), but all of them are computationally demanding. (James et al. 2013) proposed three methodological lines of dimensionality reduction, based on the selection of sub-samples of variables, on penalty methods and on reduction methods themselves via non-correlated linear combinations, and highlighted that high dimensionalities can result in problems of overfitting, multicollinearity and high bias of variance. (Weigel et al. 2009) observed that when using a set of 300 SNP´s with higher estimated effects out of the total of 32518 SNPs, it was possible to obtain half of the prediction accuracy than the complete model using the Lasso Bayesian methodology. Usai et al. (2009), in their simulations, reported higher accuracy when the prediction was made using 169 markers selected from 6000 SNP´s using Lasso.
Some authors have investigated the number of SNP´s markers applied in GWS and divergent results were found on the use of higher or lower density of SNP´s sets (Habier et al. 2009; Weigel et al. 2009; Moser et al. 2010; Crossa et al. 2013; Perez-Rodriguez et al. 2013; Tayeh et al. 2015; Sousa et al. 2019). The higher density of genotyping does not always offer the best accuracy and subsets of markers can sometimes outperform the data set (Zhang et al. 2010; Ma et al. 2016). Sousa et al. (2019) concluded that genotyping costs can be reduced without decreasing the accuracy of GWS through the application of methodologies for obtaining subsets of markers and demonstrated that it is possible to select the most informative markers and create a low-cost SNP chip to implement in genomic selection in breeding programs. Thus, the use of subsets of markers selected based on their effects or positions provides an efficient strategy to improve accuracy (Vazquez et al. 2010; Resende Jr et al. 2012; Zhang et al. 2015). Tayeh et al. (2015) decreased the number of markers from 9824 to 2945, maintaining a single marker per position exclusive to the map. These authors did not observe a reduction in the accuracy of precision in any of the evaluated traits. Ma et al. (2016), concluded that the pre-selection of markers based on the analysis of haplotype blocks is an interesting option to reduce costs with the implementation of genomic selection.
In this study, the reduction in dimensionality proved to be effective, since the values of the r2, RMSE, and AIC statistics, associated with models with a reduced number of markers, proved to be efficient, indicating an improvement in the quality and accuracy of prediction. Thus, it was possible to verify that the presence of non-informative or small effect markers can be removed from the data set and that the prediction of the genetic value with reduced information preserved the same biological conclusions when using a larger data set.