In the present study we chose SNPs at random to be part of reduced subsets to investigate the genomic prediction of carcass and organ traits in broiler chickens. Although the markers were not equally spaced during the selection process, they were present in at least one chromosome across all genome.
MAF distribution
The distribution of MAF for the HD array was uniform while the MAF distribution for WGS variants retained for further analyses was not (Figure 1). Unlike other studies [10, 12], the variants used in the current study did not show a U-shaped MAF distribution for WGS data. In accordance with Ni et al. [27], which also found a non-U-shaped MAF distribution for sequence data in layer chickens, this distribution in WGS data may have occurred due to two possible reasons. First, some of the rare SNPs in the sequenced animals were removed during the imputation process as a result of poor imputation accuracy of SNPs with low MAF. Second, these same rare SNPs were not available in all animals in the population.
Heritability for pedigree-based and genomic models
Estimates of variance components and heritability for carcass and organ traits obtained through PBLUP and ssGBLUP are provided in Table 2. The heritability estimates varied from low (LUN) to moderate (HRT, LIV and THG) and high (GIZ, BRST and DRM) and the standard errors associated with those estimates were low.
Pedigree-based heritability estimates have been reported in the literature for most of the traits used in this study. Using the same population (Embrapa TT), Venturini et al. [14] reported similar pedigree-based heritability estimates for LIV (0.33±0.07), GIZ (0.44±0.08) and BRST (0.37±0.07) to those reported herein. However, the heritability estimate found in this study for DRM (0.43±0.08) and THG (0.39±0.07) was than the estimate in Venturini et al. [14] (0.35±0.07 for DRM and 0.29±0.06 for THG). THG and DRM are commonly analyzed together as a leg trait, so heritability estimates for those traits are scarce in the literature. Heritability estimates for leg in chicken were reported by Argentão et al. [28] (0.34), Rance et al. [29] (0.48±0.07) and Gaya et al. [30] (0.33±0.03). In a study with a male broiler line, Gaya et al. [30] reported heritability estimates for HRT (0.38±0.04), LIV (0.25±0.03), GIZ (0.39±0.04) and BRST (0.33±0.03). Rance et al. [29] reported heritability estimates for HRT (0.30±0.08), LIV (0.08±0.06), GIZ (0.52±0.10) and BRST (0.59±0.08).
The heritability estimates for LUN in broiler chickens are not common in the literature. Using an F2 experimental population, Ledur et al. [31] reported similar pedigree-based heritability estimates for LUN (0.10) to the result reported herein. Although LUN is not considered an economically important trait, it has been related to pulmonary hypertension (e.g. ascites). Heritability estimates for ascites have been reported by several authors [32–36]. The use of a multi-trait model may be responsible for higher estimates of heritability reported in this study compared to the literature since multi-trait models use additional genetic information from links with other traits [37].
Heritability was also estimated using the H matrix instead of the numerator relationship (A) matrix which resulted in relatively small differences between the estimates (Table 2). In general, the genomic heritability is smaller than heritability estimated using only the pedigree and phenotypic information [38].
Correlation between EBV and GEBV
Across all traits, EBV estimated with at least 0.5% of SNP were highly correlated with EBV estimated from the complete HD (minimum correlation of 0.94) and the minimum average correlation between 0.5% of SNP and PBLUP was 0.89. Indeed, lower correlations were observed when a smaller number of SNP sets were used, but correlations between predicted breeding values were higher when the genomic matrix was incorporated in the analyses, regardless the SNP set selected. A similar pattern was observed with WGS subsets.
Comparing the correlations between EBV using different SNP subsets showed no differences when 10% or 100% of SNPs (mean correlation 0.99) were used in the analyses which suggests that the use of an evenly-spaced lower-density panel could provide a very similar ranking of EBV at a potentially lower cost, as proposed by Habier et al. [39]. When applied to Japanese black cattle, Ogawa et al. [40] suggested that using at least 4,000 equally spaced SNPs was sufficient for genetic prediction for carcass weight and marbling score. Using a 50K chip, Rolf et al. [41] reported that between 2,500 and 10,000 SNPs distributed throughout the genome could be used to form a G matrix to accurately predict EBV for feed efficiency in Angus cattle. The current study supports the use of reduced subsets of SNP, demonstrated in other species, for genomic prediction in broiler chickens.
Regression coefficients
The regression coefficients of EBV on GEBV quantifies the bias in the variance of the estimated breeding value for each SNP subset (Tables 3 and 4). Regression coefficients were similar for both the HD and WGS sets of SNPs, ranging from 0.82 to 0.99 across subsets. The regression coefficients increased as the proportion of SNP increased reaching a plateau between 5 and 10% of SNPs. Overall, all the regression coefficient values were the same when HD or WGS was used. In practice, regression coefficient equal to one indicates no bias. Except for LUN, our results showed regression coefficients lower than one for both HD and WGS sets, which means the variance of breeding values were overestimated. However, according to Tsuruta et al. [42] deviations of ± 15% from unity are acceptable.
Using a pure layer line, Yan et al. [43] used bias to compare PBLUP and ssGBLUP prediction methods. These authors used the regression coefficients of phenotypes corrected for fixed effects on predicted (G)EBV and reported that EBVs from ssGBLUP were less biased than those from PBLUP. Also, in layer lines, Heidaritabar et al. [12] reported high regression coefficients (greater than 1) when PBLUP or GBLUP methods were used. According to these authors, regardless the incorporation of genotypes from a 60K SNP panel and sequence data the regression coefficients remained greater than 1, indicating an underestimation of the breeding value variance.
Bias differences among the methods may be explained by directional selection [44]. In the present study, multi-trait selection has been applied in this line, mainly focused on traits such as body weight, feed conversion, carcass weights and yield, fertility, hatchability, and to reduce abdominal fat and metabolic syndromes [13, 14].
Predictive ability
Predictive abilities across all traits are reported in Table 5. Compared to PBLUP, the predictive ability assessed by ssGBLUP was higher for most of the traits when at least 5% of SNPs were used. The incorporation of sequencing variants is generally thought to have the potential to improve predictive abilities, since it is expected that a high proportion of genetic variation may be explained when a high-density panel or even sequencing data are used. Although WGS increases the number of markers, most of them are in incomplete LD with causal mutations. Variants in incomplete LD with causal mutations limited the increase of prediction abilities, thus the use of variants in strong LD with causal mutations could be useful in improving the accuracy of genomic prediction [45].
The concept of reducing SNP density as a solution for genotyping costs has been widely reported [39–41, 46, 47]. While accuracy estimates using a genomic relationship matrix appear to be better than those from a pedigree relationship matrix, our results show no difference in genomic prediction accuracy when a reduced number of SNPs were used to fit the genomic relationship matrix, indicating that at least 10% of SNP from the HD panel (~37,000 SNPs) can be used in genomic evaluation. A previous study with dairy cattle [48] has shown small genomic prediction gains with the increased marker density from medium density (~54,000K) to HD (~777,000K). Agreeing with our findings, similar results were also obtained in pigs by Zhang et al. [37] which increasing marker density (80K, 650K and WGS) had a little or no advantage in genomic prediction for feed intake.
Simulated data has shown an increase in genomic prediction accuracy when the causal mutations were included in the analyses [7, 8, 11]. Contrary to those findings, our study showed no significant increase in prediction accuracy when using WGS variants as opposed to SNP from the HD. Other authors have also observed lower or no significant benefits in predictive ability gain using sequence data comparing with SNP arrays [6, 9, 10, 12, 45, 49, 50].
The infinitesimal model used herein (ssGBLUP), whereby all markers are assumed to have an effect and a common variance, showed no significant increase in prediction accuracy using WGS variants as compared to the HD markers. In a simulated study, Clark et al. [51] suggested that the increase of genomic prediction accuracy would be small when the trait is highly polygenic especially in a small reference population. Furthermore, false positives, including sequencing, alignment and calling errors, which are not included in simulated analysis but are present in real data, can also be responsible for these results [49].
Another possible reason is related to the population structure. When a small effective population size undergoes selection for an extended period of time no significant gains in prediction accuracy are obtained regardless of using HD panel or WGS dataset [11]. Thus, in highly selected population, almost all of the genetic variance can be explained by the SNPs genetic variance as result of the relationship between individuals [52].
Although imputation accuracy was not the main objective of the present study, it can help to explain why sequence data did not show a superior predictive ability compared with the HD panel. In our study, the average of imputation accuracy assessed using the validation subset approach was 0.84. Although this value is suitable, possible errors in the genomic map might be responsible for reducing the imputation accuracy since those errors may decrease the accuracy of prediction and interfere in the detection of causal mutations [53].