There is a large variability in the literature on the parameters used to identify ROH, which makes it difficult to compare results from different studies, as reinforced by Peripolli et al. [19] and shown in Table 4. Besides the ROH parameters, we also evaluated the impact of the SNP panel density on ROH detection.
As shown in Table 4, the diversity of parameters chosen among studies leads to difficulty in comparison of studies and/or breeds [19]. As mentioned by Biscarini et al. [16], few studies aimed to evaluate the impact of different parameters on ROH detection, with such parameters still not well established. This definition is of crucial importance, as it has a direct effect on the results obtained [23]; not only in ROH detection, but also in the secondary analyses (e.g., inbreeding coefficient based on ROH).
Our work proposed to evaluate 16 populations of cattle selected for different breeding goals to compare ROHs and HERs. The majority of ROHs were defined as being 2-4 Mb and taking into consideration that the recombination events that happen in each generation break the homozygous segments into smaller haploblocks, these runs were likely formed in more ancient generations. Howrigan et al. [24] estimated that the ROH lengths of 10 Mb, 5 Mb, and 2.5 Mb would be correlated with 5, 10, and 20 generations ago, respectively. Moreover, Cardoso et al. [25] estimated that the ROH length higher than 16 Mb was formed less than three generations ago and the ROH less than 8 Mb was formed more than six generations ago.
SEN showed the highest ROH number and the largest number of ROH segments greater than 16 Mb. This suggests that recent inbreeding events have occurred in this population, as expected the crossbred or composite populations, such as ANGSIM and MON, presented the smallest amount of ROHs. The verification of ROH incidence in composite breeds and crossbreeds is an important factor, as a high ROH incidence can indicate a decrease of heterosis [26].
The populations with high selection pressure, such as ANG, NEL50, HOL, and JER, showed higher numbers of ROHs. It is well known that selection increase autozygosity, although there is still a lack of information on selection effects regarding ROH distribution along the genome [7]. The ROH prevalence is more common in regions with higher linkage disequilibrium and low recombination, or regions of low genetic diversity [27]. However, it is also known that selection can cause a substantial pressure in specific genomic regions, resulting in an increase of ROH numbers and determining an appearance pattern in the population [28, 29].
Another interesting fact of analysis is that the difference in density panels affects the number of ROHs detected. For NEL35 and NEL50 (same breed and different SNP panels), there was a considerable discrepancy between the amount of ROH detected (Figure 1 and Figure S1). As discussed by Hillested et al. [30], denser SNP panels tend to result in the identification of a larger number of ROH segments. This happens because an increasing density of markers allows the detection of heterozygote markers that are not presented on the lower density. However, this might not be the only reason in this context, since the SNP panel used for NEL35 was created specifically for the Bos taurus indicus breeds (GGP indicus 35K), with the selection of SNPs with higher MAF and designed to optimize equidistant spacing of markers [31]. Such particularity could affect the ROH detection of ROHs based on the parameters used.
Some studies reported that the density of SNP panels affect ROH detection [8, 31, 32]. According to Ceballos et al. [32], the detection and the ROH quality are affected by the density of markers and their distribution along the genome. Rebelato et al. [10] reported that ROHs are better identified in SNP panels with a density higher than 50,000 SNPs. In the present study, the 35K panel was not capable of showing sensibility to detect ROHs compared to the higher density panels. One possible alternative would be the imputation to higher-density SNP panels once the accuracy of such procedure was elevated in this population [33].
Genomic inbreeding coefficients and effective population size
In general, the inbreeding estimates of the first four methodologies showed similar results. Each one of the methodologies used in the present study has their particularities regarding inbreeding estimation and are dependent on some factors. For example, the first three methods (FHOM1, FHOM2, FGRM) are dependent on genotype allele frequency different to the methodology by uniting gamete [34]. These particularities must be taken into account when defining inbreeding concepts, for example, the UNI method that takes into account the Wright [35] and Malécot [36] definition or the HOM methods that take into consideration the heterozygosity reduction. The HOM and ROH methodologies weigh every allele equally, while the UNI and GRM methods give more weight to rare alleles [37].
The majority of molecular information measured on inbreeding coefficients estimation is based on marker allele identity and does not directly separate the regions that are identical-by-descendent and those that are identical-by-state [38]. The advantage of the ROH inbreeding estimate, besides not being dependent on allele frequency, is the ability to differentiate recent and ancient inbreeding [39]. However, the ROH estimate method is highly dependent on the detection parameters used in analyses.
Low to moderate correlations were observed between the first four methods based on the ROHs analyses (Figure 2). The correlation among the four methodology and the ROH length groups decreases for shorter ROH segments, reinforcing that some inbreeding estimation methods have lower power of detection of more ancient inbreeding [40]. These results corroborate with previous findings in cattle studies such as Gurgul et al. [41] that evaluated the correlation of GRM and ROH methodologies, or Zhang et al. [42] which evaluated the correlation of the HOM and UNI methodologies with the ROH method.
Regarding Ne, there was a reduction, over the generations, for all populations. However, the highest reduction of Ne was detected for GIR and LMS populations. The probable effect of a Ne decrease is the genetic diversity reduction, affecting the homozygous and heterozygous rates in these populations. This Ne reduction might be a consequence of intense selection practices and use of the reproductive technologies [8, 10, 19]. Meuwissen et al. [43] mentioned that attention is required in the management of genetic diversity of populations with Ne lower than 100. The decrease of Ne values can increase the linkage disequilibrium and increasing the fixed allele rates, causing a reduction in the variability of certain genomic regions. All populations in the present study had recent Ne values higher than 100, which demonstrates a reasonable control of inbreeding. However, it is recommended, especially for GIR, HFD, SGT, HOL, and JER that the Ne rates should be constantly monitored to avoid a loss of diversity in the next generations. Edea et al. [44] working with some of the same breeds (ANG, CHL, HFD, HOL, and JER) found the same Ne pattern, also demonstrating a decrease of this parameter in recent generations.
Homozygosity islands
The homozygosity islands are the regions with the incidence of ROHs in at least 50% plus one individual of the population. This occurs due to increase of allele frequency in certain regions as a response to adaptive processes or intense selection of traits with high economic interest [10]. Many islands were present in both GIR and BRM animals. Interestingly, GIR contributed to the formation of BRM [45], suggesting that these islands might have been maintained over generations and were kept in BRM. One of the metabolic pathways found with the genes into these islands is the melanogenesis pathway, which is responsible for determining the coat color pattern in each breed [46], sustaining the balance between the brown-black eumelanin and reddish-yellow phaeomelanin [47], and is also associated with thermoregulation, resulting in better adaptation to certain environmental conditions [48, 49].
Another pathway found to be in common in four populations that presented the homozygosity islands was the prolactin signaling pathway, which is responsible for many biological processes. Some genes in this pathway were previously associated with some traits in cattle. For example, the STAT gene family (signal transducer and transcription activator) is responsible for the development and differentiation of mammary gland cells in the different life phases [50]. The MAPK10 gene (mitogen-activated protein kinase 10) is linked with the inflammatory response and immunity of epithelial cells in the mammary gland. According to Silva et al. [51], MAPK10 is a candidate gene to somatic cell score (SCS). The PRLR gene (prolactin receptor), also known as the SLICK gene, was associated with heat tolerance [52], because results in short and slick hair that help in adaptation to high temperatures [53].
The calcium signaling pathway found in the common islands was observed in ANGSIM and JER. In the ANGSIM breed, the gene related to this pathway is CACNA1E (calcium voltage-gated channel subunit alpha1 E). Hay and Roberts [54] reported CACNA1E as a candidate gene for growth and carcass-related traits. In JER breed, the DRD5 gene (dopamine receptor D5) was associated with the calcium signaling pathway, which has been associated with feed intake regulation and energy homeostasis [55]. Other pathways and genes associated with the homozygosity island found in this study can be accessed in the Supplementary Material section.
Heterozygosity-enriched regions
Different from ROHs, it is expected that the HER occur in regions with high recombination rate, as low linkage disequilibrium leads to high region diversity. Normally, HER are concentrated in genomic regions associated with disease resistance, where higher levels of diversity can help the population to deal with novel (and changing) potential infirmity issues [16]. Despite being associated with interesting regions, HER are not as widely studied as ROHs [56].
The population that showed the highest amount of HER was MON (a composite population), which represented more than 47% of HER found in this study. This result is expected, as the MON population is a composite breed that combines different biological groups, including breeds from Bos taurus indicus and Bos taurus taurus [26].
Another interesting result was the difference found between NEL35 and NEL50 groups for HER. The length classification of HER was different between SNP panels: the NEL50 group presented a smaller length of HER compared to the NEL35 group. As the same regions were detected in both panels, this result suggests that many of the long HER regions are composed of small HER, broken by regions with homozygous SNPs.
Only one gene was found in the smaller HER – KHDRBS2 (KH RNA Binding Domain Containing, Signal Transduction Associated 2). This gene has been associated with fertility and reproductive performance in Sanmartinero cattle [57]. However, most genes found in the HER islands showed some relation with animal immunity, demonstrating a relation of the biological responses to environmental challenges. Another important gene is IFNG (interferon-gamma), which was found in the island for MON breed. Some studies indicate an association of IFNG with tick resistance in taurine and zebu cattle [58]. The ANXA1 gene (Annexin A1), a protein regulated by glucocorticoids, which was identified for SEN, plays an inhibitory role in the synthesis of arachidonic acid, a precursor of inflammatory processes.
These high heterozygosity concentrations in the region of immune response control are an interesting point to be investigated in these populations, as they can contribute to disease resilience [15]. Additionally, the genomic regions where there is higher variability in the livestock-interesting traits is an important point to check in studies to estimate the genetic diversity of the populations [16], as sustaining diversity can be an advantage for adaptive traits and selective breeding [59].
SNP panels and WGS comparison
Currently, some WGS studies have been used to detect population structure and identify the regions influencing economic traits in livestock [60]. The present study did not find similarities between SNP panels and WGS results in any of the evaluated populations. The majority of ROHs with long-homozygous sequences are actually many short ROHs distributed side by side. In addition, as in the SNP panels, the loci between two homozygous SNPs are assumed to also be in homozygosity, as the ROH tends to be long, overestimating the lengths of some ROHs.
One interesting point found in the comparison between breeds and SNP panels that was not previously identified with the analysis only with SNP panels, was a small ROH concentration observed in the WGS analysis for Holstein breed close to an island region found in SNP panel analysis of the SEN population. This region contains the SLICK gene, already discussed previously and originally found in SEN animals. Recent studies have already shown that the SLICK gene has been introduced into some HOL populations in a selection process to control heat stress [53].
As found in our study, the number of ROH and HER regions differ among populations and provide insights on their differences in selective breeding and evolutionary histories. These differences are expected once the events that acted in each population are different or show different intensity [44]. Some particularities of this study must be taken into account, as the unbalanced number of animals among populations. This could affect the total number of ROH and HER identified for each population and the comparison of the results obtained. Furthermore, the WGS datasets were not obtained in the same animals genotyped for SNP panels and hence, the lack of common regions cannot be attributed solely to the genotyping platform.
It is noteworthy that our work report substantial information about the genetic diversity in different cattle breeds, presenting new genomic regions with homozygosity islands and heterozygous-enriched regions, where a great number of genes are situated (Table S1 to Table S12). One of the main challenges of managing genetic diversity of livestock populations is to know which genomic regions are in homozygosity/heterozygosity, since they are highly heterogeneous over the genome. This management and characterization of the genetic structure of a population is essential to access the diversity and help to understand the time action over specific breeds. With the advancement of molecular technologies, novel insights into the animal genome can be accessed and populations compared, to verify which regions are in high or low diversity in each population and thus better manage future generations.