Directional selection can leave a set of signatures in the genes under its influence, such as the rapid divergence of functional sites among populations, the depression of polymorphism within populations and a large amount of linkage disequilibrium (LD) existing over an extended region harboring the selected allele [1, 21, 30]. To animal breeders, these patterns could be useful in the process of identifying genomic regions or genes underlying traits of interest, since locations of selection in livestock populations should be correlated with QTL affecting production traits [5, 14]. The “fat tail” is a phenotype that divides domesticated sheep into two major groups. In the past, this trait has been desirable for both associated requirements of humans and as a store of energy for animals during feed scarcity that led to selection for higher fat tail weight [31]. Although, in recent times due to the improved forage availability and decreased price for the product, most of the advantages of a large fat tail have reduced importance in genetic improvement programs [3, 32]. An increasing number of studies have been conducted to detect signals of recent positive selection on a genome-wide scale in different domestic animals [e.g. 34–37]; however, there are relatively few genomic regions identified that have been subject to selection for a specific mutations underlying evolutionary shifts in a trait. In this paper, in order to fine map the genomic regions potentially affecting the fat deposition in thin and fat tail breeds and get more insight into the genomic basis of that, we have investigated three candidate regions using this approach.
These candidate regions were analyzed using a variety of statistics to clarify the signals of selection observed in these regions. Our analyses focused on two allele frequency spectrum (FST and median homozygosity) and haplotype based (|iHS| and XP-EHH) statistics. These tests were chosen because previous power analyses suggested that these are largely complementary [27]. FST and median homozygosity tests detect highly differentiated alleles between populations, where positive selection in one area causes larger frequency differences compared to neutrally evolving alleles. Akey et al. 2002 [37] suggested using the loci in the tails of the empirical distribution of FST as candidate targets of selection. iHS, a measure of within breed evidence for selection, has good power to detect partial selective sweeps. Voight et al. 2006 [23] demonstrated that the iHS method was more powerful than Tajima’s D [38] or Fay and Wu’s H [39] tests. A disadvantage of this approach is that it loses power when the beneficial allele is close to fixation, because fixation will eliminate variation at and near the selected site. To overcome this problem, we next applied XP-EHH, to detect selected alleles that have risen to near fixation in one but not another population, as for this purpose it is more powerful than iHS test. iHS and XP-EHH tests have been combined to search for recent positive selection in humans [24, 40, 41], and other species such as Plasmodium falciparum [42]. Grossman et al. 2010 [43] showed that the FST and XP-EHH signals peaked more narrowly around the causal variant, making them useful for spatial localization, while iHS is better to distinguish causal variants and contributed little to spatial resolution. These tests were relatively uncorrelated in neutral regions, and only weakly correlated for neutral variants within selected regions [43].
Study of candidate regions on chromosome 5 and X revealed obvious evidence of selection using FST, median homozygosity and XP-EHH test in a relatively narrow region; while an examination of these regions identified no particular |iHS| peak. With a hypothesis that historically different selection pressures operated in thin and fat tail breeds and somehow selection acted on a variant that was advantageous only in one breed, these results suggest that selection in these regions occurred for mutations affecting fat tail size as the beneficial mutations have risen to near fixation in fat tailed breeds. This suggestion is also supported by the core haplotype frequencies observed in candidate regions on these chromosomes as the common core haplotypes in fat tail breeds were near fixation (Table 2).
Analysis of the region on chromosome 7 indicated strong evidence of selection using all selective sweep statistics. The results of FST and median homozygosity suggested that the selection differentiated the beneficial alleles between breeds and that homozygosity has been increased in favor of thin tailed in this region. However, |iHS| and XP-EHH revealed additional information. |iHS| provided evidence of partial selection in both breeds, while the XP-EHH results showed that the selected alleles have approached fixation in the thin tail breed.
As discussed earlier, the power of the |iHS| statistic to detect selective sweeps is greatest at a moderate allele frequency (~40-60%), while XP-EHH test is more powerful for detecting selective sweeps close to fixation (>80%) [23, 40]. However, both of these methods do have power outside their optimal ranges. Sabeti et al. 2007 [24] demonstrated that the iHS statistic could detect signals over the range of 20-80% and XP-EHH do have power between 60-100%. Overall, these results suggest that while the frequency of selected alleles has been raised to fixation in thin tail breed, its frequency should be ~ 60-80% to be picked up by both methods. Simultaneously, the favorable allele should be increased to mid-range frequency in the fat tail breed. The results of core haplotype frequencies in this region (Table 2-b) are in consistent to this point as the common haplotype in thin tail breed has a frequency of 80%, whereas all haplotypes observed in this region appears polymorphic in the fat tail breed. One inference is that there has been ongoing infusion of fat tailed haplotypes into this breed, but also selection for the thin-tailed phenotype.
To further test this hypothesis, we constructed the pattern of haplotype blocks in this region and the decay of LD (pairwise r2) was visualized using Haploview [44]. The effect of a selective sweep on patterns of variation is expected to decline with time (due to recombination) and if a selective sweep is still ongoing in a subpopulation, the hitchhiking haplotype is expected to be rather long [27, 30]. Our results (Additional file 5: Figure S5) revealed that although there were haplotype blocks in both breeds, they extended for longer distances in the fat tailed compare to the thin tail breed. This suggests that as the prevalence of selected alleles increased in thin tail breed, the LD around variants decayed due to recombination, while the selection in the fat tail breed is younger (longer haplotype blocks). These results confirm our observations for evidence of selection in this region with both |iHS| and XP-EHH tests.
The earliest known depiction of a fat tail sheep is on an Uruk III stone vessel about 5000 years before present, approximately 4000 years after initial domestication [45]. Given that fat tailed breeds are now prevalent in the Fertile Crescent, where sheep were originally domesticated, while thin tailed sheep breeds are predominant in peripheral areas and that the wild ancestor of sheep is thin tail, it has been assumed that the first domesticated sheep were thin tailed and fat tail was developed later [45, 46]. We have investigated this hypothesis through the classification of selected alleles in core haplotypes of our regions of interest as ancestral or derived. Our results provide the preliminary molecular evidence to confirm this assumption since, we observed that in almost all cases, derived alleles have been under selection pressure in the fat tail breed, and this is consistent with the selection of a new mutation in these breeds (Table 2).
Population demographic history can also cause similar patterns on DNA sequence variation and could be a source of error in making inferences on genomic targets of selection. This caveat can be avoided by screening a large number of markers (as has previously been performed in this research) spaced across whole genome [18], as selection will result in regional patterns compared to the genome-wide effects of population history and demographic events [10, 47]. Similarly conducting this type of fine-scale analysis at the candidate genomic regions using a dense set of markers and multiple statistical tests, reduces the chance that a signature of selection will be a false positive if it is detected in more than one marker locus and statistical test [43].
The candidate regions of interest have previously been studied using OAR v1.0 in O. aries and their corresponding area of B. taurus and no particular candidate genes associated with fat deposition were identified [18]. In this study, reinvestigation of these regions using the newly available sheep genome OAR v3.1 [48], defined some genes associated with fat metabolism in O. aries or their orthologous areas of B. taurus (Table 3). Protein phosphatase 2 (formerly 2A), catalytic subunit, alpha isoform (PPP2CA) has a variety of roles in different biological process such as cellular lipid metabolic, membrane lipid metabolic and sphingolipid metabolic process, while hydroxysteroid (17-beta) dehydrogenase 10 (HSD17B10) and emopamil binding protein (EBP) genes play some roles in lipid metabolic process and androgen receptor (Ar) and synaptophysin (Syp) genes get participate in lipid binding [49]. Interestingly, most of the genes identified here, have been recently reported as candidate genes associated with lipid metabolisms using various molecular techniques in sheep. Yuan et al. 2019 [28] implemented differential expression analysis using RNA-seq technology in longissimus dorsi muscle tissue (MUT), perirenal adipose tissue (PAT) and tail adipose tissue (TAT) of different Chines short and fat tailed sheep breeds and revealed that PPP1CA is highly expressed in TAT. Also, PPP1CA was identified as plausible genes associated with the fat-tailed or fat-rumped phenotype by comparing copy number variations (CNVs) with different tail types [50]. Moreover, protein phosphatase 1, catalytic subunit, gamma isozyme (PPP1CC) have been under selection signature in Chinese thin and fat-tailed sheep breed and reported to be associated with tail type [51]. Kang et al. 2017 [19] profiled transcriptomes from Tan sheep, a Chinese indigenous breed with notable fat tail, using RNA-seq to unravel the potential underlying mechanisms governing accumulation of adipose tissues in various regions of the body including subcutaneous, visceral and tail. Their results exhibited marked changes in expression among the three adipose depots for AR gene. Hormone receptor density differences in the adipose tissues influence regional fat distribution, for example, AR contributes to the control of adipocyte development by interacting with its own ligand androgen [19]. HSD17B12 has been also reported to be associated with fat tail metabolism in thin and fat tailed sheep breeds using deep transcriptome analysis obtained by RNA-Seq data [52]. It is reported that HSD17B12 (act as elongates) are important genes for controlling the overall balance of fatty acid composition [52].
To confirm the results of our study, the exon 1 of PPP2CA gene was amplified and its variation patterns were sequenced in an independent study on Zel and Lori-Bakhtiari sheep breeds [53]. Two patterns were identified and the results of sequencing showed that in Lori-Bakhtiari, Del/Del genotype resulted in heavier fat tail than T/T genotype (5.20±0.21kg vs 3.28±0.12kg) (P<0.05) while, in Zel, the effect of genotypes on carcass fat percentage and triglyceride was significant, so that the T/T genotype had more carcass fat percentage comparing to Del/Del genotypes (P<0.05). Overall, it seems as the annotation of the ovine genome becomes more complete, all genes located in the candidate regions will be identified and promising targets can then be verified by further experimentation.
A result which is irrelevant to the inheritance of the trait, but provides an insight into a possible mechanism of fat deposition in this organ, are the results of Gökdal et al. [29] who examined the effects of docking in fat tail breeds. The carcasses of the docked group contained more kidney, pelvic and internal fat than the intact lambs as well as a higher percentage of subcutaneous and intramuscular fat. The weights of the different carcass cuts of the docked lambs were also heavier than those of the intact group. However, there was little change in overall carcass composition, suggesting that the genes affecting the fat tail phenotype are associated with the localization of fat stores to a regional depot rather than control of the overall level of fat deposition. This observation also may provide support to the suggestion that some genes selected for in fat tail sheep breeds in these regions are likely to be also associated with developmental defects or ectopic expression of organs. Our results revealed that these regions contain many genes, having some known biological functions associated with developmental process (Table 4 and Additional file 4: Table S4). Several earlier studies provide evidence for this issue. For example, the transcription factor (TCF) genes has been among highly differentially expressed genes in perirenal adipose tissue (PAT) and identified as being the most likely to account for the fat-tailed phenotype of sheep [28]. TCF7 is involved in the Wnt/β-catenin signaling pathway, and this pathway plays a critical role in regulating sheep [52] and porcine [54] adipogenesis genes expression. Zhu et al. 2019 [55] studied copy number variations (CNVs) and selection signatures on the X chromosome of Chinese indigenous sheep with different tail types and revealed that the regions harboring CNVs and selective sweeps in different sheep breeds overlapped with calcium channel, voltage-dependent, L type, alpha 1F subunit (CACNA1F) gene that could be as associated with tail type in these breeds. In addition, HSD15B [52], SLC35A2 [51], AR and TIMP1 [19], have been identified as candidate genes that affect fat tail development. Moreover, it is important to note that our regions of interest overlapped with some genes, for example BMP15, WDR13 and RBM3 that belong to the gene families that their closely related genes consisting BMP2 [21, 51, 56–58], WDR92 [20, 51] and RBM11 [5] have recently reported to be associated with fat tail formation and adipose tissue gene expression in sheep.
Finally, fine mapping of candidate regions using different sweep statistical tests has enabled us to confirm the signature of selection in these chromosomal regions and better refine the critical regions from 113 kb (47,149,400-47,263,230) to 28 kb (47,146,931- 47,175,489) on chromosome 5, from 201 kb (46,642,359-46,843,356) to 142 kb (consisting to shorter intervals: 46,587,943-46,642,359 and 46,765,080-46,852,870) on chromosome 7 and from 2,831 kb (58,621,412-61,452,816) to 1,006 kb (59,257,971-60,264,325) on chromosome X. These regions were refined considering all statistical test results (Additional file 6: Figure 6). Acquiring of the genes located within the regions of interest after fine mapping revealed some genes consisting TCF7 on chromosome 5, PTGDR and NID2 on chromosome 7 and finally AR on chromosome X that have a multiple effect on lipid metabolisms, macromolecule metabolic process, organ/gland development or associated with ectopic expression of organs simultaneously.
Recently published study on the origin of European sheep as revealed by the diversity of the Balkan breeds and by optimizing population-genetic analysis tools [59] using a variety of sheep breed samples from Southwest-Asian, Mediterranean, Central-European and North-European showed that the thin-tailed Zel sheep is found to be in the same genetic cluster as the fat-tailed Iranian sheep, whereas the fat-tailed Italian Laticauda is related to other breeds in central Italy. This implies that the tail phenotype is encoded by a limited number of genes. By combining information of the present study, previously reported and annotated biological functional genes, we suggest PPP2CA and TCF7 (OAR5), PTGDR and NID2 (OAR7), AR, EBP, CACNA1F, HSD15B, SLC35A2, BMP15, WDR13 and RBM3 (OAR X) as the most promising candidate genes for type of tail traits. It is obvious that understanding the mechanisms that underlie fat tail inheritance in sheep is difficult to verify solely by selective sweep profiling. Therefore, further work based on the results from this study are required to uncover the exact genetic mechanisms of fat deposition in the tail of sheep. Also, these regions may still be too large to efficiently implement technologies such as marker assisted selection or positional cloning. More detailed and larger scale experiments from these and other thin and fat tailed breeds may allow us to refine the location of the causal mutations. Likewise, it does not exclude contemporaneous selection for other traits, and any regions identified still need to be tested for a functional genetic relationship via trait measurement in contrasting genotypes or phenotypes. Specifically, future studies should be conducted in reciprocal F2 crosses to provide independent and causal evidence and verify the mode of inheritance. If these areas are shown to have a significant effect, then further work sequencing either side of the regions can help the search for causal variants.