Genome-wide patterns of signatures of selection. We investigated candidate regions under selection using allele frequencies estimated within (iHS) and between (di and Rsb) Egyptian and European breed-groups of goats (Supplementary Table S1). The di identified four candidate regions showing significant differentiation between the two breed-groups on four chromosomes (CHI5, CHI6, CHI12 and CHI17; Figure 1a; Supplementary Table S2). CHI6 and CHI12 had the strongest signatures of differential selection spanning 5 and 20 significant SNPs, respectively. The Rsb revealed 19 candidate regions spread across 12 chromosomes (Figure 1b; Supplementary Table S2). The strongest and most significant regions occurred on CHI3, CHI6 and CHI16 and spanned 15, 11 and 81 significant SNPs, respectively. Thirteen candidate regions found across eight chromosomes were revealed by iHS (Figure 1c; Supplementary Table S2). The strongest of these regions were found on CHI6 and CHI12 and were defined by 19 and 18 significant SNPs, respectively.
Overlap between candidate regions. All the three approaches identified candidate regions under selection on CHI6 and CHI12 (Supplementary Table S2). On CHI6, one region was identified by di (68,327,191-68,495,816 bp), four by Rsb (75,072,253-76,237,641 bp; 78,852,104-79,060,435 bp; 82,782,072-82,791,242 bp; 89,340,232-89,433,167 bp) and two by iHS (30,246,187-47,044,407 bp; 82,852,913-82,907,013 bp). One of the regions identified by Rsb (82,782,072-82,791,242 bp) overlapped with one of the two iHS regions (82,852,913-82,907,013 bp). On CHI12, one region was revealed by di (34,478,328-35,402,998 bp), and two each by Rsb (6,942,093-7,015,304 bp; 34,091,464-34,143,365 bp) and iHS (34,295,846-36,783,669 bp; 41,241,216-41,433,623 bp) (Supplementary Table S2), respectively. The candidate region identified by di, overlapped with one of the two iHS regions (34,295,846-36,783,669 bp). In addition to the regions identified on CHI6 and CHI12 by iHS and Rsb, respectively, the two tests also identified candidate regions on CHI11 and CHI16 (Supplementary Table S2). On CHI11, the single candidate region (68,478,589-69,137,181 bp) identified by Rsb overlapped with the one revealed by iHS (68,478,589-69,208,064 bp). On CHI16, none of the two regions identified by Rsb (6,511,375-13,049,713 bp; 46,186,034-46,690,120 bp) overlapped with the one identified with iHS (53,386,905-53,459,228 bp).
Gene content and functional annotation of the strongest candidate regions. There were five (Rsb = 3, iHS = 2) candidate regions that spanned no annotated genes (Supplementary Table S2). Such regions have been designated as “gene deserts” and have also been reported by other investigators [12, 13, 14, 15]. Genome-wide, we identified 258 genes (di = 13, Rsb = 115, iHS = 130) mapping to 36 (di = 4, Rsb = 19, iHS = 13) candidate selection regions that were defined by 287 significant SNPs (Supplementary Table S2). Amongst the 36 candidate regions, there were seven that were the strongest (di = 2, Rsb = 3, iHS = 2) and spanned 169 significant SNPs (Table 1). In total, 134 genes mapped to the nine strongest candidate regions (Table 1). We regard the positive selection acting on these nine strongest candidate regions to be the primary ones driving the evolutionary divergence between the two groups of goats analysed.
The two strongest signals of selection revealed by di were respectively, on CHI6 and CHI12. The region on CHI6 spanned four genes and the one on CHI12 extended across two genes (Table 1). The strongest signals revealed by Rsb were on CHI3, CHI6 and CHI16, respectively. The region on CHI3 spanned four genes, the one on CHI6 five genes and the one on CHI16, 29 genes (Table 1). Chromosomes CHI6 and CHI12 also had the strongest iHS candidate regions. The one on CHI6 spanned 80 genes, while 10 genes occurred in the region on CHI12 (Table 1).
Enrichment analysis was performed with the 134 genes present in the nine strongest candidate regions. We limited the functional annotation to B. taurus and used it as the background species because, in both cases, using C. hircus returned no results, most likely due to either the incomplete annotation of its genome or its exclusion in the DAVID database. Based on the bovine RefSeq gene annotation, of the 134 genes, 68 were significantly enriched. From DAVID analysis, we found seven functional annotation clusters that were significantly enriched (Supplementary Table S3); the top three were regulator of G protein-coupled receptor (GPCR) protein signaling pathway (GO:0008277; Enrichment score = 4.14), protein ubiquitination involved in ubiquitin-dependent protein catabolic process (GO:0042787; Enrichment score = 2.02) and Leucine-rich repeats (LRR) typical subfamily (IPR003591; Enrichment score = 1.42).
The functions of the 68 genes were further investigated using the text-mining tool in STRING. We detected several genes that have been reported before to be strongly linked to positive selection in other domestic livestock. These genes included LAP3, IBSP, MED28, MEPE, ABCG2, CCSER1, SPP1, that have been associated with milk production related traits [8, 12, 16, 17, 18, 19, 20] and PLAG1, NCAPG, IBSP, DCAF16, CCSER1, FAM184B, CCKAR and LCORL which are body size-related genes [12, 21, 22] in mammals. There were also genes associated with male fertility and spermatogenesis (ANAPC5, GPR125, SMARCAD1, SPATA18) and prolificacy (BMPR1B). Other genes were involved in fatty acid synthesis (adipogenesis) and metabolic pathways (PI4K2B, PPARGC1A, RGS1, RGS2, RGS13), and with pro-inflammation and immune response (SPP1, CXCL8/IL8, HERC5, HERC3) [23]. There were several genes that have been reported to be of functional significance in other species. SLIT2, PACRGL, GPR125, DHX15, SOD3 and KCNIP4 were identified in a mice QTL interval linked to thermal nociception [24, 25]. TBC1D14 has a critical development role for RAB-GAP-mediated protein transport in early embryogenesis in mouse, while its paralog, TBC1D12, has been implicated as a candidate gene for environmental genetic adaptation [26]. TROVE2, UCHL5 and RGS1 were identified as autoimmune disease response candidate genes in activated T cells in human subjects and in interactions with ubiquitin and proteasome pathways [27] while RGS proteins play an essential role in B-lymphocytes and thus in immune response in mice [28].
Protein–protein interactions (PPI), as well as GO enrichment, were also investigated for the 68 most plausible candidate genes using STRING. The network proteins encoded by the 68 genes had significantly more interactions amongst themselves than was expected for a random set of proteins of similar size, drawn from the genome (85 edges identified; PPI enrichment P-value = 1.0 × 10−16; Figure 2). STRING also revealed two GO terms that were the most significantly enriched (FDR < 0.05) viz: negative regulation of G protein-coupled receptor (GPCR) signaling (GO: GO:0045744) and cellular response to stress (GO:0033554).
GPCR’s represent the largest family of transmembrane receptors in the mammalian genome and occur in nearly all multicellular organisms [29, 30]. They play a major role in signal transduction and perception of external environments in mammalian cells. In vertebrates, GPCR signaling networks have been associated with neurotransmission, cellular metabolism, secretion, cellular differentiation and growth, inflammatory and immune responses, smell, taste and vision [31]. Cellular stress response encompasses a wide range of molecular changes in response to environmental stressors, including extremes of temperature, exposure to toxins, and mechanical damage. The various processes involved in cellular stress response serve the adaptive function of protection against unfavourable environmental conditions, through either short-term mechanisms that minimize acute damage to cell integrity, and/or long-term mechanisms which provide a measure of resiliency against adverse conditions.