To the best of our knowledge, this is the first study to provide a whole scan for signatures of selection in the MM genome. Our findings shed light on the possible candidate genes/gene groups involved in the regions undergoing selection in this breed.
Overall, there are many footfall patterns that quadrupeds use during locomotion. The gaits are generally considered to be discrete patterns of footfalls and are divided into symmetrical and asymmetrical [27]. The DMRT3 gene has been associated with gait type in MM; however, some studies have shown gait ability (lateral gait pattern) to be under the influence of a set of genes [5, 28]. Considering that fact, there is still a lack of information regarding the genetic architecture behind the batida and picada gaits.
Selection in the MM breed is based exclusively on competitions where gaited performance records are evaluated relative to that of competitors, often being an empirical selection. Thus, we presumed that time under strong artificial selection is necessary to identify a possible gait type segregation to well-defined lineages. In this regard, it is important to understand which genes in the MM population are most relevant to accomplish such goals. According to Arnason et al. [29], the thoroughbred carried out a long history of artificial selection for galloping speed while being ridden by a jockey, and it might be the same for MM. A well-defined breeding selection can shift the allele frequencies of the desirable phenotype and well-defined lineages could be achieved by selection. For this reason, we next focused on exploring the signatures of selection found in the MM population and in the understanding of genes and pathways associated with these regions.
At first glance, DMRT3 was defined as the only gene capable of determining the phenotypic variation of gait [4]. Nevertheless, other discoveries reported that alleles related to the type of gait were differently fixed within each gait [5]. In the MM population, no significant region could be detected harboring the DMRT3 gene. This is not the first study to discuss the genetic complexity of locomotion in horses. It was verified in Icelandic horses that no SNP demonstrated genome-wide significance, implying that the ability to pace goes beyond the presence of a single genetic variant [30].
The identification of genomic regions modified by positive selection has provided discoveries of the adaptive directions of species, today being one of the main theoretical and applied evolutionary studies [7]. In this study, we covered three distinct methods to scan for signatures of selection which diverge according to their methodology, capturing different patterns of genetic variation in different time scales. Due to the density and complexity of biological information, these methods still present pitfalls and are unable to exploit the genetic variation seen within the entire genome. To overcome this limitation, it is important to use multiple methods in a complementary way [31].
Common regions of significance were found between iHS and ROH. Eight common candidate genes (C9orf24, CNTFR, DNAI1, ENHO, DCTN3, FAM219A, RPP25L, and SIGMAR1) were located on ECA23, and one common gene (RASGRP1) was located on ECA1. As a sense of curiosity, the above genes are ~ 28 Mb away from DMRT3. The existence of some relationships between these regions is possible, and determining the exact gene(s) under selection can be a challenge. We performed network analysis including DMRT3; only one instance of low co-expression was found between DCTN3 and DMRT3. Thus, a possible physiological relationship of the eight genes with DMRT3 was excluded. However, the limitations of using non-model species may have interfered with our presumptions. Besides that, according to Ma et al. [32] and Ablondi et al. [33], during evolution, a series of unknown demographic events further increased the difficulty in detecting modified genomic regions due to different selective pressures. The use of next-generation sequencing (NGS) technologies can be promising not only for elucidating the relationships between loci in ECA23, but because sequencing offers a higher resolution in the localization of selection signatures and an increase in detection power [34]. Few common candidate regions were found among the iHS and ROH methods. As mentioned before, this can be attributed to specific features of each method. Their results should be treated as complementary in that the methods capture different types of selection signals.
The TD results suggest that the MM population is under strong balancing selection; however, many hitchhiking effects were highlighted in other statistics based on the extended haplotype homozygosity and footprints on homozygous regions. The pronounced balancing status in the studied population supported by the TD results was an interesting consequence, possibly explained by the nonexistence of any breeding program in the breed during the past years.
In ROH, the longest shared ROH was identified on ECA7 (Fig. 4). On the other hand, the number of short intervals was more abundant, possibly due to the recombination that has already caused its reduction [26, 35] or due to the limitation of using SNP chips, thereby overestimating the number of short ROH [36]. Again, sequencing data can add value to ROH studies as it covers more genetic variability [37]. However, one of the strengths of ROH analysis is that long homozygous segments can be reliably identified, even from relatively modest marker densities [37]. The two main pieces of evidence for this long ROH segment on ECA7 have already been described in the literature [33, 38]. The first has been considered the quest for high sport trait performance in horses, thus being a recent positive selection based on the directional selection. A second view suggested that strong bottlenecks occurred in this region during the formation of the breed. Ablondi et al. [33] and Capdevila & Belmonte [38] found similar results for ECA7 in Swedish Warmblood horses and Exmoor ponies, respectively. Thus, due to the reproducibility of similar results in this region, we speculate that it is possibly the consequence of a previous bottleneck and not recent positive selection. Our findings corroborate the argument raised by Ablondi et al. [33] about Exmoor ponies in a possible intense bottleneck, but generalizing the evolutionary process of horses as a common moment.
Four genes (TRIR, TNPO2, WDR83 and WDR83OS) were highlighted within the longest shared homozygosity segment. These genes were identified under biological functions for localization (GO:0051179) and metabolic processes (GO:0008152). It has been shown that the TRIR gene has a significant role in cellular functions [39]. Other genes, TNPO2 and WDR83, were related to tumor development. One region on ECA1 harbored the gene RASGRP1, which was found in common between ROH and iHS and plays a key role in the development of T and B cells [40]. Studies have associated RASGRP with disease phenotypes in bovine animals [41, 42] and dogs [43].
In general, the genetic signals for the three statistics were most enriched in ontologies corresponding to “biological regulation”, “metabolic process”, and “cellular process”. In the Panther results for iHS and ROH candidates, the ontology “localization” was also very representative. Some highlighted candidate genes were associated with aspects of gait and locomotor system, with eight of them regulating anterior/posterior pattern specification (Fig. 6).
Figure 6 Functional annotation for top five significant biological functions for candidates related to gait, and locomotor system (P < 0.05)
The HOX genes encode homeodomain transcription factors in the development of many embryonic structures in vertebrates and invertebrates [44]. According to Capdevila et al. [38] and Pineault & Wellik [45], as development progresses, tight spatial and temporal control of gene expression and cellular behavior sculpts the developing embryo, adding specific morphological and functional characteristics that determine the adult animal's lifestyle and functionality.
The GLI3 gene was identified under the same HOX gene group to the regulation anterior/posterior pattern specification. Exploring this information, we find that GLI3 is a transcriptional activator and a repressor of the sonic hedgehog pathway, and also plays an important role in limb development, being described in the literature as an embryonic patterning of human limbs and other structures [46]. In addition, the relationship between the HOX genes and limb musculoskeletal development has been well described. Pineault & Wellik [45] suggested that the integration of the musculoskeletal system is regulated in part by HOX function in the stromal connective tissue, and plays critical roles in skeletal patterning throughout the axial and appendicular skeleton. Evidence to support these genes as possibly regulating limb formation and other processes associated with the locomotor system was reported by Grilz-Seger et al. [47] and Beeson et al. [48], who found several GO terms shared by more than one breed when studying a set of European and Near Eastern horse breeds; high significance levels were reached for the GO terms “anterior/posterior pattern specification” (GO:0009952), “embryonic skeletal system morphogenesis” (GO:0048704), and “sequence-specific DNA binding” (GO:0043565), mainly based upon the HOXB-cluster in the breeds Gidran, Lipizzan, Posavina, and Noriker.
Other major signals in the present study were found for the CCL19 and MAP3K6 genes enriched for the activation of JUN kinase (JNK) activity. Exercise stimulates c-Jun NH2 Kinase Activity and c-Jun transcriptional activity in human skeletal muscle, showing that the JNK pathway may serve as a link between contractile activity and transcriptional responses in skeletal muscle [49]. Exercise causes selective changes over gene expression, leading to differentiation in skeletal muscle structure and function, which provides strong evidence that this regulation may be associated with gait type segregation in the skeletal muscle on limbs. The effect of activity during exercise in c-jun mRNA expression is via the phosphorylation of two serine residues through the JNKs in the c-Jun transactivation domain, leading to an increase in transcriptional activity [49].
As is well known in the modern horse, athletic performance has been the target of selection in recent years for many breeds. Increasingly, a perfect horse is being idealized in countless sporting modalities. Indeed, candidate genes were highlighted under important biological functions related to exercise physiology, energy mechanisms, catabolic processes, morphogenesis (bone, skeletal system and cartilage), and fertility. However, these genes/functions do not act alone in the MM performance. As observed in the network analysis, gene functions are dependent, with the major part of them being regulated in sets.
The interpretation of the network analysis is that most candidate genes, either core gene or peripheral genes, are interconnected. Any peripheral gene is likely to affect the regulation or function of a hub gene. An explanation for the high interconnection in networks is that kinds of networks have structures consisting of distinct modules of connected nodes, but also frequent long-range connections. Under these conditions, any two nodes in the graph are usually connected by just a few steps [50].
Overall, the application of classical and recent techniques in genomics has successfully permitted the identification of several putative selection signatures in the MM population. Based on our discussions, gait may have a polygenic basis and is influenced by many genetic components. Further exploration would be strengthened by searching for signatures of selection by comparing the MM to a non-gaited breed. This method could then be compared to the regions found within the breed and would clarify whether these signatures are unique to the breed (or the gait) rather than being general signatures of selection in horses, or if they could potentially detect new genetic bases of gait in the MM. Among the biological processes, genes of biological interest such as the HOX gene family were enriched in the ontology corresponding to “anterior/posterior pattern specification”. Biological processes related to limb morphogenesis, the skeletal system, proximal/distal pattern formation, JUN kinase activity (CCL19 and MAP3K6), and muscle stretch response (MAPK14), among others, were reported. Finally, identifying genes and pathways that drive phenotypes is still a challenge; here, we pinpoint some important genes and gene pathways involved in complex selective processes that could be useful in other studies and for the genetic improvement of this breed.