The Min is a native Chinese pig breed with an average IMF of > 4.0%, and the Large white is a commercial pig breed with an average IMF of <=2.00%. The Large white × Min F2 separated population is an ideal population in which to investigate the candidate genes or QTLs for IMF. In this study, we first used NGS data of the F2 resource population for CNV calling and genotyping, and then performed CNV-based GWAS for candidate CNV identification. CNV calling using NGS data mainly uses four approaches: paired-end mapping, split-reads, sequence assembly, and read depth (RD) [16]. In this study, we chose to use CNVcaller software [17], which uses the RD method. CNVcaller mitigates the influence of high proportions of gaps and misassembled duplications in the nonhuman reference genome assembly for CNV calling and genotyping and was suitable for our population. In our research, a total of 1185 CNVRs were identified, and this number was smaller than some other pig CNVRs detection research. For example, in the research of Zheng et al., a total of 12,668 CNVRS were detected in 32 Meishan pigs and 29 Duroc pigs[18], and in the research of Wu et al., a total of 18,687 CNVs were identified in Tongcheng and Large white pigs [19]. This may have been caused by the strict standard of CNV definition (silhouette coefficient > 0.6) we used. The qPCR validation results indicated that the selected CNVRs were all real.
In the GWAS analysis, a total of 19 genomic significant CNVRs were identified as being related to IMF. Among the known genes in which significant CNVRs were located, PELP1 is an ESR coregulator protein [20] and has been reported to be associated with sperm morphology abnormalities in pigs [21]. In the human and great ape PELP1 gene, duplicated CNV also exists [22, 23]. Homozygous spermatogenesis associated 22 (SPATA22) is a sex-related gene associated with infertility and related traits [24]. SH3 domain-binding protein 4 (SH3BP4) is a negative regulator of amino acid-Rag GTPase-mTORC1 signaling and is related to diabetic retinopathy [25, 26]. FCH And Mu Domain Containing Endocytic Adaptor 2 (FCHO2) protein can participate in the early step of clathrin-mediated endocytosis and has lipid-binding activity [27]. Other known genes, namely LOC100524322, LOC100524156, and R-SSC-381753, are olfactory receptors (ORs), and ORs have been reported to be related to IMF or insulin resistance in previous research [28–31].
In order to study the function of the CNVRs located in the intergenic region or in novel genes, we performed QTL co-location analysis, and ultimately found that 3 of the 19 CNVRs were located in the regions of reported IMF-associated QTLs. We infer that these QTLs may affect IMF through structure variation.
As CNVRs usually work through regulation effects or dose effects, we analyzed the RNA expression profiles of some individuals with different CNVR dosages. Interestingly, we found that one of the PELP1 ASs, named ENSSSCT00000019597, was significantly differently expressed in CNV150-variant individuals. We then validated the differential expression using qPCR and the results were positive. Hence, we inferred that this CNV150 may affect PELP1 alternative splicing.
In order to confirm the function of CNV150, we analyzed the read depth of CNV 150 in F0-generation individuals and found that the copy number of CNV150 was normal in Min pigs and duplicate in Large white pigs. As shown in Table 1, this CNVR has a negative effect for IMF, and Min pigs and Large white pigs are high- and low-IMF pigs, respectively. These results were consistent. We then further studied PELP1 and its ASs.
First, we studied whether PELP1 directly or indirectly affects IMF. In the PPI networks, about half of the proteins had been reported as related to IMF or insulin resistance. Among these genes or proteins, AR and ESR1 can regulate leptin transcript accumulation and protein secretion in adipocytes [32]. NR3C1 transcription factor has been identified as a potential regulator co-localizing within QTLs for fatness and growth traits [33]. NR4A1 can affect insulin resistance and downregulated intramuscular lipid content [34]. RB1 has a direct interaction relationship related to adipogenesis growth [35]. RPL11 has been revealed to play a role in fat storage [36]. SRC and STAT3 can respond to adipogenesis through the TNF-α pathway [37]. We inferred that PELP1 may influence IMF by interacting with other proteins.
In previous research, the interacting regions of PELP1 and ESR, AR, GR, RB, and STAT3 were amino acids 1–400, or LXXLL motifs, amino acids 1–600, or amino acids 1–330 [38]. Hence, we then studied whether the ASs affected the 3D structure of the PELP1 protein and affected the interaction between PELP1 and its interactive proteins. The structures of the two proteins coded by PELP 1 ASs were predicted using Alphafold2. Alphafold2 has been used to predict the structures of many difficult protein targets at or near experimental resolution [22]. Our results may have high reliability. The results indicated that, in the variation location of amino acids 83–105, a helix was unfolded in F1RFT3 (coded by ENSSSCT00000019597). In A0A5G2R420 (coded by ENSSSCT00000075280), the structure located in the variation region had very high confidence (predicted LDDT (pLDDT)>90). Moreover, this helix was between two LXXLL motifs (located on 69–74 AA and 111–116 AA). We inferred that the structure changes potentially caused by CNV150 may affect the interaction of PELP1 and its interactive proteins. However, the function and molecular regulatory mechanisms between CNV150 and IMF content require further experimental research, such as gene knockdown/editing, co-immunoprecipitation, and so on.