Ethics statement
The animals and experimental methods used in this study are following the guidelines of the Ministry of Agriculture of China and Use Committee of South China Agricultural University (SCAU). The ethics committee of SCAU (Guangzhou, China) approved all of the animal experiments.
Samples and phenotype data
Our sample includes American (n=3,770) and Canadian (n=2,857) origin Duroc pigs collected from the Wens Foodstuff Group Co., Ltd. (Guangdong, China). In addition, data such as ADG (average daily gain at 100 kg), AGE (days to 100 kg), and BTF (backfat thickness at 100kg) were obtained from each population. During the experiment, all the pigs were raised under normal management conditions and the same feeding standards. The ADG and AGE of each pig were measured from 30 to 115 kg and then adjusted to 100 kg. The adjusted formula of AGE are as follows:
where correction factors one is different between sire and dam, and the formulas are as follows:
The following formula for adjusted ADG:
The phenotype of BTF was measured from the 10th-rib to 11th-rib at approximately 100 kg using an Aloka 500V SSD B ultrasound (Corometrics Medical Systems, USA) [25], and the adjusted formula of BFT is as follows:
SNP genotyping and quality control
Genomic DNA was extracted from ear tissue with the traditional phenol/chloroform method, the DNA quality of samples was assessed by light absorption ratio (A260/280 and A260/230) and gel electrophoresis, the concentration was controlled to be 50 ng/μL [26]. Samples were genotyped using Illumina GeneSeek 50K SNP array (Neogen, Lincoln, NE, United States) with 50,703 SNP markers across the entire genome. The quality control (QC) were performed with PLINK v1.90 software [27], the parameters of QC include: animal call rates > 0.9, SNP call rates > 0.9, minor allele frequencies > 0.005 and P > 10-6 for Hardy-Weinberg equilibrium test. The SNPs located in sex chromosomes or without positional information were also discarded. Finally, a set of 39,600 informative SNPs from 3,770 American origin Duroc pigs and 36,700 informative SNPs from 2,857 Canadian origin Duroc pigs were used for CNVs detection, respectively.
CNVs detection
PennCNV software [28] was used to identify individual-baed CNV by combining the SNP signal data of log R Ratio (LRR) and B allele frequency (BAF) as well as the population frequency of B allele (PFB). Among them, the LRR and BAF values were computed by GenomeStudio software v2.0 (Illumina, Inc., USA) for each SNP. The perl comppile_pfb.pl command was used to calculate the PFB files based on the BAF of each SNP. Moreover, the wave adjustment procedure was conducted using the -gcmodel option in PennCNV software to reduce the impact of genomic waves [29]. We calculated the GC content of the 500kb genomic region around each SNP according to the Sscrofa 11.1 version of the pig reference genome (http://ensemble.org/Sus_scrofa/Info/Index). The final CNVs were generated by retaining high-quality samples according to the following criteria: LRR < 0.3, BAF drift < 0.01, and GC wave factor of LRR < 0.05. Meanwhile, to further decrease false-positive CNVs, CNVs with consecutive SNPs ≥ 3 and CNV length ≥ 10 kb were remained. We also used the bedtools software v2.26.0 [30, 31] to merge CNVs with at least 1bp overlap in all samples to define CNV region (CNVR) [17], and CNVRuler software [32] was used to define three types of CNVR: loss, gain and mixed (gain and loss occur in the same region). Besides, we matched the CNVs of respective populations with the corresponding CNVR to obtain the CNVRs of each Duroc pig population. In other words, the CNVRs with full coverage CNV sequences were considered as the population-based CNVRs. Finally, a set of 682 CNVRs mapped in American Duroc pigs and 424 CNVRs mapped in Canadian Duroc pigs were used for subsequent analyses.
Quantitative PCR validation
We chose Quantitative PCR (qPCR) to validate the CNVRs detected by PennCNV software. A total of 9 CNVRs was randomly selected according to the CNVR type (loss, gain, and mixed) and frequency in the population. Due to the uncertain boundary of the identified CNVRs, we used Oligo 7 software [33] to design primers for the specific region of ADGRA1, PUSL1, MAPRE2, SGMS2, PCID2, DSCAM, GATD3A, ELFN1 and LIFR genes (see Additional file 4: Table S4). We also selected the GCG gene as the reference locus since this gene was highly conserved among pigs and existed as a single copy in animals [17, 34, 35]. Normal samples identified with no copy number change in the test region were used as a reference. qPCR was conducted on a LightCycler 480 Real-Time PCR System (Roche, Basel, Switzerland). The PCR reaction was performed with a total volume of 10 μl consisting of the following reagents: 1 μl DNA (50 ng/μl), 0.3 μl of both forward primer and reverse primer, 5 μl Blue-SYBR-Green mix (2×) and 3.4 μl water. The PCR conditions were as follows: 10 min at 95 ℃ followed by 40 cycles for 10 s, 60 ℃ for 15 s and 72 ℃ for 20 s. All reactions were carried out on 384-well clear reaction plates and each of sample was performed in triplicate, with average Ct values calculated for further copy number determination. The relative copy number difference of the test region was determined by 2 × 2-ΔΔCt. Where ΔΔCt = [(mean Ct of the target gene in test sample) - (mean Ct of GCG in test sample)] - [(mean Ct of the target gene in reference sample) - (mean Ct of GCG in reference sample)] [36]. The values around 2 were considered normal. A value of 3 or more and a value of 1 or less represent the status of gain and loss, respectively.
CNVR genotyping and association analysis
In order to provide the required input for association analysis, a specific genotyping was necessary. We genotyped CNVR type according to the CN value. For instance, A/A for copy deletion (CN = 0,1), A/G for normal copy (CN=2) and G/G for copy increase (CN=3,4).
In this study, a linear model was used to conduct association analysis between CNVR with frequencies large than 0.5% and single trait in each population, including fixed effects (overall mean, covariate) and random effects (residual effect). The analysis was completed by GEMMA software [37]. The linear model used was as follows:
where Yijkl is the corrected phenotypic value for each population; μ is the overall mean; Si is the fixed effects of gender; Pj is the fixed effects of parity; Wk is the fixed effects of birth weight; Gl is the CNVR genotypic effect; eijkl is the random residuals corresponding to the observed values of the trait. We applied the false discovery rate (FDR) to determine the significance threshold according to the previous studies [38, 39]. P-value was defined as follows:
where FDR was set as 0.05, N represents the CNVRs number of P < 0.05 in the association results, and M is the total number of CNVRs.
CNVR overlapped with relevant QTLs
All QTL data based on Sscrofa 11.1 version were downloaded from the pig QTL database [40, 41]. We retrieved autosomal QTL regions associated with the ADG, AGE, and BFT traits from pigQTLdb, respectively. Significant CNVRs assessed in the corresponding traits were used to map QTLs.
Functional Genes annotation
The physical position information was obtained from the Sscrofa 11.1 version of the pig reference genome. Genes that overlapped or adjacent to the significant CNVRs were selected for the KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways and GO (Gene Ontology) analysis using KOBAS v3.0 [42]. The statistical method used in the enrichment analysis was Fisher’s exact test with a significant threshold of P < 0.05. Also, GeneCards (http://www.genecards.org/) and Ensembl (www.ensembl.org/biomart/martview) were used to query gene functions.