Animals and data
Animals were selected using two-tailed strategy proposed by Jiménez-Montero et al. [33] based on the estimated breeding values (EBVs) for milk yield because there is a high negative genetic correlation (-0.68) between milk yield EBVs and methane intensity [e.g. 31], which was almost confirmed by our data set in which the phenotype correlation between milk yield EBVs and predicted methane emission was estimated to be -0.47. Hair and rumen digesta samples were collected from 150 Iranian Holstein cattle in a breeding population (Ferdous Pars Dairy Farm, Isfahan, Iran) in May 2016. Estimation of breeding values were performed by National Animal Breeding Centre of Iran (Karaj, Iran) using lactation model following the Equation 1 [34]:
yij = 𝜇 + hysi + aij + eij Equation 1,
where yij is milk yield (adjusted to 305 days and twice a day milking); μ is the population mean; hysi is the effect of herd-year-season group i; aij is the animal breeding for jth animal and ith herd-year-season group, and eij is the random residual effects. The mean of the accuracy of estimated breeding values for milk yield was 0.61 [34].
The sampled animals were progeny of 42 sires and 150 dams and were born from 2011 to 2013. All animals were fed the same diet of 57.1% concentrate (barley, corn, soybean meal, lime, fish meal, meat meal, salt, fat meal, bentonite, soybean, biosaf, mineral supplement and magnesium oxide) and 42.9% forage (alfalfa, straw and corn silage) for at least two months before the sample collection. Animals were fed four times per day (morning, noon, evening and night) and had ad libitum access to water. The diet was a total mixed ration with 49% of dry matter (DM), 16.5% crude protein (60% rumen degradable protein and 40% rumen undegradable protein), 29% neutral detergent fiber, 21% acid detergent fiber, 4%lignin, 7% ash and net energy for lactation of 1.77 Mcal per kg of DM. The average amount of dry matter intake was 27.3 kg per cow per day.
The rumen digesta samples were shipped to the University of Tehran’s Animal Nutrition Laboratory (Karaj, Iran) and VFA measurements were performed on these samples using the method proposed by Ottenstein and Bartley [35].
The methane emission was predicted using the Equation 2 [36]:
CH4(ml) = 22.4×(0.5×Ac−0.25×Pr+0.5×Bu) Equation 2,
where CH4 is the PME, and Ac, Pr, and Bu are the concentration of acetic acid (mM), propionic acid (mM), and butyric acid (mM), respectively.
The traits include PME (ml), PME per kg milk (ml), PME per kg fat (ml), acetic acid (%), propionic acid (%), butyric acid (%), valeric acid (%) and isovaleric acid (%). The ratio of PME per cow per day to the milk or fat products per cow per day was used to estimate PME per kg product. After removing the outlier values (observations more than 3 standard deviations away from the mean), 147 samples for valeric acid and isovaleric acid, and 146 samples for the rest of the traits were remained for GWAS analyses. In our study, acetic acid (%), propionic acid (%) and butyric acid (%) had high correlation with PME (0.85, -0.89 and -0.14, respectively) and PME per kg of milk (0.60, -0.62 and -0.09, respectively) and low correlation with PME per kg of fat (0.17, -0.17 and -0.11, respectively).
Genotypic data
A total of 150 cows were genotyped using the GGP-LD v4 SNP panel consists of 30,108 SNPs following the manufacturer’s protocol (GeneSeek, Nebraska, United States). The SNPs were removed from the genotyping data for call rate < 95%, minor allele frequency (MAF) < 0.05, and Chi-square < 10-6 of Hardy-Weinberg equilibrium test. A total of 23,835 SNPs were retained in the analysis after the filtering. Given the markers density largely affects the power of QTL detection, imputation from BovineSNP30 panel to sequence data was carried out using stepwise imputation from the BovineSNP30 to the BovineHD beadchip (578,505 SNPs) and to sequence data (12,063,146 SNPs) [37]. The R2 was between 0.85 and 0.93 for imputation of high density to sequence, and between 0.65 and 0.84 for imputation of 54k to sequence [38]. Furthermore, Van Binsbergen et al. [37] could improve the mean accuracy of imputation from 0.37 to 0.65 using the stepwise imputation (BovineSNP50 (3,132 SNPs) to BovineHD (40,492 SNPs) and then to sequence data (1,737,471 SNPs) using chromosome 1 data). In the present study, stepwise imputation was carried out using Minimac3 software [39].
The 1000 Bull Genomes Project database including 129, 43, 15, and 47 key ancestors from global Holstein-Friesian, Fleckvieh, Jersey, and Angus breeds, respectively was used for imputation [10]. The same quality control was performed on sequence data, and a total of 12,063,146 SNPs were retained for the analysis after filtering. Moreover, Eagle (version 2.3) was used to phase genotypes before using Minimac3 for reference and target populations separately [40]. Finally, the imputed genotypes with accuracy lower than 0.30 were removed [41] and a total of 6,583,595 SNPs were retained for further analysis.
Analysis of fixed effects
Data were analyzed by the least squares analysis of variance using the general linear model procedure of the SAS 9.0 (SAS, Institute Inc., Cary, NC). For analyzing PME, PME per kg milk, PME per kg fat, acetic acid, propionic acid, butyric acid, valeric acid and isovaleric acid, the contemporary group (11 levels) and age were fitted to the model as fixed effect and covariate factor, respectively.
Genome-wide association studies
Association studies between the imputed genotypes and the phenotypes—PME (ml); PME (ml) per kg milk; PME (ml) per kg fat; and VFAs—were performed using EMMAX [42]. EMMAX simultaneously adjusted the tests for both population stratification and relatedness in the association study. The model used for GWAS is shown in equation 3:
y = Xβ + Zu + e Equation 3,
where y is the vector of phenotypic values; X is the design matrix relating phenotypes to their corresponding fixed effects; β is the vector of fixed effects (contemporary group, age, and one SNP at a time); Z is the design matrix relating the phenotypes to their corresponding random polygenic effects; u is the vector of random polygenic effects; and e is the vector of random residual effects. Further, we assumed that u ~ N(0, and e~ N(0, , where I is an identity matrix, and K is the kinship matrix constructed from sequence data.
Significance levels
The following methods were used to determine the significance threshold of SNPs: 1) significance threshold of p < 5 × 10−8 as proposed by Reed et al. [43]; 2) genome-wide Bonferroni correction with significance threshold of p < 7.59 ×10-9 = p < 0.05 / total number of SNPs [44]; and 3) chromosome-wide Bonferroni correction with significance threshold of p < 0.05 / total number of SNPs on each chromosome [45]. The latter was used because Bonferroni correction would ignore the linkage between SNPs leading to conservative correction and high false-negative rate [15].
Post-GWAS bioinformatics analysis
Ensembl annotation of UMD3.1 genome version (http://www.ensembl.org/biomart/martview) was used to identify the candidate genes and then human genes orthologous in Ensemble BioMart were identified using the gene identifiers. The DAVID Bioinformatics Resources version 6.7 (http://david.abcc.ncifcrf.gov) was used to carry out gene ontology (GO) analysis. Finally, the gene networks were drawn by GeneMANIA webserver (http://genemania.org/).
Annotating the discovered QTL with previously reported QTL for other traits
The cattle QTLdb (https://www.animalgenome.org/cgi-bin/QTLdb/BT/index) was used to annotate QTLs within 1 Mb of the significant variants associated with the studied traits.