2.1 Study Design
This study utilized a two-sample Mendelian randomization (MR) approach to investigate the causal relationship between vitamin E and degenerative musculoskeletal diseases (osteoarthritis, osteoporosis, muscle loss, and intervertebral disc degeneration). Vitamin E was considered the exposure factor, while degenerative musculoskeletal diseases (osteoarthritis, osteoporosis, muscle loss, and intervertebral disc degeneration) were the outcome variables. Figure 1 outlines the study design and the hypotheses of the MR study [24].
2.2 Data Sources
Biological information databases used in this study primarily included the UK Biobank's IEU OpenGWAS, the Finnish FinnGen Biobank, and the PhenoScanner database. Exposure data for vitamin E were obtained from the IEU GWAS database, with the ID ukb-b-12506, involving 13,548 participants and 9,851,867 SNPs. Outcome data for degenerative musculoskeletal diseases (osteoarthritis, osteoporosis, muscle loss, and intervertebral disc degeneration) were obtained from the IEU GWAS database and the Finnish database. See Table 1 for specific details. The genetic backgrounds of the study populations were all from Europeans to eliminate bias due to racial-related confounding factors.
Table 1 A detailed description of the GWAS data involved in this study.
2.3 Instrumental Variable Selection
When selecting instrumental variables (IVs), three main assumptions must be met: 1) strong correlation with the exposure factor; 2) no direct correlation with the outcome through confounding pathways; and 3) only indirect correlation with the outcome through the exposure. Using a significance threshold of P < 1×10-6, statistically significant SNP loci were selected as initial IVs from the genetic data on vitamin E. Linkage disequilibrium was set at r2 = 0.001 with a region width of 10,000 kb to ensure independence among SNPs and exclude the influence of genetic pleiotropy on the results. PhenoScanner V2 database was used to identify and remove confounding SNPs (e.g., those related to smoking, alcohol consumption, obesity) to meet the second assumption. Additionally, SNPs closely related to the outcome (P < 1×10-6) were excluded. Finally, the strength of the instrumental variables was quantified using the F-statistic, calculated as (beta/se)2. An F-statistic greater than 10 indicates a low likelihood of weak instrumental variable bias [26].
2.4 MR Analysis
The inverse variance weighted-fixed effects model (IVW-FE) was used as the main MR analysis method. IVW-FE is the most effective analysis method when there is no genetic pleiotropy, meaning the SNPs selected as instrumental variables do not affect the outcome through any pathways other than the exposure. Additionally, MR-Egger regression, Weighted median, Weighted mode, and Simple mode were used for result validation and stability testing. The simple median method requires at least 50% of genetic variation to be effective instrumental variables, while the weighted median method requires that the weights contributed by genetic variation be at least 50% effective. MR Egger regression relaxes the requirement of no genetic pleiotropy in the IVW method, assuming that instrumental variables do not or only partially affect the outcome through the exposure factor [25,27].
2.5 Sensitivity Analysis
Sensitivity analysis included calculating the F-statistic for SNPs to assess the strength of the association between instrumental variables and exposure factors, where F= (beta/se)2, with beta being the allele effect value and se being the standard error. An F-value less than 10 indicates potential weak instrumental variable bias and the SNP should be removed. Cochran's Q test was used to assess heterogeneity among instrumental variables, with a P-value > 0.05 suggesting a low likelihood of heterogeneity. The MR Egger intercept test was used to assess the presence of horizontal pleiotropy, with a statistically significant intercept indicating significant horizontal pleiotropy in the study. The Mendelian randomization pleiotropy residual sum and outlier (MR-PRESSO) method were used to detect and correct for outliers if they exist. The "leave-one-out" sensitivity analysis was conducted by removing individual SNPs each time to assess the impact of each genetic variant on the overall causal effect [25,28].
2.6 Statistical Methods
All statistical analyses were performed using the "TwoSampleMR" package in R software version 4.2.2. Results were presented as odds ratios (OR) with 95% confidence intervals (CI). The significance level was set at α=0.05.