Observational research design and data sources
Data from the NHANES collected from 1999-2004 and 2009-2010 were used for this cross-sectional research. The primary aim of the NHANES initiative is to evaluate the health and nutritional status of noninstitutionalized Americans via stratified multistage probability surveys [30]. Data for this study were accessed from the NHANES website (http://www.cdc.gov/nchs/nhanes.htm) on August 10, 2022. Data from four NHANES cycles were analysed on the basis of LBP and statin use. Initially, 41,663 participants aged 20–80 years were considered for enrolment. Pregnant women (n=2,882), participants without statin information (n=32,894), and those without LBP information (n=1,532) were excluded, resulting in a final sample of 4,343 participants for follow-up analyses. The detailed exclusion process is illustrated in Figure 1.
Fig. 1 NHANES 1999-2004 and 2009-2010 sample selection flowchart. Abbreviations: NHANES, National Health and Nutrition Examination Survey
Low back pain and statin usage status
LBP was categorized as a binary variable. Participants were considered to have LBP if they answered "yes" to the question "Have you experienced LBP in the past three months?" Statin usage was categorized into "taking statins" (yes) and "not taking statins" (no).
Covariables of interest
In addition to basic demographic data such as age, sex, and race, the analysis included clinical covariates related to LBP, such as body mass index (BMI), TG levels, and cholesterol levels. The racial categories in this cross-sectional study included "Mexican American," "Non-Hispanic Black," "Non-Hispanic White," "Other Hispanic," "Non-Hispanic," and "Other race - including Multi-Racial." BMI was divided into four categories: "underweight" (< 18 kg/m²), "normal weight" (18–24 kg/m²), "overweight" (24–28 kg/m²), and "obese" (> 28 kg/m²). Both univariate and multivariate analyses were performed to assess the impact of statin usage on the incidence of LBP. The univariate analysis offered an initial understanding of the relationship between statin use and LBP incidence. The multivariate analysis included covariates that significantly differed between individuals with and without LBP to mitigate potential biases affecting the final results.
Statistical analysis
In the NHANES study, interview weights were applied to appropriately weight the data. Categorical variables are presented as proportions (%), whereas continuous variables are summarized as the means ± standard deviations (SDs) or medians with interquartile ranges (IQRs). The Wilcoxon rank-sum test was used to compare differences among groups in the complex survey sample for continuous variables, and the chi-square test with Rao‒Scott second-order correction was used for categorical variables. Logistic regression, adjusted for the complex survey design, was used to calculate ORs and 95% confidence intervals (95% CIs) to assess the associations between covariates and LBP. The univariate analysis compared the LBP and non-LBP groups via the chi-square test, with "taking statins" as the reference group. The multivariate analysis employed weighted logistic regression, adjusting for covariates to evaluate the impact of statin use on LBP risk. Model 1 was the univariate analysis model. Model 2 was adjusted for BMI and HDL-C, TC, and LDL-C concentrations. Model 3 included additional adjustments for sex and age. All data processing and analyses were performed via R version 4.2.3.
Mendelian randomization studies
Data sources
Aggregated data for TC, LDL-C, and non-HDL-C were obtained from a comprehensive genome-wide genetic meta-analysis involving approximately 1.65 million individuals, predominantly of European descent [31]. To avoid sample overlap, datasets excluding FinnGen participants were used, with sample sizes for these cholesterol traits ranging from 525,239 to 930,637 individuals. CHD data were used as a positive control dataset. Genetic variant information for IVDD, sciatica, LBP, and CHD was sourced from the FinnGen consortium (https://www.finngen.fi) with the following accession numbers: finn-b-M13_SCIATICA, which included 9,917 European-descent patients and 134,889 control participants; finn-b-M13_INTERVERTEB, with 6,827 European-descent patients and 134,889 control participants; and finn-b-M13_LOWBACKPAIN, comprising 15,565 European-descent patients and 134,889 control participants (Supplementary Table 1).
Selection of instrumental variables
As depicted in Fig. 2, the robustness of the two-sample MR study was ensured by strictly adhering to three core assumptions: 1) the selected single-nucleotide polymorphisms (SNPs) must be highly correlated with the exposure; 2) the selected SNPs must not have a direct relationship with the outcome; and 3) the selected SNPs must not be correlated with other confounding factors that could affect the exposure or outcome. The selection of instrumental variables for predicting gene-mediated TC, LDL-C, and non-HDL-C was conducted meticulously using the aggregated data mentioned above, following stringent criteria. These variables were required to meet several conditions: a prevalence above 1% (minor allele frequency), a significant correlation with the respective cholesterol trait (P < 5.0 × 10^-8), and the exclusion of weak instrumental variables on the basis of the F statistic (F statistic > 10). Additionally, these variables needed to be minimally affected by horizontal pleiotropy, as indicated by an R² value of less than 0.30, and located within a 100 kb proximity of the gene on the chromosome. This rigorous approach ensured the reliability and validity of the instrumental variables used in the analysis.
Fig.2 Overview of the research and design of drug target MR analysis.
The expression quantitative trait loci (eQTL) summary data for the HGMCR gene were sourced from the eQTLGen Consortium, which included 31,684 individuals. In contrast, the eQTL data for the PCSK9 and NPC1L1 genes were obtained from the GTEx Portal and included 755 and 663 participants, respectively [32, 33]. Most of these participants were of European ancestry. From these datasets, suitable eQTL instrumental variables for the genes were identified on the basis of criteria including minor allele frequency exceeding 1%, genome-wide correlation significance (P < 5.0 × 10^-8), and an F-statistic greater than 10. Additionally, the eQTLs needed to be within 1 Mb of the respective genes to ensure robustness and validity in the analysis.
Statistical analysis
SNPs predictive of exposures were extracted from the outcome dataset, and their orientations were harmonized through allele correction. The inverse-variance weighting (IVW) method was used to determine the associations of gene-mediated cholesterol traits with the risk of developing IVDD, sciatica, and LBP. The IVW method is robust for causal inference, especially in the absence of horizontal pleiotropy [29]. Beta values, 95% CIs, and P values are provided. A P value less than 0.05 was considered to indicate nominal significance, whereas for multiple comparisons, Bonferroni correction was applied, and a P value of less than 0.004 (12 exposures and one outcome) was considered to indicate statistical significance. To mitigate horizontal pleiotropy (R² < 0.30), colocalization analyses validated associations between gene-mediated cholesterol traits and the conditions studied, following the IVW method [34]. Several sensitivity tests were performed after the IVW analysis:
- The primary analysis was repeated with a stricter clumping threshold (R² < 0.10) to assess linkage disequilibrium.
- The Cochran's Q test was used to identify heterogeneity.
- MR‒Egger regression and MR‒Pleiotropy RESidual Sum and Outlier (MR‒PRESSO) were used to detect horizontal pleiotropy.
- A positive control analysis was performed to verify the validity of the SNPs, and CHD was used as the outcome.
The IVW analysis, colocalization analysis, multivariate MR, and sensitivity analyses were conducted via the TwoSampleMR and coloc packages in R software (version 4.3.0).