2.1 Study design
Figure 1 illustrates an overview of the study design, which comprised of several components including observational and causal association (or MR) analyses to comprehensively assess the relationship between urate and cancer in European individuals. We first performed an observational association analysis combining the data from two population-based Swedish cohorts from southernmost part of Sweden, the Malmö Diet and Cancer Study (MDCS) [39] and the Malmö Preventive Project (MPP) [40]. The combined dataset is referred to as MDCS-MPP cohort in this study.
To ensure the robustness of causal associations, we implemented a range of, both primary and sensitivity, unidirectional MR analyses to determine the causal effect of urate on the risk of cancer. For this purpose, both one-sample and two-sample MR settings were employed using the individual-level data from MDCS-MPP and summary-statistics data from the Global Urate Genetic Consortium (GUGC) [19] and UK-Biobank cohort [41] respectively.
2.2 Study participants and data sources
The observational association between urate and cancer risk was investigated using the data of 17,597 individuals in MDCS-MPP cohort (Table S1). Both MDCS and MPP are population-based cohorts of 30,447 and 33,346 European (mostly born in Sweden) participants, respectively. At baseline, individuals in MDCS underwent a health examination from 1991 to 1996, while those in MPP had a health examination from 1974 to 1992. Individuals from both of these cohorts included in the study (n = 17,597) were middle-aged (mean age = 46.5 years, Table S2). The follow-up data descriptions and other details for MDCS and MPP have been published elsewhere [40, 42]. For the purpose of this study, data for SU measurements (mean SU = 4.96 mg/dL) (Table S2) from the health surveys was used, and cancer endpoints (mean follow-up = 21.2 years) (Table S1) were obtained using the national cancer register in Sweden, which has been found to have a high validity [43-45]. In total, data for 13 common site-specific cancers [bladder, breast, colorectal, gastric, hepatic, lung, pancreatic, prostate, renal, skin, lymphatic and hematopoietic cancers, gynecological (included ovarian, cervical and uterine cancers) cancers and brain tumor] were used, and the variable all-cause cancer was created by combining the cases from all 13 cancer types (n = 5,659) (Table S1). Cancer diagnosis was defined using International Classification of Diseases or ICD-9 and 10 codes (Table S3). To increase comparability, all cancer cases were excluded from the control group (n = 11,938).
For one-sample MR analysis, the data for SU measurement and that for 14 cancer outcomes, were obtained from the MDCS-MPP cohort for all 17,597 participants. For two-sample MR, we retrieved publicly available GWAS summary statistics data for variant-urate association for 110,347 European individuals from GUGC [19]. The variant-cancer outcome association data for 13 site-specific and all-cause cancer were obtained from 367,570 European individuals (case = 36,815, control = 330,755) in the UK-Biobank. The details of the genotyping information in MDCS-MPP and UK-Biobank are provided in supplementary material (Text S1). The definitions for several cancer outcomes were similar but not identical between the MDCS-MPP and the UK-Biobank cohorts using ICD-9 and 10 codes (Table S3).
2.3 Statistical analyses
2.3.1 Instrument selection
We used a set of 26 variants (single nucleotide polymorphisms or SNPs), associated with SU at a genome-wide significance (p < 5 x 10-08) in the GUGC consortium [19], as the exposure instrument to run several MR analyses in this study (Table S4).
2.3.2 Observational association analysis
For observational association analysis, multivariable cox-proportional hazard regression models, adjusted for age and sex (except for breast, prostate, and gynecological cancers, where the adjustment was done only for age), were used to evaluate the association of SU, separately, with the risk of each cancer type.
2.3.3 One-sample MR
We tested the association of all 26 SNPs with SU in the individual-level data from the MDCS-MPP cohort, using the linear regression models in controls only (Table S6 and S7). The F-statistic was also calculated in this regression analysis to assess the strength of the SU instrument. An F-statistic of > 10 is regarded as having a strong potential to predict causality without weak instrument bias [46]. SNPs to cancer endpoint association estimates were obtained using logistic regression models (Table S8 to S21). All analyses were adjusted for age and sex, except for breast, prostate, and gynecological cancers, where the adjustment was done only for age.
To indicate the robustness of causal associations, a range of MR analyses were performed to run one-sample MR using the summary-statistics from the above variant-urate and variant-cancer association analysis (wherever applicable) in the MDCS-MPP. These analyses primarily implemented MR using two-stage least square (2SLS) and inverse variance weighted (IVW) methods. Cochran’s Q test was used to assess the heterogeneity across the estimates for 26 SNPs. We further applied MR analysis in MDCS-MPP data using weighted genetic risk score (GRS), where the GRS were calculated based on the number of risk alleles for all SNPs and their effect sizes on the SU levels in the MDCS-MPP cohort.
2.3.4 Two-sample MR
To run two-sample MR, the summary statistics data for SNP/variant-urate associations was obtained from the GUGC consortium (Table S4) while for variant-cancer endpoints the data was obtained from the UK-Biobank (Table S22 to S35) for all 26 SNPs. The individual estimates for each SNP were generated using the Wald ratio estimator, while the standard error (SE) was calculated using the Delta method [47, 48]. The estimator provides the ratios for SNP-outcome estimate over SNP-exposure estimate. The individual estimates were then pooled for each cancer type in the IVW ratio method as the primary MR analysis using the random-effect model. All effects were adjusted for age and sex (wherever applicable) before running the MR analysis. Cochran’s Q test was used to assess the heterogeneity across the estimates for 26 SNPs. As IVW method is vulnerable to the horizontal pleiotropy through the confounding pathways independent of the exposure (SU in this case), a range of sensitivity analyses were carried out. Each of these sensitivity analyses assess the causal effect based on different assumptions that are designed to be less stringent to address the possible pleiotropy. The methods applied are weighted median and MR-Egger methods. The detailed description for these methods is provided in the supplementary material (Text S2). In addition, the Mendelian Randomization Pleiotropy Residual Sum and Outlier (MR-PRESSO) method was performed [49]. The MR-PRESSO removes the potential outliers that are determined by the square of residual errors from the SNP-outcome against SNP-exposure regression to calculate an outlier-free effect estimate. This test is more sensitive than Egger and has the power to detect any outlier SNP in the MR analysis that could introduce biasness in the results. We also performed leave-one-out analyses to detect the significant effect that an individual SNP could render upon the MR estimates. Power calculation were done using mRnD power calculator [50].
To deal with any possible horizontal pleiotropy, an additional sensitivity analysis was performed. The 26 SNPs used as urate instrument were examined for their association with the traits other than SU levels using PhenoScanner (http://www.phenoscanner.medschl.cam.ac.uk/). The PhenoScanner is an online platform that provides public access to a vast range of GWAS summary results. We identified seven SNPs (Table S5) that were exclusively associated with SU levels (and or gout) and repeated all MR analyses using seven SNPs as urate instrument.
We applied Bonferroni correction to account for multiple testing for the 14 cancer outcomes. The associations with p-values < 3.5E-03 (where p = 0.05/14) were deemed as strong evidence for causal associations, while associations with p-values < 0.05 but > 3.5E-03 were arbitrated as suggestive evidence of associations. All analyses were done using R software (v4.3.2) and Mendelian Randomization [51] and MR-PRESSO software [49] packages in R.
2.4. Ethics approval and consent to participate
The Regional Ethics Committee at Lund University (Dnr LU 90-51, 85/2004 and 2009/633) provided ethical approval for this study in MDCS-MPP cohort. The UK Biobank has approval from the North West Multi-centre Research Ethics Committee (MREC) as a Research Tissue Bank (RTB) approval, renewed in 2021 (REC reference: 21/NW/0157).