Comparison of the performance of different MR methods
To compare the performance of different MR methods and better understand the differences in real data analysis results, we conducted extensive simulations under a range of scenarios (Methods and Supplementary Note 1), with a specific focus on the effect of horizontal pleiotropy. In our previous study22, we observed that strong confounders could distort the true genetic correlation and causal association, even reversing their direction, and a large proportion of IVs showed strong directional pleiotropic effects. Thus, simulation settings need to mimic such extreme scenarios to test the limits of MR methods. We included in the benchmark analysis a set of commonly used MR methods, namely IVW23, weighted median24, mode19, MR-Egger18, Robust25, MR-Lasso26, RAPS27, MR-PRESSO20, MRMix28 and Con-Mix29. We also included an upgraded version of the Generalised Summary-data-based Mendelian Randomisation30 (named GSMR2), incorporating a new global heterogeneity test with improved robustness to detect and remove pleiotropic IVs (Supplementary Note 1 and URLs).
The simulation results showed that under the null model (i.e., no causal effect between exposure and outcome), when the proportion of invalid IVs was small and the invalid IVs explained a small fraction of heritability of the exposure, almost all the methods showed a well-controlled false-positive rate (FPR) (Supplementary Figs. 1–2). However, when the proportion of invalid IVs was large (e.g., half of the IVs were invalid), most methods had an inflated FPR under the null. Under a causal model (i.e., the alternative model), the estimates of causal effects could be biased by directional pleiotropy (the effects of pleiotropic IVs are correlated between exposure and outcome), and the inflation was proportional to the strength of the directional pleiotropy. Under the alternative model, most methods attained high statistical power when the level of directional pleiotropy was modest, but the power for several methods decreased substantially when the level of directional pleiotropy was strong (Supplementary Fig. 3). The simulation results suggest that in the presence of strong directional pleiotropy, no MR method can attain both low FPR under the null model and high statistical power under a causal model. The consistency in result between the MR methods reduced with the increased level of directional pleiotropy because of the differences in robustness to pleiotropy between the methods. Thus, it is essential to compare results from different MR methods before making definitive inferences about causality, and such triangulation framework has proven effective in improving the robustness of causal inference26,31,32.
To estimate the causal effects of SUB on common diseases, we carried out MR analyses between 7 SUB traits (e.g., smoking initiation, alcohol consumption) and 18 common diseases (e.g., asthma, type 2 diabetes, psychiatric disorders) plus disease count (i.e., a sum of diseases carried) using the 11 MR methods that were calibrated as mentioned above (Methods and Supplementary Tables 1–2). For each SUB trait, we used a local false discovery rate (FDR) of 0.01 to define significant associations between the exposure and outcomes and a local FDR of 0.05 to define suggestive associations (Methods).
Widespread risk effects of smoking on common diseases
Results from nearly all the methods consistently showed that smoking initiation (SI) had significant risk effects on 13 diseases and protective effects against 1 disease (Fig. 1 and Supplementary Table 3), consistent with recent MR studies for smoking traits33–35. In total, there were 100 significant associations out of 209 tests (11 methods multiplied by 19 outcomes). MR-Egger and MRMix were the only two methods that did not show any significant association, consistent with the simulation evidence that they had the lowest statistical power in most scenarios among the MR methods tested (Supplementary Fig. 3). The only protective effect of SI was against allergic rhinitis, which was significant in seven methods, and the estimates were largely consistent across methods (Fig. 1). Former smoking (FS) and current smoking (CS) both showed consistent results with SI, and all these three smoking-related traits showed consistent risk effects on disease count (Supplementary Figs. 4 and Supplementary Tables 4–5). On the contrary, for smoking cessation (SC), only the GSMR2 method showed a significant protective effect against cardiovascular disease and a suggestive protective effect against disease count (Supplementary Fig. 4 and Supplementary Table 6). The small number of significant associations for SC was likely due to the lack of power because only 8 index SNPs were included in the MR analysis (see below for the results from a more powerful analysis).
The observed beneficial health effects of moderate drinking are likely non-causal
The health consequences of alcohol consumption (AC) have been under debate for decades. Several genetic analyses showed negative estimates of genetic correlation between AC and common diseases36,37. However, recent MR analyses failed to find any significant cardioprotective of alcohol drinking35,38,39. In addition, observational studies showed a non-linear relationship of AC with common diseases10,40, e.g., a J-shaped relationship with cardiovascular disease41,42. Our previous study showed that the negative estimates of genetic correlation and J-shaped relationship between AC and disease could be largely driven by misreports and longitudinal changes due to disease ascertainment22. In this study, results from different MR methods showed consistently that AC had risk effects on cardiovascular disease, dyslipidemia, and hypertensive disease (Fig. 2 and Supplementary Table 7). To further investigate the health effects of moderate drinking, we derived two additional phenotypes: moderate alcohol consumption (MAC) and heavy alcohol consumption (HAC) and re-ran the MR analysis (Methods). We found that MAC did not show any significant protective effects in any methods, while HAC still showed significant risk effects on dyslipidemia and hypertensive disease (Supplementary Figs. 5 and Supplementary Table 8), implying that the protective effects of moderate drinking observed from observational studies are likely non-causal.
Coffee and tea intake exerted complicated effects on common diseases
Coffee intake (CI) showed significant risk effects on five diseases (Fig. 3). For asthma, cardiovascular disease, dyslipidemia, and iron deficiency anemias, only one method was significant although the estimates from all the other methods showed a consistent direction. For osteoarthritis, all the methods showed significant results except for MR-Egger, and the direction of the estimates were all consistent (Supplementary Table 9 and Fig. 3). The mean OR from all methods was 1.52, which is interpreted as a 1 s.d. increase in CI (equivalent to 2.10 cups per day) leading to a 1.52-fold increase in the risk of osteoarthritis. CI also showed a protective effect against irritable bowel syndrome, osteoporosis, and varicose veins of lower extremities (VVLE), but the evidence is considered as modest since only a few methods provided significant estimates (Fig. 3). Tea intake (TI) showed a significant risk effect on osteoarthritis in Median and Mode methods, a significant protective effect against osteoporosis in GSMR2, and a suggestive protective effect against type 2 diabetes (T2D) and VVLE in GSMR2 and Mode methods (Fig. 4), indicating possible confounding effects dealt with differently by different methods so that these results should be interpreted with great caution. Neither CI nor TI had a significant effect on disease count, suggesting that the overall health effects of these two behaviours are mild (Figs. 3–4 and Supplementary Tables 9–10). Alternatively, the effect may be dosage-dependent, and thus underestimated if we assume a linear relationship (see below).
The relationship between CI and common diseases is complicated and controversial. For example, CI has previously been associated with lower T2D risk43,44. However, recent evidence argues that high coffee consumption increases the T2D risk compared to low consumption45. We attempted to investigate this potential dosage-dependent relationship via a stratified analysis by performing logistic regression of 18 common diseases on 10 different dose groups against non-drinkers (Supplementary Note 3). The results showed that for T2D, coffee intake of less than five cups per day had beneficial effects, but when the intake was more than six cups per day, the protective effects turned to risk effects (Supplementary Fig. 6). TI also showed dosage-dependent patterns for cardiovascular disease and osteoarthritis (Supplementary Fig. 7) as well as several other diseases, suggesting that the health effects of both coffee and tea intake might be dosage-dependent.
If there is a J-shaped, dosage-dependent relationship between CI/TI and a disease, the genetic correlation (rg) between moderate CI/TI and the disease could potentially be in the opposite direction compared to that between high CI/TI and the disease. To verify this hypothesis, we derived four new traits, heavy/moderate coffee intake (HCI/MCI) and heavy/moderate tea intake (HTI/MTI), i.e., contrasting people with a daily intake of ≥ 5 or < 5 cups against those with zero intake, and assessed the associations of the original and new tea/coffee intake phenotypes with the 18 common diseases by genetic correlation, stratified regression, and MR analyses (Methods). HCI showed a significant (local FDR < 0.01) positive \({\widehat{r}}_{g}\) with 3 diseases and no significant negative rg (Supplementary Fig. 8), consistent with the results for CI. In contrast, MCI showed a significant negative \({\widehat{r}}_{g}\) with 9 diseases and no significant positive \({\widehat{r}}_{g}\) (Supplementary Fig. 8 and Supplementary Table 11). For example, MCI showed a negative \({\widehat{r}}_{g}\) (\(-0.22, s.e.=0.03, \text{q} \text{v}\text{a}\text{l}\text{u}\text{e}=2.65\times {10}^{-10}\)) with cardiovascular disease, whereas the estimate for HCI was in the opposite direction (\({\widehat{r}}_{g} = 0.16, s.e.=0.04, \text{q} \text{v}\text{a}\text{l}\text{u}\text{e} = 1.07\times {10}^{-4}\)), consistent with the results from the dosage-dependent regression analysis (Supplementary Fig. 6). However, in the MR analysis, the significant estimates of causal effects (\({\widehat{b}}_{xy}\)) of MCI on common diseases were mostly in consistent direction with those for HCI (Supplementary Figs. 9), suggesting that the difference in the direction of \({\widehat{r}}_{g}\) with common diseases between MCI and HCI might be due to pleiotropic effects and/or confounders (see below for more discussion). For tea intake, the rg estimates between the MTI-disease pairs were broadly consistent with those between the HTI-disease pairs, e.g., both MTI and HTI showed significant negative genetic correlation with T2D (Supplementary Fig. 8) and protective effects against T2D as suggested by three MR methods (Supplementary Figs. 10). The only robust risk causal effect of HTI was found for osteoarthritis (significant in MR-Median and MR-Mode methods with the estimates from the other MR methods in a consistent direction). We further demonstrated by simulation that observing opposing directions in the estimates of rg between exposure and outcome across different stratified exposure groups is indicative of dosage-dependent effects (Methods and Supplementary Fig. 11).
To better understand the dosage-dependent associations and the discrepancies between the genetic correlation and MR results shown above, we focused specifically on the association between MTI/HTI and osteoarthritis because of a discernible dosage-dependent effect shown consistently in the stratified regression, genetic correlation, and MR analyses (Supplementary Figs. 7–8, 10). We visualized the relationship between the effects of IVs on the exposure and outcome (Supplementary Fig. 12) and found that the two most significant IVs (rs1264377 and rs977474) for MTI were distinct from those for HTI (rs4410790 and rs2472297), causing the estimates of bxy to be in opposite directions for the two sub-phenotypes (Table 1). For example, the T allele of rs4410790 (top IV for HTI) had a negative effect on HTI (\({\widehat{b}}_{GWAS}=-0.091, s.e.=0.007, P=4.90\times {10}^{-35}\)) but a positive effect on MTI (\({\widehat{b}}_{GWAS}=0.034,s.e.=0.006,P=3.40\times {10}^{-8}\)). All these observations above indicate a substantial genetic heterogeneity between MTI and HTI, which was further supported by the evidence that the genetic correlation between them was significantly different from unity (\({\widehat{r}}_{g} = 0.651, s.e. = 0.028\)). For coffee intake, the top two IVs for HCI and MCI were the same, and their bxy estimates were in the same direction (Table 1). Also, HCI showed more significant bxy estimates with common diseases (risk effects on T2D, dyslipidemia, and osteoarthritis, and protective effects against osteoporosis across different methods) than MCI did (only risk effect on T2D and osteoarthritis in a single method, Supplementary Fig. 9). In addition to the disease outcomes, we utilised PolyMR to directly assess the non-linearity of the effects of CI and TI on seven common biomarkers (Methods). We found significant non-linear effects of CI on total cholesterol (P = 9.50 × 10−7) and low-density lipoprotein levels (P = 2.46 × 10−3), even after applying the Bonferroni correction (Supplementary Fig. 13).
Table 1
Dosage-dependent effects of the top four GWAS signals for coffee and tea intake
SNP
|
Nearest Gene
|
A1/A2
|
Trait
|
N
|
Freq_A1
|
beta
|
s.e.
|
P
|
rs4410790
|
AHR
|
T/C
|
CI
|
421947
|
0.366
|
-0.062
|
0.002
|
1.1E-166
|
MCI
|
393101
|
0.369
|
-0.067
|
0.005
|
5.0E-36
|
HCI
|
124890
|
0.369
|
-0.2
|
0.01
|
5.3E-92
|
TI
|
440094
|
0.367
|
-0.045
|
0.002
|
4.1E-92
|
MTI
|
351508
|
0.373
|
0.034
|
0.006
|
3.4E-08
|
HTI
|
155936
|
0.354
|
-0.091
|
0.007
|
4.9E-35
|
rs2472297
|
CYP1A1
|
T/C
|
CI
|
421947
|
0.265
|
0.074
|
0.002
|
2.6E-198
|
MCI
|
393101
|
0.263
|
0.08
|
0.006
|
1.2E-40
|
HCI
|
124890
|
0.264
|
0.25
|
0.01
|
4.8E-130
|
TI
|
440094
|
0.264
|
0.06
|
0.002
|
1.9E-134
|
MTI
|
351508
|
0.258
|
-0.027
|
0.007
|
9.2E-05
|
HTI
|
155936
|
0.278
|
0.124
|
0.008
|
1.3E-54
|
rs1264377
|
PSORS1C3, MIR877
|
A/G
|
CI
|
421947
|
0.182
|
0.015
|
0.0028
|
4.0E-08
|
MCI
|
393101
|
0.181
|
-0.004
|
0.0067
|
0.51
|
HCI
|
124890
|
0.185
|
0.043
|
0.0119
|
3.5E-04
|
TI
|
440094
|
0.182
|
-0.002
|
0.0028
|
0.37
|
MTI
|
351508
|
0.181
|
-0.054
|
0.0077
|
2.8E-12
|
HTI
|
155936
|
0.185
|
-0.028
|
0.0091
|
1.8E-03
|
rs977474
|
PRH1/ PPR4/TAS2R14
|
C/T
|
CI
|
419168
|
0.166
|
0.022
|
0.0029
|
1.8E-13
|
MCI
|
390497
|
0.165
|
0.044
|
0.0071
|
4.6E-10
|
HCI
|
124071
|
0.163
|
0.077
|
0.0125
|
7.1E-10
|
TI
|
437186
|
0.165
|
-0.021
|
0.0029
|
4.0E-13
|
MTI
|
349186
|
0.166
|
-0.058
|
0.0080
|
6.3E-13
|
HTI
|
154901
|
0.167
|
-0.067
|
0.0095
|
1.2E-12
|
Note: rs4410790 and rs2472297 are the top two GWAS signals in all GWAS except for MTI, and rs1264377 and rs977474 are the top two GWAS signals in MTI GWAS. A1/A2: minor allele/major allele. Freq_A1: allele frequency of A1. N: sample size. beta: estimate of SNP effect. s.e.: standard error of the beta estimate. |
Taken together, our results demonstrate the complexity of the health consequences of coffee/tea intake. The results also suggest that the overall health effects of CI and TI are mild and need to be interpreted with caution, especially when a dosage-dependent relationship is observed.
Validating the causal estimates by data from published studies
To validate our causal estimates above, we first re-ran the MR analysis with the disease GWAS summary statistics replaced by those from published studies (Methods and Supplementary Table 12). We identified 86 significant associations (local FDR < 0.01) between the 7 SUB traits and 12 common diseases (Supplementary Fig. 14). The causal effects estimated using the published disease GWAS data were highly correlated with those estimated using the UKB disease data, despite the phenotypic definitions of the diseases being slightly different between studies. The Pearson’s correlation r of the bxy estimates across 7 SUB traits was 0.86 between cardiovascular disease and coronary artery disease (CAD), and 0.77 between psychiatric disorder and schizophrenia (SCZ) (note: the reported r is the median of the estimates across 11 MR methods). We also re-ran the MR analysis using summary data for SI, SC, and AC from a recent GWAS meta-analysis by the GSCAN consortium37 (Supplementary Fig. 15). The bxy estimates using the UKB SUB data were generally consistent with those using the GSCAN SUB data (Pearson’s correlation r = 0.55–0.81 across different MR methods). Of the 100 significant associations between SI and common diseases discovered in the UKB, 94 remained significant when using SI from the GSCAN data. Notably, smoking cessation from GSCAN showed several significant protective effects with consistent estimates from multiple methods, validating the beneficial effects of SC, as indicated by the GSMR2 analysis above with the SC data from the UKB. The gain of power is likely to be due to the increased number of IVs (from 8 to 18). These results also demonstrate the power of GSMR2 when the number of IVs is limited. On the other hand, the replication rate of AC was low (4/20), probably because the GSCAN dataset has not corrected for misreports and longitudinal changes as noted previously22.
Causal estimates are largely robust to the confounding of socioeconomic status
Considering that the estimates of causal associations between SUB and common diseases might be confounded by SES, we estimated the causal effects of SUB on the diseases adjusting for educational attainment (EA) and household income (HI). To achieve this, we applied mtCOJO30 which only requires summary statistics to conduct a conditional GWAS analysis for each SUB or disease trait conditioning on EA and HI simultaneously (Methods). We then re-ran the MR analysis using the SES-adjusted SUB and disease GWAS summary statistics. The causal estimates after the SES adjustment were largely consistent with those without adjustment (Fig. 5), indicating that the causal estimates between SUB and common diseases were generally robust to the confounding of the SES analyzed in this study (except for MRMix which showed several extreme bxy estimates for AC and TI after the SES adjustment). In terms of the robustness for each specific exposure, most of them showed consistent results before and after the SES adjustment except for tea/coffee intake. The results for smoking-related traits were highly robust even for the results from MRMix. These observations indicate that tea/coffee intake is more likely to be confounded by SES compared to smoking and drinking. We further validated the mtCOJO adjustment by conducting BOLT-LMM46 analysis on both SUB and common diseases fitting EA and HI as covariates and re-ran the MR analysis (Methods). The individual-level data-based conditional GWAS analysis results were consistent with those from mtCOJO (Supplementary Fig. 16), and the Pearson’s correlation r of the bxy estimates between mtCOJO adjustment and individual-level data-based adjustment ranged from 0.61 to 0.97 across different exposures (excluding the estimates from MRMix). We also adjusted the SUB and disease GWAS data for two physical activity traits (i.e., leisure screen time and moderate-to-vigorous intensity physical activity during leisure time)47 using mtCOJO, and the causal estimates remain largely unchanged (Supplementary Fig. 18). To further assess the robustness of our analysis to potential collider bias48, we employed Slope-Hunter49 to correct the seven SUB traits for SES, specifically educational attainment, and re-estimated their effects on diseases using all 11 MR methods. The bxy estimates after Slope-Hunter adjustment were largely consistent with those after mtCOJO adjustment (Supplementary Fig. 17).
Bi-directional effects are rare
To investigate whether there are reverse causal associations between SUB and complex diseases (i.e., disease status leads to behavioural change), we performed a reverse MR analysis, designating a disease as the exposure and an SUB as the outcome (Methods). The number of diseases that showed significant effects on SUB at local FDR < 0.01 was small (Supplementary Table 13). For the diseases available from the UKB, only asthma showed significant negative effects on current smoking in the GSMR2 and Lasso analyses. For the diseases available from the published studies, there was a strong positive effect of major depressive disorder (MDD) on smoking initiation (\({\widehat{b}}_{xy}=0.19\sim0.28\)) significant in 9 out of the 11 MR analyses, consistent with the previous findings50,51. Schizophrenia also showed a positive effect on smoking in multiple MR analyses, but the effect size was much smaller than that for MDD (Supplementary Table 13).