2.1 Assumptions of MR study
MR research needs to satisfy three core assumptions: relevance assumption, independent assumption, and exclusion restriction assumption[10]. Firstly, instrumental variables must be strongly correlated with exposure factors at the genome-wide level of significance (p < 5×10 − 8) (Relevance assumption). Secondly, instrumental variables could not be associated with any confounders associated with the outcome (Independent assumption). Lastly, instrumental variables can only affect outcome variables through exposure, not other pathways (p > 5×10 − 5) (Exclusion restriction assumption). We designed an MR study to explore the causal association between thyroid disease and breast neoplasm (Fig. 1)
2.2 Study population and data sources
The data of exposure including FT4, TSH, and hyperthyroidism used in the MR study was obtained from the ThyroidOmics Consortium, for a total of 72,167 individuals [11]. In addition, the MR study employed summary-level data from the IEU Open GWAS database, namely thyroid cancer including 649 cases and 431 controls (GWAS ID: ieu-a-1082), hypothyroidism including 405,357 individuals (GWAS ID: ebi-a-GCST90013893), and benign neoplasm of the thyroid including 2,079 cases and 101,074 controls (GWAS ID: finn-b-CD2_BENIGN_BREAST_EXALLC). Similarly, data on BC, HER2-positive BC, and HER2-negative BC were acquired from the FinnGen R9 alliance[12]. In this MR analysis, the phenotype was “Malignant neoplasm of breast (controls excluding all cancers)”, “Malignant neoplasm of breast, Her-negative (controls excluding all cancers)” and "Malignant neoplasm of breast, Her-positive (controls excluding all cancers)". On the one hand, the malignant neoplasm of breast GWAS included 182,869 Finnish adult subjects, including 15,680 cases and 167,189 controls. In contrast, the HER2-positive BC GWAS consists of 176,715 Finnish adult individuals, with 9,698 cases and 167,017 controls. The HER2-negative-like BC GWAS involved 172,982 Finnish adult individuals, with 5,965 cases and 167,017 controls. The data all came from European populations.
2.3 Selection of Genetic Instrumental Variables
Referring to previous MR studies, we implemented a rigorous selection process[13]. Using the selection of instrumental variants (IVs) of thyroid cancer as an example, the same criteria were used for the following studies. To satisfy the first MR assumption that single nucleotide polymorphisms (SNPs) must be strongly associated with thyroid cancer, SNPs associated with thyroid cancer were selected at the genome-wide level (p < 5×10 − 8). To ensure the independence of the instrumental variables, we used strict criteria (r2 ≤ 0.001; clumping window, 10 000 kb) and finally selected 347 SNPs associated with thyroid cancer. These selected SNPs were then aligned with the outcome of BC GWAS summary results. Find SNPs with palindrome sequences and remove them: rs4142879, after that only left 346 SNPs. Then we retrieved these SNPs on PhenoScanner (www.phenoscanner.medschl.cam.ac.uk)[14, 15]), a website that facilitates the identification of genetic variants associated with various phenotypes or traits, to examine which SNP might be associated with risk factors of outcome, including exogenous female hormone use, age, dense breast tissue, early onset of menstruation, radiation exposure, alcohol, high body mass index and late onset of menopause[2].
We found 5 SNPs that could be risk factors for malignant neoplasm of breast: rs461599 was associated with alcohol intake frequency, rs10031777 was related to age at menopause, while rs10493096, rs1577026, and rs4567782 were linked to BMI. Then only 341 SNPs were left. From the GWAS database of outcomes, 335 SNPs (Supplementary Table 1) were left as instrumental variables. After rigorous screening, SNPs related to FT4 (n = 14), TSH (n = 41), hyperthyroidism (n = 15), hypothyroidism (n = 99), and thyroid cancer (n = 341) were identified from the above standard. Characteristics of these SNPs used as instruments were presented in Supplementary Table S1-S4. The F statistic[16], \({R}^{2}=2\times MAF\times (1-MAF)\times {beta}^{2}\)) of the SNPs calculated in the analysis was larger than 10, indicating that there were no weak instrumental variables[13].
2.4 statistical analysis
In this MR study, we used four methods including the inverse variance weighted (IVW) random effects model[17], IVW fixed effects model[17], MR-Egger method[18], median weighted method[19], and the weighted mode method[20].
The IVW method is a commonly used method for combining the causal estimates obtained from multiple genetic instruments. IVW method calculates a weighted average of the causal estimates, with the weights being proportional to the inverse of the variance of each estimate. It assumes that all genetic instruments are valid and have a consistent causal effect.
The MR-Egger regression method allows for the possibility of unbalanced pleiotropy (when genetic instruments affect the outcome through pathways other than the exposure of interest). It estimates the causal effect and tests for directional pleiotropy by assessing the intercept of the regression.
The weighted median method is robust to the presence of invalid instruments. It calculates the median of the causal estimates from individual instruments, with weights assigned based on the inverse of their variances. It provides valid causal estimates as long as at least 50% of the weight comes from valid instruments.
The weighted mode method combines information from multiple genetic instruments and aims to improve precision in estimating causal effects by considering the mode (most frequent value) of the individual causal estimates.
To guarantee the reliability of this trial, we performed horizontal pleiotropy, heterogeneity, and sensitivity analysis. The presence of horizontal pleiotropy was assessed utilizing the MR-PRESSO test[21] and the MR-Egger intercept test[22]. Furthermore, Cochran's Q-test was also utilized to identify any cases of heterogeneity[17]. The Cochran Q-test is a statistical method used to test the variability of multiple related samples. When the p value is less than 0.05, we use IVW random effects model, otherwise, we use IVW fixed effects model.
Moreover, we also performed a leave-one-out analysis to evaluate whether a single SNP accounted for a significant association. In addition, we also use the forest plot, funnel spot, and scatter plot for the initial assessment of bias and publication preference of study results. The results of these tests are listed in Supplementary Figure S1-S20.
All analyses were carried out in R software 4.3.2 (https://www.R-project.org) with the TwoSampleMR (version 0.5.8), MendelianRandomization (version 0.8.0), and MRPRESSO (1.0) packages.