3.1 Mendelian randomized design
In our study, we utilized a two-sample Mendelian randomization (MR) analysis to evaluate the impact of 731 immune cell characteristics (categorized into seven groups) on colorectal cancer (CRC). The MR analysis employed genetic variants as instrumental variables, and the validity of our study rested on three core assumptions. Firstly, the correlation assumption posits that genetic variants strongly correlate with the exposure being studied. Secondly, the independence assumption suggests that genetic variants are not influenced by confounding factors that may mediate the relationship between exposure and outcome. Lastly, the exclusion-restriction belief maintains that genetic variants only impact the result through the exposure being studied [13].
3.2 GWAS data on the characterization of 731 immune cells
The GWAS Public Catalog now possesses genetic data associated with 731 immune cell traits. It's worth noting that the most comprehensive reports available to date are numbered GCST0001391 through GCST0002121. These reports provide detailed information on genetic loci related to resistant cell characteristics, including absolute cell (AC) counts (n = 118), median fluorescence intensity (MFI) indicating surface antigen levels (n = 389), morphometric parameters [MPs] (n = 32), and relative cell (RC) counts (n = 192). The B cell, CDC, T cell maturation stage, monocyte, myeloid, TBNK (T cell, B cell, natural killer cell), and Treg panels contain MFI, AC, and RC features, while the CDC and TBNK panels have the MP feature[14]. The first Genome-Wide Association Study (GWAS) on immune characterization used data from 3757 European individuals. This dataset was unique and did not overlap with other cohorts. To create a reference panel, Sardinian sequences were used. High-density arrays were used to genotype roughly 22 million SNPs. After adjusting for factors like sex, age, and age2, tests for correlation were conducted[15].
3.3 GWAS data on colorectal cancer
GWAS data for colorectal cancer were obtained from the FinnGen database (finn-b-C3_COLORECTAL). Binary et al. analyzed approximately 16 million variants after quality control and estimation by performing a GWAS of 3022 European cases and 215770 European controls. The above GWAS data were used for causal analysis.
3.4 Selection of Genetic Instrumental Variables (IVs)
The GWAS analysis produced 731 IVs for immune cell phenotypic characterization. To ensure the quality of the data, we implemented various measures to screen for IV-type genes that met the MR hypothesis. We aimed to achieve genome-wide significance for the immune cell phenotype GWAS data for the screened IVs while obtaining sufficient IVs to improve statistical power. Therefore, we set the p-value threshold for immune cell-characterized IVs at p < 5E-6. We established a significance threshold of p < 5E-8 for the GWAS data of CRC. We also set p-value thresholds for the screened IVs at p < 5E-8 for both immune cell characterization and CRC. We used a window size of 10,000 kb and a threshold of 0.001 for the LD clustering algorithm. We harmonized the exposure and outcome datasets to ensure accuracy, removing SNPs with intermediate allele frequencies and ambiguous SNPs with incongruent alleles. To assess the strength of the IV, we approximated the F-statistic for each SNP using a specific equation.
$$F=\frac{N-K-1}{K}\times \frac{{R}^{2}}{1-{R}^{2}}$$
In the dataset for exposure, N represents the sample size, K represents the number of SNPs, and R^2 indicates the proportion of variation explained by iv [16]. Any iv with F-statistics below ten will be excluded from MR analysis as they are considered weak instruments. Only SNPs passing stringent screening will be selected as IV for further studies.
3.5 Mendelian randomization analysis
For statistical analysis and data visualization, we used R programming software (version R4.2.3). To ensure the robustness of the study, we used three complementary methods for MR analysis: the Inverse variance weighted (IVW) method, the MR-Egger method, and the Weighted-median method. The IVW method is the primary causal estimation method, which provides the most accurate results when all selected SNPs are valid IVs[17]. The MR-Egger method is a causal inference technique that can produce reliable estimates, assuming that the instrumental variables (IVs) are independent of the direct effects. This method is robust to invalid genetic IVs but has lower precision and can be affected by peripheral genetic variation[18]. In comparison to the MR-Egger method, the weighted median method is a more effective way to estimate causality. This is because it calculates the weighted median of the Wald ratio estimates, and is able to handle horizontal pleiotropic bias without relying on the InSIDE assumption. Additionally, it has a lower type I error rate[19]. We determined that there is a nominal causal relationship between exposure and outcome if the IVW analysis result shows a P-value of less than 0.05, and the direction of the OR for the three MR analysis methods is consistent. To establish a significant causal relationship, we considered a P-value of less than 6.84E-5 (0.05/731) for the IVW method using Bonferroni correction[20].
3.6 Multivariate MR analysis and inverse MR analysis
Our team conducted both multivariate MR analysis [21] and univariate reverse MR analysis to examine the causal effects of immunophenotypes on CRC. Multivariate MR analysis allows us to simultaneously identify the causal effects of multiple risk factors [22] while accounting for the relationships between different immune cell phenotypic characteristics to ensure reliable results. We utilized the regression-based IVW method for our multivariate MR analysis, using combinations of IVs from various exposures[21]. To further validate our findings, we performed reverse MR analyses on immunophenotypes with significant causal associations to confirm the directionality of causality.
3.7 Sensitivity analysis
Conducting sensitivity analysis is crucial in evaluating heterogeneity and horizontal pleiotropy, which can potentially compromise the integrity of MR analysis. Horizontal pleiotropy occurs when the instrumental variables (IVs) affect the outcome through pathways other than the exposure. To ensure the accuracy of our results, we employed various methods in this study. These included the Cochran Q test, the MR-Egger intercept test, and the MR-Pleiotropy RESidual Sum and Outliers (MR-PRESSO) test, all used to detect and correct for heterogeneity and horizontal pleiotropy. Heterogeneity was established when the Cochran Q test yielded a result of p < 0.05 [23]. We also calculated the MR-Egger intercept to detect any bias resulting from IV nullification [24], and finally, we used MR-PRESSO to verify the study for horizontal pleiotropy [25]. Additionally, we performed leave-one-out (LOO) analyses to assess the robustness of the results by discarding each SNP in turn and then performing MR analyses[24]. We also refined the Steiger test to determine the directionality of causality.
The methods described above were supported by various R packages, including "TwoSampleMR" (Version 0.5.7), "MendelianRandomization" (Version 0.9.0), and "MRPRESSO" (Version 0.9.0) (PMID: 24114802). Forest plots were generated using the "forestploter" R package (Version 2.0.1). The "forestploter" R package (version 1.1.1) generates a forest plot. Use the "circlize" R package to create a Ring heatmap.