Study Design
MR analysis should satisfy the following three assumptions [20]: (1) the selected instrumental variables are strongly associated with the exposure (pQTL and eQTL); (2) instrumental variables are not related to any potential confounding factors; (3) instrumental variables only influence the outcome through the exposure and not through any other pathways, there is no genetic pleiotropy. The overall study design is illustrated in Fig. 1.
Data Source
eQTL and pQTL data were obtained from the study by Zheng et al. and selected as instrumental variables [21]. The eQTL data were sourced from the GTEx V8 and eQTLGen consortium [22, 23]. The GTEx V8 database comprises 838 donors and 15,201 samples, covering 49 tissues or cell types. The eQTLGen consortium collected blood samples from 31,684 individuals across 37 datasets. The pQTL dataset was derived from six aggregated datasets [24, 25], specifically including 3,301 healthy participants from the INTERVAL study, involving 50,000 blood donors from 25 centers in England[26]; 1,000 individuals from Germany[27]; 3,394 European participants [28]; 7,333 participants from the Birmingham Cardiovascular Risk Study[29]; 5,457 individuals aged 65 and older from Iceland [30]; and 7,213 European American participants from the Atherosclerosis Risk in Communities study[25].
Pancreatic cancer GWAS summary statistics data were obtained from the FinnGen consortium R10 version (https://r10.finngen.fi/) and used as the outcome instrument. In this study, "Malignant neoplasm of pancreas (excluding all tumors)" was employed as the phenotype. The GWAS data sample consisted of 315,819 individuals, including 1,626 cases of pancreatic cancer and 314,193 controls. The exposed and outcome groups were predominantly composed of individuals of European descent to avoid potential biases due to ethnic differences, with little to no apparent sample overlap between the exposed and outcome groups.
This study involves reanalyzing previously collected and published data, so institutional review board ethical approval is not required.
Instrumental Variable SNP Selection
By filtering SNPs in the pQTL and eQTL data with P < 5x10− 8 and calculating F values > 10, SNPs were selected to meet the strong correlation with pancreatic cancer. The instrumental variables in the eQTL data exhibit minimal linkage disequilibrium (LD r2 < 0.001). For the pQTL instrumental variables, out of 5,326 instrumental variables, 272 (5.1%) show low to moderate linkage disequilibrium (r2 between 0.2 and 0.6).
Mendelian Randomization
This study primarily employed inverse variance weighted (IVW) and Wald ratio to evaluate the relationship between plasma proteins, mRNA, and pancreatic cancer. MR-Egger and Weighted Median methods were used to complement and test the robustness of MR results because these three calculation methods have different assumptions about the effectiveness of instrumental variables, with IVW demonstrating higher efficiency than the other two MR methods. IVW is a weighted linear regression model that assumes all genetic variants are valid. It calculates the causal effect estimate of individual instrumental variables using the ratio method and combines each estimate through weighted linear regression to obtain the overall causal effect estimate[31]. In cases where only a single instrumental variable is available, the Wald ratio is used to calculate the MR estimate for each SNP. MR-Egger regression analysis can identify and control for bias caused by horizontal pleiotropy. Even if all SNPs are invalid, as long as the association of a single SNP with the exposure is independent of the corresponding horizontal pleiotropy, MR-Egger regression analysis can provide an effective estimate where the slope represents the causal effect estimate of exposure on the outcome[32]. The main difference between MR-Egger and IVW is considering the intercept term in regression. The Weighted Median method utilizes the intermediate effects of all available genetic variants, obtaining estimates by inversely weighting each SNP's correlation with the outcome and requiring at least 50% of the weight to be contributed by effective genetic variants[33]. If the estimates from these three methods are consistent in direction, it suggests the reliability of the MR analysis. In inconsistent directions, a reassessment of the MR analysis is needed after narrowing down significant P-values[34].
Sensitivity Analysis
Sensitivity analysis is one of the methods used to assess the reliability of MR study results. This study utilized Cochran’s Q test and MR-Egger regression analysis. Cochran’s Q test was employed to quantify the heterogeneity in the magnitude of effects generated by the selected genetic instrumental variables, with P < 0.05 indicating the presence of heterogeneity. When the intercept of the MR-Egger regression is non-zero and P < 0.05, it suggests the presence of instrumental variable pleiotropy[35].
Enrichment Analysis
In order to investigate the functional characteristics and biological relevance of genes, we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. GO analysis compares gene sets with predefined functional categories, such as cellular components (CC), molecular functions (MF), and biological processes (BP), to identify enriched functional categories, thus aiding in understanding the main functional modules and biological processes involved in the gene set[36]. KEGG enrichment analysis compares gene sets with metabolic pathways, signaling pathways, and other pathways in the KEGG database to determine whether specific pathways are enriched in certain diseases or biological processes. This helps reveal the association between gene sets and specific pathways, furthering our understanding of their roles in disease development or biological processes[37]. Systematic analysis of the functional and pathway enrichment information of gene sets can provide insights into the functions and interactions of gene sets in biological processes and diseases, offering important clues and guidance for subsequent experimental design and mechanistic studies. The Database for Annotation, Visualization, and Integrated Discovery (DAVID) is a powerful online tool for functional annotation and analysis of gene and protein data [38]. In this study, we utilized DAVID for GO and KEGG enrichment analysis.
Protein-Protein Interaction Network Construction
We used the online tool STRING to construct a Protein-Protein Interaction (PPI) network analysis of significant MR results to explore potential interactions among the identified genes. We imported the STRING result file into Cytoscape for visualization. Subsequently, we used the cytoHubba plugin to identify key genes in the network further to investigate the importance of these genes in biological processes.
Statistical Analysis
All MR analysis and enrichment analysis visualizations were performed using R software (version 4.2.1). The MR analysis utilized the TwoSampleMR package (version 0.5.6). Regarding statistical significance, a two-sided p-value less than 0.05 was considered significant.
Instrument Variable Selection
A total of 39,630 eQTL instrumental variables from 16,059 transcriptomes and 5,326 pQTL instrumental variables from 1,608 plasma proteins were included in the MR analysis. For the pQTL instrumental variables, the selected SNP numbers ranged from 1 to 3, with F values exceeding 10 (ranging from 10 to 8146). In the eQTL instrumental variables, the selected SNP numbers ranged from 1 to 581, with F statistics greater than 10 (ranging from 29 to 850), indicating an effective reduction of bias caused by weak instrumental variables. Please refer to Supplementary Table S1-2 for detailed information on the instrumental variables.
MR Results
Using volcano plots, we visualized significant MR results for the causal relationships between pQTL, eQTL, and pancreatic cancer (Fig. 2, Supplementary Table S3-4). IVW and Wald ratio in MR analysis identified 88 proteins causally associated with pancreatic cancer. Among these, an increase in the abundance of 44 proteins was significantly associated with increased pancreatic cancer risk. In comparison, an increase in the abundance of another 44 proteins was significantly associated with decreased pancreatic cancer risk (Fig. 2A).
Through MR analysis of tissue or cell mRNA expression eQTLs, we determined causal relationships between 811 genes and pancreatic cancer risk, with 407 showing positive causal relationships and 404 showing negative causal relationships (Fig. 2B).
Comparing the MR results in forest plot for overlapping genes (Fig. 3), we found that genes such as IRF3, HAGH, FGF2, PILRA, DTD2, IDUA, CD248, and AMY2B exhibit significant causal effects on pancreatic cancer at both transcriptional and protein levels in blood, tissue, and cells. Specifically (Fig. 4), HAGH, FGF2, DTD2, IDUA, and CD248 show consistent directional effects on pancreatic cancer between protein and mRNA levels.
However, IRF3, PILRA, and AMY2B show some differences in their effects on pancreatic cancer between protein and mRNA levels. For instance, an increase in protein levels of PILRA and AMY2B is associated with increased pancreatic cancer risk. In contrast, a corresponding increase in mRNA levels is associated with decreased pancreatic cancer risk. This inconsistency may reflect the complexity of the regulatory mechanisms of genes at transcriptional and protein levels, requiring further research to elucidate their roles in the pathogenesis of pancreatic cancer.
Sensitivity Analysis
Cochran's Q test P-values quantify the degree of heterogeneity between genes. If the P-value is < 0.05, it indicates the presence of heterogeneity. According to Cochran's Q test results, the MR analysis of the 88 genes in pQTL did not show heterogeneity (all P-values > 0.05). In the eQTL analysis, except for EPHB4 showing heterogeneity (P = 0.013), heterogeneity did not affect the rest of the genes. Because the random-effects IVW method was used in this experiment, heterogeneity could be accepted.
MR-Egger regression test is used to detect horizontal pleiotropy, as the presence of horizontal pleiotropy may make MR results less robust, thus affecting conclusions. If the test for horizontal pleiotropy is significant, it indicates that the instrumental variable is related to the exposure and other confounding factors. However, in this study, MR-Egger regression indicates the absence of horizontal pleiotropy (P > 0.05).
Tables 1 and 2 show the sensitivity results of pQTL and eQTL intersection genes. Supplementary Table S5-8 shows detailed sensitivity analysis results.
Table 1
Sensitivity analysis of MR analysis on the causal effects of intersecting genes of pQTL on pancreatic cancer.
Intersection genes | Cochran’s Q test | MR-egger |
Q | P -value | intercept | P- value |
IRF3 | 0.167 | 0.920 | 0.037 | 0.789 |
HAGH | 1.957 | 0.581 | -0.486 | 0.375 |
FGF2 | / | / | / | / |
PILRA | 0.002 | 0.999 | 0.002 | 0.983 |
DTD2 | 1.370 | 0.504 | 0.112 | 0.481 |
IDUA | 10.716 | 0.635 | -0.018 | 0.636 |
CD248 | / | / | / | / |
AMY2B | 10.996 | 0.529 | -0.066 | 0.169 |
Table 2
Sensitivity analysis of MR analysis on the causal effects of intersecting genes of eQTL on pancreatic cancer.
Intersection genes | Cochran Q test | MR-egger |
Q | P value | intercept | P value |
IRF3 | 0.391 | 0.532 | / | / |
HAGH | 5.164 | 0.523 | -0.005 | 0.866 |
FGF2 | / | / | / | / |
PILRA | 1.464 | 0.691 | 0.048 | 0.444 |
DTD2 | / | / | / | / |
IDUA | 0.061 | 0.970 | 0.004 | 0.963 |
CD248 | 2.107 | 0.349 | -0.093 | 0.531 |
AMY2B | 0.098 | 0.992 | 0.016 | 0.910 |
GO and KEGG Enrichment Analysis
Based on the results of GO and KEGG enrichment analysis using DAVID software for the 88 genes with causal pQTL and 811 genes with eQTL, it was found that the pQTL were enriched in 296 biological processes (BP), 34 cellular components (CC), 43 molecular functions (MF), and 14 pathways. Similarly, the eQTL was enriched in 184 BP, 50 CC, 37 MF, and 13 pathways (Supplementary Table S9-10).
As shown in Fig. 5A, the 88 genes of pQTL are mainly enriched in BP, such as cell adhesion, cellular metabolism, and response to stimuli in the BP category. In the CC category, they are mainly enriched in the extracellular matrix, endoplasmic reticulum, Golgi apparatus, and cytoplasmic vesicles. In the MF category, they are mainly enriched in binding to various substances, such as receptors, calcium ions, metal ions, protein complexes, macromolecular complexes, and cell adhesion molecules. Four significantly enriched (P < 0.05) KEGG pathways were identified, involving metabolic pathways, nucleotide metabolism, pancreatic secretion, purine metabolism, and the PI3K-Akt signaling pathway.
Figure 5B displays that the 811 genes of eQTL are primarily involved in BP, such as intracellular and extracellular protein transport, localization, and cellular metabolism. The CC category mainly involves the Golgi apparatus, endoplasmic reticulum, mitochondria, and cellular vesicles. Regarding MF, these genes are mainly associated with binding substances such as protein kinases, ions, phospholipids, nucleotides, and enzymes. The KEGG enrichment analysis significantly enriched metabolism and signaling pathways, including metabolic pathways, amino sugar and nucleotide sugar metabolism, ketone body metabolism, MAPK signaling, and Wnt signaling pathways.
PPI Network Analysis
The genes corresponding to the 88 significant proteins and the 811 significant genes were separately input into the STRING database to construct PPI networks, and the PPI network results were saved in TSV text format. Subsequently, Cytoscape was used to visualize these networks and identify the top 8 core genes.
In Figs. 6A and 6B, the PPI network constructed from the 88 genes of pQTL is displayed, comprising 60 nodes and 80 edges. Further analysis using the cytoHubba plugin in Cytoscape software identified eight core genes with strong interactions, namely FN1, FGF2, KITLG, CBL, PRKCA, ASPN, COL6A1, and NT5E.
Additionally, the PPI network constructed from the 811 genes of pQTL consists of 501 nodes and 1282 edges, as shown in Figs. 6C and 6D. After analysis, the top 8 core genes with strong interactions were determined to be RPS19, RPS18, EIF4A1, PABPC1, EIF4G1, EIF3C, HSPA4, and H4C6.