BioWinfordMR: An Online Platform for Comprehensive Mendelian Randomization Analysis

doi:10.21203/rs.3.rs-4609267/v1

Download PDF

Article

BioWinfordMR: An Online Platform for Comprehensive Mendelian Randomization Analysis

https://doi.org/10.21203/rs.3.rs-4609267/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Background

Mendelian randomization has become a tool for quickly and accurately identifying genetic relationships between phenotypes. It has played an important role in identifying disease-related risk factors, drug targets, and other fields. However, large amounts of GWAS data often come from different platforms, with inconsistent formats, missing data, and difficulties in downloading large files. Therefore, we developed the BioWinfordMR platform, a platform that integrates curated GWAS data from multiple categories using the shiny server to automate Mendelian randomization analysis.

Results

We used the BioWinfordMR platform to infer causality between sepsis, intestinal microbiota, and immune cells. Through systematic analysis, we ultimately found that CD62L- CD86 + myeloid DCs are key intermediate factors that increase the risk of sepsis with enhanced gut microbiota. We also further identified two risk genes, ENTPD5 and MANEA, associated with sepsis.

Conclusions

We developed a platform named BioWinfordMR to facilitate various types of Mendelian randomization analyses. The BioWinfordMR currently comprises 3792 curated GWASs and is updated regularly. BioWinfordMR ensures the accuracy and reproducibility of Mendelian randomization analysis and allows potential causal relationships to be discovered.

Mendelian randomization

Sepsis

Gut microbiota

Immune cells

Causal inference

GWAS

Inferring causal relationships between phenotypes poses a significant challenge with crucial implications for understanding genetic origins. With the development of large-scale genome-wide association studies (GWASs), phenome-wide causal inference has substantially improved in the last decade. Millions of individuals with different phenotypes have been sequenced, and the correlations between genetic variations and phenotypes have been statistically analyzed in large-scale populations. For a single phenotype, we can directly select significant single nucleotide polymorphisms (SNPs) based on statistics for drug target research or disease mechanism interpretation. For different phenotypes, we also have the opportunity to use SNPs as instruments to identify causal relationships between different phenotypes[1]. To facilitate causal inference between phenotypes based on GWAS data, many statistical models have been developed leveraging Mendelian randomization (MR) principles using GWAS summary data [2–5].

MR was proposed to simulate the design of randomized controlled trials (RCTs) via SNPs[2, 5]. The SNP serves as an instrumental variable (IV). Generally, a disease phenotype is considered an outcome, and the causal factor is exposure. The causal relationship between exposure and outcome can be determined by the statistical correlation in the effect values of instrumental variables in two GWASs.

However, accessing GWAS data for MR analysis can be challenging, and MR methods may pose difficulties for nonspecialists. Because most researchers lack statistics and programming skills, there are three main difficulties in conducting MR analysis. 1) Acquisition of GWAS data. Although some databases provide API interfaces, allowing users to load data online through accession IDs, unstable network connections can lead to erroneous results that may bias the entire research conclusion. 2) Data preprocessing. Currently, the world's largest GWAS public databases include OpenGWAS[6], Catalog[7], UK Biobank[8], and FinnGen[9]. However, GWAS data from different resources do not share consistent formats. Therefore, preprocessing GWAS data from different sources poses a great challenge for analysts. 3) Inadequate computing resources. Most GWAS data contain approximately ten million SNP records, which conventional laptops can barely analyze. However, when conducting MR analysis on multiple GWAS, more computing resources are needed. This approach is impractical for teams and individuals lacking local or cloud servers.

To meet the growing need for systematic curation and application of comprehensive GWAS summary data and MR methods, we introduce BioWinfordMR (https://biowinford.site:3838/BioWinfordMR), a platform that integrates almost seven thousand GWAS datasets with a user-friendly web interface for interactive visualization and automated causal inference through MR. The entire analysis procedure was performed using R software. Users can upload data, set parameters through the interactive interface, and then submit the task.

Using the functionality of BioWinfordMR on the gut microbiota and sepsis as a practical example, we showcase how the platform facilitates users via various functional modules. A large amount of research has confirmed that the occurrence and development of sepsis are closely related to imbalances in the gut microbiota[10, 11]. An imbalance in the gut microbiota can induce sepsis through the destruction of intestinal mucosal barrier function, mucosal immune function, and bacterial translocation[12]. Moreover, an imbalance in the gut microbiota can trigger a strong inflammatory immune response, leading to disruption of the body's immune environment and multiorgan dysfunction[13]. Therefore, the TwoSampleMR module is used for identifying potential pathogenic pathways from the gut microbiota to sepsis mediated via immune cells. The MR mediator module was used to screen the significant mediator pathways. The colocalization and SMR modules were used to probe potential drug targets via protein quantitative trait loci (pQTLs). These MR modules underscore how GWAS data preprocessing and analytical procedures were facilitated and enabled novel insights that were previously technically and practically challenging to attain.

Data Acquisition

We collected GWAS data from recent hotspot research, including data on the gut microbiota, skin microbiota, oral microbiota, cytokine factors, immune cells, metabolites, blood cells, mitochondria, and liposomes. Detailed information on each type of GWAS dataset is shown in Table 1.

Table 1

Description of collected GWAS data
GWAS category	number of files	reference	cohort
gut211	211	PMID: 33462485	18,340 individuals
gut412	412	PMID: 35115690	7,738 Dutch people
gut418	418	/	/
cytokine41	41	PMID: 33491305	8,293 Finnish individuals
cytokine91	91	PMID: 37563310	14,824 participants
blood cell	15	PMID: 37578112	562 243 participants
lipidome	179	PMID: 37907536	7174 Finnish individuals
immune cell	731	PMID: 32929287	3,757 Sardinians
serum metabolites	1400	PMID: 36635386	8,299 Canadians
skin_Popgen	147	PMID: 36261456	273 Popgen individuals
skin_KORA	147	PMID: 36261456	324 KORA individuals
oral microbiome	3117	PMID: 34873157	over 1,915 individuals

The 211 gut microbiota data were retrieved from the MiBioGen consortium with accessions from GCST90016908 to GCST90017118[14]. The 412 gut microbiota data points were obtained from a study conducted by Esteban et al., including 207 taxa and 205 pathways representing microbial composition and function (GCST90027446-GCST90027857)[15]. We also proposed a new category of gut418 consisting of 211 gut microbiota data points from MiBioGen and 207 taxa data points from Esteban’s study. Therefore, gut418 can be considered a more comprehensive and complete collection of the gut microbiota. The 41 cytokine GWAS data were obtained from 8,293 Finnish individuals[16]. This study combined the results from the Cardiovascular Risk in Young Finns Study (YFS) and FINRISK surveys. The GWAS data for the 91 plasma cytokines were obtained from a recently published study by Zhao et al. (GCST90274758-GCST90274848)[17]. Fifteen blood cell GWAS data were obtained from the ‘Blood Cell Consortium’ (BCX) meta-analysis, which included more than 560,000 participants[18, 19]. According to the 15 blood cell GWASs, 7 were related to red blood cells, 6 were related to white blood cells, and 2 were related to platelets. We collected data from 179 lipidome-related GWASs in 7174 Finnish individuals from Ottensmann. et al. (GCST90277238-GCST90277416)[20]. The GWAS data for the 1,400 serum metabolites comprise genome-wide association studies of 1,091 blood metabolites and 309 metabolite ratios[21]. A total of 731 GWASs of immune cells were retrieved from a study conducted by Valeria Orrù et al. (GCST90001391-GCST90002121)[22]. The GWAS data for the skin microbiota were retrieved from Lucas et al. (GCST90133165-GCST90133310). Since the skin microbiota study was performed in two population-based German cohorts, we separated the data into two categories[23]. The oral metagenome data from both the tongue dorsum (n = 2017) and saliva (n = 1915) were obtained from Liu, X’s study[24].

All these GWAS data underwent a data-cleaning procedure and were processed into the same input format. In the data cleaning process, we carried out two steps. Firstly, we check the data format ensuring essential variables are present, such as SNP, effect_allele, other_allele, beta, standard error (SE), P-value, sample size, effect allele frequency (EAF), trait, and accession ID. If the data already includes SNP RSIDs, we rearrange and rename columns to generate a standard input file. Secondly, if the data lacks SNP RSIDs, the BiowinfordMR platform matches the coordinates of chromosome (chr) and position to the dbSNP database according to the corresponding reference version to obtain SNP RSIDs. During data cleaning, we do not filter out any information. However, users can set criteria in the parameter interface for downstream analysis to filter SNPs based on parameters like P-value, LD clump, and F value.

TwoSampleMR

The core step of MR analysis is to estimate causal inference between two phenotypes based on SNP instruments. The entire causal inference procedure is implemented by the TwoSampleMR R package[25]. Users first need to submit GWAS files or accessions for exposure and outcome to the TwoSampleMR module and then set parameters such as P-value, clump_kb, and clump_r2. The threshold values of P-value, clump_kb, and clump_r2 are set to 5e-8, 10000, and 0.001, respectively, by default. Once clump_kb and clump_r2 are set, PLINK is used to filter SNPs with linkage disequilibrium[26]. Then, the F statistic of each SNP is calculated using formula 1, where N, k, and \(\:{\varvec{R}}^{2}\) indicate the number of participants, number of IVs and regression coefficient, respectively.

\(\:\varvec{F}\:\varvec{s}\varvec{t}\varvec{a}\varvec{t}\varvec{i}\varvec{s}\varvec{t}\varvec{i}\varvec{c}=\frac{\varvec{N}-\varvec{k}-1}{\varvec{k}}\times\:\frac{{\varvec{R}}^{2}}{1-{\varvec{R}}^{2}}\) Eq. 1

Instrument SNPs with F statistics greater than 10 were retained for the harmonizing step. Next, the exposure is merged with the outcome through a harmonization process to filter ambiguous and palindromic SNPs. Finally, heterogeneity, pleiotropy, and directionality were estimated using the TwoSampleMR package. The uncorrelated and correlated horizontal pleiotropy were further estimated using the PRESSO[27] and CAUSE[28] algorithms, respectively. Various locally prepared GWAS data also enable us to use bidirectional MR analysis to detect potential reverse causality.

MVMR

TwoSampleMR is the module for decomposing the direct effect of exposure on an outcome. For the other indirect effects, we proposed the multivariable MR (MVMR) module. which act directly, and those that act by mediating variables[29]. MVMR takes into account the instrumental variables associated with multiple exposures to estimate the effect of each exposure on a single outcome[30]. The BioWinfordMR platform allows users to type multiple GWAS IDs to revoke the ieugwas API for analysis. However, considering the unstable network connections of the ieugwas server, we recommend that users upload local files. The BioWinfordMR platform supports various input file formats, including vcf.gz, tsv.gz, csv.gz, and other plain text formats. Multiple exposure data will be merged through the harmonize_data function, and then the mv_multiple function will be used for MVMR analysis. Considering the potential collinearity between multiple exposures, we also provide an option for feature selection. If the user needs to perform feature selection on multiple exposures, the module will automatically enable the mv_lasso_feature_selection function for exposure screening[31].

Mediator MR

Some exposures and outcomes do not have a direct cause-and-effect relationship but rather act through mediators. Currently, there are two mainstream methods for analyzing mediating effects, one of which is the MVMR method[29]. This method analyzes the mediator and exposure together, evaluating their effects on the outcome separately. Then, the TwoSampleMR method was used to calculate the overall effect of exposure on the outcome. This helps to assess the impact of the mediating effect. Another method is called two-step methods. Similar to MVMR, MVMR first calculates the overall effect from exposure to the outcome and then evaluates the separate effects from exposure to the mediator and from the mediator to the outcome, ultimately calculating the mediating effect[32]. Both methods for mediating Mendelian analysis are supported on the BioWinfordMR platform.

Colocalization

Bayesian colocalization analyses were used to estimate the probability that two phenotypes share the same causal variant[33]. The colocalization was performed using the coloc R package (https://github.com/chr1swallace/coloc). Bayesian colocalization provides the posterior probability for five hypotheses on whether two phenotypes share the same causal variant. Hypothesis 3 (PPH3) represents the posterior probability that both phenotypes were associated within the region by different variants, and hypothesis 4 (PPH4) represents the posterior probability that both phenotypes were associated within the region by shared variants. Generally, a PPH4 greater than 0.65 indicates that a potential causal mechanism is shared between the two phenotypes[34]. LocusCompare was used to visualize the colony formation results.

To help users run the Coloc module, we provide 3 running options. Option 1) Users can provide one GWAS and choose another from our built-in QTL data. The website includes three types of built-in QTL data, namely, eQTLs, pQTLs, and mQTLs, all from the MRInstrument R package (https://github.com/MRCIEU/MRInstruments). Option 2) Users can also directly enter a gene name, and the module will automatically match the gene's genomic position based on the hg19 version of the GENCODE[35] database and perform a colocalization analysis within that region. Option 3) Users can provide two GWAS files and specify a genomic region, and the coloc module will perform a colocalization analysis within that region specified for the two GWAS datasets.

SMR

In the SMR module, eQTL data were retrieved from the Version 8 release of Genotype Tissue Expression (GTEx) as the IVs for gene expression[36]. The eQTL data contain IVs from 49 tissues, including the brain, heart, and whole blood. The mQTL data were retrieved from McRae et al[37]. The pQTL data were retrieved from deCODE Genetics pQTL data[38].

The SMR module can be used to detect whether the effect of SNPs on the outcome phenotype is mediated by gene/protein expression or methylation. The heterogeneity in the dependent instrument (HEIDI) test was applied to estimate heterogeneity. A HEIDI P value greater than 0.05 indicated that no heterogeneity existed[39]. The Benjamini-Hochberg procedure was used to correct the original P-values in order to exclude false positives[40].

Outcome acquisition

In this study, we focused on sepsis as the outcome phenotype. Our research revolves around studying the causal effect of the gut microbiota and immune cells on sepsis to identify potential drug targets. We obtained sepsis outcome data from a catalog with the accession number GCST90044692[41]. This dataset comprises 1,573 European ancestry cases and 454,775 European ancestry controls. The preprocessed sepsis outcome included 11,831,932 SNPs in total.

Gut microbiota-related exposures via bidirectional MR

In this study, we selected 418 gut microbiota-related traits from the MR Omics module as exposures, with sepsis as the outcome. Default parameters (P-value < 5e-8, clump_r2 = 0.001, clump_kb = 10000) were employed to screen SNPs for instrumental variables. The F statistic for each SNP was estimated, and all SNPs with F statistics greater than 10 were selected as strongly correlated IVs. Please refer to Supplementary Material Table S1 for all exposure-related SNP information.

We applied five models for TwoSampleMR analysis, namely, MR-IVW, MR‒Egger, MR-weighted median, MR-simple median, and MR-weighted mode. The significance of the correlation between the gut microbiota and sepsis severity was evaluated based on the MR-IVW P-value. MR‒Egger was utilized to assess pleiotropy. A positive result was considered significant only if the effect direction was consistent for at least three methods. Subsequent analysis yielded positive findings, as shown in Fig. 1A. The distribution of beta values and P-value for all microbial exposures is depicted in Fig. 1B.

Based on an MR-IVW P-value less than 0.05 and consistent results for at least three models, we ultimately identified 5 gut microbiota associated with sepsis. No heterogeneity was found among the 5 exposures (Q_pval > 0.05), as detailed in Supplementary Material Table S2. The positive results were subjected to pleiotropy analysis using the MR‒Egger and MR-PRESSO algorithms. Neither method indicated the presence of horizontal pleiotropy, as shown in Supplementary Material Tables S3 and S4. The MR analysis results for the 5 significant gut microbiota are shown in Table 2. For the complete MR analysis results, please refer to Supplementary Material Table S5.

Table 2

MR results of the top gut microbiota with a P value < 0.05
exposure	method	nsnp	OR	pval
family.Enterobacteriaceae.id.3469	IVW	4	1.917	0.037
family.Enterobacteriaceae.id.3469	Simple median	4	1.626	0.196
family.Enterobacteriaceae.id.3469	Weighted median	4	1.603	0.25
family.Enterobacteriaceae.id.3469	Weighted mode	4	1.504	0.47
family.Enterobacteriaceae.id.3469	MR Egger	4	0.812	0.898
genus.Veillonella.id.2198	IVW	6	0.498	0.001
genus.Veillonella.id.2198	Weighted median	6	0.445	0.002
genus.Veillonella.id.2198	Simple median	6	0.44	0.002
genus.Veillonella.id.2198	Weighted mode	6	0.422	0.082
genus.Veillonella.id.2198	MR Egger	6	0.017	0.415
order.Enterobacteriales.id.3468	IVW	4	1.917	0.037
order.Enterobacteriales.id.3468	Simple median	4	1.626	0.212
order.Enterobacteriales.id.3468	Weighted median	4	1.603	0.224
order.Enterobacteriales.id.3468	Weighted mode	4	1.504	0.456
order.Enterobacteriales.id.3468	MR Egger	4	0.812	0.898
phylum.Verrucomicrobia.id.3982	IVW	8	1.601	0.028
phylum.Verrucomicrobia.id.3982	Simple median	8	1.567	0.105
phylum.Verrucomicrobia.id.3982	Weighted median	8	1.54	0.141
phylum.Verrucomicrobia.id.3982	Weighted mode	8	1.5	0.408
phylum.Verrucomicrobia.id.3982	MR Egger	8	0.948	0.932
Lachnospiraceae_bacterium_7_1_58FAA	IVW	8	0.571	0
Lachnospiraceae_bacterium_7_1_58FAA	Simple median	8	0.534	0.002
Lachnospiraceae_bacterium_7_1_58FAA	Weighted median	8	0.544	0.002
Lachnospiraceae_bacterium_7_1_58FAA	Weighted mode	8	0.524	0.104
Lachnospiraceae_bacterium_7_1_58FAA	MR Egger	8	0.524	0.617

To further elucidate the causal relationship between sepsis and the gut microbiota, we conducted a bidirectional analysis with sepsis as the exposure and the gut microbiota as the outcome. The results are depicted in Fig. 2C. Conversely, when sepsis was considered, it might be negatively associated with Veillonella and Lachnospiraceae but positively associated with Enterobacteriaceae, Enterobacteriales, and Verrucomicrobia, as shown in Table 3.

Table 3

Results of reverse MR analysis using sepsis as an exposure
outcome	method	nsnp	OR	pval
genus.Eubacteriumbrachygroup.id.11296	IVW	5	0.862	0.008
genus.Parabacteroides.id.954	IVW	5	1.074	0.008
Eubacterium_ventriosum	IVW	4	1.197	0.018
Prevotella_copri	IVW	4	0.889	0.021
genus.CandidatusSoleaferrea.id.11350	IVW	5	0.894	0.023
Alistipes_sp_AP11	IVW	4	0.853	0.025
Desulfovibrio_piger	IVW	4	1.177	0.03
family.Porphyromonadaceae.id.943	IVW	5	1.064	0.037
genus.Oscillospira.id.2064	IVW	5	0.922	0.04
genus.Flavonifractor.id.2059	IVW	5	0.933	0.041
Eggerthella	IVW	4	1.19	0.044
Alistipes_senegalensis	IVW	3	0.894	0.046
genus.unknowngenus.id.2071	IVW	5	1.064	0.047

Integrating the results of the bidirectional analysis, we did not find any gut microbiota exposures that exhibited a reciprocal causal relationship. Thus, we ultimately identified 5 gut microbiota exposures with causal relationships with sepsis.

Immune-related exposures via bidirectional MR

In this study, we selected 731 immune-related traits from the MR Omics module as exposures, with sepsis as the outcome. The selection criteria for IVs and the analysis process were consistent with those for the gut microbiota. The IV information can be found in Supplementary Material Table S6. A positive result was considered significant only if the effect direction was consistent in at least three models. Subsequently, the analysis yielded a forest plot of the positive findings, as shown in Fig. 2A. The distributions of beta values and P-value for all immune cell exposures are illustrated in Fig. 2B.

Based on MR-IVW P-value less than 0.05 and consistent OR directions in at least 3 models, we ultimately identified 38 immune traits associated with sepsis. Through MR heterogeneity testing, we found no heterogeneity in any of the 38 exposures (Q_pval > 0.05), as detailed in Supplementary Material Table S7. The positive results were subjected to pleiotropy analysis using the MR‒Egger and MR-PRESSO algorithms. Neither method indicated the presence of horizontal pleiotropy, as shown in Supplementary Material Tables S8 and S9.

We selected significant results with P-value less than 0.01, which are presented in Table 4. For the complete MR analysis results, please refer to Supplementary Material Table S10. We found that CD19 and CD8 are significantly positively associated with sepsis. Additionally, CD62L, CD45, and HLA-DR are significantly negatively correlated with sepsis.

Table 4

MR results of the top immune traits with P values < 0.01
exposure	method	nsnp	OR	pval
CD19 on switched memory B-cell	MR IVW	16	1.23	0.001
CD62L on CD62L + plasmacytoid Dendritic Cell	MR IVW	10	0.844	0.002
Lymphocyte %leukocyte	MR IVW	12	1.208	0.003
CD45 on lymphocyte	MR IVW	12	0.853	0.004
CD19 on IgD- CD38dim B-cell	MR IVW	14	1.202	0.006
HLA DR on HLA DR + T-cell	MR IVW	6	0.731	0.007
CD8dim Natural Killer T Absolute Count	MR IVW	18	1.206	0.008
Effector Memory CD8 + T-cell Absolute Count	MR IVW	14	1.141	0.008

To further clarify the causal relationship between sepsis and immune cells, we conducted a bidirectional analysis with sepsis as the exposure and immune cells as the outcome. The results are depicted in Fig. 2C. Reverse blot analysis indicated that sepsis is negatively associated with CD11 and CD45. Moreover, it was positively correlated with CD4, CD19, and HLA-DR, as shown in Table 5.

Table 5

Results of reverse MR analysis using sepsis as an exposure
outcome	method	nsnp	OR	pval
CD45 on HLA DR + CD4+	MR IVW	8	0.884	0.01
CD11c on myeloid DC	MR IVW	8	0.899	0.032
CD19 on PB/PC	MR IVW	8	1.097	0.035
CD45 on HLA DR + T-cell	MR IVW	8	0.904	0.036
CD19 on CD20-	MR IVW	8	1.095	0.038
CD4 on naive CD4+	MR IVW	8	1.117	0.039
NK %lymphocyte	MR IVW	8	1.089	0.045
HLA DR on myeloid DC	MR IVW	8	1.103	0.049

According to the results of the bidirectional analysis, we did not find any reciprocal causal relationships between immune cells and sepsis. Therefore, we ultimately identified 38 immune cell exposures with a causal relationship with sepsis.

Mediator analysis

To further elucidate how gut microbiota disruption affects the immune system and exacerbates sepsis, we conducted a two-step mediator analysis. The gut microbiota, immune cells, and sepsis status were treated as the exposure, mediator, and outcome, respectively. The Mediator_MR module on the BioWinfordMR platform was utilized with default parameters (P-value < 5e-8, clump_kb = 10000, clump_r2 = 0.001), and the analysis results are illustrated in Fig. 3.

Figure 3 shows the results of the mediator analysis, with the gut microbiota as the exposure, immune cells as the mediator, and sepsis as the outcome. By utilizing a two-step procedure, a mediation pathway in which Enterobacteriaceae mediates CD62L-CD86 + myeloid DCs, leading to the development of sepsis, was ultimately identified. The figure elaborates on the indirect effects of exposure to the mediator and from the mediator to the outcome. It also outlines the total effect from exposure to outcome along with the direct effect. Both the total effect and the indirect effect are statistically significant (p value < 0.05). Upon removing the mediating effect, the direct effect becomes nonsignificant, indicating that a complete mediation pathway exists.

To validate the CD62L-CD86 + myeloid DCs roles in sepsis, we used flow cytometry to validate the infiltration of CD62L-CD86 + DCs in spleen and lung tissues of mice in both the control and sepsis groups. In the sepsis group, we observed a significantly higher proportion of CD62L-CD86 + DCs compared to the control group as shown in Supplementary Material Figure S1.

Disease-related genes via SMR

We further analyzed sepsis using the SMR module to identify genes potentially associated with sepsis. Tissues were specified as whole blood, and eQTL data were used as the exposure. We selected 40 genes significantly associated with sepsis based on the criteria of pSMR < 0.01 and HEIDI > 0.05 (Supplementary Material Table S11). Additionally, using pQTL data, we identified 18 sepsis-related proteins, as shown in Supplementary Material Table S12. Finally, we obtained two intersecting genes, ENTPD5 and MANEA, from the significant results of the eQTL and pQTL analyses, as presented in Fig. 4.

Figures 4A-B display the SMRplot results for ENTPD5 and MANEA at the eQTL level, while Figs. 4C-D present the SMRplot results for ENTPD5 and MANEA at the pQTL level. These findings indicate that ENTPD5 and MANEA are significantly associated with sepsis at both the gene expression and protein expression levels.

Candidate drug target screening via colocalization

To further investigate whether sepsis shares a causal variant with the candidate genes discovered by the SMR algorithm, we conducted colocalization analysis using the coloc module on the BioWinfordMR platform. The eQTL GWAS data for the two candidate genes were retrieved from OpenGWAS. The colocalization region was specified using option 2, which automates target gene location from GENCODE. Subsequently, within the target gene region, we conducted a colocalization analysis between sepsis and the eQTL GWAS data of each gene. The results are illustrated in Fig. 5.

Figure 5 Comparative analysis between sepsis and candidate genes (A). Locus plot between sepsis and ENTPD5. The lead variant is uniquely colored purple, and all other variants are colored according to the corresponding r2 value. The recombination rate peaks are plotted in blue. All genes located in the flanking region of ENTPD5 are plotted at the bottom. (B) Locus comparison plots for ENTPD5. The y-axis represents -log 10 P values, and the x-axis represents the genome region. Variants are colored by their r2 value, and the risk variant is labeled and uniquely colored purple. (C) Locus plot between sepsis patients and MANEA patients. (D) Locus comparison plots for MANEA

Figure 5A-B presents the colocalization and locus comparison results for sepsis and ENTPD5. Figure 5C-D shows the colocalization and locus comparison results for sepsis and MANEA. The colocalization of these genes with sepsis in the PP.H4.abf group was 0.67 and 0.81, respectively. Generally, a PP.H4.abf value greater than 0.65 indicates a significant colocalization relationship between two GWASs[34]. Therefore, both ENTPD5 and MANEA exhibit shared causal variants with sepsis.

To validate the accuracy of the BiowinfordMR platform's analysis results, we experimentally verified the expression levels of the two target genes. We performed Real-time PCR validation for the two candidate target genes, ENTPD5 and MANEA, and found significant upregulation in the sepsis group. These Real-time PCR results have been included in the supplementary materials Figure S1.

Genetic variations not only determine individual polymorphisms but also have been confirmed as risk factors for various disease phenotypes[42]. With the growth of public GWAS summary data, an increasing number of genetic relationships between disease phenotypes and genetic variations are being revealed. MR is an effective algorithm for inferring causality between phenotypes. However, while providing analytical opportunities, GWAS data analysis poses significant challenges for researchers, primarily in three aspects.

First, public GWAS data stem from diverse resources, such as OpenGWAS, the UK Biobank, Catalog, and FinnGen, with substantial differences in data formats. Although some data from OpenGWAS support users in fetching data through APIs, network congestion or large data requests may lead to data loss. Preprocessing GWAS data demands high proficiency in statistical programming for formatting textual data. Additionally, due to the immense amount of GWAS data, considerable computational and storage resources are needed. Second, the MR analysis process involving numerous parameters and handling procedures often struggles to ensure the reliability and reproducibility of causal inference. Filtering SNPs based on allele frequency or the F statistic lacks standardized criteria, potentially leading to inconsistent conclusions even when employing the same parameters. Third, uncompressed text-format GWAS data files typically range from hundreds of megabytes to several gigabytes. MR analyses usually necessitate importing at least two GWAS datasets (exposure and outcome). Some analyses may even require multiple GWAS files simultaneously, such as MVMR, demanding high computational resources that regular laptops may not suffice. Apart from computational resources, MR also places significant demands on storage resources. For instance, a bidirectional analysis of 731 immune cells requires 131 GB of storage.

We developed a platform containing over 6000 preprocessed GWAS summary datasets and numerous statistical modules to facilitate systematic causal inference. This platform enables users to engage in causal inference based on GWAS summary data effectively in the following ways. First, BioWinfordMR incorporates extensive preprocessed GWAS data that users can access directly. The platform also offers a data cleaning module that automatically preprocesses data from various sources. Second, BioWinfordMR automates the application of cutting-edge methodologies based on customized parameters, enhancing the reliability and reproducibility of causal inference. Third, BioWinfordMR was established on a large server infrastructure, currently deployed on a 16-core, 64 GB memory, 8 TB storage server to meet the substantial computational and storage resource demands of GWAS data.

In an applied example, we used the BioWinfordMR platform to explore potential pathways mediating sepsis induced by gut microbiota-mediated immune cells. Through batch analysis using the MRomics module, we identified six gut microbiota and 38 immune cell exposures that are significantly associated with sepsis. Subsequently, utilizing a two-step analysis, we discerned two completely mediating pathways. Our analysis revealed that Enterobacteriaceae positively regulate the abundance of CD62L- CD86 + myeloid DCs, further increasing the risk of sepsis. Previous studies have suggested a potential connection between Enterobacteriaceae, myeloid DCs, and sepsis, yet no research has decisively determined the genetic interaction pathway among them through MR mediator analysis[43, 44]. In our study, via MR mediator analysis, we obtained strong evidence indicating that Enterobacteriaceae activate myeloid DCs, consequently increasing the risk of sepsis through a mediating pathway.

Furthermore, we selected candidate genes significantly associated with sepsis at the eQTL and pQTL levels using the SMR algorithm. Subsequently, we validated these candidate genes associated with sepsis through colocalization analysis, ultimately identifying causal variants shared between sepsis and two candidate genes (ENTPD5 and MANEA). The ENTPD5 gene functions as an enzyme involved in purinergic signaling and metabolism by hydrolyzing nucleoside triphosphates and diphosphates, impacting cellular processes such as proliferation, differentiation, and survival. Mutations in ENTPD5 are linked to certain cancers and infectious diseases[45, 46]. Recent studies have verified that ENTPD5 promotes renal injury in both human patients and mouse models. Xu et al. reported that ENTPD5 was mainly expressed in the renal tubules of kidneys, and the expression level of ENTPD5 was altered in mice and patients with kidney injury[47]. On the other hand, MANEA encodes the enzyme mannosidase endo-alpha, playing a crucial role in N-glycan processing within the endoplasmic reticulum. The structural characteristics of MANEA have inspired the development of new inhibitors disrupting pathogen protein N-glycan processing and reducing pathogen infectivity in cellular models[48].

With the advancement of Mendelian Randomization technology, there are currently many packages related to MR analysis. However, these publicly available MR-related R packages mostly focus on addressing specific issues. Online platforms like the BiowinfordMR are scarce, which integrates rich MR functionalities and data. The most widely used MR-related platform currently is the MR-Base platform (https://app.mrbase.org/)[25]. Hence, we compared our platform with MR-Base. Firstly, MR-Base only supports data from the OpenGWAS database. In contrast, the BiowinfordMR platform not only supports data retrieval from the OpenGWAS database through an API but also includes over 7000 locally processed omics GWAS datasets. Moreover, as the BiowinfordMR platform evolves, this number continues to grow. Secondly, while MR-Base allows users to upload local files, it only supports preprocessed plain text files. In addition to regular text files, BioWinfordMR also supports direct analysis using VCF format input files. MR-Base cannot analyze GWAS data with missing RSIDs, whereas BiowinfordMR can convert coordinates into RSIDs. This function expands the types of input files supported, facilitating users to conduct MR analysis through local files. Lastly, MR-Base can only conduct TwoSampleMR analysis, lacking more complex analyses such as multivariable MR, mediator analysis, SMR, colocalization, etc. It necessitates users to perform step-by-step analyses using corresponding R packages. BioWinfordMR has incorporated the major functions of the MR-base and built on top of it to make the functionality more diverse, thus can better meet the needs of user analysis.

The BioWinfordMR platform boasts a number of strengths, as outlined below.

1. Extensive Repository of Preprocessed GWAS Data

Currently housing nearly 7000 localized preprocessed GWAS datasets, BioWinfordMR enables users to access data directly via IDs without the need for redownloading from the original resource. Moreover, for nonpreprocessed data from various sources, the platform offers an automated formatting module that standardizes GWAS data into a unified format.

2. Efficient Execution of MR Analysis

Leveraging significant computational resources on large servers, BioWinfordMR can process data in parallel across multiple threads, greatly enhancing operational efficiency and reducing processing times. Under default settings with four threads, Mendelian randomization analysis involving 731 immune cells was completed in approximately 17 minutes, whereas single-threaded analysis on a laptop took approximately 1.5 hours. The platform interactively presents graphical and tabular postanalysis results, allowing users to adjust parameters with real-time result updates within the interface.

3. Generation of reliable MR estimates

BioWinfordMR focuses on enhancing reliability through multiple approaches for estimating pleiotropy, heterogeneity, and confounding factors. The platform offers tools such as MR PRESSO to assess uncorrelated horizontal pleiotropy, while the CAUSE module evaluates correlated horizontal pleiotropy. Additionally, the PhenoScanner module is available for evaluating SNP confounding factors.

4. Reproducibility of MR Findings

By consolidating data and analytical modules within a unified platform, BioWinfordMR facilitates the reproduction of results by other analysts when utilizing identical parameters. Furthermore, the platform provides R code to aid users in reproducing their results on different devices.

5. Diverse MR Analysis Modules

BioWinfordMR offers comprehensive MR analysis functionalities tailored to meet user requirements for systematic and in-depth analysis. In addition to common TwoSampleMR capabilities, the platform includes functional modules such as MVMR, Mediator MR, LDSC, SMR, Coloc, MR Meta-Analysis, and various interactive visualization analysis modules.

Our platform, while offering numerous advantages, also has certain limitations. First, some GWAS pertaining to different traits may stem from either the same cohort or distinct cohorts that exhibit overlap. Such cohort overlap among traits has the potential to introduce bias into effect estimates, skewing them toward confounded observational associations[49]. Although we made efforts to mitigate cohort overlap bias by selecting SNPs with F-statistics exceeding 10, complete elimination of this bias remains challenging. Second, our platform features multiomics MR analysis modules encompassing areas such as the gut microbiota, cytokines, and immune cells. Typically, to control false positives in multiple tests, the use of the false discovery rate (FDR) algorithm is recommended[50]. However, given that positive causal relationships in MR analysis are often sparse, applying FDR correction runs the risk of obscuring true positive conclusions. As a result, there is a likelihood of losing genuine positive findings after implementing FDR correction.

We developed a complimentary web-based platform that supports comprehensive Mendelian randomization (MR) analysis through an intuitive user interface. The extensive collection of preprocessed GWAS data and diverse analytical modules empower users to draw meaningful biological insights. By consolidating these analytical tools within a unified platform, we enhance the reliability and reproducibility of the analysis outcomes. In a practical demonstration, our analysis yielded compelling evidence endorsing myeloid DCs as an intermediary linking Enterobacteriaceae and sepsis.

GWAS: Genome-Wide Association Study

MR: Mendelian randomization

SNP: Single nucleotide polymorphism

RCT: randomized controlled trial

IV: instrumental variable

QTL: Quantitative trait locus

Ethics approval and consent to participate

Not available

Consent for publication

Not available

Availability of data and materials

The BioWinfordMR platform can be accessed through the following link: http://biowinford.site:3838/BioWinfordMR/

The code can be obtained from the GitHub repository

https://github.com/yunfengwang0317/BioWinfordMR.git

Competing interests

The authors declare no competing interests.

Funding

Beijing Economic and Technological Development Zone Postdoctoral Work Allowance

Authors' contributions

YF.W. and T.W. constructed the platform. XL.L. was in charge of writing the manuscript. DK.Y. and WH.X. revised the manuscript. All the authors read and approved the final manuscript.

Acknowledgments

The authors thank all the team members for their assistance.

Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. American Journal of Human Genetics. 2017.
Smith GD, Ebrahim S. “Mendelian randomization”: Can genetic epidemiology contribute to understanding environmental determinants of disease? International Journal of Epidemiology. 2003.
Zhu C, Chen Q, Si W, Li Y, Chen G, Zhao Q. Alcohol Use and Depression: A Mendelian Randomization Study From China. Front Genet. 2020.
Pierce BL, Burgess S. Efficient design for mendelian randomization studies: Subsample and 2-sample instrumental variable estimators. Am J Epidemiol. 2013.
Smith GD, Hemani G. Mendelian randomization: Geneticanchorsfor causal inference in epidemiological studies. Hum Mol Genet. 2014.
Elsworth B, Lyon M, Alexander T, Liu Y, Matthews P, Hallett J, et al. The MRC IEU OpenGWAS data infrastructure. bioRxiv. 2020.
Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014.
Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018.
Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner KM, et al. FinnGen provides genetic insights from a well-phenotyped isolated population. Nature. 2023.
Ojima M, Motooka D, Shimizu K, Gotoh K, Shintani A, Yoshiya K, et al. Metagenomic Analysis Reveals Dynamic Changes of Whole Gut Microbiota in the Acute Phase of Intensive Care Unit Patients. Dig Dis Sci. 2016.
Zaborin A, Smith D, Garfield K, Quensen J, Shakhsheer B, Kade M, et al. Membership and behavior of ultra-low-diversity pathogen communities present in the gut of humans during prolonged critical illness. MBio. 2014.
Potruch A, Schwartz A, Ilan Y. The role of bacterial translocation in sepsis: a new target for therapy. Therapeutic Advances in Gastroenterology. 2022.
Chen L, Deng H, Cui H, Fang J, Zuo Z, Deng J, et al. Inflammatory responses and inflammation-associated diseases in organs. Oncotarget. 2018.
Kurilshikov A, Medina-Gomez C, Bacigalupe R, Radjabzadeh D, Wang J, Demirkan A, et al. Large-scale association analyses identify host factors influencing human gut microbiome composition. Nat Genet. 2021.
Lopera-Maya EA, Kurilshikov A, van der Graaf A, Hu S, Andreu-Sánchez S, Chen L, et al. Effect of host genetics on the gut microbiome in 7,738 participants of the Dutch Microbiome Project. Nat Genet. 2022.
Ahola-Olli A V., Würtz P, Havulinna AS, Aalto K, Pitkänen N, Lehtimäki T, et al. Genome-wide Association Study Identifies 27 Loci Influencing Concentrations of Circulating Cytokines and Growth Factors. Am J Hum Genet. 2017.
Zhao JH, Stacey D, Eriksson N, Macdonald-Dunlop E, Hedman ÅK, Kalnapenkis A, et al. Genetics of circulating inflammatory proteins identifies drivers of immune-mediated disease risk and therapeutic targets. Nat Immunol. 2023.
Constantinescu AE, Bull CJ, Jones N, Mitchell R, Burrows K, Dimou N, et al. Circulating white blood cell traits and colorectal cancer risk: A Mendelian randomisation study. Int J Cancer. 2024.
Chen MH, Raffield LM, Mousas A, Sakaue S, Huffman JE, Moscati A, et al. Trans-ethnic and Ancestry-Specific Blood-Cell Genetics in 746,667 Individuals from 5 Global Populations. Cell. 2020.
Ottensmann L, Tabassum R, Ruotsalainen SE, Gerl MJ, Klose C, Widén E, et al. Genome-wide association analysis of plasma lipidome identifies 495 genetic associations. Nat Commun. 2023.
Chen Y, Lu T, Pettersson-Kymmer U, Stewart ID, Butler-Laporte G, Nakanishi T, et al. Genomic atlas of the plasma metabolome prioritizes metabolites implicated in human diseases. Nat Genet. 2023.
Orrù V, Steri M, Sidore C, Marongiu M, Serra V, Olla S, et al. Complex genetic signatures in immune cells underlie autoimmunity and inform therapy. Nat Genet. 2020.
Moitinho-Silva L, Degenhardt F, Rodriguez E, Emmert H, Juzenas S, Möbus L, et al. Host genetic factors related to innate immunity, environmental sensing and cellular functions are associated with human skin microbiota. Nat Commun. 2022.
Liu X, Tong X, Zhu J, Tian L, Jie Z, Zou Y, et al. Metagenome-genome-wide association studies reveal human genetic impact on the oral microbiome. Cell Discov. 2021.
Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V, Baird D, et al. The MR-base platform supports systematic causal inference across the human phenome. Elife. 2018.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007.
Verbanck M, Chen CY, Neale B, Do R. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat Genet. 2018.
Morrison J, Knoblauch N, Marcus JH, Stephens M, He X. Mendelian randomization accounting for correlated and uncorrelated pleiotropic effects using genome-wide summary statistics. Nat Genet. 2020.
Sanderson E. Multivariable mendelian randomization and mediation. Cold Spring Harb Perspect Med. 2021.
Burgess S, Thompson DJ, Rees JMB, Day FR, Perry JR, Ong KK. Dissecting causal pathways using mendelian randomization with summarized genetic data: Application to age at menarche and risk of breast cancer. Genetics. 2017.
Grant AJ, Burgess S. Pleiotropy robust methods for multivariable Mendelian randomization. Stat Med. 2021.
Carter AR, Sanderson E, Hammerton G, Richmond RC, Davey Smith G, Heron J, et al. Mendelian randomisation for mediation analysis: current methods and challenges for implementation. Eur J Epidemiol. 2021.
Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, et al. Bayesian Test for Colocalisation between Pairs of Genetic Association Studies Using Summary Statistics. PLoS Genet. 2014.
Dobbyn A, Huckins LM, Boocock J, Sloofman LG, Glicksberg BS, Giambartolomei C, et al. Landscape of Conditional eQTL in Dorsolateral Prefrontal Cortex and Co-localization with Schizophrenia GWAS. Am J Hum Genet. 2018.
Frankish A, Diekhans M, Ferreira AM, Johnson R, Jungreis I, Loveland J, et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 2019.
Aguet F, Barbeira AN, Bonazzola R, Brown A, Castel SE, Jo B, et al. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science (80- ). 2020.
McRae AF, Marioni RE, Shah S, Yang J, Powell JE, Harris SE, et al. Identification of 55,000 Replicated DNA Methylation QTL. Sci Rep. 2018.
Ferkingstad E, Sulem P, Atlason BA, Sveinbjornsson G, Magnusson MI, Styrmisdottir EL, et al. Large-scale integration of the plasma proteome with genetics and disease. Nat Genet. 2021.
Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016.
Benjamini Y, Hochberg Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J R Stat Soc Ser B. 1995.
Jiang L, Zheng Z, Fang H, Yang J. A generalized linear mixed model association tool for biobank-scale data. Nat Genet. 2021.
Chiarella P, Capone P, Sisto R. Contribution of Genetic Polymorphisms in Human Health. International Journal of Environmental Research and Public Health. 2023.
Alhashem F, Tiren-Verbeet NL, Alp E, Doganay M. Treatment of sepsis: What is the antibiotic choice in bacteremia due to carbapenem resistant Enterobacteriaceae ? . World J Clin Cases. 2017.
Schrijver IT, Théroude C, Roger T. Myeloid derived suppressor cells sepsis. Frontiers in Immunology. 2019.
Haas CB, Lovászi M, Braganhol E, Pacher P, Haskó G. Ectonucleotidases in Inflammation, Immunity, and Cancer. J Immunol. 2021.
Belinky F, Nativ N, Stelzer G, Zimmerman S, Stein TI, Safran M, et al. PathCards: Multi-source consolidation of human biological pathways. Database. 2015.
Xu L, Zhou Y, Wang G, Bo L, Jin B, Dai L, et al. The UDPase ENTPD5 regulates ER stress-associated renal injury by mediating protein N-glycosylation. Cell Death Dis. 2023.
Sobala LF, Fernandes PZ, Hakki Z, Thompson AJ, Howe JD, Hill M, et al. Structure of human endo-a-1,2-mannosidase (MANEA), an antiviral host-glycosylation target. Proc Natl Acad Sci U S A. 2020.
Burgess S, Davies NM, Thompson SG. Bias due to participant overlap in two-sample Mendelian randomization. Genet Epidemiol. 2016.
Storey JD. A direct approach to false discovery rates. J R Stat Soc Ser B Stat Methodol. 2002.

No competing interests reported.

Download PDF

Reviewers invited by journal
18 Aug, 2024
Editor assigned by journal
18 Aug, 2024
Editor invited by journal
05 Aug, 2024
Submission checks completed at journal
31 Jul, 2024
First submitted to journal
20 Jun, 2024

You are reading this latest preprint version

BioWinfordMR: An Online Platform for Comprehensive Mendelian Randomization Analysis

Status:

Version 1

Abstract

Background

Results

Conclusions

Figures

Background

Methods

Data Acquisition

TwoSampleMR

MVMR

Mediator MR

Colocalization

SMR

Results

Outcome acquisition

Gut microbiota-related exposures via bidirectional MR

Immune-related exposures via bidirectional MR

Mediator analysis

Disease-related genes via SMR

Candidate drug target screening via colocalization

Discussion

Conclusions

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1