Study design
In this MR analysis, we selected cis-expression quantitative trait loci (cis-eQTL) of druggable genes as the exposure variable and employed genome-wide association study (GWAS) data for CKD from the FinnGen database as the outcome variable. This approach was designed to elucidate the complex relationship between the expression of druggable genes and CKD. To identify the most suitable SNPs as instrumental variables, we conducted association analyses of the exposure factors and established stringent inclusion and exclusion criteria. A series of sensitivity analyses was implemented in order to ensure the robustness and reliability of the MR analysis. Furthermore, to elucidate the functional roles of the significant genes identified through MR analysis in terms of cellular composition, biological processes, and molecular functions, as well as to interpret the biological pathways they are involved in according to the Kyoto Encyclopedia of Genes and Genomes (KEGG), we performed comprehensive functional annotations. We conducted Gene Ontology (GO) and KEGG enrichment analyses on these significant genes, and provided detailed information regarding their chromosomal positions. Additionally, we performed colocalization analysis to evaluate whether cis-eQTL and CKD share the same causal variants. The overall design idea is detailed in Figure 2. (Figure 2)
Data sources
By comparing the sequences and structures of proteins present in human blood with those of currently identified drug target proteins, scientists identified several proteins that exhibit structural similarity to known drug targets and possess the potential to be modulated by drug-like small molecules. Consequently, scientists identified a specific group of genes for further investigation(12). Finan identified 1,427 genes encoding drug targets that have been approved or are in the clinical phase, 682 genes encoding proteins that bind to known drug molecules or are similar to approved drug targets, and 2,370 genes that are members of a family of key druggable genes or encode proteins that are distantly similar to approved drug targets(12). These 4,479 druggable genes have also been utilized by researchers to investigate potential therapeutic targets for pulmonary fibrosis (Supplementary Material: Table S1)(5). (Supplementary Material: Table S1). We queried the eQTLGen Consortium database for these 4,479 druggable genes and retrieved blood cis-eQTL data for 2,888 of them(13). The eQTL data facilitated the identification of genetic variants associated with gene expression levels in blood samples, specifically those variants located within 1 Mb of the central location of each gene.
FinnGen is a well-known and large database, and FinnGen provides genetic insights from a phenotypically well-defined and isolated population(14). The CKD outcome data were derived from the FinnGen database, which collected information on 447,473 individuals, including 11,265 in the case group and 436,208 in the control group.
Data availability statement:
The cis-eQTL data were obtained from the eQTLGen Consortium. (https://www.eqtlgen.org/cis-eqtls.html).
The CKD data were obtained from the FinnGen. (https://www.finngen.fi/en/access_results).
The study protocol received approval from the Medical Ethics Committee of the Affiliated Hospital of Guangdong Medical University. All data used in this study are publicly available and listed in Supplementary Material.
Selection of instrumental variables
Reliable and accurate instrumental variables are characterized by three distinctive features: (1) a very strong association with the exposure, (2) no relationship with confounders other than the exposure, and (3) the absence of horizontal pleiotropy. It is essential to note that horizontal pleiotropy should be entirely absent in MR analyses. To identify IVs meeting these criteria, we conducted a rigorous screening of each druggable gene. Initially, we performed association analyses to select SNPs from the cis-eQTL data, ensuring that only SNPs with P-values below the genome-wide significance threshold (5.0 × 10-8) were considered. Secondly, to mitigate the knock-on disequilibrium effects of these SNPs, we utilized the "TwoSampleMR" package in the R programming language to set the linkage disequilibrium (LD) threshold to R² < 0.1, using data from the 1000 Genomes Project (EUR population) with an aggregation distance of 10,000 kb(15). In addition, we calculated the proportion of variance explained (R2) and the F statistic to quantify the strength of the tool with the following equations: R2 = 2 × MAF × (1 - MAF) × β2, and F = R2(n-k-1)/ k(1-R2), where "MAF" is the SNP used as the IV secondary allele frequency, "n" is the sample size, "k" is the number of IVs used(16). To avoid weak IV bias, we selected SNPs with F > 10 as the IVs we analysed.
Mendelian randomization analysis
MR analyses were conducted using the "TwoSampleMR" package in R (version 4.4.1). For the MR analysis, the Wald ratio method was employed when a single SNP served as the IV. When the IV comprised two or more SNPs, we applied five different methods: inverse variance weighting (IVW), MR-Egger, weighted median, simple mode, and weighted mode. We used the IVW method as the primary analysis method because previous studies have shown that the IVW method is more conservative but more robust(17). To account for multiple testing, we applied the False Discovery Rate (FDR) correction to identify significant MR analysis results. Additionally, to ensure the robustness of our findings, we employed several methods for sensitivity analyses. Potential heterogeneity of IVs was assessed using Cochrane's Q test(18). Heterogeneity was considered absent if the p-value of the Cochrane Q test exceeded 0.05. Potential pleiotropy between exposure and outcome was examined using MR-Egger regression. Pleiotropy was considered absent if the p-value for the MR-Egger regression intercept was greater than 0.05(18).
Colocalization
For druggable genes showing significant MR results, colocalization analysis was performed using the R package "coloc"(19). The default a priori probabilities were P1 = 1.0 × 10-4, P2 = 1.0 × 10-4, and P12 = 1.0 × 10-5, indicating that the SNPs were associated with the expression of the druggable genes, the outcome, or both, respectively. Posterior probabilities for the following five hypotheses were generated from the colocalization analysis: PPH 0, not associated with the expression or outcome of a druggable gene; PPH 1, associated with the expression of a druggable gene but not with the outcome; PPH 2, associated with the outcome but not with the expression of a druggable gene; PPH 3, associated with the expression and outcome of a druggable gene with a different causal variant; and PPH 4, association with expression of the druggable genes and outcome, with a shared causal variant. PPH 4 > 0.80 was considered strong.
Gene function and pathway enrichment analysis
GO is the world's largest source of information on gene function. This knowledge is both human-readable and machine-readable, and is the basis for large-scale computational analyses(20, 21). KEGG is a database for understanding high-level functions and utilities of biological systems from molecular-level information. For biological process and pathway enrichment analyses, KEGG and GO analyses were performed using the R "clusterProfiler" software package. Finally, the R "circlize" package was used(22) to display information on the location of these genes on the corresponding chromosomes.