2.1 Research exposure
In this study, we analyzed circulating proteins that were genetically predicted. The magnetic resonance tool for analyzing circulating proteins was developed using data from nine proteomic genome-wide association studies, which had screening criteria of a sample size greater than 500 and measuring more than 50 proteins.7, 9–16 Additional information on these nine studies is available online (Supplementary Table 1). Using serum pQTLs, we identified potential instrumental variables by determining single-nucleotide polymorphisms (SNPs) associated with any protein, as determined by p-value thresholds in various studies (Supplementary Table 1). Additionally, SNPs located in the MHC region of chromosome 6 between 26 and 34 Mb were excluded because of the intricate LD pattern observed in this region. LD aggregation was conducted using a threshold r2 value greater than 0. 01, and an upstream/downstream distance of less than 5000kb in order to discover distinct pQTLs for each protein. Instruments linked to five or more proteins were excluded because of their high pleiotropy. In this study, we divided these tools into cis and trans-pQTLs. Cis pQTLs were identified as pQTLs located within a 500kb region encompassing the associated protein-coding sequence, while trans pQTLs were identified as pQTLs located outside a 500kb region of the protein-coding gene.
2.2 Study outcome
In the discovery phase, Sun et al. conducted a genome-wide association analysis based on a large cohort of the UK Biobank. In this study, participants with alcoholic disorders, harmful alcohol use, confirmed hemochromatosis, viral hepatitis, Wilson's disease, and users of liver-damaging medications were excluded from the analysis.17 In the replication phase, GWAS data for NAFLD (2,568 cases and 409,613 controls) were obtained from the Finnish r10 consortium.18 The appropriate institutional review board approved and informed consent was obtained for this study, which relied on previously published works and public databases.
2.3 MR analysis
In MR analyses, genetically predicted proteins are considered exposures and NAFLD-related traits are considered outcomes. Cis pQTLs and all pQTLs, including both cis and trans, were used to creat an instrument for MR analysis. The MR effect was calculated using the Wald ratio when there was only one pQTLs, and the inverse variance weighting method when there were two or more pQTLs. The instrumental variables generated by our screening strategy may be pleiotropic as they can be related to up to four proteins. To address this problem, associations between protein traits were examined based on MR evidence, specifically focusing on cases where the protein instrument was correlated with several proteins. The following table displays the results of the MR analyses performed with the remaining instruments (excluding those that might be pleiotropic). A sensitivity analysis was conducted to examine the reliability of the study findings and account for any variability and potential horizontal pleiotropy associated with the instrument. Cochran's Q was used to test for heterogeneity among various instruments in the sensitivity analyses, while the MR-Egger method was employed to detect horizontal pleiotropy. In the discovery phase, causal proteins were identified, and replication analysis was conducted using GWAS summary data from the FinnGen R10 consortium. The analysis was conducted with the 'TwoSampleMR' R package. Each outcome was determined to be significant at a level of 0.05, divided by the total number of proteins analyzed. (Supplementary Table 2).
2.4 Co-localization analysis
Surviving results from MR analyses were assessed using Bayesian colocalization analysis to determine the likelihood of each genomic region harboring a single variant influencing protein and NAFLD-related traits, as opposed to the variant itself. Because of their LD relationships, they share a common bond. We examined colocalization using the R package 'coloc'19. With coloc R, posterior probabilities can be generated for five hypotheses (0, 1, 2, 3, and 4) concerning the possibility of a common variant affecting two traits, with hypothesis 4 indicating that both traits are related. Strong evidence of colocalization was determined by posterior probability exceeding 0. 8 for hypothesis 4. The Bayesian colocalization assumption that there is one causal SNP per genetic locus is limited. Genetic loci may contain several causal SNPs.
2.5 Steiger filter analysis
Steiger filter techniques were used on MR associations that passed various testing thresholds with the 'TwoSampleMR' R package to investigate the potential impact of reverse causality on the results. For convenience, categorical variables were also used as a means of facilitating comprehension, indicating 'true' when the effect goes from exposure to outcome at p < 0. 05, 'false' when it reverses at p < 0. 05, and 'false' otherwise, showing "uncertainy" at p ≥ 0. 05.
2.6 Annotation of protein altered variants (PAVs) of cis pQTLs
Proteomic methods based on affinity depend on preserved binding sites. PAVs, which are genetic variations that change the structure of proteins, can cause changes in aptamer binding, resulting in cis pQTLs. Annotating variant effect predictors in Ensembl helped us determine whether these variants were associated with aptamer-binding artifacts. Cis pQTLs with evidence of MR, whether it is a PAV or an LD with a PAV (r2 ≥ 0. 8). Coding sequence variants, frameshift variants, in-frame deletions, in-frame insertions, missense variants, PAVs, splice acceptor variants, splice donor variants, splice region variants, gain of function, loss of function, stop gain, or stop loss are classified by their genetic variations in the Ensembl Variant Effect Predictor20.
2.7 Overlap in assessment between pQTL and eQTL loci is observed.
Genetic variation can quantitatively affect the transcript and protein levels. The current eQTLs and pQTLs show that genetic influences on protein levels are controlled by the regulation of mRNA transcription, potentially enhancing the understanding of pQTLs in a biological context. The investigation of how pQTLs affect plasma protein levels involved the examination of the intersection of pQTLs and eQTLs through a direct search for genes. The Genotype Tissue Expression (GTEx) portal (V8, http://www.gtexportal. org) was utilized for the pQTLs supported by MR evidence.
2.8 Protein-protein interaction (PPI) and functional enrichment analysis
A PPI network was created using the Interacting Gene Retrieval search tool (STRING V11. 521) to examine the connections among the MR-preferred proteins. Additionally, using the “ClusterProfiler”22 R package,
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were conducted to investigate potential enrichment pathways linked to these proteins.
2.9 Plasma proteins represent an important source of druggable targets based on MR prioritization.
Our goal was to determine whether MR-preferred proteins overlapped with druggable genes in Finan et al. 's list of druggable genes. Finan et al. methodically categorized 4479 genes as either drugs or gene sets that can be targeted by drugs, organizing them into three tiers based on their stage in the drug development process. Tier 1 (1427 genes) includeds approved small molecules and biotherapeutics targeting clinical-stage drug candidates; Tier 2 had 682 genes encoding small molecule binding partners with known bioactive drug-like targets and genes with ≥ 50% identity to approved drug targets; Tier 3 consisted of 2370 genes encoding secreted or extracellular proteins with distant similarity to approved drug targets and members of key druggable gene families not in tier 1 or 2. Tier 3 was further divided into genes close (± 50 kbp) to GWAS SNPs with an extracellular location (layer 3A), and the remaining genes in layer 3B. Our analysis concentrated on specific details regarding MR-prioritized proteins, including the importance of the targetable gene, whether the protein product is currently or could possibly be targeted by small molecules, and whether monoclonal antibodies or enzymes could be used as biotherapeutics. The Therapeutic Target Database is available at http://db.idrblab.net/ttd/ and contains annotations of therapeutic targets for MR-favoring proteins. Within the database, there are a total of 3578 drug targets, with 498 targets that have been successfully reached, 1342 targets involved in clinical trials, 185 targets in the preclinical or proprietary stage, and 1553 targets designated for research purposes.23 In our analysis, we focused on the type of target, the drug that targets it, and the disease that the drug targets.