Unveiling Promising Drug Targets for NAFLD through Mendelian Randomization

doi:10.21203/rs.3.rs-4647999/v1

Download PDF

Research Article

Unveiling Promising Drug Targets for NAFLD through Mendelian Randomization

https://doi.org/10.21203/rs.3.rs-4647999/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Objectives

Non-alcoholic fatty liver disease (NAFLD) is a common disorder that with genetic factors and few available treatments. The identification of new drug targets for NAFLD prevention remains a critical requirement.

Methods

In this study, we used Mendelian randomization analysis with summary statistics of NAFLD to investigate potential therapeutic targets. For this task, genetic tools obtained from a recent study that analyzed plasma proteins across nine groups were used. Improving the strength of Mendelian randomization findings can be achieved using two-sample Mendelian randomization analysis, Bayesian colocalization, steiger filter analysis, protein variation assessment, and mapping of expression quantitative trait loci to protein quantitative trait loci. Our goal was to improve our understanding of NAFLD and discover potential opportunities for its treatment by studying protein interactions, pathway enrichment, and drug targets.

Results

In summary, genetically predicted levels of 13 proteins were found to be correlated with the risk of non-alcoholic fatty liver disease (NAFLD). Specifically, elevated levels of nine proteins (ADH1B, TOM1L1, MMP3, GALE, RAB14, SNRPF, ADH1B, SPATA9) and decreased levels of five proteins were associated with an increased susceptibility to NAFLD.

Conclusions

Our thorough examination indicated that genetically determined levels of various circulating proteins are associated with susceptibility to NAFLD. These results imply that targeting these proteins may hold promise as a therapeutic approach for NAFLD and warrants additional clinical scrutiny.

NAFLD

Mendelian randomization

drug target

Due to the rising rates of obesity and metabolic syndrome, non-alcoholic fatty liver disease (NAFLD) is now the primary contributor to chronic liver disease globally and is expected to be a major factor in the development of cirrhosis.¹ Cirrhosis is a severe form of liver illness that impairs liver function, impacts overall well-being, and results in a decline in quality of life.² Currently, histopathological evaluation of liver biopsies is the primary endpoint for conditional drug approval. This particular demand poses a significant obstacle in the industry due to the wide range of variability in invasive histopathological evaluation, leading to exceptionally high rates of screening failures in clinical trials.³ Cause there is currently and no effective treatment.⁴ Therefore, it is of great value to explore the underlying mechanisms and drug targets of NAFLD.

The human proteome provides a significant number of potential targets for therapy.⁵ Proteomics technology can rapidly quantify thousands of proteins in genetically diverse samples and provide a significant number of potential targets for therapy. Combining these data with systems genetics analysis is an efficient approach that can identify economically important or diseases in different populations. Novel regulators of relevant phenotypes and the ability to identify candidate regulators and drug targets in large human joint studies.⁶ The protein quantitative trait loci (pQTLs) in cycling can be identified through genome-wide association studies (GWAS). ⁷ Through mendelian randomization (MR), the combination of pQTLs and disease variants can be used to determine disease.⁷ The use of genetic variation in MR can enhance causal inferences about exposure-outcome relationships by minimizing confounding influences and excluding reverse causality.⁸

Numerous studies have explored the link between circulating proteins and NAFLD, and because of the observational nature of these studies, confounding bias and reverse causation are unavoidable. In addition, some animal experiments have found that relevant circulating proteins are biomarkers, but a causal relationship cannot be determined, and they are not necessarily the causative factors of NAFLD. Hence, more conclusive evidence is required to understand the impact of serum proteins on NAFLD. The goal of this study was to discover potential circulating proteins that may cause NAFLD and new targets for drugs using an MR analysis framework, which could offer insights for the prevention and treatment of NAFLD.

2.1 Research exposure

In this study, we analyzed circulating proteins that were genetically predicted. The magnetic resonance tool for analyzing circulating proteins was developed using data from nine proteomic genome-wide association studies, which had screening criteria of a sample size greater than 500 and measuring more than 50 proteins.^{7, 9–16} Additional information on these nine studies is available online (Supplementary Table 1). Using serum pQTLs, we identified potential instrumental variables by determining single-nucleotide polymorphisms (SNPs) associated with any protein, as determined by p-value thresholds in various studies (Supplementary Table 1). Additionally, SNPs located in the MHC region of chromosome 6 between 26 and 34 Mb were excluded because of the intricate LD pattern observed in this region. LD aggregation was conducted using a threshold r² value greater than 0. 01, and an upstream/downstream distance of less than 5000kb in order to discover distinct pQTLs for each protein. Instruments linked to five or more proteins were excluded because of their high pleiotropy. In this study, we divided these tools into cis and trans-pQTLs. Cis pQTLs were identified as pQTLs located within a 500kb region encompassing the associated protein-coding sequence, while trans pQTLs were identified as pQTLs located outside a 500kb region of the protein-coding gene.

2.2 Study outcome

In the discovery phase, Sun et al. conducted a genome-wide association analysis based on a large cohort of the UK Biobank. In this study, participants with alcoholic disorders, harmful alcohol use, confirmed hemochromatosis, viral hepatitis, Wilson's disease, and users of liver-damaging medications were excluded from the analysis.¹⁷ In the replication phase, GWAS data for NAFLD (2,568 cases and 409,613 controls) were obtained from the Finnish r10 consortium.¹⁸ The appropriate institutional review board approved and informed consent was obtained for this study, which relied on previously published works and public databases.

2.3 MR analysis

In MR analyses, genetically predicted proteins are considered exposures and NAFLD-related traits are considered outcomes. Cis pQTLs and all pQTLs, including both cis and trans, were used to creat an instrument for MR analysis. The MR effect was calculated using the Wald ratio when there was only one pQTLs, and the inverse variance weighting method when there were two or more pQTLs. The instrumental variables generated by our screening strategy may be pleiotropic as they can be related to up to four proteins. To address this problem, associations between protein traits were examined based on MR evidence, specifically focusing on cases where the protein instrument was correlated with several proteins. The following table displays the results of the MR analyses performed with the remaining instruments (excluding those that might be pleiotropic). A sensitivity analysis was conducted to examine the reliability of the study findings and account for any variability and potential horizontal pleiotropy associated with the instrument. Cochran's Q was used to test for heterogeneity among various instruments in the sensitivity analyses, while the MR-Egger method was employed to detect horizontal pleiotropy. In the discovery phase, causal proteins were identified, and replication analysis was conducted using GWAS summary data from the FinnGen R10 consortium. The analysis was conducted with the 'TwoSampleMR' R package. Each outcome was determined to be significant at a level of 0.05, divided by the total number of proteins analyzed. (Supplementary Table 2).

2.4 Co-localization analysis

Surviving results from MR analyses were assessed using Bayesian colocalization analysis to determine the likelihood of each genomic region harboring a single variant influencing protein and NAFLD-related traits, as opposed to the variant itself. Because of their LD relationships, they share a common bond. We examined colocalization using the R package 'coloc'¹⁹. With coloc R, posterior probabilities can be generated for five hypotheses (0, 1, 2, 3, and 4) concerning the possibility of a common variant affecting two traits, with hypothesis 4 indicating that both traits are related. Strong evidence of colocalization was determined by posterior probability exceeding 0. 8 for hypothesis 4. The Bayesian colocalization assumption that there is one causal SNP per genetic locus is limited. Genetic loci may contain several causal SNPs.

2.5 Steiger filter analysis

Steiger filter techniques were used on MR associations that passed various testing thresholds with the 'TwoSampleMR' R package to investigate the potential impact of reverse causality on the results. For convenience, categorical variables were also used as a means of facilitating comprehension, indicating 'true' when the effect goes from exposure to outcome at p < 0. 05, 'false' when it reverses at p < 0. 05, and 'false' otherwise, showing "uncertainy" at p ≥ 0. 05.

2.6 Annotation of protein altered variants (PAVs) of cis pQTLs

Proteomic methods based on affinity depend on preserved binding sites. PAVs, which are genetic variations that change the structure of proteins, can cause changes in aptamer binding, resulting in cis pQTLs. Annotating variant effect predictors in Ensembl helped us determine whether these variants were associated with aptamer-binding artifacts. Cis pQTLs with evidence of MR, whether it is a PAV or an LD with a PAV (r² ≥ 0. 8). Coding sequence variants, frameshift variants, in-frame deletions, in-frame insertions, missense variants, PAVs, splice acceptor variants, splice donor variants, splice region variants, gain of function, loss of function, stop gain, or stop loss are classified by their genetic variations in the Ensembl Variant Effect Predictor²⁰.

2.7 Overlap in assessment between pQTL and eQTL loci is observed.

Genetic variation can quantitatively affect the transcript and protein levels. The current eQTLs and pQTLs show that genetic influences on protein levels are controlled by the regulation of mRNA transcription, potentially enhancing the understanding of pQTLs in a biological context. The investigation of how pQTLs affect plasma protein levels involved the examination of the intersection of pQTLs and eQTLs through a direct search for genes. The Genotype Tissue Expression (GTEx) portal (V8, http://www.gtexportal. org) was utilized for the pQTLs supported by MR evidence.

2.8 Protein-protein interaction (PPI) and functional enrichment analysis

A PPI network was created using the Interacting Gene Retrieval search tool (STRING V11. 5²¹) to examine the connections among the MR-preferred proteins. Additionally, using the “ClusterProfiler”²² R package,

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses were conducted to investigate potential enrichment pathways linked to these proteins.

2.9 Plasma proteins represent an important source of druggable targets based on MR prioritization.

Our goal was to determine whether MR-preferred proteins overlapped with druggable genes in Finan et al. 's list of druggable genes. Finan et al. methodically categorized 4479 genes as either drugs or gene sets that can be targeted by drugs, organizing them into three tiers based on their stage in the drug development process. Tier 1 (1427 genes) includeds approved small molecules and biotherapeutics targeting clinical-stage drug candidates; Tier 2 had 682 genes encoding small molecule binding partners with known bioactive drug-like targets and genes with ≥ 50% identity to approved drug targets; Tier 3 consisted of 2370 genes encoding secreted or extracellular proteins with distant similarity to approved drug targets and members of key druggable gene families not in tier 1 or 2. Tier 3 was further divided into genes close (± 50 kbp) to GWAS SNPs with an extracellular location (layer 3A), and the remaining genes in layer 3B. Our analysis concentrated on specific details regarding MR-prioritized proteins, including the importance of the targetable gene, whether the protein product is currently or could possibly be targeted by small molecules, and whether monoclonal antibodies or enzymes could be used as biotherapeutics. The Therapeutic Target Database is available at http://db.idrblab.net/ttd/ and contains annotations of therapeutic targets for MR-favoring proteins. Within the database, there are a total of 3578 drug targets, with 498 targets that have been successfully reached, 1342 targets involved in clinical trials, 185 targets in the preclinical or proprietary stage, and 1553 targets designated for research purposes.²³ In our analysis, we focused on the type of target, the drug that targets it, and the disease that the drug targets.

3.1 Genetic tools to identify plasma proteins

Genetic tools were constructed by screening pQTLs from nine proteomic GWAS, with the selection process summarized in Fig. 1. Following the screening process, we selected 8285 protein quantitative trait loci (pQTLs) associated with 4421 proteins (2518 of which are unique) to for use as resources for MR analysis (Supplementary Table 3). We categorized the tools into cis and trans pQTLs, with 3811 cis pQTLs corresponding to 2958 proteins (1558 distinct proteins) and 4474 trans pQTLs corresponding to 2374 proteins (1763 distinct proteins), respectively. Four thousand and twenty-nine of the 4421 proteins analyzed showed both cis and trans pQTLs effects, 2047 showed only cis effects, and 1463 showed only trans effects.

Magnetic resonance imaging was used to evaluate the impact of plasma proteins on NAFLD. Given that cis pQTLs are believed to be more likely to have specific biological effects than trans pQTL, we initially utilized cis pQTLs as a genetic instrument in MR analysis to comprehensively assess the support for characteristics of NAFLD caused by plasma proteins. In total, four connections between proteins and phenotypes were discovered at the Bonferroni adjusted level (Fig. 2A, Table 1, and Supplementary Table 3). In the replication stage, three proteins were successfully validated in the FinnGen dataset (P < 0.05) based on the Wald ratio or IVW method. (Table 1, Supplementary Table 4). The majority of these important and seemingly important connections for specific proteins showed consistent trends, indicating that these NAFLD characteristics may share a common cause. Including trans pQTLs in MR analyses could enhance the credibility of protein-phenotype connections. Therefore, our MR analysis was extended to all (cis + trans) pQTLs. In our examination, we identified 11 protein characteristics linked to the signs of MR, with the majority of connections not found in the cis pQTLs investigation. Similarly, replication analysis in the FinnGen cohort obtained similar results for the nine plasma proteins. (Fig. 2B, Table 2, Supplementary Tables 5 and 6).

3.3 Colocalization of pQTL with NAFLD risk loci

Colocalization analyses were conducted to assess the impact of linkage disequilibrium on Mendelian randomization associations with the aim of identifying whether genetic links to proteins and traits are influenced by the same underlying variant. Colocalization of proteins using single-instrument variables and accessible GWAS summary data. A strong colocalization signal was observed in only one of the 4 cis MR preferential associations examined for colocalization (Supplementary Table 7). Of the 8 associations analyzed for colocalization with all pQTLs prioritized by MR, 2 showed strong evidence of colocalization (Supplementary Table 8).

3.4 Testing causal direction using Steiger filter analysis

To determine whether the associations identified by MR between proteins and NAFLD-related traits were due to reverse causation, a directionality test known as Steiger filter analysis was conducted. The findings indicated that all relationships discovered through MR, including sensitivity analysis, correctly showed a causal relationship between proteins and traits related to NAFLD (Supplementary Tables 9–10).

3.5 PAVs assessment of cis pQTLs

Changes in the genetic code can affect the amino acid sequence and shape of a protein, which can in turn affect the binding ability of aptamers and result in inaccurate measurements. Therefore, pQTLs with evidence of cis-MR were evaluated for PAVs. Four of the 4 cis-pQTLs predicted proteins had 3 pQTLs in linkage disequilibrium (r² > 0. 8) with PAVs that influenced the genes, potentially indicating epitope-binding artifacts (Supplementary Table 11).

3.6 Determine overlap between pQTLs and eQTLs

This study examined whether the connection between certain pQTLs and plasma protein levels is influenced by transcription effects rather than by other factors by analyzing the intersection of pQTLs and eQTLs. In the MR analysis utilizing only cis pQTLs, 5 cis pQTLs were employed as protein instruments with MR support, among which 1 them shared a significant eQTL in at least one tissue, and according to GTEx Items exhibited consistent directions of effect (Supplementary Table 12). In the main MR analysis, all pQTLs (cis + trans) and 11 pQTLs were tools for MR-preferred proteins, of which 2 were identified as cis-pQTLs and 9 as trans-pQTLs. In at least one tissue, one variant overlapped the corresponding eQTLs in the same direction for the two cis pQTLs (Supplementary Table 14). Conversely, there was no overlap between the nine trans pQTLs or their proxies and the eQTLs (Supplementary Table 13).

3.7 Study on PPI and enrichment pathways of multidrug resistance priority proteins

Our goal was to develop a deeper understanding of the pathogenesis of NAFLD through analysis of PPI and enrichment pathways related to MR-preferred proteins. Proteins selected by MR through cis pQTLs were found to have significant interactions in the PPI network, with 4 nodes and two edges in a medium confidence interval of 0. 4, there is a positive interaction score that surpasses the anticipated 1 edge. This suggests a higher level of interactions in this network compared to a randomly selected set from a genome of similar size (enrichment p-value of 0.0924) (Supplementary Fig. 1). The PPI network for proteins preferred in cis + transMR contained 11 nodes and 1 edge, with an expected number of edges of 1 and an enrichment p-value of 0. 733 (Supplementary Fig. 3).

GO pathway enrichment analysis revealed that, multiple pathways associated with OA biology were enriched. Cis MR preferential proteins were enriched for negative regulation of cholesterol biosynthetic process, AMPA glutamate receptor clustering, glutamate receptor clustering, negative regulation of sterol biosynthetic process, regulation of fear response, regulation of CoA-transferase activity, alcohol metabolic process, and negative regulation of pathways such as cholesterol metabolic process (Supplementary Table 14, Supplementary Fig. 2A). At the same time, cis + trans MR preferential proteins were enriched in Golgi stack, the Golgi apparatus sub-compartment, clathrin-coated vesicle, low-density lipoprotein particle, chylomicron, U4 snRNP, RNA polymerase I complex, methylosome, and other pathways (Supplementary Table 15, Supplementary Fig. 2B). Cis MR-preferential proteins were enriched in KEGG pathways related to tyrosine metabolism, fatty acid degradation, pyruvate metabolism, cholesterol metabolism, Glycolysis Gluconeogenesis, Retinol metabolism, drug metabolism-cytochrome P450, Metabolism of xenobiotics by cytochrome P450, Alcoholic liver disease, and Alzheimer disease (Online Supplementary Table 16 and Online Supplementary Fig. 3A). KEGG pathway analysis did not reveal significant enrichment for proteins preferred by cis + transMR (Supplementary Table 17,Supplementary Fig. 3B).

3.8 Evaluating MR-Prioritized Proteins as Drug Targets

The likelihood of drugs targeting MR-evidenced human proteins was assessed, as human proteins are significant therapeutic targets. As a starting point, we examined MR-prioritized proteins in relation to Finan et al., who identified druggable genes. 13 proteins were tested for druggability, with eight having druggable targets, including three of tiers 1, one tiers 2, and three tiers 3A proteins (Table 3, Supplementary table 18). The therapeutic target database identified four proteins as potential targets for current or future medications, with one being the focus of clinical trials, two having patents, and one mentioned in the literature (Table 3, Supplementary Table 18). The majority of proteins with drugs that can be targeted according to the therapeutic target database overlapped with druggable genes identified by Finan et al., whereas none of the proteins without targetable drugs were found to overlap with druggable genes (0/6).

Our research utilized a comprehensive method that merged MR, colocalization, steiger filter analysis, PAV evaluation, eQTLs overlap determination, PPI examination, pathway enrichment, and drug target assessment to analyze information from nine extensive proteomic GWAS. Numerous plasma proteins play a causal role in the development of NAFLD. Relevant characteristics. Overall, 4 distinct proteins were found to be linked to NAFLD characteristics through cis pQTLs in the MR analysis, whereas 11 distinct proteins were found using all pQTLs. The replication MR analysis validated 10 of these 13 candidate proteins. Several crucial statistical analyses were conducted to confirm these conclusions and investigate possible regulatory mechanisms and drug targets for the identified proteins.

NAFLD and fibrosis stage serum biomarkers have become a focus of current clinical research. However, there are still differences in the diagnostic efficiencies of the different studies. For example, most non-invasive serum biomarkers can only identify ≥ F3 liver fibrosis.²⁴ Furthermore, proteins with varying levels are indicators of NAFLD but may not directly contribute to the development of NAFLD, highlighting the limitations of observational research. Another study used two-sample MR to explore the causal impact of the plasma proteome on metabolic dysfunction-associated fatty liver disease.²⁵ In contrast to our research, this study utilized pQTLs data from a singular proteome GWAS without distinguishing between cis and trans effects.

Some of the proteins identified by MR analysis through cis pQTLs in our study have been linked to NAFLD. For example, Apolipoprotein E (ApoE) was negatively correlated with NAFLD in our study, and ApoE deficiency can regulate the AMPK/mTOR pathway, which is likely to cause NAFLD by regulating liver mitochondrial function.²⁶ Furthermore, it is crucial to investigate ApoE genetic variations in order to gain a deeper insight into the underlying causes of liver conditions. A specific study revealed that individuals carrying the rs429358 C (APOE) allele have a lower likelihood of developing hepatocellular carcinoma in cirrhotic patients.²⁷

Recent studies by Chen et al. have shown that alcohol dehydrogenase 1B (ADH1B) diverts/reduces TG from serum and increases TG levels, fatty acid content of phospholipids, and other substrates in the liver, thereby causing NAFLD.⁴ In our study, ADH1B was positively correlated with NAFLD, providing key evidence for that ADH1B is a potential therapeutic target for NAFLD. Some MR preferential proteins, previously unlinked to NAFLD, may have evidence suggesting a potential role in the development of NAFLD. The Neurocan (NCAN) - Cartilage intermediate layer protein 2 (CILP2) domain creates a strong LD block and is linked to plasma lipid levels and NAFLD in individuals of European descent, leading to decreased plasma lipid levels, etc.²⁸

Certain genetic variations are associated with an increased risk of developing non-alcoholic fatty liver disease.

Our research utilized MR analysis of all prioritized proteins from pQTLs, revealing some proteins with limited evidence of cis-MR. Some proteins have also been linked to NAFLD. In addition, previous studies have provided supporting evidence for our findings. Metalloproteinases are involved in various liver functions. Matrix metallopeptidase 3 (MMP3) is crucial for the remodeling of connective tissue and the breakdown of proteoglycans, fibronectin, laminin, and elastin. Elevated MMP3 levels are associated with various clinical and immunological factors. There was a positive correlation between biological parameters and advanced liver fibrosis. The present study revealed an inverse relationship between serum MMP3 and NAFLD, further reinforcing the notion that MMP3 has a protective effect on NAFLD.²⁹ Further exploration is warranted for the roles and mechanisms of other proteins prioritized for our study in NAFLD using all pQTLs for MR analysis. For example, our study found a positive correlation between serum levels of small nuclear ribonucleoprotein F (SNRPF) and NAFLD. SNRPF may play a crucial role in coordinating this inflammatory immune response.³⁰ Certain proteins preferred by MR that have not been linked to NAFLD may still show potential evidence suggesting their role in the development of NAFLD. UDP-galactose-4-epimerase (GALE) plays a crucial role in the metabolism of nucleotide sugars in the human cells. Knocking out GALE will affect glycoconjugate biosynthesis and receptor signaling, thereby increasing the risk of diseases such as metabolic syndrome, thrombocytopenia, and galactosemia.³¹ It is reported that POLR3-related leukodystrophy is a type of hypomyelinating leukodystrophy with specific characteristics on brain MRI. DNA-directed RNA polymerases I and III subunits RPAC1 (POLR1C) and POLR3 are alleles that have been proven to be related to endocrine abnormalities. POLR1C affects the incidence of NAFLD and deserves further study explore.³²

A series of subsequent analyses were conducted based on the MR results. Using Bayesian colocalization, we assessed whether genetic confounding can be caused by linkage disequilibrium between the pQTLs and NAFLD-associated SNPs. Only a minority of the connections examined in our study displayed convincing evidence of colocalization, which supports a causal relationship. In reality, the assumption that there is only one related signal per region in Bayesian colocalization methods may not be accurate, potentially resulting in an underestimation of colocalization.²⁵ A Steiger filter analysis was conducted to examine reverse causality, which showed all connections. Our MR analysis identified a direct causal link between these proteins and the characteristics associated with NAFLD. PAVs have the ability to change the structure of proteins, which in turn can affect the affinity of aptamers, leading to the creation of measurement errors.³³ Our research found that It appears that aptamer binding artifacts could contribute to the causal connection between cis MR-preferred proteins and their phenotypes in less than half of the pQTLs associated with the proteins are either PAVs or in LD with PAVs. Comparing pQTLs and eQTLs can provide insights into whether genetic variants act as mediators of their associations with genes. MR-expressed proteins showed some overlap between their cis pQTLs and the corresponding eQTLs, whereas trans pQTLs did not overlap consistently with eQTLs. Trans-pQTLs are more complex in terms of their regulation mechanism, whereas cis pQTLs have higher biological interpretability. Pathway enrichment and PPI analyses were performed on the identified proteins in order to gain a deeper understanding of their functions and interactions. Based on our study, we suspect that proteins are functionally correlated since the PPI network was significantly enriched. Furthermore, our pathway enrichment analysis identified multiple pathways that might be associated with NAFLD, indicating that our results are biologically meaningful.

Since no disease-modifying drugs for NAFLD and NAFLD-related cirrhosis are currently approved, there is an urgent need to develop drugs.³ The use of MR is becoming more common as a standard method for evaluating potential new drug targets.⁵ Hence, proteins supported by MR evidence were assessed as potential drug targets in order to aid in prioritizing drug development efforts and repurposing current medications for NAFLD.

This study had several limitations. The first step in our approach was to test the impact of circulating proteins, which can include proteins that have been intentionally secreted or leaky. Variations in the levels of these proteins in circulation may not reflect their levels within cells and tissues, leading to a lack of investigation into the potential impacts of cell- or tissue-specific protein levels. Furthermore, our capacity to restrict non-European participants was constrained by the lack of individual-level statistics in NAFLD GWAS, resulting in a study cohort predominantly composed of individuals of European descent, potentially restricting the applicability of the results to other populations. Additionally, in cis pQTLs coding mutations that alter amino acid sequences, measurements may only be affected quantitatively, without affecting function or levels, which may lead to incorrect conclusions. Nonetheless, PAV assessments have been performed on cis pQTLs, providing clues to this question. Fourth, Due to the inclusion of thousands of pQTLs, the correction burden has greatly increased, which will lead to false negatives in some results, resulting in some proteins not being proven. Fifth, some loci identified using all pQTLs have not been proven to be related to NAFLD in previous studies. These loci may represent novel discoveries that require further research. Finally, this study used a publicly available dataset, which may limit novelty, as it does not provide a new and unique data source for research.

Overall, by conducting MR analysis on the two sets of samples, we discovered numerous circulating proteins that play a causal role in NAFLD-related characteristics. Furthermore, our study offers insights into the validity of MR outcomes and potential therapeutic targets for NAFLD within an analytical framework. Further research is needed to clarify the disease-causing functions and fundamental biological processes of the proteins associated with NAFLD.

CONFLICT OF INTEREST STATEMENT

The authors have no conflict of interest.

ETHICS STATEMENT

Since this study is based on existing publications and public databases, both ethical approval and informed consent were received by each relevant institutional review committee.

FUNDING INFORMATION

This study was supported by the Scientific Research Project of the Hubei Provincial Health Commission,China (WJ2021F060).

Author Contribution

Gang Lei: Project administration (equal); acquisition, analysis, and interpretation of research data; writing – original draft (lead). Dai Chibing: Project administration (equal); writing, review, and editing (equal).

Acknowledgement

The authors thank the participants and investigators for providing publicly available summary statistics.

DATA AVAILABILITY STATEMENT

All raw data and code are available upon request.

Li B, Zhang C: Zhan YT:Nonalcoholic Fatty Liver Disease Cirrhosis: A Review of Its Epidemiology, Risk Factors, Clinical Presentation, Diagnosis, Management, and Prognosis. Canadian journal of gastroenterology & hepatology 2018, 2018: 2784537.
Nascimento JCR, Matos GA, Pereira LC, Mourão A, Sampaio AM, Oriá RB: Toniutto P:Impact of apolipoprotein E genetic polymorphisms on liver disease: An essential review. Annals of hepatology 2020, 19: 24–30.
Harrison SA, Allen AM, Dubourg J, Noureddin M: Alkhouri N:Challenges and opportunities in NASH drug development. Nature medicine 2023, 29: 562–573.
Chen Y, Du X, Kuppa A, Feitosa MF, Bielak LF, O'Connell JR, Musani SK, Guo X, Kahali B, Chen VL, et al:Genome-wide association meta-analysis identifies 17 loci associated with nonalcoholic fatty liver disease. Nature genetics 2023, 55: 1640–1650.
Zheng J, Haberland V, Baird D, Walker V, Haycock PC, Hurle MR, Gutteridge A, Erola P, Liu Y, Luo S, et al:Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nature genetics 2020, 52: 1122–1131.
Molendijk J: Parker BL:Proteome-wide Systems Genetics to Identify Functional Regulators of Complex Traits. Cell systems 2021, 12: 5–22.
Ferkingstad E, Sulem P, Atlason BA, Sveinbjornsson G, Magnusson MI, Styrmisdottir EL, Gunnarsdottir K, Helgason A, Oddsson A, Halldorsson BV, et al:Large-scale integration of the plasma proteome with genetics and disease. Nature genetics 2021, 53: 1712–1721.
Sheehan NA, Didelez V, Burton PR: Tobin MD:Mendelian randomisation and causal inference in observational epidemiology. PLoS medicine 2008, 5: e177.
Folkersen L, Fauman E, Sabater-Lleal M, Strawbridge RJ, Frånberg M, Sennblad B, Baldassarre D, Veglia F, Humphries SE, Rauramaa R, et al:Mapping of 79 loci for 83 plasma protein biomarkers in cardiovascular disease. PLoS genetics 2017, 13: e1006706.
Gilly A, Park YC, Png G, Barysenka A, Fischer I, Bjørnland T, Southam L, Suveges D, Neumeyer S, Rayner NW, et al:Whole-genome sequencing analysis of the cardiometabolic proteome. Nature communications 2020, 11: 6336.
Gudjonsson A, Gudmundsdottir V, Axelsson GT, Gudmundsson EF, Jonsson BG, Launer LJ, Lamb JR, Jennings LL, Aspelund T, Emilsson V, et al:A genome-wide association study of serum proteins reveals shared loci with common diseases. Nature communications 2022, 13: 480.
Hillary RF, McCartney DL, Harris SE, Stevenson AJ, Seeboth A, Zhang Q, Liewald DC, Evans KL, Ritchie CW, Tucker-Drob EM, et al:Genome and epigenome wide studies of neurological protein biomarkers in the Lothian Birth Cohort 1936. Nature communications 2019, 10: 3160.
Pietzner M, Wheeler E, Carrasco-Zanini J, Raffler J, Kerrison ND, Oerton E, Auyeung VPW, Luan J, Finan C, Casas JP, et al:Genetic architecture of host proteins involved in SARS-CoV-2 infection. Nature communications 2020, 11: 6397.
Yao C, Chen G, Song C, Keefe J, Mendelson M, Huan T, Sun BB, Laser A, Maranville JC, Wu H, et al:Genome-wide mapping of plasma protein QTLs identifies putatively causal genes and pathways for cardiovascular disease. Nature communications 2018, 9: 3268.
Suhre K, Arnold M, Bhagwat AM, Cotton RJ, Engelke R, Raffler J, Sarwath H, Thareja G, Wahl A, DeLisle RK, et al:Connecting genetic risk to disease end points through the human blood plasma proteome. Nature communications 2017, 8: 14357.
Sun BB, Maranville JC, Peters JE, Stacey D, Staley JR, Blackshaw J, Burgess S, Jiang T, Paige E, Surendran P, et al:Genomic atlas of the human plasma proteome. Nature 2018, 558: 73–79.
Sun Z, Pan X, Tian A, Surakka I, Wang T, Jiao X, He S, Song J, Tian X, Tong D, et al:Genetic variants in HFE are associated with non-alcoholic fatty liver disease in lean individuals. JHEP reports : innovation in hepatology 2023, 5: 100744.
Kurki MI, Karjalainen J, Palta P, Sipilä TP, Kristiansson K, Donner KM, Reeve MP, Laivuori H, Aavikko M, Kaunisto MA, et al:FinnGen provides genetic insights from a well-phenotyped isolated population. Nature 2023, 613: 508–518.
Wallace C:A more accurate method for colocalisation analysis allowing for multiple causal variants. PLoS genetics 2021, 17: e1009440.
McLaren W, Gil L, Hunt SE, Riat HS, Ritchie GR, Thormann A, Flicek P: Cunningham F:The Ensembl Variant Effect Predictor. Genome biology 2016, 17: 122.
Szklarczyk D, Kirsch R, Koutrouli M, Nastou K, Mehryary F, Hachilif R, Gable AL, Fang T, Doncheva NT, Pyysalo S, et al:The STRING database in 2023: protein-protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research 2023, 51: D638-d646.
Wu T, Hu E, Xu S, Chen M, Guo P, Dai Z, Feng T, Zhou L, Tang W, Zhan L, et al:clusterProfiler 4.0: A universal enrichment tool for interpreting omics data. Innovation (Cambridge (Mass.)) 2021, 2: 100141.
Zhou Y, Zhang Y, Lian X, Li F, Wang C, Zhu F, Qiu Y: Chen Y:Therapeutic target database update 2022: facilitating drug discovery with enriched comparative data of targeted agents. Nucleic acids research 2022, 50: D1398-d1407.
Reinson T, Buchanan RM: Byrne CD:Noninvasive serum biomarkers for liver fibrosis in NAFLD: current and future. Clinical and molecular hepatology 2023, 29: S157-s170.
Liu J, Hu S, Chen L, Daly C, Prada Medina CA, Richardson TG, Traylor M, Dempster NJ, Mbasu R, Monfeuga T, et al:Profiling the genome and proteome of metabolic dysfunction-associated steatotic liver disease identifies potential therapeutic targets. medRxiv : the preprint server for health sciences 2023.
Lu W, Mei J, Yang J, Wu Z, Liu J, Miao P, Chen Y, Wen Z, Zhao Z, Kong H, et al:ApoE deficiency promotes non-alcoholic fatty liver disease in mice via impeding AMPK/mTOR mediated autophagy. Life sciences 2020, 252: 117601.
Innes H, Nischalke HD, Guha IN, Weiss KH, Irving W, Gotthardt D, Barnes E, Fischer J, Ansari MA, Rosendahl J, et al:The rs429358 Locus in Apolipoprotein E Is Associated With Hepatocellular Carcinoma in Patients With Cirrhosis. Hepatology communications 2022, 6: 1213–1226.
Boonvisut S, Nakayama K, Makishima S, Watanabe K, Miyashita H, Lkhagvasuren M, Kagawa Y: Iwamoto S:Replication analysis of genetic association of the NCAN-CILP2 region with plasma lipid levels and non-alcoholic fatty liver disease in Asian and Pacific ethnic groups. Lipids in health and disease 2016, 15: 8.
Bauer A: Habior A:Concentration of Serum Matrix Metalloproteinase-3 in Patients With Primary Biliary Cholangitis. Frontiers in immunology 2022, 13: 885229.
Bandesh K: Bharadwaj D:Genetic variants entail type 2 diabetes as an innate immune disorder. Biochimica et biophysica acta. Proteins and proteomics 2020, 1868: 140458.
Broussard A, Florwick A, Desbiens C, Nischan N, Robertson C, Guan Z, Kohler JJ, Wells L: Boyce M:Human UDP-galactose 4'-epimerase (GALE) is required for cell-surface glycome structure and function. The Journal of biological chemistry 2020, 295: 1225–1239.
Bernard G: Vanderver A, POLR3-Related Leukodystrophy. In GeneReviews(®), Adam, M. P., Feldman, J., Mirzaa, G. M., Pagon, R. A., Wallace, S. E., Bean, L. J. H., Gripp, K. W.: Amemiya, A., Eds. University of Washington, Seattle Copyright © 1993–2024, University of Washington, Seattle. GeneReviews is a registered trademark of the University of Washington, Seattle. All rights reserved.: Seattle (WA), 1993.
Muñoz VR, Gaspar RC, Kuga GK, Nakandakari S, Baptista IL, Mekary RA, da Silva ASR, de Moura LP, Ropelle ER, Cintra DE, et al:Exercise decreases CLK2 in the liver of obese mice and prevents hepatic fat accumulation. Journal of cellular biochemistry 2018, 119: 5885–5892.

TABLE 1 Identification of cis-pQTLs that explain protein-phenotype associations

Uniprot	Protein	Discovery: UKB		Replication: FinnGen		Direction
Uniprot	Protein	OR (95%)	P	OR (95%)	P	Direction
O14594	Neurocan(NCAN)‡	0.839 (0.812,0.867)	2.13E-26	0.303 (0.226,0.405)	1.00E-15	Negative
P00325	Alcohol dehydrogenase 1B(ADH1B)‡	1.102 (1.056,1.149)	6.46E-06	1.373 (0.548,3.444)	0.499	Positive
Q8IUL8	Cartilage intermediate layer protein 2(CILP2)‡	0.689 (0.620,0.765)	3.79E-12	0.160 (0.058,0.446)	5.00E-04	Negative
P02649	Apolipoprotein E(APOE)§	0.948 (0.932,0.964)	8.91E-10	0.793 (0.683,0.921)	0.002	Negative

‡Ferkingstadet al.

§Suhre et al.

pQTLs,protein quantitative trait loci.

Table 2

Identification of all the pQTLs that explain protein-phenotype associations
Uniprot	Protein	Discovery: UKB		Replication: FinnGen		Direction
		OR (95%)	P	OR (95%)	P
O15160	DNA-directed RNA polymerases I and III subunit RPAC1(POLR1C)*	0.873 (0.840,0.908)	8.54E-12	0.529 (0.370,0.757)	5.00E-04	Negative
O75674	TOM1-like protein 1(TOM1L1)*	1.119 (1.085,1.154)	1.71E-12	1.843 (1.350,2.517)	1.00E-04	Positive
P00325	Alcohol dehydrogenase1B(ADH1B)‡	1.102 (1.056,1.149)	6.46E-06	1.373 (0.548,3.444)	0.499	Positive
P02649	Apolipoprotein E(APOE)§	0.948 (0.932,0.964)	8.91E-10	0.793 (0.683,0.921)	0.002	Negative
P08254	Stromelysin-1(MMP3)‡	1.136 (1.091,1.183)	8.35E-10	2.016 (1.347,3.017)	0.001	Positive
P49760	Dual specificity protein kinase CLK2(CLK2)*	0.860 (0.824,0.898)	8.54E-12	0.494 (0.333,0.734)	5.00E-04	Negative
P61106	Ras-related protein Rab-14(RAB14)†	1.070 (1.047,1.094)	8.35E-10	1.451 (1.172,1.798)	0.001	Positive
P62306	Small nuclear ribonucleoprotein F(SNRPF)§	1.053 (1.036,1.070)	8.91E-10	1.250 (1.082,1.443)	0.002	Positive
Q14376	UDP-galactose-4-epimerase(GALE)‡	1.218 (1.144,1.297)	8.35E-10	2.955 (1.585,5.509)	0.001	Positive
Q9BWV2	Spermatogenesis-associated protein 9(SPATA9)‡	1.131 (1.071,1.194)	8.14E-06	1.483 (0.755,2.914)	0.253	Positive
Q9H3Q3	Galactose-3-O-sulfotransferase 2(GAL3ST2)*	0.861 (0.826,0.897)	1.71E-12	0.442 (0.292,0.670)	1.00E-04	Negative

*Gudjonsson et al.

†Sun et al.

‡Ferkingstadet al.

§Suhre et al.

pQTLs,protein quantitative trait loci.

Table 3

Proteins that MR prioritized as potential drug targets
Uniprot	Protein	Druggability tier*	Target type†
Q9H3Q3	Galactose-3-O-sulfotransferase 2(GAL3ST2)	/	/
Q9BWV2	Spermatogenesis-associated protein 9(SPATA9)	/	/
Q8IUL8	Cartilage intermediate layer protein 2(CILP2)‡	Tier 3A	/
Q14376	UDP-galactose-4-epimerase(GALE)	Tier 2	Literature-reported Target
P62306	Small nuclear ribonucleoprotein F(SNRPF)	/	/
P61106	Ras-related protein Rab-14(RAB14)	/	/
P49760	Dual specificity protein kinase CLK2(CLK2)	Tier 1	Patented-recorded Target
P08254	Stromelysin-1(MMP3)	Tier 1	Patented-recorded Target
P02649	Apolipoprotein E(APOE)§	Tier 3A	Clinical trial Target
P00325	Alcohol dehydrogenase 1B(ADH1B)‡	Tier 1	/
O75674	TOM1-like protein 1(TOM1L1)	/	/
O15160	DNA-directed RNA polymerases I and III subunit RPAC1(POLR1C)	/	/
O14594	Neurocan(NCAN)‡	Tier 3A	/

* Using Finan et al.'s druggable genes.

† Using a therapeutic target database

No competing interests reported.

Supplementalmaterial.xlsx

Download PDF

Editor assigned by journal
28 Jun, 2024
Submission checks completed at journal
28 Jun, 2024
First submitted to journal
27 Jun, 2024

You are reading this latest preprint version

Unveiling Promising Drug Targets for NAFLD through Mendelian Randomization

Status:

Version 1

Abstract

Figures

1. INTRODUCTION

2. MATERIALS AND METHODS

2.1 Research exposure

2.2 Study outcome

2.3 MR analysis

2.4 Co-localization analysis

2.5 Steiger filter analysis

2.6 Annotation of protein altered variants (PAVs) of cis pQTLs

2.7 Overlap in assessment between pQTL and eQTL loci is observed.

2.8 Protein-protein interaction (PPI) and functional enrichment analysis

2.9 Plasma proteins represent an important source of druggable targets based on MR prioritization.

3. RESULTS

3.1 Genetic tools to identify plasma proteins

3.3 Colocalization of pQTL with NAFLD risk loci

3.4 Testing causal direction using Steiger filter analysis

3.5 PAVs assessment of cis pQTLs

3.6 Determine overlap between pQTLs and eQTLs

3.7 Study on PPI and enrichment pathways of multidrug resistance priority proteins

3.8 Evaluating MR-Prioritized Proteins as Drug Targets

4. DISCUSSION

Declarations

CONFLICT OF INTEREST STATEMENT

ETHICS STATEMENT

FUNDING INFORMATION

Author Contribution

Acknowledgement

DATA AVAILABILITY STATEMENT

References

Tables

Additional Declarations

Supplementary Files

Status:

Version 1