The 25-gene signature diagnostic of NASH and liver fibrosis in existing discovery, validation cohorts and Chinese cohorts
As described in the methods, eight datasets were identified and divided into four discovery and validation cohorts (Fig. 1). Based on the multi-cohort analysis, twenty-five genes, including CCL23, RPL19, KRT10, IQCH-AS1, NDUFA2, CBS, IFIT1, RPS7, DNAJC12, STXBP6, SULT1A2, ANKRD37, GNAO1, KRT8 PKMYT1, ARHGAP9, ANXA5, DECR1, TNFRSF14, CDCP1, KIAA1522, STRADA, LGALS3, WARS2, NPM2, were significantly differentially expressed between patients who progress to NASH and CON/NAFLD in discovery and validation cohorts (false discovery rate [FDR] < 40%). We calculated the liver fibrosis score for each sample and meta-scores for each cohort according to the criteria in Methods (Fig. 2A). The Liver fibrosis scores differentiated NASH from CON/NAFLD by the receiver operating characteristic curve value (ROC) shown as a summary area under the curve (AUC) = 0.95 with 95% confidence interval (CI) [0.79−0.99] in the discovery cohorts (Fig. 2B). Then we validated the Liver fibrosis scores using the validation cohorts, the liver fibrosis score of the representative genes are shown in Fig. 3A, the others are in Supplementary Fig. S1. The Liver fibrosis scores differentiated NASH from NAFLD with a summary area under the curve (AUC) = 0.9 with 95% CI [0.86−0.94] in the validation cohorts (Fig. 3B).
The Liver fibrosis scores were further validated by collecting plasma from Chinese patients with NASH or liver fibrosis (Fig. 3A). In the validation 2 (NASH and NAFLD group), the Liver fibrosis scores differentiated NASH from NAFLD with AUC=0.94, 95% CI [0.89−0.98]. Next, we evaluated the predictive power of the Liver fibrosis scores in independent validation 1 (Liver fibrosis cohort) with AUC= 0.84, 95% CI [0.77−0.91] (Fig. 3B).
Pathway analysis of the 25-gene signature predictive of liver fibrosis
To investigate the functions of these genes, we performed Gene Ontology (GO) term enrichment analysis and DisGeNET enrichment analysis (Fig. 4A). In DisGeNET enrichment analysis, the most significantly enriched pathway was Liver Failure. The most significant GO term was NADP binding (Molecular Function, BP) and monocyte chemotaxis (Biological Process, BP). The NAD (P)-binding is the most enriched pathway associated with early liver fibrosis diagnosis. Then some other pathways were listed, such as oxidoreductase activity acting on other nitrogenous compounds as donors, nitrite reductase (NO−forming) activity, carbon monoxide binding, nitrite reductase (NO−forming) activity cystathionine beta−synthase activity, 2, 4−dienoyl−CoA reductase (NADPH) activity. Tryptophan-tRNA ligase activity, mitochondrial tryptophanyl−tRNA aminoacylation, and tryptophanyl−tRNA aminoacylation were also identified in GO enrichment. As shown in our GO-KEGG results (Table 1), CBS is Oxidoreductase activity, acting on other nitrogenous compounds as donors, cytochrome as acceptor, Nitrite reductase (NO-forming) activity, Carbon monoxide binding and Nitrite reductase activity, and also cystathionine beta-synthase activity. NADP binding, 2, 4-dienoyl-CoA reductase (NADPH) activity, oxidoreductase activity, acting on the CH-CH group of donors, NAD or NADP as acceptor were attributed to DECR1.
Differentially expressed metabolites associated with liver fibrosis and network analysis
A total of 63 metabolites, which were repeated at least twice across the five different datasets, were used for the differential metabolite identification (Supplementary Table S2). Among them, some frequently occurring metabolites, especially amino acids, most are amino acids, were found to be significantly differentially regulated between the liver fibrosis and control groups (Fig. 4C). Patients were characterized by uniformly higher or lower levels of asymmetric metabolites. For instance, Tryptophan, Asparagine, Methionine, Phenylalanine, Tyrosine, Threonine consistently manifest elevated levels across the first three articles under scrutiny [11-13]. This uniform upregulation not only signifies their importance but also alludes to their pivotal role in the context being explored.
Contrastingly, Valine's marked downregulation in the first three articles accentuates its potential reduced role or consumption under the studied conditions. While the other two branched-chain amino acids (BCAAs), Isoleucine and Leucine showed different and inconsistent changes. Other metabolites showed inconsistent trends across studies. BCAAs have been associated with the improvement of protein malnutrition and have shown potential to reduce the risk of hepatocellular carcinoma in patients with cirrhosis [29].
The network analysis results revealed associations between key genes and many amino acids, as DECR1 concurrently links to lysine, glutamate, methionine, arginine etc. And CBS gene is associate with serine, taurine and methionine (Fig. 4C, Supplementary Fig. S2).
Mendelian randomization (MR) analysis
To explore the potential causal links of selected Gene/Metabolite to NASH, we further conducted Mendelian randomization (MR) analysis utilizing single nucleotide polymorphism (SNP) data from four well-known GWAS databases. The analysis encompassed 3 parts: 1) to explore the causal link of metabolites and genes, with each of the selected 49 metabolites as the outcome and each of the 25 target genes as the exposure (Supplementary Table S6); 2) to explore the causal link of genes and NASH, with genes and NASH/Liver fibrosis as exposure and outcome respectively (Table 2); 3) to explore the causal link of metabolites and NASH, with the 49 metabolites as the exposure and NASH/Liver fibrosis as outcome(Supplementary Table S7). Eventually, 21 metabolites and 12 genes were identified with a causal association with NASH/liver fibrosis in a directed network diagram (Fig. 4B).
We found that three genes KIAA1522(OR:0.90, 95%CI:0.83~0.98, p=0.01), KRT8(OR:5.52, 95%CI:2.12~14.38, p=0.02), and PKMYT1(OR:1.00, 95%CI:1.00~1.00, p=0.04) have direct causal relationships with NASH and liver fibrosis. Eleven out of all the 12 genes (except for KRT8) show causal relationships with NASH and liver fibrosis through the mediation of metabolites. Notably, WARS2 has a negative causal relationship with Glutamine (β=-0.1±0.08, p=0.04), which has an inhibitory effect on liver fibrosis (OR: 0.87, 95%CI: 0.79~0.97, p<0.01). The CBS gene may lead to the occurrence of liver fibrosis by upregulating Margarate (β=0.13±0.06, p=0.04) and downregulating Leucine (β=-0.15±0.06, p=0.02). Interestingly, the DECR1 gene may has dual effects on NASH or liver fibrosis. It may affect the levels of Linolenate (β=0.06±0.03, p=0.02) and Alanine (β=-0.06±0.03, p=0.03), which are potential promoters of NASH (OR: 3.35, 95%CI: 1.09~10.29, p=0.03 for Linolenate) and liver fibrosis (OR: 1.62, 95%CI: 1.09~2.42, p=0.03 for Alanine). Meanwhile, it may lead to an increase of the levels of Methionine (β=-0.07±0.03, p<0.01), which has an inhibitory effect on liver fibrosis (OR: 0.05, 95%CI: 0.01~0.35, p<0.01).
Network analysis of key genes and metabolites
Network analysis using MetaboAnalyst showed that the genes DECR1 and CBS had the highest degree and betweenness centrality, respectively. DECR1 interacted with 10 metabolites, while CBS interacted with 3 metabolites. Methionine is associated with both of the two genes (Fig. 4C, Supplementary Fig.S2).
The 12-gene signature predictive of NASH and liver fibrosis with Pathway analysis
The twelve genes including KRT10, NDUFA2, CBS, IFIT1, GNAO1, KRT8, PKMYT1, DECR1,TNFRSF14, KIAA1522, STRADA and WARS2, were significantly differentially expressed between patients who progress to NASH and CON/NAFLD in all the discovery and validation cohorts (Fig. 5A). The Liver fibrosis scores differentiated NASH from NAFLD with a summary AUC=0.8, 95% CI [0.75−0.85] in the four SRP cohorts. Next, we evaluated the predictive power of the Liver fibrosis scores in the two Chinese independent Liver fibrosis cohorts (Validation 2 AUC=0.8, 95% CI [0.73−0.87], Validation 1 AUC=0.75, 95% CI [0.67−0.83]), with a summary AUC= 0.76, 95% CI [0.63−0.86] (Fig. 5B).