Collection of AH-related transcriptome datasets
To obtain the meta-analyzed AH-related signatures, we collected six liver AH transcriptome datasets (GSE28619, GSE103580, GSE143318, GSE142530, GSE167308, and GSE155907, Figure S1 – S6), along with two blood AH dataset (GSE135285 and GSE171809). Additionally, we compiled two liver gene expression datasets (GSE94397 and GSE94399) to identify pooled signatures associated with the poor prognosis of AH.
To systematically compare the compiled gene expression datasets, we unified the transcript IDs in each dataset using Entrez Gene [19]. Chip-specific transcripts, probes, and probe-set IDs were contained in the microarray datasets, while Ensembl IDs were implemented for all genes in the RNA-seq datasets. We systematically annotated them into gene symbol approved by the HUGO Gene Nomenclature Committee [20] using the org.Hs.eg.db, org.Mm.eg.db, and Orthology.eg.db packages to perform the meta-analysis.
Differential expression analysis
Differential expression (DE) analyses of microarrays and RNA-seq were performed to compare specific phenotypes (e.g., AH and death) with their matched controls. The limma and DESeq2 methods were employed for microarray and RNA-seq data, respectively. Both DE tools provided gene-specific fold changes (FC) and matched p-values for comparing disease and control samples. Genes with a false discovery rate (FDR)-adjusted p-value of less than 0.5 were identified as differentially expressed genes (DEGs) between the two conditions.
A hypergeometric test was applied to identify enriched biological pathways in the candidate gene sets. Enrichment analyses were carried out using the Gene Ontology (GO) [21] and Kyoto Encyclopedia of Genes and Genomes (KEGG) [22] databases. Biological pathways were considered enriched when they met the criteria of a 0.5 FDR-adjusted p-value determined by the hypergeometric test.
Meta-analysis
We conducted a meta-analysis to identify biomarkers commonly associated with diseases in the collected candidate gene expression datasets. The meta-analysis utilized a fixed-effect inverse-variance-weighted (IVW) method adopted in METAL software [23]. The IVW method requires an effect similar to the FC between the two conditions and a matched standard error. We selected gene-specific FC between the two conditions for each transcriptomic dataset and matched the standard errors obtained from summary statistics calculated using limma or DESeq2 in the previous step. The significance level for associations calculated through meta-analysis was determined by FDR adjustment for multiple comparison test.
Bayesian approach for Identifying Upstream Biomarkers
Recent studies implemented multiple lines of evidence as prior knowledge to prioritize the candidate upstream genes [24–29]. Hägg et al.[24] curated gene sets involved in transcription activities such as transcription activator activity, transcription coactivator activity, and TF binding from the GO database [21] to identify upstream genes related to the coronary artery. Lee and Lee [25] used a TF-related gene set obtained from a TF database to identify Alzheimer’s disease (AD)-related genes. Recently, Lambert et al.[14] proposed a TF catalog integrating prominent TF databases, and several studies [26, 27] used it to narrow down the potential gene sets correlated to disease and comorbidity. Motivated by these studies, we implemented the TF catalog manually updated by Lambert et al.[14] to identify the upstream genes related to AH (Fig. 1).
We implemented the DigSee database by curating the relationships of approximately 4500 disease types with about 13000 genes by integrating text mining and machine learning methods [15, 30]. Using a “Liver Diseases, Alcoholic” query, approximately 500 ALD-related genes were compiled from the DigSee.
A GWAS was conducted on alcohol-related cirrhosis in separate German and UK cohorts (712 cases and 1,426 controls) [16]. Full summary statistics of the alcohol-related cirrhosis GWAS were downloaded from the supplementary website (http://gengastro.med.tu-dresden.de/suppl/alc_cirrhosis/) [16]. Among the 6,502,449 SNPs, approximately 5500 variants with uncorrected p-values < 0.001 for the association between genetic variants and alcohol-related cirrhosis were selected. The 5500 SNPs were assigned to their corresponding genes using the Gsnpense function in the gprofiler2 package [31]. Among the candidate AH genes, those with evidence from GWAS were selected as potential biomarkers of AH (Fig. 1).
Liver cis-eQTL data were obtained from the Genotype-Tissue Expression (GTEx) project [17]. In detail, full summary statistics of the GTEx liver eQTL were obtained with an access ID (study ID: QTS000015; dataset ID: QTD000266) from the eQTL catalog database [32]. Cis-associations between gene expression and variant types with an uncorrected p-value < 10–5 were selected, yielding 317,324 gene-SNP pairs that accounted for 2829 genes (annotated by symbols assigned by HGNC). Among the candidates the meta-analysis chose, genes showing evidence of liver cis-eQTL were designated candidate upstream genes (Fig. 1).
A PPI network was obtained from the STRING database, which collected the interactome from various sources, including automated text mining of scientific and medical literature, computational predictions of co-expression and co-occurrence across genomes, PPI experiment databases, and known biological pathways [18]. STRING consists of approximately 12 million edges of 40,000 proteins (based on the Ensembl Protein (ESPN)). The PPI network introduced by the STRING database included protein-protein pairs and their matched scores. The distribution of PPI scores did not follow a Gaussian distribution; therefore, PPIs with the top 90 percent of the interactome scores were selected. Among significant genes in meta-analysis, genes with ≥ 200 edges were chosen as the candidate upstream genes (Fig. 1).
Severe acute AH mouse model
Mice (C57BL/6, male, seven weeks) were randomly assigned to AH and control groups. For the AH group, mice were administered oral ethanol (5 g/kg/day) and intraperitoneal thioacetic acid (TAA, 200mg/kg, twice week/4 weeks), and the body weight was measured every week, and biochemistry analyses were performed by collecting blood before sacrifice. For the control group, mice were injected with normal saline in peritoneal space.
After four weeks of the mouse experiments, mice were anesthetized with isoflurane inhalation. Liver were extracted and fixed in 4% paraformaldehyde, imbedded in paraffin, followed by dehydration in graded alcohol. Tissue slices of 5 µm thickness were prepared and stained with hematoxylin-eosin (Sigma-Aldrich, St. Louis, MO, USA).
RNA isolation and real-time polymerase chain reaction
Total RNA was extracted from frozen whole livers using TRIzol reagent (Invitrogen, Carlsbad, CA) or Qiagen mini columns (Qiagen Inc. Valencia, CA) according to the manufacturer’s protocol. RNA concentrations were quantified by spectrophotometry. cDNA was synthesized form total RNA (1ug) by using the RT Premix kit (TOYOBO, Japan). Real-time PCR was performed with the QuantStudioTM6 Flex instrument (Applied Biosystems) using SYBR Green PCR master mix (Applied Biosystems) and the respective primer pairs for each gene. Data were analyzed by using QuantStudio 6 and 7 Flex Software (Applied Biosystems). The cycle threshold (Ct) values of the target genes were normalized to those of endogenous control gene (beta-actin or GAPDH). Relative expression of mRNAs for various genes is presented as fold change using the 2^ (-ΔΔCT) method.