Acquisition of apoptosis-related genes set and GSE analysis
Following the search in the Molecular Signatures Database v7.1 in GSEA, 161 ARGs were identified from the gene set “HALLMARK_APOPTOSIS”. Thereafter, the 161 ARGs in this gene set were further assessed through GSEA analysis to ascertain their biological significance in HCC. Fig. 1a shows that ARG set was significantly differentially expressed between HCC and normal samples.
Identification of differentially expressed ARGs
The mRNA sequence data for 374 HCC tissue samples and 50 samples of normal tissue were obtained from the TCGA database. The limma package and the Wilcoxon signed-rank test in R (∣log2FC∣ > 1, FDR < 0.05) were used to screen differentially expressed ARGs in HCC and non-tumor samples. In this study, 43 and 8 ARGs were found to be significantly upregulated and downregulated, respectively. These findings are presented in a volcano plot, heatmap, and box plot (Fig. 1(b-d)).
Functional enrichment and PPI network analysis of differentially expressed ARGs
To establish the biological functions and significant pathways of the 51 identified genes, GO enrichment and KEGG pathway enrichment analysis were performed (Fig. 2). The 51 identified ARGs were found to be involved in the pathways associated with cellular apoptosis (Fig. 2a and 2b). Furthermore, KEGG enrichment analysis revealed that these ARGs are involved in platinum drug resistance, transcriptional dysregulation in cancer and some oncogenic pathways such as MAPK, P53, TNF, and PI3K-AKT signaling pathways (Fig. 2c). The STRING online tool was used to establish a PPI network to evaluate the interactions among ARGs coded proteins. Results were observed using the Cytoscape software (Fig. 2d).
Association between ARGs and HCC patient’s survival and prognosis
Univariate Cox regression analyses were performed on both mRNA expression and the corresponding clinical data for the 51 selected ARGs to identify their prognosis-associated ARGs in HCC (Fig. 3a). There were 20 upregulated ARGs and 1 downregulated ARG which were statistically significant. These 21 genes were subsequently analyzed by multivariate Cox regression (p < 0.05) to determine their association with the prognosis of HCC patients and to acquire the corresponding regression coefficients. Five prognosis-related ARGs: Protein phosphatase 2 regulatory subunit B'beta (PPP2R5B), Sequestosome 1 (SQSTM1), DNA topoisomerase II alpha (TOP2A), Bcl2 modifying factor (BMF), and Galectin 3 (LGALS3) were screened. Expression levels in normal and HCC samples were further compared to establish a prognostic value involving these 5 genes. Compared to normal samples, the 5 identified ARGs in HCC samples exhibited significantly elevated expression levels than normal specimens as indicated by heatmap and boxplot (Fig. 3b and 3c). Besides, the K-M curve was constructed by utilizing the survival rate differences in the high- and low-expressed groups of the identified ARGs. Interestingly, the high expression levels of the 5 identified ARGs indicated a low survival rate (Fig. 3d).
Furthermore, mutations in these 5 HCC genes were analyzed through the cBioPortal database (http://cbioportal.org). Data from 353 HCC patients in this database revealed that 22 patients (6.3%) had mutations. Among the 22 patients with the mutation, 0.85% had missense mutations, 0.56% had amplifications, while 0.56% had deep deletions in the PPP2R5B gene; 0.28% had missense mutations, 0.85% had amplifications while 0.28% had deep deletions in the SQSTM1 gene; while 0.85% had missense mutations, 0.28% had truncating mutations, 0.56% had amplifications and 0.28% had deep deletions in the TOP2A gene. Moreover, 0.28% had missense mutations, and 0.28% had deep deletions in the BMF gene whereas 0.28% had amplifications in the LGALS3 gene (Fig. 3e).
Immunohistochemical analysis of the Human Protein Atlas (HPA) database revealed that these 5 genes were significantly upregulated in HCC (Fig. 4).
Establishment of a prognostic risk signature based on ARGs
We combined the expression levels of ARGs and the regression coefficients of multiple Cox regression analyses to establish a risk scoring formula. Risk score = (0.385327 * expression value of PPP2R5B) + (0.2787 * expression value of SQSYM1) + (0.152062 * expression value of TOP2A) + (0.172177 * expression value of BMF) + (0.110211 * expression value of LGALS3). All genes had a positive coefficient, implying that elevated expression levels of the identified genes were negatively correlated with prognosis. After calculating the risk score in HCC patients, the median risk score was used as a cutoff, and these patients were assigned into high- and low-risk groups (Fig. 5). The plotted heatmap of the 5 ARGs expression levels showed that patients in the same group had distinct expression levels (Fig. 5a). Patient's scores were ranked in ascending order (Fig. 5b) and their survival time presented as a scatterplot (Fig. 5c). Low-risk patients had a longer survival time and higher survival rates than high-risk patients.
Survival analyses of high- and low-risk groups were performed to show the correlations between risk scores and the prognosis of patients. First, we evaluated the prognostic significance of the risk scores using K-M curves and the log-rank method as shown in (Fig. 6). Compared to the high-risk group, patients in the low-risk group had a significantly high two-year or five-year survival probability (p < 0.0001; Fig. 6) revealing a worse prognosis for the high-risk group.
Independent prediction of HCC prognosis by risk score
Both univariate and multivariate Cox regression analyses were used to compare the predictive value of the prognostic model and other clinical prognostic variables. The risk score was established as an independent prognostic predictor for HCC patients (Fig. 7a and 7b). However, clinical variables were found not to be independent prognostic predictors (p > 0.05; Fig. 7b). Moreover, the ROC curve was plotted to validate the predictive value of the risk score (Fig. 7c). The AUC of the risk score was 0.741 as the highest value, compared to that of the clinical variables, implying that the risk score had a higher predictive value than the clinical variables.
KEGG enrichment analysis between different risk patients
KEGG analysis was used to evaluate to explore the potential biological pathways that were associated with 5 ARGs. The 5 majorly enriched pathways were apoptosis, cell cycle, mechanistic target of rapamycin kinase (mTOR) signaling, WNT signaling, and phosphatidylinositol signaling systems (Fig. 7d).
Construction of a nomogram to predict the prognosis of HCC patients
To apply the risk score in predicting the prognosis of HCC patients, we combined the risk score with the corresponding clinical variables to construct a nomogram to predict the OS of patients at 1, 2, and 3 years. Based on the risk scores and clinical variables, an average point for a patient can be established and extrapolated to determine the 1-, 2- and 3-year OS (Fig. 8).
The correlations between risk scores/ 5 ARGs and clinical variables
Based on the gene expression and corresponding clinical data obtained from the TCGA database, we analyzed the correlations between these clinical variables and risk scores of the 5 ARGs. Risk scores were correlated with tumor stage; BMF was associated with age, gender, grade, and stage; PPP2R5B and LGALS3 were correlated with stage; SQSTM1 was associated with age and gender; while TOP2A was correlated with grade and stage (Fig. 9).