The overall data processing flow is shown in Figure 1. In our study, 365 HCC patients were provided by the TCGA-LIHC cohort and 231 HCC patients were provided by the ICGC (LIRI-JP) cohort. Clinical data for these patients are presented in Table 1.
Identification of pyroptosis-related prognostic DEGs in the TCGA cohort
More than half of the pyroptosis-related genes (47/69, 68.1%) were expressed differently between tumor and adjacent non-tumorous tissues. Among them, twenty-one genes were associated with OS in univariate Cox regression analysis (Figure 2A). A total of 21 pyroptosis-related DEGs were preserved in our study (all FDR < 0.05) (Figure 2B-C). The interaction network and correlation among these genes are presented in Figure 2D-E.
Construction of a four-gene signature in the TCGA cohort
A total of 21 survival-related DEGs were mentioned above, then LASSO Cox regression analysis identified four genes to construct the prognostic model (Supplementary Figure 1). The four genes included in the model were IRAK1, NOD2, POP1 and YWHAB. The risk score = 0.101 * expression level of IRAK1 + 0.003 * expression level of NOD2 + 0.164 * expression level of POP1 + 0.438 * expression level of YWHAB. The patients were stratified into a high-risk group and a low-risk group according to the median cut-off value (Figure 3A). Based on PCA and t-SNE analyses, patients in different risk groups were distributed in two directions (Figure 3E-F). In the TCGA cohort, we found that the risk scores were increased with the increase of tumor grade and TNM stage (Table 2). Patients with high risk were more likely to die earlier than those with low risk (Figure 3B). Conformably, the Kaplan-Meier curve revealed that the OS was significantly poorer in the high-risk group than in the low-risk group (P < 0.001) (Figure 3I). Time-dependent ROC analysis was used to evaluate the predictive power of the four-gene signature and the AUC for 1-, 2-, and 3-year OS were 0.735, 0.666, and 0.663, respectively (Figure 3J). Survival analysis confirmed that the high expression of IRAK1, NOD2, POP1 and YWHAB was correlated with the poor prognosis of HCC (P < 0.05) (Supplementary Figure 2). Furthermore, the expression of each prognostic gene was higher in HCC tissues compared with adjacent non-tumorous tissues (Supplementary Figure 3).
Validation of the four-gene signature in the ICGC cohort
The prognostic model was validated in the ICGC cohort. Patients were divided into a high-risk group and a low-risk group based on the median value from the TCGA cohort (Figure 3C). Similar to the above results, the patients in different risk groups were distributed in discrete directions based on PCA and t-SNE analyses (Figure 3G-H). Besides, the risk scores were increased with the increase of the TNM stage (Table 2). Equally, compared with the low-risk group, patients in the high-risk group had a shorter survival time (Figure 3D) and poorer OS (P < 0.05) (Figure 3K). Finally, the AUC of the four-gene signature for 1-, 2-, and 3-year OS were 0.638, 0.636, and 0.638, respectively (Figure 3L).
Independent prognostic value of the four-gene signature
The risk score and other clinical features, including age, gender, tumor grade and tumor stage, were evaluated in TCGA and ICGC cohorts for OS by the univariate and multivariate Cox regression analyses. According to the results of univariate Cox regression analysis, the risk score (TCGA cohort: HR = 2.136, 95% CI = 1.459-3.128, P < 0.001; ICGC cohort: HR = 2.145, 95% CI = 1.128-4.078, P = 0.020, respectively) and tumor stage (TCGA cohort: HR = 2.500, 95% CI = 1.721-3.632, P < 0.001; ICGC cohort: HR = 2.492, 95% CI = 1.351-4.599, P = 0.003, respectively) were associated with OS in both the TCGA and ICGC cohorts (Figure 4). In multivariate Cox regression analysis, the risk score (TCGA cohort: HR = 1.790, 95% CI = 1.207-2.656, P = 0.004; ICGC cohort: HR = 1.996, 95% CI = 1.044-3.817, P = 0.037) and tumor stage (TCGA cohort: HR = 2.139, 95% CI = 1.453-3.148, P < 0.001; ICGC cohort: HR = 2.766, 95% CI = 1.470-5.203, P = 0.002, respectively) were associated with OS in both the TCGA and ICGC cohorts (Figure 4). Furthermore, the combination of the risk score with the tumor stage could provide a more accurate prediction of 3-year OS for HCC (TCGA cohort: AUC = 0.718; ICGC cohort: AUC =0.766) (Figure 5).
Risk scores and prognostic genes in different clinical characteristics groups
According to the risk score and clinical characteristics of HCC patients in the TCGA cohort, we discovered that the risk score was higher in the advanced tumor grade and advanced tumor stage (Figure 6C-D). The results obtained from the ICGC cohort were similar to the TCGA cohort, but there was no data about the grade of LICH in the ICGC cohort (Figure 6E-G). Combining the expression levels of prognostic genes with the clinical characteristics of HCC revealed that the expression levels of NOD2, POP1 and YWHAB were different in tumor grade (P < 0.05), and the expression levels of POP1 as well as YWHAB were different in the tumor stage (P < 0.05) in the TCGA cohort (Supplementary Figure 4C-D). Moreover, in the ICGC cohort, IRAK1 was distinguishingly expressed in the tumor stage (Supplementary Figure 4G).
Relationship between risk score and immune status
To determine the immune status in different risk groups, ssGSEA was performed to quantify the enrichment scores of diverse immune cell subpopulations and related functionsIn both the TCGA and the ICGC cohorts, the immune cell subpopulations including aDCs, iDCs, macrophages, Tfh as well as Treg showed high infiltration in the high-risk group, but NK cells were opposite (P < 0.05) (Figure 7A-B). Moreover, the scores of immune-related functions including APC co-stimulation, CCR, Check point, HLA, MHC class I and parainflammation were significantly elevated in the high-risk group, but the score of type Ⅱ Interferon (IFN) response was opposite (P < 0.05) (Figure 7C-D).
To further understand the role of the four-gene signature in HCC, we explored the expression levels of immune checkpoint-related genes (PD-L1, PD-L2 and Galectin-9) in different risk groups. We found that the expression levels of PD-L1, PD-L2 and Galectin-9 were significantly higher in the high-risk group than in the low-risk group and they were all positively correlated with the risk score (P < 0.05) (Figure 8A-B).
The correlation between risk score and different immune infiltrate subtypes in HCC patients was conducted by the Analysis of Variance. Six types of immune infiltrates were found in human tumors. They are C1 (wound healing), C2 (IFN-r dominant), C3 (inflammatory), C4 (lymphocyte depleted), C5 (immunologically quiet) and C6 (TGF-β dominant) [22]. No patient sample belonged to the C5 immune subtype and only one patient sample belonged to the C6 immune subtype, so we excluded them. From C1 to C4 subtype, the value of the risk score showed a downward trend, and risk scores between C1 and C3, C4, also between C2 and C3, C4 were significantly different (P < 0.05) (Figure 9). Besides, the expression levels of prognostic genes were significantly different in immune infiltrate subtypes. Most of the prognostic genes (IRAK1, POP1 and YWHAB) were highly expressed in the C1 subtype and NOD2 was highly expressed in the C2 subtype (P < 0.05) (Supplementary Figure5).
Differential expression of cell cycle-related genes in risk groups
To understand the correlations between the risk score and the expression levels of cell cycle-related genes, Spearman correlation was carried out. CDK2, CDK4, CDC25A and Cyclin E were overexpressed in the high-risk group compared with the low-risk group. Furthermore, the expression levels of CDK2, CDK4, CDC25A and Cyclin E were all positively correlated with the risk score (P < 0.001) (Figure 10A-B).
Analysis of the correlation between the risk model and multidrug resistance-related genes and chemotherapeutics
To assess the impact of the four-gene signature on HCC treatment, we analyzed the correlations of the risk score with the expression levels of multidrug resistance-related genes. The results indicated that the expression levels of MRP1, MRP4 and MRP5 were higher in the high-risk group than in the low-risk group. Also, the expression levels of MRP1, MRP4 and MRP5 were positively correlated with the risk score (Figure 11A-B). Furthermore, the correlation analysis was conducted to explore the correlation between the risk score and chemotherapeutics. We found that the expression levels of prognostic genes were negatively correlated with the sensitivity of some chemotherapy drugs (Figure 12). For example, the sensitivity of Bosutinib was negatively correlated with the expression of IRAK. More details are listed in supplementary table 5.
KEGG functional enrichment analysis
The GSEA was used to perform KEGG analysis in different risk groups to explore the potential molecular mechanisms of the four-gene signature. According to the results, we found that some pathways were enriched in the high-risk group in both the TCGA and ICGC cohorts (Figure 13). Of note, some pathways were related to cancer processes such as cell cycle, MAPK signaling pathway, mismatch repair, mTOR signaling pathway, Notch signaling pathway, pathways in cancer, VEGF signaling pathway and Wnt signaling pathway. The detailed information is displayed in supplementary figure 6-7.
Verification of prognostic gene expression levels between HCC and adjacent non-tumorous tissues by qRT-PCR and IHC
We analyzed mRNA and protein expression levels by qRT-PCR and IHC to validate the different expression levels of four prognostic genes (IRAK1, NOD2, POP1 and YWHAB) in HCC and adjacent non-tumorous tissues. The qRT-PCR results indicated that the expression levels of IRAK1, NOD2, POP1 and YWHAB in HCC tissues were higher than those in adjacent non-tumorous tissues (Figure 14A). The same results were shown by IHC (Figure 14B). Furthermore, the results were consistent with RNA sequencing results of the four prognostic genes in the TCGA cohort (Supplementary Figure 3). Consistently, we compared the expression levels of four prognostic genes in normal and tumor samples across different TCGA cancer types (Figure 15). The result indicated that prognostic genes in tumor samples were highly expressed in most cancer types compared with normal samples.