3.1 Identify immune-related genes with prognostic value in HCC
In total, 329 IRGs (including 62 downregulated and 267 upregulated genes) were identified as differentially expressed genes in HCC samples compared with normal samples. Both heatmap and volcano shows the distribution of differentially expressed immune-related genes between HCC and normal tissues (supplementary Fig. a, c). Then, the 329 IRGs were evaluated with univariate Cox regression analysis to determine prognostic characteristics. Finally, 62 genes were identified as candidate prognostic IRGs, which were shown in Fig. 1.
3.2 Construction of TF-immunogene regulatory network
To discover underlying molecular mechanisms of these candidate prognostic IRGs, we explored the interaction between each gene. A total of 118 TFs were differentially expressed between HCC and normal tissues (supplementary Fig. b, d). A regulatory network was built upon these 118 TFs and 62 prognostic IRGs. Correlation score more of >0.6 was set as the cut-off values. The constructed TF-immunogene regulatory network revealed the interaction among these IRGs (Fig.2).
3.3 Establishment of the IRPS for HCC prognosis
According to the univariate Cox regression results of TCGA cohort, LASSO-penalized Cox analysis identified eight genes to establish the IRPS, including retinol binding protein 2 (RBP2), microtubule associated protein tau (MAPT), baculoviral IAP repeat containing 5 (BIRC5), roundabout guidance receptor 1 (ROBO1), fidgetin like 2 (FIGNL2), interleukin 17D (IL17D), secreted phosphoprotein 1 (SPP1), stanniocalcin 2 (STC2). The risk scores were calculated as follows: 0.0132×expression of RBP2+0.1397×expression of MAPT+0.0202×expression of BIRC5+0.0512×expression of ROBO1+0.2357×expression of FIGNL2+0.0560×expression of IL17D+0.0001×expression of SPP1+0.2095×expression of STC2. Based on the median cut-off risk score (0.387082), 343 HCC samples were assigned to the high- (n=171) and low-risk (n=172) groups. K-M survival analysis demonstrated that HCC patients with high risk scores exhibited remarkably worse OS compared to low-risk group (p-value=7.731e-08; Fig. 3a). The area under the ROC curve (AUC) for OS was determined to be 0.825 (Fig. 3c), suggesting that this prognostic signature exhibits outstanding sensitivity and specificity. The gene expression heatmap and survival overview are illustrated in Fig. 3e.
The prognostic signature was validated in ICGC cohort. A total of 232 HCC patients were assigned to the high- (n=197) and low-risk groups (n=35) according to the same cut-off value of 0.387082 in TCGA cohort. Notably, the OS of HCC patients was longer in low-risk group than in high-risk group (p-value=3.3632e-02; Fig.3b). The AUC for OS was determined to be 0.668 (Fig. 3d), indicating the high sensitivity and specificity of this prognostic signature. Moreover, the gene expression heatmap and survival overview are illustrated in Fig. 3f.
3.4 Independence evaluation of the IRPS
For TCGA cohort, univariate analysis demonstrated that tumor stage, T stage, M stage, and risk score were independent prognostic factors for OS (Fig. 4a,). Furthermore, multivariate Cox regression analysis showed that only risk score remained an independent prognostic factor of the survival outcome of HCC patients after adjusted by age, gender, tumor grade and TNM stage (Fig. 4b). For ICGC cohort, both univariate and multivariate Cox regression analyses revealed that tumor stage and IRPS were the independent prognostic factors for OS (Fig. 4c, d), which were supported by the findings of TCGA cohort. Taken together, these results confirmed that IRPS could serve as a biomarker for predicting survival outcome of HCC patients.
3.5 Relationships between the IRPS and clinical-related factors
To further determine the predictive values of the IRPS, we analyzed the correlation between the IRPS and clinicopathological characteristics. In TCGA cohort, this prognostic signature was significantly correlated with pathological poor differentiation (p-value=0.003; Fig. 5a), more advanced tumor stage (p-value=0.004; Fig. 5b), and advanced T stage (Fig. 5c). Similarly, higher risk scores was associated with more advanced tumor stage in ICGC cohort (Fig. 5d). The clinical significance of each identified IRG is shown in Tables 1 and 2.
To analyze whether the IRPS is related to tumor-infiltrating immune cells, the association between the signature and immune cell infiltration were investigated in TCGA cohort. The results indicated that CD4+ T cells and CD8+ T cells were negatively correlated with the signature, and no correlations were observed between the signature and B cells, dendritic cells, macrophage cells and neutrophil cells. These results are shown in Fig. 6.
3.6 External validation of the expression of the immune-related genes involved in IRPS
Compared to the adjacent non-tumor liver tissues, the expression of the eight immune-related genes involved in IRPS was significantly increased in HCC in TCGA cohort. These results were validated in ICGC cohort (Table. 3). Moreover, we confirmed the expression patterns of these eight genes in HCC samples through the Oncomine database. Except for FIGNL2, the mRNA expression levels of RBP2, MAPT, BIRC5, ROBO1, IL17D, SPP1 and STC2 were also increased in HCC tissues by using Wurmbach liver cohort [15] (Fig.7a). The protein expression of these eight genes was confirmed in HPA database. Compared to their expression in normal liver tissues, MAPT and ROBO1 were strongly positive, while BIRC5 and SPP1were moderately positively, and STC2 were weakly positively in HCC tissues (Fig. 7b, c). However, the expression of RBP2 were not detected in both HCC and normal liver tissues (Fig. 7b, c), the expression of FIGNL2 and IL17D were not found on the website.
3.7 GSEA
GSEA was preformed and identified 65 significantly enriched KEGG pathways in TCGA cohort. The top 5 KEGG pathways in high-risk group were “KEGG_PYRIMIDINE_METABOLISM”, “KEGG_PURINE_METABOLISM”, “KEGG_RNA_DEGRADATION”, “KEGG_BASE_EXCISION_REPAIR”, “KEGG_ CELL_CYCLE”. The top 5 KEGG pathways in low-risk group were “KEGG_COMPLEMENT_AND_COAGULATION_CASCADES”, “KEGG_FATTY _ACID_METABOLISM”, “KEGG_DRUG_METABOLISM_CYTOCHROME_ P450”, “KEGG_PRIMARY_BILE_ACID_BIOSYNTHESIS”, “KEGG_RETINOL _METABOLISM” (Fig.8a).
3.8 Establishment of a predictive nomogram
A predictive nomogram was established to systematically validate the predictive value of the prognostic signature in HCC patients (Fig. 8b). All independent clinical risk factors, including age, gender, grade, stage, T, N, and M combined with the prognostic signature, were involved. Each factor was assigned a score point according to its risk to survival. In the nomogram plot, the influence of different factors on the survival were presented in the length of the line. The prognostic signature had the longest line indicated that it had the greatest influence on the predication of survival probability. Compared to other clinical characteristics, our prognostic signature contributed the highest number of risk points (from 0 to 100) in the nomogram, which was consistent with the results of independence evaluation analyses.