Differential expressions of TF genes and IRGs in CCA
We used RNA-seq data and clinical follow-up information collected from 45 samples, including 36 CCA and 9 non-tumor tissues in TCGA. Wilcoxon rank test in R was performed to preprocess and analyze the gene differential expression of microarray data. Totally, 2657 DEGs were identified, which consist of 1203 significantly down-regulated DEGs and 1454 significantly up-regulated DEGs, for the subsequent bioinformatics analysis (Fig. 1a, d). To explore potential function and pathway of these DEGs identified above, GO and KEGG pathway enrichment analysis were performed (Additional file: Fig S1).
The mRNA levels of 2498 IRGs (ImmPort database) and 318 TF genes (Cistrome Cancer database) were examined. This analysis eventually revealed 223 immune-related genes (130 downregulated and 93 upregulated) and 32 TFs (13 downregulated and 19 upregulated) with the thresholds of |log2 fold change| >1 and adjusted P < 0.05, which were identified as differentially expressed in CCA tissues compared with normal tissues. Then, heatmaps and volcano plots were visualized to show the expression pattern of differentially expressed IRGs and TFs between CCA and non-tumor tissues (Fig. 1b, e, c, f).
Screening of prognostic immune-related genes and construction of TF regulatory network
Considering the association IR-DEGs with OS, we further conducted a Cox regression hazards analysis to screen the IR-DEGs. To prevent overfitting of the model, TCGA-CHOL dataset was utilized as training group. We found 11 prognostic IRGs (RAET1E, CST4, CCL24, CCK, CGB, GUCA2A, TDGF1, TDGF3, THPO, AVPR1B and IL9R) that were significantly related to the survival of CCA patients in the training group (Fig. 2a)
To further explore potential regulatory mechanisms behind deregulation of IR-DEGs expression in CCA, we analyzed the correlation between TF genes and IRGs expression. We investigated the correlations between the TF-DEGs and prognostic IR-DEGs by the method of Pearson correlation coefficient. Based on the threshold criteria, 24 TF genes were prominently associated with the PIR-DEGs (P < 0.05). We utilized Cytoscape software to build and visualize the TF-based regulatory network, as showed in Fig. 2b. This TF regulatory network revealed relationships among these immune-related genes. Furthermore, GO annotation analyses of IR-DEGs led to the identification of top significantly GO terms, and it indicated that the differentially expressed immune-related genes were mainly involved in the regulation of immune effector process and cell killing (Fig. 2c).
Construction of the prognostic model in the training group
To investigate significance of risk genes in estimating prognosis, the 11 survival-related IRGs were further submitted to a multivariate Cox proportional hazards regression analysis (with forward selection and backward selection). Finally, five candidate PIR-DEGs (RAET1E, CST4, TDGF1, AVPR1B and IL9R) which may serve as significant predictors of the prognosis were obtained for inclusion in the risk model. Among these genes, RAET1E, CST4 and TDGF1 were identified as high-risk genes (serving as risk factors), while AVPR1B and IL9R were identified as low-risk genes (serving as protective factors) (Table 1). In the TCGA training cohort, the expression distribution of five IRGs and Kaplan-Meier curves for OS were shown in Additional file: Fig S2.
A prognostic index was established based on the mRNA levels and estimated regression coefficients of the risk genes. The formula of prognostic model containing five PIR-DEGs is described as follows:
Risk score= (0.5631 × expression of RAET1E) + (0.4147 × expression of CST4) + (0.3941 × expression of TDGF1) + (-1.2478 × expression of AVPR1B) + (-1.4883 × expression of IL9R).
According to median risk score, patients were divided into high-risk (n = 18) and low-risk group (n = 18) in the training cohort. We ranked the risk scores of patients and analyzed their distribution. The survival status of each patient was marked on dot plot, which showed that mortality rate in the high-risk group was significantly higher than that of the low-risk group (Fig. 3a, b). Heatmap revealed that the expression level of five PIR-DEGs varied with risk scores in two groups (Fig. 3c). Among patients with higher risk scores in the training cohort, three risk genes (RAET1E, CST4 and TDGF1) were upregulated, while two protective genes (AVPR1B and IL9R) were downregulated. In patients with lower risk scores, these genes displayed an opposite expression pattern. Kaplan-Meier plot and ROC curve were used to evaluate the performance of five PIR-DEGs signature in predicting the outcome of patients. The prognosis was significantly worse in high-risk group than low-risk group (P < 0.001, Fig. 3d). The OS rates at three years and five years for high-risk group in the training cohort were 19.9% and 9.9%, respectively, while the corresponding rates for low-risk group were 83.9% and 69.9%, respectively. Then we used ROC curves to examine predictive accuracy of the prognostic model at three years. The AUC value of ROC was 0.898 (Fig. 3e).
Validation of the prognostic model by GEO cohort
The GSE107943 cohort including 30 CCA samples was used for validation of the prognostic model. The risk score distribution, survival status and risk gene expression in the GEO cohort are displayed in Fig. 4a-c. In agreement with results of TCGA cohort, the Kaplan-Meier survival curves demonstrated that the OS was significantly poorer in high-risk group than low-risk group (P < 0.05; Fig. 4d), and the survival rates at three and five years in high-risk group were 30.9% and 0.0%, respectively, while the corresponding rates in low-risk group were 75.0% and 40.0%, respectively. The AUC for three years was 0.846 (Fig. 4e). Taking together, these results indicated a moderate sensitivity and specificity of the prognostic model. We also analyzed the expression distribution of five IRGs and Kaplan-Meier curves for OS, the results were shown in Additional file: Fig S3.
Independent prognostic value of the risk model in the TCGA cohort
Next, we performed univariate and multivariate Cox regression analyses to assess the performance of model with the clinicopathological factors of CCA, such as age, gender, histological grade, T stage, N stage, M stage, AJCC stage and margin status. The univariate analysis indicated that N stage (HR = 4.148, 95%CI = 1.517–11.336, P = 0.006), margin status (HR = 2.754, 95%CI = 1.013–7.491, P = 0.047) and risk score (HR = 2.255, 95%CI = 1.409–3.611, P < 0.001) were associated with the prognosis of CCA patients (Fig. 5a). Meanwhile, the multivariate analysis revealed that risk score was an independently risk factor in the TCGA cohort (HR = 3.009, 95%CI = 1.540–5.880, P = 0.001) (Fig. 5b). These results demonstrated that the prognostic model showed a good performance to predict prognosis.
Nomogram for the prediction of prognosis in the training cohort.
To better estimate the prognosis of CCA patients, we established a nomogram that integrated prognostic risk score and clinical variables (age, gender, N stage, M stage, AJCC stage, margin status and risk score) (Fig. 5c). The nomogram could accurately predict OS, and demonstrated that risk score contributed much more risk points than other variables. C-index for OS was 0.819 (95% CI, 0.727-0.911), and calibration plot for probability of survival at 3- or 5-year showed satisfactory agreements between the predicted and actual values (Fig. 5d, e).
The correlation between the risk score and tumor-infiltrating immune cells (TIICs) in the TCGA cohort
To determine whether our model could reflect the status of the tumor immune microenvironment in CCA, we investigated the correlation between risk score and immune cell infiltration in the TCGA cohort. The tumor-infiltrating immune cells were estimated by CIBERSORT algorithm. We plotted bar plots to demonstrate the proportion of immune cells of CCA and non-tumor tissues. As shown in Fig. 6a-b, the 22 TIIC proportions between two groups were significantly different. Heatmap was used to show the correlation between immune cells subpopulations, which indicated the fractions of different immune cells were weakly to moderately correlated in tumor tissues in the TCGA cohort. The three immune cells showed strong and moderate positive correlation with T gamma delta cells, CD8 + T cells and resting memory CD4 + T cells (Fig. 6c).
To analyze whether infiltration of immune cells might contribute to risk score, we analyzed correlation between risk score and tumor-immune cell proportions. As the risk score increased, the content of CD8 + T cells (P < 0.05), neurophils (P < 0.05) and T gamma delta cells (P < 0.001) in CCA tissues also increased, while the content of resting memory CD4 + T cells decreased (P < 0.05) (Fig. 6d).