Immunogenomic profiling identifies two subtypes of GC
To comprehensively evaluate the immunological characteristics in GC, we analyzed 375 tumor samples from the TCGA-STAD cohort using the ssGSEA algorithm along with 29 immune gene sets. Based on the single sample Gene Sets Enrichment Analysis (ssGSEA) scores and hierarchical clustering algorithm, all samples were clearly clustered into two categories: Immunity_H (High) and Immunity_L (Low) (Fig. 1a and 1b). According to the results of estimate, the tumor microenvironment characteristics between these two subtypes were identified. We found that the Immunity_H had higher levels of EstimateScore, ImmuneScore, and StromalScore, whereas Immunity_L had the lower levels of these scores (Wilcox test, P < 0.001) (Fig. 1c).
The degrees of immune cell infiltration were determined using the CIBERSORT algorithm. We found that the infiltration degrees of CD8 T cells, activated memory CD4 T cells, resting memory CD4 T cells, gamma delta T cells, memory B cells, activated NK cells, resting NK cells, M0 macrophages, M1 macrophages, activated mast cells, resting dendritic cells, and plasma cells were significantly different between Immunity_H and Immunity_L groups (Fig. 1d). We further used the tSNE algorithm to confirm the immune level clustering of the GC patients, and the same classification results were obtained (Fig. 1e). We also examined the expression of HLA genes in these two subtypes and found that most HLA genes showed significantly higher expression levels in Immunity_H but significantly lower expression levels in Immunity_L (Wilcox test, P < 0.05) (Fig. 1f). Taken together, these results indicated that the immunophenotyping of GC patients was successfully performed.
Molecular characteristics and multidimensional functional annotation describing the tumor-immune interactions and their potential use for prognostic prediction
We further investigated the molecular characteristics of tumor-immune interactions and their prognostic potential in GC patients stratified by immunophenotype. After the preliminary screening, a total of 1,086 genes were identified as DEGs. Of these, 1,036 genes were upregulated and 50 genes were downregulated in the Immunity_H group. All DEG expression levels in GC patients were shown in Fig. 2a. Subsequently, 608 genes were identified as DEIGs based on the ImmPort database, of which 603 genes were upregulated and five genes were downregulated (Fig. 2b). Finally, 16 PIGs were identified using univariable Cox proportional hazards regression analysis, and the hazard ratio (HR) was calculated (Fig. 2c). The log2 fold change values in the DEGs, DEIGs, PIGs, and their q values in both Immunity_H and Immunity_L groups were shown in Fig. 2d.
To gain insight into the global patterns of the effects of immune gene expression on the biological processes in GC, the GSEA was performed to identify the pathways involved in the DEGs. We obtained several biologically sensible themes enriched in Immunity_H, which proved that the DEGs were mainly involved in the immune-related biological processes or signaling pathways. For the Hallmark gene sets (Fig. 3a), typically, the first group relates to the immune response, including complement, IL2-STAT5 signaling, allograft rejection, inflammatory responses, IL6-JAK-STAT3 signaling, interferon gamma response, TNF-α signaling via NF-κB, and interferon alpha response. The second group relates to cell growth and death, including apoptosis, p53 pathway, and transduction pathways such as KRAS signaling and PI3K-Akt signaling.
As for the Hallmark and KEGG pathways (Fig. 3b), the immune-associated pathways were highly active in Immunity_H and included antigen processing and presentation pathways, Th1 and Th2 cell differentiation, Th17 cell differentiation, and hematopoietic cell lineage. Besides, we identified various immune disease-associated pathways that were hyperactivated in Immunity_H, including asthma, systemic lupus erythematosus, graft-versus-host disease, allograft rejection, autoimmune thyroid disease, inflammatory bowel disease, and rheumatoid arthritis. Several pathways related to infectious diseases such as Staphylococcus aureus infection, leishmaniasis, tuberculosis, and human T-cell leukemia virus 1 infection were also enriched in Immunity_H. In summary, the GSEA result confirmed the elevated immune activity in Immunity_H and showed that the activation of various cancer-associated pathways positively correlated with immune activation.
To elucidate the role of the multidimensional regulatory network of immune molecules in the process of tumorigenesis and development of GC, we first explored the upstream mechanisms of the PIGs. The transcription factors related to the development of GC were identified by combining differential expression analysis and data from the CISTROME database. A total of 23 upregulated transcription factors in the Immunity_H group were identified. Then, the regulatory relationships of the GCTF-PIGs were revealed based on the correlation analysis. The regulatory network of the GCTF-PIGs was shown in Fig. 3c.
Construction and validation of the IGPM-based risk signature
To develop a gene signature based on GC immunophenotype, the PIGs were submitted to LASSO Cox regression analysis for dimension reduction. Consequently, an IGPM-based risk signature including six genes to predict OS in the TCGA-STAD cohort was constructed. The formula was determined as follows: 0.142 * expression of IL2RG + 0.104 * expression of IGLV5-48 + 0.147 * expression of BMP7 + 0.110 * expression of IL33 + 0.010 * expression of GKN1 + 0.045 * expression of CCL25 (Fig. 4a).
We calculated the risk score of each GC patient and divided all patients into the high- and low‐risk groups according to a median risk score (Fig. 4b). Compared with the low‐risk group, the survival time of patients in the high‐risk group was shorter and the incidence of death was higher (Fig. 4c). The correlation analysis showed that the risk score had a significant negative correlation with the survival time, indicating that the survival times of GC patients gradually decrease with an increasing risk score (Fig. 4d). The Kaplan-Meier curve was employed for the survival analysis of GC patients and the results showed that the high-risk group was significantly associated with poorer OS, while the low-risk group was associated with better OS (Log-rank test, P < 0.0001, Fig. 4e). To assess the predictive ability and accuracy of the IGPM-based risk signature, we performed a time-dependent ROC curve analysis, and the AUCs of the 1‐, 3‐, and 5‐year predictions were 0.63, 0.7, and 0.794, respectively (Fig. 4f). The discriminative ability of the IGPM‐based risk signature was measured by determining the calibration plot, which showed that the predicted value of the IGPM‐based risk signature was in a good agreement with the actual value (Fig. 4g).
We further measured the robustness of the IGPM-based risk signature by validating its performance in other independent cohorts. We accessed the GEO database recently and did not find a data set that contained both clinical data and all kIGs. To verify the signature more accurately, a bootstrap method was used to obtain the validation cohort with a sample size of 500 from the TCGA-STAD cohort. Survival analysis based on the Kaplan-Meier method showed that there was a significant difference between the high‐risk group and the low‐risk group, suggesting that the IGPM‐based risk signature worked well in the validation cohort (Fig. 4h). The survival analyses of the kIGs in the ACRG and GSE29272 cohorts were used to validate the predictive ability of the IGPM‐based risk signature once again. The high and low expression groups of IL2RG, CCL25, or IL33 had significantly different survival probabilities (Fig. 4i-l), and the results and trends were consistent with the outcomes of OS prediction for the TCGA-STAD cohort.
Interrelation of the risk score, clinical features, immune cell infiltration, immune checkpoint molecules, and TMB
We examined the pairwise correlations of the risk score with clinical features, immune cell infiltration, immune checkpoint molecules, and tumor mutation burden (TMB), respectively (Fig. 5a). Our results showed that GKN1 had a significant negative correlation with American Joint Committee on Cancer (AJCC) Stage, which indicated that the kIGs might affect the prognosis of GC patients. In terms of the immune microenvironment, BMP7 had a significant negative correlation with the resting NK cells and monocytes, whereas CCL25 had a significant positive correlation with monocytes. Besides, GKNI had a positive correlation with M1 macrophages. IGLV5-48 had a positive correlation with naïve B cells and activated CD4 memory T cells, but it showed a significant negative correlation with M2 macrophages.
Next, we aimed to explore whether the risk score could predict immunotherapeutic benefits in GC patients. We selected 14 immune checkpoint-related candidate genes (BTLA, GITR, TNFRSF14, IDO, LAG-3, PD-1, PD-L1, PD-L2, CD28, CD40, CD80, CD137, CD27, and CTLA-4) to assess their relationships with the risk scores. The results showed that the risk scores were negatively associated with the expression of the critical immune checkpoints (TNFRSF14 and CD40) (Fig. 5b and c), indicating that the poor prognosis of high IGPM patients might be due to the tumor immunosuppressive microenvironment.
We further found that there was no significant correlation between the risk score and TMB representing nonsynonymous variants (Fig. 5d). Also, there was no significant difference in TMB between patients with high IGPM and low IGPM (Fig. 5e). However, we found that high TMB was associated with good OS (Log-rank test, P < 0.05, Fig. 5f). Considering that the TMB predictive value was not very high in the TCGA-STAD cohort, we explored whether the combination of IGPM and TMB could be a more powerful predictive biomarker for the prognosis. We integrated IGPM and TMB to stratify all the samples into high TMB/low IGPM, low TMB/low IGPM, high TMB/high IGPM, and low TMB/high IGPM groups. As shown in Fig. 5g, significant differences were found among all groups (Log-rank test, P < 0.0001), and patients in the high TMB/low IGPM group exhibited the best OS. These results together strongly demonstrated that the risk score was positively correlated with tumor malignancy.
Establishment and validation of a nomogram combined with clinical characteristics
As the risk score was significantly correlated with high malignancy, we examined whether the IGPM-based risk signature could serve as an independent prognostic factor for GC patients through univariate and multivariate Cox regression analysis. The IGPM‐based risk signature, together with age, gender, AJCC stage, and Lauren classification were enrolled as covariates to perform the analysis. The results demonstrated that the p value of the IGPM‐based risk signature was less than 0.0001, confirming that the IGPM‐based risk signature could be utilized to predict the prognosis of GC patients (Table 1).
Table 1
Univariable and multivariable Cox analyses for clinical characteristics and IGPM in the TCGA-STAD cohort
| Univariable Cox analysis | Multivariable Cox analysis |
Variables | Hazard ratio (95% CI) | p value | Hazard ratio (95% CI) | p value |
Age | 1 (1–1) | 0.01 | 1.04 (1.01–1.06) | 0.002 |
Gender | | | | |
Female | 1 | | | |
Male | 1.3 (0.91–1.8) | 0.15 | 1.2 (0.76–1.89) | 0.431 |
AJCC Stage | | | | |
Stage I | 1 | | | |
Stage II | 1.55 (0.78–3.08) | 3e-04 | 1.98 (0.77–5.08) | 0.157 |
Stage III | 2.39 (1.26–4.52) | 5e-04 | 2.88 (1.23–6.75) | 0.015 |
Stage IV | 3.83 (1.85–7.90) | 3e-04 | 8.86 (3.44–22.79) | 6.07e-06 |
Lauren Classification | | | |
Diffuse | 1 | | | |
Intestinal | 0.84 (0.55–1.3) | 0.43 | 0.68 (0.43–1.07) | 0.093 |
IGPM | | | | |
High | 1 | | | |
Low | 0.43 (0.31–0.61) | 1.6e-06 | 0.39 (0.25–0.61) | 3.13e-05 |
Combining the above factors, we constructed a nomogram to broaden the clinical application and usability of the IGPM-based risk signature (Fig. 6a). Every patient was assigned with a total point value by adding the point for each prognostic parameter. Higher total points corresponded to a worse clinical outcome of patients. The ROC curve indicated that the nomogram had good predictive ability and accuracy for survival status (Fig. 6b). Furthermore, the calibration plot showed that the nomogram had similar performance to that of an ideal model (Fig. 6c).