GO and KEGG enrichment analyses of prognostic genes
A total of 948 low-risk and 1636 high-risk genes were identified in the training cohort. The KEGG and GO enrichment analyses for the two sets of genes revealed that the low-risk genes were enriched in cell proliferation and DNA metabolism-related processes, such as DNA replication, cell apoptosis, and nuclear division (Fig. 1a–b). In contrast, the high-risk genes were enriched for cell-matrix adhesion-related processes, including PI3K-Akt signaling pathway, focal adhesion, and containing collagen (Fig. 1c–d).
Identification of GC subtypes with prognostic significance
Patient samples in the training cohort were divided into high-score (n = 233) and low-score (n = 517) groups based on the ssGSEA score. The Kaplan-Meier survival curves showed that patients in the high-score group have, as could be expected, a higher survival probability than those in the low-score group (p < 0.01, Fig. 2a–c). Essentially the same results were obtained in the validation set. The high-score patients in GSE26942 showed a higher progression-free survival (PFS) probability (Fig. 2d) compared with the low-score group. Likewise, the disease-free survival (DFS) probability (Fig. 2e) and overall survival probability (Fig. 2f) of the high-score group were higher than that of low-score group in GSE66229.
The relationship between ssGSEA score and clinical features
Although there was no significant difference in gender (Fig. 3b, g), higher ssGSEA scores were correlated with higher age (Fig. 3a, f), surviving cases (Fig. 3c, h), lower AJCC-T (American Joint Committee on Cancer-tumor invasion) stage (Fig. 3d, i), and AJCC-N (lymphoid metastasis) stage (Fig. 3e, j). Furthermore, clinical T (p < 0.001) and N (p < 0.01) stages, surviving cases (p < 0.001), and age (p < 0.05) were correlated with the ssGSEA scores (Fig. 3k).
Construction of a nomogram to predicting OS
A nomogram with age, gender, T and N stages, and ssGSEA score was constructed to predict the OS of the GC samples. Meanwhile, the multivariate Cox regression analysis results revealed that the ssGSEA score (p < 0.001), age (p < 0.001), T (p < 0.05) and N (p < 0.01) stages were independent prognostic indicators (Fig. 4a). The calibration curves on 1 year, 3 years, and 5 years confirmed the accuracy of the nomogram, as shown in Fig. 4b.
Relationship between two GC subtypes and TMB, MSI
The TMB values (Fig. 5a) and MSI status (Fig. 5b) were significantly higher in the high-score group versus the low-score group (Wilcoxon Rank Sum Test). Spearman correlation analyses revealed that the TMB (R = 0.55, p < 0.001, Fig. 5c) and MSI (R = 0.38, p < 0.001, Fig. 5d) had a strong correlation with the ssGSEA score.The TMB value and ssGSEA score of each sample were merged, and then, based on their best cut-off values, patients were divided into four subgroups, respectively. The high-score, high-TMB subgroup had the highest survival rate followed by the low-score, high-TMB subgroup, and the low- score, low-TMB subgroup had the lowest survival rate (Fig. 5e). Similarly, the high score, high MSL group also showed the longest overall survival (Fig. 5f). We then defined 20 frequently mutated genes to explore the landscape of somatic mutations in two GC subtypes. Patients in the high-score group (Fig. 5g) had a higher probability of gene mutations compared with the low- score group (Fig. 5h).
Identification of key module and hub gene
Fifteen modules were identified based on the soft thresholding power (Fig. 6a) and the average linkage hierarchical clustering (Fig. 6b). The blue module showed the highest correlation with the high ssGSEA score (Fig. 6c, d). We obtained 16 genes that are the intersections of genes in the blue module and tumor-related transcription factors (TFs), including BCL6, EBF1, EPAS1, FOXP2, LEF1, MEF2C, MEIS1, NR2F1, NR2F2, PBX3, RUNX1T1, SOX17, TCF21, TCF7L1, VEZF1, and WWTR1. Next, through consulting literature in PubMed, we found that RUNX1T1 was closely related to the progress of cancer[9, 15, 22]. Therefore, RUNX1T1 was selected as the hub gene for further analysis.
Prognostic signature of RUNX1T1
We performed differential expression analyses of RUNX1T1 for two subtypes in multiple cohorts (Wilson Rank Sum Test). The RUNX1T1 expression was associated with a high score (Fig. 7a, b) and a poor OS (Fig. 7c-f). In addition, samples from probe ID 205528_s_at were divided into high and low RUNX1T1 expression groups according to the best cut-off value. The Kaplan-Meier plotter database showed that high mRNA expression of RUNX1T1 were significantly related to shorter OS, first progression (FP), and post-progression survival (PPS) in GC (Supplementary figure. 1).
Function enrichment analysis of RUNX1T1
The GSEA plot showed that the high expression of RUNX1T1correlated positively with the focal adhesion pathway (Fig. 8a) and lowly expressed RUNX1T1 associated with the MAPK signaling pathway (Fig. 8b)
Predictive pembrolizumab response of ssGSEA score and RUNX1T1
We cited the clinical data and gene expression data of 61 patients with advanced metastatic gastric cancer from South Korea[6]. Next, we calculated the ssGSEA score for these samples. The ssGSEA score is related to tumor mutation burden, so we analyzed the relationship between the ssGSEA score and pembrolizumab treatment in response to this data (Fig. 9a). It is worth mentioning that patients with a high RUNX1T1 expression showed a worse response to pembrolizumab treatment on these patients (Fig. 9b). The ROC (AUC = 0.742) assessed the accuracy of this result (Fig. 9c).
The relationship between the RUNX1T1 expression and immune infiltration
Tumor-infiltrating lymphocytes are not only an independent factor predicting cancer survival[23], but are also highly correlated to the response to immunotherapy[24]. Therefore, we tried to find the relationship between the expression of RUNX1T1 and immune infiltration in GC. CIBERSORT was used to assess the state of immune infiltration. Eight hundred and four cancer samples in the training set were divided into two parts according to the median values of the RUNX1T1 expression that the high- and low-expression groups showed in differential immune cells infiltration. (Fig. 10a). We also evaluated the correlations between RUNX1T1 expression and immune infiltration levels (Fig. 10b). The result showed that seven kinds of TICs were positively correlated with RUNX1T1 expression, including memory B cells, naive B cells, M2 macrophages, resting mast cells, monocytes, resting memory T cells CD4, and gamma delta T cells; eight kinds of TICs were negatively correlated with RUNX1T1 expression, including activated dendritic cells, M0 macrophages, activate mast cells, neutrophils, resting NK cells, plasma cells, activated memory T cells CD4, and follicular helper T cells.