3.1 There were 2 DCs between GC and normal controls
A total of 64,775 cells were kept in GSE183904 after processing data (Fig.S1, Fig. 1a). We selected 2,000 highly variable genes to perform the PCA downscaling (Fig. 1b), and then the top 30 PCs for UMAP clustering (Fig. 1c-d). There were 21 clusters in the UMAP results (Fig. 1e). After annotation, 10 distinct cell subgroups were obtained, including B cell, Bone marrow (BM) cell, Endothelial cell, Epithelial cell, Fibroblast, Monocyte, Natural killer (NK) cell, Smooth muscle cell, T cell, and Tissue stem cell (Fig. 1f-g). The fraction of T cells in GC groups (38.38%) was more than that of controls (19.2%) (Fig. 1h). The proportion of B cells in GC group was 11.53% less than that in the control group, and the Epithelial cell was 13.91 less. Only the proportion of Epithelial cells and Tissue stem cells between GC and normal controls had significant differences as DCs (Fig. 1I). Finally, a total of 1,592 DC-DEGs were identified in DCs (Fig.S2, S3)
3.2 A total of 112 candidate genes were obtained by different analyses
In TCGA-STAD, there were 4,181 STAD-DEGs between STAD and normal controls, of which 1,934 STAD-DEGs were up-regulated and 2,238 STAD-DEGs were down-regulated (Fig. 2a-b). Then, the WGCNA was performed, and there was no outlier sample in STAD (Fig.S4). Based on the optimal soft threshold was 5 (Fig. 2c), 10 modules were obtained by the dynamic tree cutting (Fig. 2d). Among them, MEmagenta module was significantly positive correlation with MPTDN-RGs (cor = 0.57, p < 0.001) and MEturquoise module was significantly negative correlation with MPTDN-RGs (cor=-0.41, p < 0.001) (Fig. 2e). A total of 3,832 Module genes from the MEmagenta and MEturquoise modules were identified for further analysis. Finally, 112 candidate genes were determined by overlapping the DC-DEGs, STAD-DEGs, and Module genes (Fig. 2f, Table S3). The GO results indicated that candidate genes were linked with collagen fibril organization, extracellular matrix organization, extracellular structure organization, etc (Fig. 2g, Table S4). The KEGG results suggested that candidate genes were associated with protein digestion and absorption, AGE-RAGE signaling pathway in diabetic complications, amoebiasis, etc (Fig. 2h, Table S5).
3.3 The risk model was constructed and validated
After univariate Cox analysis, 6 signature genes were identified (Fig. 3a), and then 4 prognostic genes (GPX3, CD36, VCAN, and SERPINE1) were determined by the LASSO analysis with λ.min = 0.0346 (Fig. 3b-c). The GSEA results indicated that GPX3, CD36, and VCAN significantly co-enriched the DNA replication, and SERPINE1 was related to ribosome, and aminoacyl tRNA biosynthesis (Fig. 3d-g). Based on these prognostic genes, the risk model was constructed, and the risk coefficients of each prognostic gene are shown in Table 1. What’s more, the STAD patients were assigned to high- (n = 175) and low-risk (n = 175) groups based on the Median Riskscore of prognosis genes in TCGA-STAD (Fig. 3h). Compared with the low-risk group, the prognosis of the high-risk group was worse (Fig. 3i). The area under the curves (AUCs) of 1-, 3-, and 5-year OS were above 0.6, which indicated the effectiveness of the risk model (Fig. 3j). Additionally, the STAD patients were assigned to high- (n = 150) and low-risk (n = 150) groups in GSE62254 (Fig. 3k). KM curves indicated the OS in the low-risk group was more than that of high-risk groups (Fig. 3l), and the ROC curves indicated the risk model could predict effectively the prognosis of STAD (Fig. 3m).
Table 1
The risk coefficient of prognostic genes.
Gene | coefficient |
GPX3 | 0.08911124 |
CD36 | 0.04142701 |
VCAN | 0.01263749 |
SERPINE1 | 0.13868956 |
3.4 Nomogram models could predict effectively the prognosis of STAD patients
To further explore the effect of clinic indicators (age, gender, and TNM categories) on STAD, we performed the univariate Cox, PH assumption, and multiple Cox on these clinic indicators and risk score to select the independent prognostic factors (Fig. 4a-b). Ultimately, risk score, age, and N categories were determined as the independent prognostic factors to construct a nomogram model, which could predict the 1-, 3-, and 5-year OS of STAD patients (Fig. 4c). Additionally, the calibration curves and ROC curves of 1-, 3-, and 5-year indicated that the nomogram was in good predicted performance (Fig. 4d-g).
3.5 Immune mechanism was explored in TCGA-STAD
First of all, we analyzed 6 immune cell infiltrations (B cell, CD4+ T cell, CD8+ T cell, Neutrophil, Macrophage, and Dendritic) in the tumor in TCGA-STAD. The risk score had significantly positive correlations with CD4+ T cell, CD8+ T cell, Neutrophil, Macrophage, and Dendritic (p < 0.001), with the strongest correlation with Macrophage (cor = 0.52, p < 0.001) (Fig.S5). A total of 23 immune cells showed significant differences in the two risk groups (p < 0.05), which was shown in (Fig. 5a). Then, all immune-related pathways except MHC class Ⅰ had significant differences in the high- and low-risk groups (Fig. 5b). Next, compared with the high-risk group, the Dysfunction, Exclusion, and TIDE scores of the low-risk group were significantly lower (p < 0.001) (Fig. 5c), which manifested immune escape was more likely to occur in high-risk groups. The immune response rate was significantly lower in the high-risk group than in the low-risk group (p < 0.001) (Fig. 5d). In addition, the survival probability of the 4 groups had significant differences, and the prognosis of the ‘L_TMB-H_risk’ was the worst (Fig. 5e). The IPS scores of CTLA4-/PD-1-, CTLA4+/PD-1-, and CTLA4-/PD-1 + in the low-risk group were significantly higher than those of the high-risk group, indicating that STAD patients in the low-risk group could benefit from immune checkpoint inhibitors (ICIs) (Fig. 5f-i).
3.6 The sensibility of 93 drugs had significant differences between the two risk groups in TCGA-STAD
Here, a total of 93 drugs showed significant differences in IC50 levels, the IC50 levels of 15 drugs in the high-risk group were significantly higher than those in the low-risk group, and the others were opposite (p < 0.05) (Fig.S6). Afterward, the 4 prognostic genes were submitted to the DGIdb database, and 22 drugs were predicted, such as ABT-510, CYCLOSPORINE, and DEFIBROTIDE, etc (Fig. 6, Table S6). In the end, the 19 molecular structures of drugs were downloaded from the PubChem database (Fig.S7).
3.7 Endothelial cells and Tissue stem cells differentiated in normal control were less than those of GC
After expression levels analysis of prognostic genes, Endothelial cells and Tissue stem cells caught our attention (Fig. 7a, Fig.S8-S9). Before pseudotime analysis, Endothelial cells and Tissue stem cells were divided into 14 and 8 clusters, respectively (Fig. 7b-c). Next, we conducted the pseudotime analysis of Endothelial cells, which indicated that the Endothelial cells differentiate from the left to the right with the differentiation of time (Fig. 7d), while Tissue stem cells were the exact opposite (Fig. 7e). There were 8 distinct states in the differentiating process of Endothelial cells (Fig. 7f). State 1 was located the starting point of the trajectory with different clusters, State 6 was located the terminal point of the trajectory with the clusters 6 (Fig. 7h). As for Tissue stem cells, they also included 8 states (Fig. 7g), state 7 was located at the starting point of the trajectory with clusters 0, 4, 5, etc (Fig. 7i). In the normal controls, fewer Endothelial cells and Tissue stem cells differentiated (Fig. 7j-k). Then, we selected the marker genes in different states of Endothelial cells and Tissue stem cells (Fig.S10), these genes significantly changing expression along pseudotime (Fig. 7l-m). Last, the expression of prognostic genes in different states is displayed in Fig.S11.
3.8 Expression levels of prognostic genes were compared between STAD and normal controls
In TCGA-STAD, the expression levels of GPX3 and CD36 in STAD groups were significantly lower than those of normal controls (p < 0.05). In contrast, the VCAN and SERPINE1 were the opposite (Fig. 8a). Subsequently, we collected tumor tissues (n = 5) and para-carcinoma tissues (n = 5) from the GC patients to verify the expression of prognostic genes. Importantly, the qPCR results were consistent with the previous results (p < 0.05), and these genes provide a new perspective on the prognosis of GC (Fig. 8b-e).