3.1 Enrichment analysis of 10 DE NMRGs
A total of 5018 DEGs between GC and paraneoplastic tissues were obtained, with 3964 up-regulated genes and 1054 down-regulated genes (Fig. 1A-B, Table S1). The 10 DE NMRGs were obtained (Fig. 1C), and they were enriched to 152 GO Biological Process (BPs), 38 GO Molecular Function (MFs), and 13 KEGG pathways, including NAD biosynthetic process, nicotinamide nucleotide biosynthetic process, positive regulation of immune response to tumor cell, positive regulation of immune effector process, JAK-STAT signaling pathway, nicotinate and nicotinamide metabolism and nucleotide metabolism, etc. (Fig. 1D-E, Table S2-3).
3.2 Correlation analysis of GSVA
The 354 GC patients were divided into the high scoring group (n = 177) and low scoring group (n = 177). There were 2956 DEGs between high and low scoring groups (Table S4), and a total of 1001 GC-NM DEGs were obtained (Fig. 2A). Figure 2B showed that the patients in the high scoring group had a lower survival rate compared to the low scoring group. GSEA showed 1501 GO items and 80 KEGG pathways were enriched (Table S5-6). The top5 GO and KEGG items enriched by high-score (NES > 0) and low-score (NES < 0) patients were showcased in Fig. 2C-D, including arrhythmogenic right ventricular cardiomyopathy, cell adhesion molecules, circadian entrainment, cell–cell adhesion via plasma-membraneadhesion molecules, intrinsic component of synaptic membrane etc. The CIBERSORT algorithm showed that there were native B cells, memory B cells, regulatory T cells (Tregs), marcophage M0, resting mast cells, activated mast cells, neutrophils with differences, in which B cell memory was close to zero in most samples.
3.3 Construction and validation of a prognostic model
To mine the NMRGs relevant to the overall survival (OS) of GC patients, we incorporated the 1001 GC-NM DEGs obtained above into a univariate Cox analysis in the training set, resulting in 387 genes associated with prognosis (Fig. 3A, Table S7). The above 387 genes were further integrated into the LASSO analysis, and 16 characteristic genes were screened out, namely NPY1R, NPR3, CCNA1, AFF2, DNAJB13, SNCG, OLFM3, CFHR3, CST2, CLDN10, DAO, THPO, CIDEA, NCCRP1, ONECUT1, UPK1B (Fig. 3B). Multivariate COX analysis was performed on the 16 prognostic-related genes obtained from the previous LASSO regression analysis, and 7 genes were finally determined as prognostic biomarkers, namely DNAJB13, CST2, THPO, CIDEA, ONECUT1, UPK1B, SNCG (Fig. 3C). Next, we computed the risk score for each GC patient in the training set and classified them into two risk subgroups (high risk and low risk) relying on the median value. Figure 3D revealed the distribution of ranked risk score and survival status for each patient in the training set. The expression heat map showed that DNAJB13, CST2, THPO, CIDEA, ONECUT1, UPK1B, and SNCG were highly expressed in patients with higher risk scores (Fig. 3E). The K-M curve illustrated that patients with higher risk had noteworthy poorer survival than those with lower risk (Fig. 3F). We graphed ROC curves to check the predictive efficiency of the signature. The area under the curve (AUC) values for OS in the training set were 0.699 (1 year), 0.728 (3 year), and 0.712 (5 year), indicating a decent accuracy (Fig. 3G). In addition, a comparable trend was verified in the external validation set (Fig. 4A-D). Above results revealed that the risk signature was a valid survival predictor for GC patients.
3.4 The risk score and gender were independent prognostic predictors
Via Cox analysis, we detected that risk score and gender were independent predictors of prognosis for patients with GC (P < 0.05) (Fig. 5A-B). The nomogram containing independent prognostic predictors was generated (Fig. 5C). The calibration curves proved that the capability of the nomogram in predicting survival at 1, 3, and 5 years in GC patients was satisfactory (Fig. 5D).
3.5 Functional enrichment analysis of high and low risk groups
872 GO items and 53 KEGG pathways were enriched (TableS8-9). The top5 GO and KEGG items enriched by high risk (NES > 0) and low risk (NES < 0) patients were shown in Fig. 6A-B. Homophilic cell adhesion via plasma membrane adhesion molecules, multicellular organismal signaling, and postsynaptic membrane et were activated in the patients with higher NMRGs-relevant risk scores (NES > 0). CMG complex, DNA strand elongation involved in DNA replication, PPAR signaling pathway, and complement and proteasome assembly et were activated in the patients with lower NMRGs-relevant risk scores.
3.6 Relationships between risk score and immune cell
There were differences between the high and low risk groups for 11 immune cells, with activated CD4 T cells, activated CD8 T cells, type 17 T helper cells highly expressed in the high risk group, and central memory CD4 T cells, T follicular helper cells, activated B cells, immature B cells, natural killer cells, plasmacytoid dendritic cells, immature dendritic cells, and mast cell were low expressed in the high risk group (Fig. 7A). In addition, 7 immune cells (activated CD4 T cell, activited CD8 T cell, activated dendritic cell, CD56 bright natural killer cell, gamma delta T cell, memory B cell, plasmacytoid dendritic cell) were significantly correlated with the risk score (Fig. 7B), suggesting that the risk score may be involved in the development of GC either directly or indirectly through the influence of immune cells.
3.7 Expression of prognostic genes
In order to further verify the expression of the prognostic gene, we studied the relative expression of its transcripts in paraneoplastic tissues and GC tissues by QRT-PCR. The results showed that the expressions of DNAJB13, CST2, THPO, CIDEA, ONECUT1, UPK1B, and SNCG were significantly up-regulated compared with those of paraneoplastic tissues (Fig. 8).