Screening of DEGs between STAD and normal stomach tissues
The DEGs were visualized with a volcano map. Finally, a total of 1867 DEGs were identified, including 642 downregulated and 1225 upregulated genes (Figure 1A). The top ten pathways in terms of cellular component (CC), biological process (BP), and molecular function (MF) from the GO analysis were visualized. The results showed that the most enriched CC, BP and MF terms were concentrated in the extracellular matrix, extracellular structure organization, extracellular matrix organization, axon development, and receptor ligand activity-related pathways. The extracellular matrix, extracellular structure organization, extracellular matrix organization, axon development, and receptor ligand activity have been widely reported to be correlated with the tumorigenesis and progression of cancer (Figure 1B) [7].
PPI network construction
The top 74 and top 5 nodes were visualized by PPI analysis (Figure 1C). The top 5 nodes were COL3A1, FN1, COL1A2, POSTN and IL-6. Thus, these five genes might play key roles in the DEG protein interaction network (Figure 1D). In fact, the top 5 node genes are oncogenes that are known to play important roles in breast cancer, colorectal cancer, non-small-cell lung cancer, pancreatic cancer and GC [8-12].
WGCNA
Advanced GC refers to GC in which the tumor cells have infiltrated the submucosa and the muscle layer or have passed through the muscle layer to the serosa, regardless of the presence of lymph node metastasis. Due to the poor prognosis of advanced GC, it is of grate necessity to identify the protein coding genes (PCGs) related to the pathological stage of GC. To determine the PCGs associated with the pathological stage of GC, WGCNA was performed. Average linkage clustering and Pearson correlation analysis were used to cluster the 375 STAD samples (Figure 2A). Next, network topology analysis was conducted with various soft-threshold powers. The WGCNA was set to have relatively balanced scale independence and average connectivity. In this research, the power of β=4 (R2 without scale) was selected as the soft threshold parameter to ensure that the network was scale-free (Figure 2A). After merging modules with a difference of less than 25%, nine different PCG modules were identified (Figure 2B). Then, correlation analysis of the modular characteristic genes and T stage of each PCG module was assessed. Subsequently, black module (Cor=0.89, P<0.001) was identified as a PCG module that was highly correlated with T stage (Figure 2C). The ClueGO plug-in of Cytoscape was used to perform enrichment analysis of the black module. The results showed that it was enriched in DEGs related to the extracellular matrix, the complex of collagen trimers, the extracellular space, etc (Figure 2D).
Prognostic model and nomogram construction
Genes from the black module identified by WGCNA were used to construct the prognostic model. Then, Cox regression method was used to establish a prognostic model based on six genes to predict the OS of STAD patients (Figure 3A). The receiver operating characteristic (ROC) curve indicated that our model had great predictive efficiency (1-year survival AUC=0.76; 2-year survival AUC=0.7; 3-year survival AUC=0.66) (Figure 3B). Kaplan-Meier survival curves showed that the prognosis of patients with high expression of COL3A1, ADAMTS12, BGN, FNDC1, AEBP1 and HTRA3 was poorer than that of patients with low expression of these genes (Figure 3C). A nomogram was established by combining the prognostic model with clinical parameters such as age, sex, and T stage to better predict the 1-, 2-, and 3-year survival rates of STAD patients (Figure 3D). The decision curve indicated that the survival rates predicted by the nomogram were highly consistent with the actual OS values (Figure 3E).
GSEA and GSVA
Correlation analysis showed that the expression levels of the 6 model genes were highly correlated with a high T stage (Figure 4). GSEA showed that in samples with high expression of ADAMTS12, AEBP1 and FNDC1, the mitogen-activated protein kinase (MAPK) signaling was overactivated. In samples with high expression of BGN, COL3A1 and HTRA3, the MAPK and PI3K/AKT pathways were overactivated. GSVA showed that collagen fiber organization and extracellular matrix structural constituents conferring tensile strength were activated with the increase in ADAMTS12, AEBP1, BGN, FNDC1, COL3A1 and HTRA3 expression.
Mutation analysis and immune infiltration analysis
The mutation analysis showed that in high-risk group, the mutation frequencies of SYNE1, AHNAK2, PCDH15, RYR2 and DNAH5 were significantly increased and the non-missense mutation rates of MUC16, TP53, and CSMD3 were lower than those in low-risk group (Figure 5A-B). The mutation analysis also showed that in the low-risk group, the MUC16, TP53, OBSCN, CSMD3, SYNE1, MDN1, PLEC, and NEB mutation rates were significantly increased. CIBERSORT was used to quantify the expression of markers of 21 immune cells in STAD (Figure 6A). The abundance of M0 macrophages, M1 macrophages, activated dendritic cells, activated mast cells, and neutrophils was different between the high- and low-risk groups (Figure 6B). The heat map showed the distribution of 21 types of immune cells in STAD tissues grouped according to T stage (Figure 6C). Under the background of STAD, the expression correlation of various immune cells was shown in the heat map (Figure 6D). The correlation between the 6 model genes and the infiltration of specific immune cell types was shown in Figure 6E-I. The tumor microenvironment (TME) has important clinicopathological significance in predicting outcome and treatment efficacy. For example, according to the "macrophage balance hypothesis", tumor-associated macrophages (TAMs) have dual effects: killing tumors and promoting tumor growth. Low purity and B cell infiltration were positively correlated with the occurrence of tumors. A decreased ratio of CD4+/CD8+ T cells and an increase in CD8+ T cells, neutrophils, macrophages and dendritic cells were both positively correlated with malignant tumors.
Expression of the 6 model genes in tumor and paired nontumor tissues
The box diagram showed that the 6 model genes were significantly expressed higher in cancer tissues than in normal tissues (Figure 7A-F). According to the Venn diagram, COL3A1 was identified among both the top 5 nodes of the PPI network and the 6 model genes (Figure 7G).
The expression level of COL3A1 in STAD
Since COL3A1 was the only screened gene that was common to the top 5 nodes of the PPI network and the 6 model genes, it might have a greater potential to play a crucial role in STAD than other model genes. To verify the expression of COL3A1 in STAD tissues, RT-qPCR and WB were performed. The results showed that COL3A1 expression in STAD tissues was notably upregulated compared to that in normal tissues (Figure 8A-B).