Identification of DEGs in STAD
By comparing the gene expression profile between cancer and adjacent normal tissues through GSE (GSE54129, GSE79973) and TCGA database, with |logFC(fold−change)|>2 and adj. P < 0.01, 79 up-regulated and 10 down-regulated overlapping genes were identified, which were shown in the format of volcano plot (Fig. 2A, B, C) and venn diagram (Fig. 2D, E). In addition, these 89 differential genes were mapping in data of TCGA-STAD by heatmap (Fig. 2F). 79 upregulated and 10 downregulated DEGs are listed in Table 1.
GO and KEGG Pathway Analysis
The R package ''clusterProfiler'' was used to perform GO and KEGG analysis for latent function of DEGs. The result of GO analysis revealed that for biological precesses (BP), DEGs were mainly enriched in ''extracellular structure organization'', ''extracellular matrix organization'', ''collagen fibril organization'' and ''collagen metabolic process'' (Fig. 3A). Changes in cellular components (CC) were markedly enriched in ''extracellular matrix'', ''collagen-containing extracellular matrix'', ''endoplasmic reticulum lumen'' and ''collagen trimer'' (Fig. 3A). Changes in the DEG molecular function (MF) were primarily enriched in ''extracellular matrix structural constituent'', ''extracellular matrix structural constituent conferring tensile strength'', ''glycosaminoglycan binding'' and ''platelet-derived growth factor binding'' (Fig. 3A). The analysis of KEGG showed that DEGs were strikingly enriched in ''protein digestion and absorption'', ''ECM-receptor interaction'', ''focal adhesion'' and ''malaria'' (Fig. 3B).
PPI network construction and module analysis
According to the MCC value in cytohubba analysis, 70 genes were filtered into the PPI network. Further, 70 nodes and 281 edges were detected by using the STRING database and visualized by Cytoscape software, with colors gradienting from light to deep considering the significance of genes. COL1A1, COL1A2 and COL4A1 are the three most momentous genes (Fig. 3C).
Construction of the prognosis model
We amalgamated the gene expression matrix with survival data of TCGA patient, and then identified 31 prognosis-related genes by FM test and single factor COX analysis (Table 2). After optimization through the ''survival'' package in R, 9 genes were chosen to establish a prognosis-model (Table 3). Each patient was divided into high or low risk group with riskscores computed by '' '' formula (Fig. 4A, B). Figure 4C demonstrated the expression of these 9 genes in high or low risk group. Furthermore, the P-value in singular (Fig. 5A; HR = 1.647, 95% Cl: 1.459–1.859) and multiple factor (Fig. 5B; HR = 1.644, 95% Cl: 1.451–1.863) analysis were both less than 0.001, indicating the prognosis model could be independent of other clinical factors (for instance, TNM stage, grade, age and sex et al.) and observably associated with the prognosis of patients. Besides, the area under ROC curve (AUC) value of riskscore was 0.759 (> 0.7), which proved the effectiveness of our prognosis model (Fig. 5C).
Survival and clinical correlation analysis
The prognostic value of a totally 89 DEGs was explored by the Kaplan-Meier curves. We then found significant associations between the genes ADAMTS2, ALDH3A2, BDH2, CTHRC1, FNDC1, HOXA10, NOX4, OLFML2B, TEAD4, WISP1, SULF1, INHBA and survival (P < 0.05) (Fig. 6A). The results of these genes in the Kaplan-Meier plotter were all same, side confirming the validity of our analysis (Fig. 6B). Additionally, these 12 genes were ulteriorly analyzed by single factor COX regression, which demonstrated that the genes BDH2, CTHRC1, FNDC1, NOX4 and OLFML2B might predict inferior OS time, but the gene ALDH3A2 was reverse (Fig. 6C). Moreover, the genes ALDH3A2, BDH2, CTHRC1, OLFML2B also exist in our prognosis model. Considering no report on ALDH3A2 in GC, the gene ALDH3A2 consequently aroused our interest and was chosen for further analysis.
In order to learn more about the impact of the ADLH3A2 gene on GC, we assessed its relevance with clinical characteristics in TCGA patients (Figure S1), which was merely notable in tumor grade (P < 0.05).
GSEA enrichment analysis
For the sake of further exploring the biological functions of ALDH3A2 in GC, we performed GSEA enrichment analysis on high and low ALDH3A2 expression datasets. As is shown in Fig. 7, in ALDH3A2 high expression phenotype, the signaling pathway of β-alanine metabolism, butanoate metabolism, fatty acid metabolism, propanoate metabolism, valine leucine and isoleucine degradation were enriched in (FDR < 0.25 and NOM P-value < 0.05).
ALDH3A2 act as immune-related genes in STAD
Via the ''CIBERSORT'' package in R software and TIMER website, we investigated the relationship between ALDH3A2 and tumor immunity (Fig. 8). Totally, 178 samples satisfied the criteria of immune infiltration analysis (Fig. 8A), and then be split into high and low ALDH3A2 group (Fig. 8B, C). The result revealed that, compared with the low expression group, M1-type macrophages are highly expressed in ALDH3A2 high expression group (Fig. 8B). Figure 8D, the co-expression heatmap between diversified immune cells, exhibited that CD4 memory resting T cells might be negatively associated with CD8 T cells; neutrophils might be positively correlated with activated mast cells in STAD. In addition, the relationship between immune cell expression and survival was discussed, and we discovered that higher macrophages may predict inferior prognosis in STAD (P = 0.004; Fig. 8E). Considering that immunological checkpoint, for instance, TOX, CD274, PDCD1LG2, CTLA4, PDCD1 is playing a pivotal role in immunotherapy, the relevance between ALDH3A2 and these checkpoint-related genes were thus analyzed (Fig. 8F). Interestingly, we found that ALDH3A2 might have a negative co-expression correlation with PDCD1, PDCD1LG2 and CTLA4 gene, and positively associated with tumor purity. Furthermore, from Figure S2 and S3, we found that the ALDH3A2 copy number alterations might have an appreciable impact on the level of immune infiltration and mRNA expression. This manifested that ALDH3A2 might influence the immune infiltration level through copy number alterations, thus affecting the prognosis of STAD. In conclusion, ALDH3A2 showed a potential value for STAD remission and immunotherapy.