3.1 Analysis of differential gene expression profiles of immune and stromal scores in CRC
The immune scores and stromal scores of all CRC samples were calculated using the ESTIMATE algorithm and stratified into high and low groups according to their median levels. The gene expression profiles of the high- and low-score groups were compared. A total of 2033 DEGs were obtained based on immune scores (high vs. low), consisting of 1997 up-regulated and 36 down-regulated (Fig 1A, C, E, F). Similarly, 2521up-regulated and 23 down-regulated genes were identified from stromal scores (high vs. low) (Fig 1B, D, E, F). The Venn plots showed that 1314 up-regulated genes overlapped in the high immune/stromal score groups and 4 down-regulated genes overlapped in the low immune/stromal score groups.
3.2 Functional enrichment analyses of DEGs in the intersection
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses were performed to evaluate the biological functions and pathways involved in these DEGs in the intersection. The results of GO enrichment analysis showed that these DEGs were almost enriched in immune-related terms. lymphocyte mediated immunity, immunoglobulin complex, and antigen binding were the most significant enrichment terms among biological processes, cellular components, and molecular functions (Fig 2A). KEGG analysis also displayed that these DEGs were mainly enriched in cytokine-cytokine receptor interaction, chemokine signaling pathway, and neuroactive ligand-receptor interaction (Fig 2B). The enrichment results suggested that these DEGs participate in immune-related activities and might regulate TME in CRC.
3.3 Construction of co-expression network and identification of hub genes
To further narrow candidate genes and screen hub genes related to clinicopathological factors, a total of 1318 DEGs in the intersection were included in the co-expression network analysis. 511 CRC patients with complete information including survival, stage, and T/M/N stage were selected.
The soft threshold was set to 9, with the lowest mean connectivity (Fig 3A). As shown in Fig 3B, the logarithm of the node with connectivity k log(k) was negatively correlated with the logarithm of the node probability log(p(k)), and the correlation coefficient reached 0.94. The MEDissThres was set as 0.25 to merge similar modules, and five modules were eventually generated (Fig 3C). Among them, there were 240 genes in the blue module, 509 genes in the brown module, 85 genes in the green module, 321 genes in the grey module, and 163 genes in the yellow module. The grey module is a group of genes that could not fit into other modules. Correlation analysis indicated that the blue module was significantly associated with the clinical features of CRC (Fig 3D). Furthermore, the module membership in the blue module also has a significant correlation with clinical features such as survival time, clinical stage, T stage, and N stage (Fig 3E-H). We calculated the connectivity between genes in the blue module, and the top 30 genes with the highest connectivity were screened as hub genes for subsequent analysis.
3.4 Selection and validation of key genes associated with survival
Univariate Cox regression analysis was used to evaluate the prognostic values of DEGs in the intersection. Of the 1318 DEGs, 150 were identified to be significantly associated with the prognosis of CRC (p < 0.01). Venn analysis was used to obtain the intersection of 30 hub genes in the co-expression analysis and significant genes in univariate COX analysis (Fig 4A). As a result, SPOCK1 and POSTN were identified as key genes associated with survival. As shown in the forest plot, SPOCK1 (Hazard Ratio = 1.0861, 95%CI of ratio: 1.0279-1.1476, p = 0.0033), POSTN (Hazard Ratio = 1.0049, 95%CI of ratio: 1.0015-1.0084, p = 0.0053), age (Hazard Ratio = 1.0391, 95%CI of ratio: 1.0179-1.0608, p = 0.0003), stage (Hazard Ratio = 2.3940, 95%CI of ratio: 1.8654-3.0724, p < 0.0001), T stage (Hazard Ratio = 2.7387, 95%CI of ratio: 1.7842-4.2037, p < 0.0001), M stage (Hazard Ratio = 1.3957, 95%CI of ratio: 1.1575-1.6830, p = 0.0005), N stage (Hazard Ratio = 1.9759, 95%CI of ratio: 1.5428-2.5308, p < 0.0001) were relevant to OS among patients with CRC (Fig 4B).
The survival analysis of GEPIA data revealed that CRC patients with high expression of SPOCK1 (Hazard Ratio = 2, Logrank p = 0.0015, Fig 4C) and POSTN (Hazard Ratio = 1.7, Logrank p = 0.017, Fig 4E) had poor OS. Meanwhile, high expression of SPOCK1 (Hazard Ratio = 2, Logrank p = 0.0016, Fig 4D) and POSTN (Hazard Ratio = 1.6, Logrank p = 0.022, Fig 4F) were also significantly correlated with DFS. To validate the prognostic value of hub genes, we performed survival analyses based on an independent subset GSE17536. Consistent with the previous results, high expressions of SPOCK1 and POSTN were significantly associated with worse OS (p < 0.01, Fig 4G and 4I) and DFS (p < 0.01, Fig 4H and 4J). Summary above, SPOCK1 and POSTN have a greater prognostic value in CRC.
3.5 Association of SPOCK1 and POSTN expression with clinicopathological factors
To investigate the relationships between key prognostic genes and the clinical characteristics of CRC patients in TCGA, we compared the expression levels of SPOCK1 and POSTN in different clinical stage, T stage, M stage, and N stage. The results showed that differential expression of SPOCK1 was significantly associated with clinical stage (Fig 5A), T stage (Fig 5B), and N stage (Fig 5C). Moreover, differential expression of POSTN was also related to stage (Fig 5E), T stage (Fig 5F), and N stage (Fig 5G). However, the expression of SPOCK1 and POSTN had no significant influence on M stage (Fig 5D and 5H).
3.6 Gene set enrichment analysis of key genes
To further explore the underlying mechanism, we performed GSEA between high-expression groups and low-expression groups based on the median levels of SPOCK1 and POSTN expression, respectively. As shown in Fig 6, the genes in SPOCK1 high-expression group were mainly enriched in immune-related pathways like chemokine signaling, cytokine-cytokine receptor interaction, and ECM-receptor interaction (Fig 6A). However, the genes in the SPOCK1 low expression group were more significantly enriched in metabolic-related pathways, such as aminoacyl tRNA biosynthesis, butanoate metabolism, and citrate cycle TCA cycle (Fig 6B). In the POSTN high expression group, genes were also mainly enriched in B cell receptor signaling pathway, chemokine signaling pathway, cytokine-cytokine receptor interaction, and other immune-related pathways (Fig 6C). While, the genes in POSTN low-expression group were enriched significantly in oxidative phosphorylation, parkinsons disease, pentose phosphate pathway (Fig 6D). These results suggested that SPOCK1 and POSTN might have potential regulatory effects on the immune microenvironment of CRC.
3.7 Correlation of SPOCK1 and POSTN expression with immune infiltration levels
To confirm the relationships between SPOCK1 and POSTN expression with the immune microenvironment. We calculated the relative abundances of the 22 immune cells in all CRC samples using the CIBERSORT algorithm. The violin diagram provided visualization of the differences in 22 immune cells between the high and low expression groups of SPOCK1 and POSTN. The results showed that plasma cells (p < 0.001), resting CD4 memory T cells (p = 0.003), monocytes (p = 0.005), and activated dendritic cells (p = 0.023) were in higher proportions in the SPOCK1 low expression group, whereas macrophages M0 (p < 0.001), macrophages M1 (p = 0.046), and neutrophils (p = 0.005) were in higher proportions in the SPOCK1 high expression group (Fig 7A). The POSTN low expression group also had a higher proportion of plasma cells (p < 0.001), resting CD4 memory CD4 T cells (p = 0.046), and monocytes (p = 0.002), whereas the POSTN high expression group had a higher proportion of macrophages M0 (p = 0.001), macrophages M2 (p = 0.003), and neutrophils (p < 0.001) (Fig 7B). Additionally, we used the TIMER algorithm to determine the correlations between SPOCK1 and POSTN expression with 6 kinds of infiltration immune cells. The analysis showed that SPOCK1 and POSTN expression had significantly positive correlations with infiltrating levels of B cells, CD4+ T cells, CD8+ T cells, dendritic cells, macrophages and neutrophils (Fig 7C and 7D). From the above results, both SPOCK1 and POSTN were related to immune cell infiltration in CRC, especially macrophages and neutrophils.
3.8 Correlation of SPOCK1 and POSTN expression with immune checkpoint genes expression
Immune checkpoint molecules have been identified to suppress antitumor immune responses in solid tumors. Therefore, we analyzed the correlation between SPOCK1 and POSTN expression with immune checkpoint gene expression such as PD-L1, PD-1, PD-L2, CTLA4, TIM-3, and B7-H3. The results showed that SPOCK1 expression was significantly positively correlated with PD-1 (R = 0.23, p = 5.9e-08, Fig 8A), PD-L1 (R = 0.33, p = 8.7e-16, Fig 8B), PD-L2 (R=0.59, p < 2.2e-16, Fig 8C), CTLA4 (R = 0.34, p = 5.8e-16, Fig 8D), TIM-3 (R = 0.62, p < 2.2e-16, Fig 8E), and B7-H3 (R = 0.33, p = 4.6e-15, Fig 8F). Similarly, POSTN was also positively correlated with PD-1 (R = 0.24, p = 1.1e-08, Fig 8G), PD-L1 (R = 0.45, p < 2.2e-16, Fig 8H), PD-L2 (R = 0.67, p < 2.2e-16, Fig 8I), CTLA4 (R = 0.37, p < 2.2e-16, Fig 8J), TIM-3 (R = 0.67, p < 2.2e-16, Fig 8K), and B7-H3 (R = 0.33, p = 3.6e-15, Fig 8L). These suggested that high expression of SPOCK1 and POSTN might potentially regulate tumor immune evasion in CRC.