As shown in Fig. 1, the flowchart presented the process. Firstly, we analyzed the GEO dataset GSE13911. Then, we found macrophage infiltration was different in GC and normal tissues. To verify this phenomenon, we downloaded related data from TCGA. In this section, we identified 3 hub genes relevant to macrophages by using WGCNA. Finally, the former results were verified by various websites and IHC.
GEO Data processing
Fig. 2A (barplot) showed the ratio of immune cells infiltration in 69 samples. 31 samples on the left are normal tissues, 38 samples on the right are tumor tissues. A heat map (Fig. 2B) of the cells infiltration about 69 samples presented that there was a significant difference in the distribution of some cells in normal and tumor tissues. For example, resting memory CD4+ T cells were high in the tumor group, while plasma cells and CD8+ T cells have higher proportion in the normal group. CorHeatmap (Fig. 2C) was used to observe the co-expression (red) or co-inhibition (blue) of various cells in tumor tissues. We could summarize from the picture that neutrophils and activated dendritic cells had the highest correlation and the correlation coefficient was 0.65. In contrast, resting and activated mast cells were most negatively correlated and the coefficient was -0.59. Then, 31 pairs of samples could be matched in the GSE13911 data set for pairwise comparison. The violin plot was a variation of the box plot, which combines the distribution kernel density estimation curve with the box plot. The outermost shape showed the density of the location so the distribution of the data could be found. The middle white dot represented the median, the thick black bar (black box) represented the quartile range (25% quantile and 75% quantile), and the thin black line extending from it represented the 95% confidence interval. Follicular helper T cells and M0 macrophages, activated dendritic cells, and activated mast cells were high in the tumor group (Fig. 2D). The difference was statistically significant. In the normal group, plasma cells and CD8+ T cells and resting mast cells were highly expressed. The most important finding was that M0 macrophages in the tumor were significantly higher than the normal group. Besides, the trends of M1 and M2 macrophages were also consistent with the general knowledge in immunology. Besides, the paired plot result was unanimous in the result of the violin plot but merely looked at a certain cell more directly (Additional file 1: Figure S1). The M0 macrophages were exclusively selected. In the 31 pairs of matched samples, it could be seen that except for individual cases, most of the M0 macrophage infiltration was higher in the tumor group than the normal group.
Immune infiltration of GC in TCGA
Combined with TCGA, Xcell website was used to analyze the infiltration of immune cells. The changes of macrophages in different clinical stages were observed.
In KS multi-group test, the p-value indicated the overall situation. The significant discrepancy stated the difference between each group. There was an upward trend in Fig. 3, which meant the macrophage fraction was increasing as the clinical-grade increased.
Differential analysis of genes in TCGA samples
In TCGA, 32 samples were in the normal group and 381 samples were in the tumor group. The following parameters (FDR-Filter=0.01, logFC-filter=1) were used to analyze the differences between the genes in TCGA samples. A total of 6227 DEGs were picked out.
WGCNA co-expression network analysis
WGCNA achieved the goal of quickly locking core genes by grouping genes (modules) and associating gene modules with phenotypes. Next, WGCNA was utilized to divide these genes into various modules. Macrophages and neutrophils were used as independent variables to calculate the modules related to macrophages. Finally, Module Membership (MM) > 0.8 and the Gene-Significance (GS ) > 0.5 were used to screen out 18 genes that were positively associated with macrophages.
The first step of sample clustering in WGCNA was to screen out the samples with outlier expression and clusters those with a similar expression. It could be delineated from Fig. 4A that the TCGA-BR-6710-01A-11R-1884-13 sample was distinct from other samples. The filtering principle of the soft threshold is to make the network more scale-free. Filtered soft threshold, undirected networks with power less than 15 or directed networks with power less than 30 could not lead the scale-free network structure R^2 to reach to 0.8 or the average connectivity to drop to below 100. This may result from batch effects, sample heterogeneity, and complicated experimental conditions on expression, which needed to be removed.
The second step of the WGCNA gene clustering was to cluster genes with similar expression trends. By using a dynamic tree cut[19], a hierarchical clustering tree was formed. Its rationale was a module (pathway) based analysis. On the tree, each leaf represented a gene. The genes with similar expression data were tightly connected, formed a branch of the tree, and represented a gene module. After that, several modules were generated (Fig. 4B).
The third step was to choose a suitable cutting position. We calculated all integers from 1 to 20 as thresholds to test the optimal threshold. Among them, power Estimate was the best power value, fit Indices saved the characteristics of the network corresponding to each power and k is the connection degree value. The average value of the connection was visualized and then generated Fig. 4C, D. When the y-axis was equal to 0.9, the intersection with the curve was exactly equal to 4 so we chose 4 as the power value.
The fourth step was to merge the clustered gene modules and make the modules with similar expressions into a large module. We used 4 as a beta value to build a gene module. In Fig. 5A, red indicated a positive correlation and green indicated a negative correlation, the darker the color, the stronger the correlation.
The key step was to associate the genes of different modules with the content of macrophages and pick out the modules that were positively correlated with the content of macrophages. The MEpink module in Fig. 5B was selected as most positively correlated with macrophages and the MEbrown module was negatively correlated with macrophages. This research focused on the modules that had a poor survival expectation and the MEpink module was positively correlated with the content of macrophages based on this module.
The sixth step could be realized in Fig. 5C that the MEpink module had a strong positive correlation with macrophages. Among them, there were 149 genes in the pink module. In case of MM > 0.8 and GS > 0.5, 18 genes most relevant to macrophage infiltration were selected for following research. Table 1 exhibited the function of 18 genes. A barplot was drawn (Additional file 2: Figure S2), and these genes were highly expressed in tumors. A total of 18 genes were all up-regulated in tumor tissues in Heat map (Additional file 3: Figure S3). From the correlation heat map (Fig. 5D), we could know the co-expression relation between these genes. All the values were greater than 0.5, indicating that these genes were correlated with each pair, which was also confirmed by WGCNA. After finding the modules associated with the phenotypes, the PPI bar chart (Fig. 5E) was obtained. Genes were at the core position, and the genes could be screened according to different conditions. Cytoscape software was used to form Fig. 5F. CD86, CYBB, and C3AR1 were identified as hub genes.
Verifying the tissue expression level in THPA and GEPIA database
We obtained the transcription level of real hub genes from THPA. As Additional file 4: Figure S4A exhibited, the expression level of three hub genes was remarkably high in cancer tissues, which also supported our suppose. CYBB, C3AR1, and CD86 were highly correlated, and had high expression in GC (Additional file 4: Figure S4C-E). There was a rising trend in Additional file 4: Figure S4B, but taking out any of them had little effect on survival (p>0.05), indicating that, they might work together.
Verifying the protein expression level by IHC
After IHC staining, it was mostly intuitive that CD86 was highly up-regulated in GC tissues (Fig. 6A). Fig. 6B showed a rise of CYBB expression in tumor tissues compared to adjacent non-tumorous tissues. The positive cell density of C3AR1 was also higher in GC tissues than that in normal tissues (Fig. 6C). These results were consistent with THPA.
Verifying immune infiltration with TIMER
Using the TIMER website, we found the expression level of CD86, CYBB, and C3AR1 had a positive correlation with CD8+T Cells, CD4+T Cells, Macrophages, Neutrophils, and Dendritic cells (Fig. 7B). The degree of macrophage infiltration significantly affected the prognosis. Hence it is worthy of further research and exploration. These genes affected macrophages and had an impact on survival (5-year survival rate, 10-year survival rate and long term survival rate ) (Fig. 7A). In order to explore the relationship between gene copy number variation and immune infiltration abundance, SCNA module was used to analyze the effect of different somatic copy number alterations of CD86, CYBB and C3AR1 on the immune cell infiltration in GC. It showed that these genes had a great influence on immune cell infiltration (Fig. 7C).
Gene function analysis
From the GO enrichment pathway map, we acquainted these 18 genes that were mainly enriched in several pathways (Fig. 8A). These genes mainly regulated the production of tumor necrosis superfamily cytokines. Table 2 exhibited the details. In GC, another GO-Biological Process enrichment analysis of DEGs was displayed in Additional file 5: Table S1. It was delineated that these 18 genes were mainly enriched in the phagosome pathway in the KEGG enrichment pathway map (Fig. 8B). GO-Cellular Components, GO-Molecular Function, and KEGG pathway enrichment analysis of DEGs in GC samples were showed in Table 3.
ClueGO as a Cytoscape plugin that integrated gene ontology (GO) terminology and KEGG/BioCarta pathway were used to create a functionally ordered GO/pathway network[20]. After screening (p <0.05), C3AR1 participated in the regulation of mononuclear cell migration and monocyte chemotaxis. CD86 and CYBB involved in the regulation of cytokine biosynthetic process and interleukin-2 production (Fig. 8C).