The flow chart of the experimental protocol is shown in Figure 1.
CD8+ T lymphocyte, tumor purity, and tumor mutation burden evaluation
We obtained the tumor purity, matrix score, immune score, and tumor mutation burden corresponding to each sample. Using the screening principle of p<0.05, we obtained 860 breast cancer samples accurately evaluated by CD8+ T lymphocytes (Figure 2A). By integrating the immune microenvironment scoring file with the CD8+ T lymphocyte content samples, we determined WGCNA's phenotype entry files.
CD8+ T lymphocyte co-expression network conduction in TCGA
Weighted gene co-expression network analysis (WGCNA) analysis was performed using TCGA–BRCA. A hierarchical clustering tree was built using the dynamic hybrid cutting method (Figure 2B); 22 co-expression models were identified (Figure 2C). The correlation coefficients among CD8+ T lymphocyte proportion, tumor purity, TMB, and co-expression modules are shown in Figure 2C. The yellow module had the strongest correlation with CD8+ T lymphocyte proportion in the TCGA - BRCA cohort (Cor = -0.41; P = 1e–28) (Figure 2C). Based on these findings, we supplemented the heat map of the correlation between the factors in the yellow module (Figure 2D–G). The yellow module showed a significant correlation with CD8+ T cell (Cor = 0.78, p = 9.7e−59), tumor purity (Cor = 0.86, p = 1.7e–83), immune score (Cor = 0.98, p = 1.2e–197) and stomal score (Cor = 0.28, p = 1.9e–06)
CD8+ T lymphocyte co-expression module functional enrichment
We determined 28 CD8+ T lymphocyte proportions positively co-expressing mRNA with coefficient > 0.4 in the TCGA – BRCA yellow module (Table 1). The 28 CD8+ T lymphocyte proportion positively co-expressing mRNA were most significantly enriched in the antigen processing and presentation and response to interferon−gamma, suggesting that these biological processes might promote CD8+ T lymphocyte infiltration in the breast cancer microenvironment (Figure 3A). The CD8+ T lymphocyte negatively co-expressing module was most significantly enriched in extracellular matrix organization (Figure 3B). The protein-protein interaction network of yellow module and green module is shown in Figure 3 .
Clinical outcome of CD8+ T lymphocyte infiltration-related genes
To demonstrate their significance on clinical outcomes, we performed survival analysis. The patients in low expression groups for GZMA (TCGA: P = P < 0.001), CD74 (TCGA: P < 0.001), IL2RG (TCGA: P = 0.009), CD3E (TCGA: P < 0.001), CCL5 (TCGA: P < 0.001), CD3D (TCGA: P < 0.001), CORO1A (TCGA: P < 0.001), HLA-DMA (TCGA: P = 0.003), SELPLG (TCGA: P = 0.002), HCST (TCGA: P < 0.001), HLA-DPB (TCGA: P = 0.001), GZMK (TCGA: P = 0.001), CD48 (TCGA: P < 0.001), PAMB9 (TCGA: P = 0.005), CD2 (TCGA: P = 0.003), CD27 (TCGA: P = 0.003), IRF1 (TCGA: P = 0.003), CD8A (TCGA: P = 0.005), GBP4 (TCGA: P = 0.048), TNFRSF1B (TCGA: P = 0.011), GMFG (TCGA: P = 0.006), CST7 (TCGA: P = 0.001), GZMB (TCGA: P = 0.049), PSMB10 (TCGA: P = 0.002) and HLA-E (TCGA: P = 0.046) showed survival risk against high expression groups (Figure 4). These results suggest that these CD8+ T lymphocyte infiltration-related genes act in protective roles in breast cancer.
Cox regression hazard model of CD8+ T lymphocyte co-expression genes
A CD8+ T lymphocyte co-expression gene Cox regression hazard model was conducted based on these breast cancer prognosis protective factors.
Risk = -0.003 * CD74 + 0.045 * HLADMA – 0.107 * HCST + 0.032 * GIMAP4
The samples in high risk level samples for breast cancer patients (TCGA: P < 0.001; HR = 2.75) (Figure 5) showed survival risk against low risk groups, with the area under curve (AUC) = 0.66 (Figure 5). The risk score was evaluated in various subgroups, including age, gender, stage, tumor purity, and tumor mutation burden, metastasis status, Ki-67, and EGFR. The results were significant in these subgroups.
Clinical phenotype and immunophenotype
Having defined a clinical prognostic risk propensity weighted score consisting of four factors, we then found that these factors were co-expressed with one another and were closely related to the level of CD8+ T lymphocyte infiltration. This factors affect outcomes. Then, to demonstrate the relationship between these factors and clinical phenotype and immunophenotype more specifically, we drew multiple sets of box plots. The content of CD8+ T lymphocytes in the high expression group of these four factors showed a higher level of infiltration, suggesting that our four factors and related biological processes promoted the infiltration of CD8+ T lymphocytes in tumor tissues (Figure 6A). The expression levels of genes in the 5-year mortality group were lower than those of the 5-year survival group, suggesting their protective effect on outcomes. This trend was the same as that of CD8+ T lymphocytes (Figure 6B). Then, we found that expression levels of these factors were low in the high tumor purity group, and these factors in the high immune score group were low (Figure 6CD). These directly or indirectly indicate that these four factors promote the CD8+ T lymphocyte infiltration. We also drew a scatter plot of correlations with clinical stages (Figure 7A), CD8+ T lymphocytes (Figure 7B) and M2 macrophages (Figure 7C) to further illustrate the clinical phenotypic correlation of these factors.
GSEA and HPA
Antigen processing and presentation, the chemokine signaling pathway, B cell receptor signaling pathway, and the T cell receptor signaling pathway were related to the high expression group in CD74, GIMAP4, HCST and HLA-DMA (Figure 8).
We compared the various expression levels of these genes between normal and tumor tissues. Labeling with HPA010592, an antibody against CD74, showed higher intensity in the tumor tissue than in normal tissue (Figure 9).