The single-cell expression atlas of CRC patients
In order to investigate the single-cell sequencing atlas of CRC patients, we performed a reintegration and reanalysis of samples from 10 CRC patients and their paired healthy colorectal tissue samples obtained from GSE132465. After removing batch effects and performing quality control, we eliminated low-quality single cells, including dead cells and doublets, and finally obtained 20439 cells for subsequent analysis (Figure S1A-F). Subsequently, we conducted PCA and KNN analyses, resulting in the identification of 23 distinct single-cell clusters from both normal colorectal and tumor tissues (Fig. 1A-B). In order to investigate the association between different single-cell clusters, we employed Pearson correlation analysis. The results revealed a significant correlation in the gene expression patterns between clusters 1, 0, and 14, while clusters 11, 4, 10, 9, and 21 exhibited another similar gene expression pattern (Fig. 1C). We further explored the marker genes of these single-cell clusters based on widely reported cell markers. The results revealed that clusters 0, 1, and 14 exhibited high expression of PTPRC, CD3D, and CD3E genes, thereby annotating as T cells. Conversely, clusters 4, 9, 10, 11, and 21 demonstrated high expression of EPCAM, KRT19, and CD24 genes, indicating their annotation as epithelial cells (Fig. 1D-E). Further analysis of cell proportions revealed a significant increase in the proportion of epithelial cells and a relative decrease in plasma cells in tumor tissues compared to normal colorectal tissues (Fig. 1F-G) [28–29]. This suggests a suppression of immune cell function in tumor tissues, which is consistent with existing reports. As the immune microenvironment plays a crucial role in tumor initiation and malignant progression, we focused on the distribution of immune cells that showed high expression of PTPRC, which is consistent with the previous cell annotation results (Fig. 1H). In order to further supplement robust cell markers required in the cell annotation process, we investigated the highly expressed genes in different cell types. The results revealed that KRT18, KRT8, and FABP1 were highly expressed in epithelial cells, while CCL5 and GZMA were highly expressed in T cells, providing additional cell markers to the existing repertoire (Fig. 1I).
Epithelial cells were highly heterogeneous in CRC patients
In order to investigate the differences in epithelial cells between normal and tumor groups, we performed tSNE dimensionality reduction and cell clustering on 5952 cells from single-cell sequencing data. The results showed that epithelial cells from 10 normal colorectal tissues and 10 CRC tissues exhibited high heterogeneity and could be clustered into 10 distinct cell subtypes (Fig. 2A-B). Additionally, the results of the cell proportion analysis revealed that epithelial cells from clusters 4, 7, and 9 primarily originated from normal colorectal tissue, whereas epithelial cells from clusters 0, 1, 2, 3, 5, 6, and 8 mainly originated from distinct CRC tumor tissues, providing further validation of the heterogeneity of tumor cells (Fig. 2C-D). Building upon the proven association between IFITM3 and immune suppression in CRC [30], we conducted a detailed examination of the expression levels of IFITM3 in single-cell RNA sequencing data. The findings revealed a significant upregulation of IFITM3 in malignant tumor cells (Fig. 2E). The results from the copy number variation inference analysis yielded consistent findings, demonstrating a significantly lower CNV score in clusters 4, 7, and 9 compared to clusters 0, 1, 2, 3, 5, 6, and 8 (Fig. 2F). We further examined the highly expressed genes in different subpopulations of epithelial cells. The results revealed a significant upregulation of genes related to benign development and immune activation, such as ZG16 and CCL5, in clusters 4, 7, and 9 [31–32]. Conversely, clusters 0, 1, and 2 exhibited a significant upregulation of genes associated with malignant proliferation and immune suppression, including APOA1BP, CXCL2, and CCL20 (Fig. 2G) [33–34]. The results of GO and KEGG enrichment analysis showed a significant enrichment of ATP synthesis and transport-related pathways, as well as ribosome synthesis-related signaling pathways, in clusters 0 and 2 compared to clusters 4, 7, and 9. This indicates the activation of metabolic pathways in malignant epithelial cells (Figure S2A-B). Furthermore, we conducted differential expression analysis between normal colorectal tissue and CRC tissue. The results revealed a significant upregulation of genes such as DPEP1 and MMP7 in tumor tissue, consistent with the existing literature (Fig. 2H) [35–36]. The GO enrichment analysis revealed a significant activation of pathways related to ribosome and mitochondrial function in epithelial cells derived from CRC tissue (Fig. 2I).
The heterogeneity of macrophages and plasma cells in CRC patients
Considering the significantly increased macrophages and decreased plasma cells in CRC tissue compared to normal colorectal tissue (Fig. 1F), we further analyzed the heterogeneity of macrophages and plasma cells in CRC. The results demonstrated that macrophages could be classified into 4 distinct clusters (Figure S3A), with cluster 1 mainly derived from normal colorectal tissue and clusters 0, 2, and 3 predominantly derived from CRC tissue (Figure S3B-C). The results of the differential gene expression analysis revealed that APOC1, STMN1, and IL32 were significantly upregulated in macrophages from clusters 0, 2, and 3 compared to cluster 1, and these genes have been proven to be associated with immune suppression in tumors (Figure S3D) [37–39]. The results of pathway enrichment analysis for the highly expressed genes revealed significant activation of pathways associated with cell proliferation and immune suppression in clusters 0, 2, and 3, including cell cycle and PD-1 checkpoint pathway (Figure S3E). Furthermore, plasma cells exhibited a high degree of heterogeneity and were categorized into 6 distinct clusters (Figure S4A). The results of cell proportion analysis showed that cluster 1 primarily originated from CRC tissue (Figure S4B-C). Differential gene expression analysis revealed high expression of IGHG1, IGHG4, and IGHG3 in cluster 1 plasma cells, consistent with existing reports (Figure S4D) [40–42].
Epithelial cells inhibited activation of T cells via MIF/CD74 signaling pathway
To investigate intercellular interactions, we utilized the CellChat algorithm to infer changes in cell communication based on the gene expression levels of receptors and ligands (Fig. 3A). The results indicated a significant increase in both the quantity and intensity of intercellular interactions in CRC tissue (Fig. 3B). The results of differential analysis of intercellular interactions demonstrated a significant increase in the regulatory intensity of epithelial cells towards stromal cells in CRC tissue compared to normal colorectal tissue (Fig. 3C). In terms of molecular mechanisms, CRC tissue shows a significant increase in relative information flow in pathways including SPP1, CCL, and MIF, while normal colorectal tissue exhibits enhanced relative information flow in pathways such as IL1 and PTN (Fig. 3D). Furthermore, in comparison to normal colorectal tissue, a significant increase in the release of ligands, including MIF, SPP1, and CCL, by epithelial cells was observed in CRC tissue (Fig. 3E). Correspondingly, there was a notable increase in the receptors for MIF, SPP1, and CCL in T cells and macrophages in CRC tissue (Fig. 3F). Further differential analysis revealed that in CRC tissue, epithelial cells regulate the function of T cells through the MIF/CD74 pathway compared to the control group (Fig. 3G-H).
Immune-related pathways were inactivated in tumor-infiltrating T cells
Given the significant role of T cells in tumor immunity in CRC, we conducted a further investigation on the functional alterations of T cells in CRC tissues. The results demonstrate that T cells derived from normal colorectal tissue and CRC tissue are classified into 6 clearly distinct clusters (Fig. 4A). Based on universal cell markers, we annotated the 0 cluster as T helper cells due to their high expression of CCR6 and RORA; annotated the 1 cluster as NK T cells due to their high expression of NKG7 and ITGB2; annotated the 2 cluster as cytotoxic T cells due to their high expression of CD8A, CD8B, and CCL5; annotated the 3 cluster as exhausted T cells due to their high expression of LAG3 and CTSW; annotated the 4 cluster as Treg cells due to their high expression of CD4, TNFRSF4, and CORO1B; annotated the 5 cluster as naive T cells due to their high expression of CCR7 and SELL (Fig. 4B-D). The analysis of cell proportions revealed a significant reduction in the proportion of cytotoxic T cells and a significant increase in the proportion of exhausted T cells in CRC tissues compared to normal colorectal tissue, consistent with previous findings (Fig. 4E) [43–44]. The results of pathway enrichment analysis for highly expressed genes showed a significant enrichment of ATP-related pathways and a significant downregulation of immune-related pathways in exhausted T cells compared to cytotoxic T cells, indicating a dysregulation of both the metabolic pathways and immune function in exhausted T cells (Fig. 4F-G). We conducted further analysis on the differential gene expression of cytotoxic T cells derived from CRC tissues and normal colorectal tissues. The results revealed a high expression of DUSP4, BATF, and TNFRSF18 in cytotoxic T cells derived from CRC tissues, all of which have been reported to be associated with malignant progression of tumors (Fig. 4H) [45–47]. The results of pathway enrichment analysis further supported the significant activation of energy metabolism-related pathways and the significant inactivation of immune-related signaling pathways in cytotoxic T cells derived from CRC tissues (Fig. 4I-G).
Pseudotime and trajectory analyses revealed the different cell fates of T cells
To investigate functional plasticity among T cell subpopulations, we analyzed the evolutionary trajectories of different T cell subpopulations via the CytoTRACE and monocle2 algorithms. The analysis results revealed that Treg cells and naive T cells exhibited higher CytoTRACE scores, indicating their strong cell proliferative capacity and differentiation potential (Fig. 5A-B). The pseudotime analysis results based on monocle2 showed that naive T cells were located at the beginning of the trajectory, while Treg cells, exhausted T cells, and cytotoxic T cells were located at the endpoint, suggesting differentiation of T cells from naive T cells into distinct subpopulations (Fig. 5C-D). Expression analysis of marker genes revealed that the expression levels of CCR7, SELL, and IL7R, which are highly expressed in naive T cells, were decreased during pseudotime. In contrast, the expression levels of CARD16, IFNG, and CXCL13, which are highly expressed in Treg cells, exhausted T cells, and cytotoxic T cells, were increased during pseudotime (Fig. 5E). Gene regulatory transcription pattern analysis revealed that with increasing pseudotime, the expression level of gene cluster 1 was significantly decreased, while the expression levels of gene clusters 2, 3, and 4 were significantly increased (Fig. 5F). The KEGG clustering analysis results indicated a significant enrichment of ribosome-related pathways in gene cluster 1, suggesting an active protein synthesis process in naive T cells. Genes in cluster 2 are significantly enriched in protein folding and cellular stress-related pathways, indicating the loss of immune-related functions in exhausted T cells. Gene cluster 3 is predominantly enriched in pathways associated with cell killing and NK cells, indicating the primary roles of cytotoxic T cells and NK cells in exerting cytotoxic effects. Gene cluster 4 is mainly enriched in pathways associated with antigen processing and presentation, indicating that Treg cells and T helper cells predominantly exert antigen presentation functions in CRC tissue (Fig. 5G).
Establishment and validation of 5 gene prognostic signature based on T cell-related markers
To investigate the association between T cell-related genes and the survival time of CRC patients, we performed univariate Cox regression analysis to calculate the hazard-associated genes in 5 independent cohorts, followed by cross-analysis with T cell-related genes. The results indicated that SLC2A3, GADD45B, TERF2IP, SLC20A1, and MRPL22 were identified as risk genes in all 5 independent CRC cohorts (Fig. 6A). Subsequently, we employed multivariable Cox regression analysis to construct a prognostic prediction model based on the 5 T cell-related risk genes in the TCGA cohort. The results indicated that the risk score could effectively distinguish the prognostic survival outcomes of CRC patients, with low-risk scores patients having a more favorable prognosis, while high-risk scores patients had a poorer prognosis (Fig. 6B). The results of clinical characteristics and gene expression analysis revealed that with the patient’s risk score increasing, the mortality rate significantly increased, accompanied by increased expression levels of SLC2A3, GADD45B, TERF2IP, and SLC20A1, and decreased expression level of MRPL22 (Fig. 6C). The results of the ROC analysis showed that the 5 T cell-related genes model had predictive AUCs of 0.60, 0.60, and 0.73 for 1 year, 2 years, and 3 years, respectively, indicating that the model exhibited good predictive performance (Fig. 6D). We further included two independent CRC cohorts for validation of the model’s predictive performance. The results demonstrated that the 5 T cell-related genes model could significantly distinguish the survival outcomes of patients in the GSE39582 cohort and exhibited good predictive performance (Fig. 6E-G). The validation results in the GSE41258 cohort also supported the same conclusion, confirming the robustness of the 5 T cell-related genes model in predicting CRC patient prognosis (Fig. 6H-J).