Immune-related differential expression genes in pancreatic cancer
PC is known as the “immune desert” due to its unique TIME characteristics. The abundant bone marrow-derived cells and Treg cells in PC can mediate tumor immune escape and cause different levels of immunotherapy resistance through different mechanisms. To further explore the characteristic of TIME in PC, based on meta-cohort, ESTIMATE algorithm was conducted to calculate the immune and stromal scores of each PC patient (Figure 1A), and the result revealed a high level of infiltrating stromal cells in PC. Subsequently, Differential gene expression analysis in the immune high/low group, stromal high/low group and tumor/normal group suggested (Figure 1B) that 1238 up-regulated and 1824 down-regulated genes were screened compared to the immune low group (Table S3). Meanwhile, there were 1919 upregulated and 2324 downregulated genes compared to the stromal-low group (Table S4). Additionally, we detected 3731 upregulated and 3463 downregulated genes between pancreatic cancer samples and normal pancreatic samples (Table S5). After converging all these three types of DEGs, 1612 IRGs were identified for the following study (Table S6). Functional enrichment analysis suggested the function of IRG with potential tumor regulatory mechanisms, and the results suggested IRGs mainly enriched in adaptive immune response and regulation of T cell activation of GO terms (Figure 1C), immune cell receptor signalling and antigen binding pathways of KEGG terms (Figure 1D). All these results suggest that PC progression may be related to immune response in the tumor microenvironment to varying degrees.
Generation of TIME subtype
Emerging evidence demonstrates that specific expression patterns of TIME could influence the clinical treatment strategies for PC. Hence, we separated PC patients into two clusters (Figure 2A, Figure S2A), namely Cluster_1 (n=145, 51.06%) and Cluster_2 (n=139, 48.94%), by an unsupervised consensus clustering algorithm and according to the expression level of 1612 IRGs (optimal cutoff k=2). Interestingly, PCA analysis showed that there are significant differences between these two clusters (Figure S2B).
To further identify the correlation between the regulation of immune cells and clusters, TIMER algorithm was applied to evaluate the abundance of immune cells. As illustrated in Figure 2B, Cluster_2 displayed significantly higher infiltration of immune cells (B cell, CD4+ T cell, CD8+ T cell, neutrophil and myeloid dendritic cell) compared with Cluster_1. Moreover, we performed TIDE algorithm to predict the sensitivity of response to immune checkpoint blockade, including anti-PD1 and anti-CTLA4. Patients in Cluster_2 tend to obtain lower TIDE scores, which means patients in Cluster_2 were sensitive to anti-ICB therapy (Figure 2C). Similarly, we also accessed the diversity in the expression of ICB between Cluster_1 and Cluster_2. Results showed that the expression level of ICB (PDCD1, CD274, HAVCR2, LAG3, TIGIT and GTLA4) in Cluster_1 was obviously upregulated compared to Cluster_2, suggesting that patients in Cluster_2 were more likely to be targeted (Figure 2D). Therefore, regarding the characteristics between those two clusters mentioned above, we manually defined the Cluster_1 as Immune_desert subtype, and Cluster_2 as Immune_rich subtype. ssGSEA analysis also confirmed that the Immune_rich subtype possessed a significant level of innate and adaptive immune cells, including natural killer cells, immature B cells and T cells (all p < 0.0001, Figure 2E). Of note, tumor-suppressing Th1 cells were considerably enriched in the Immune_rich subtype (p = 2.96e-32) compared to tumor-promoting Th2 cells (p = 0.256).
Then, we compared the identified TIME subtype with classical molecular classifications in PC. The marker of Bailey’s classification, Collisson’s classification, Moffitt’s tumor classification, Moffitt’s stromal classification and Li’s classification were utilized to cluster PC patients in the meta-cohort (Figure S3A-E, Table S7), and Puleo’s classification was predicted followed the pipeline in Materials and Methods (Table S7). Results illustrated that there was no significant difference between Moffitt’s tumor classification and TIME subtype (p = 0.70, Table S8), while Bailey’s classification (p < 0.0001), Collisson’s classification (p = 0.0006), Moffitt’s stromal classification (p = 0.0101), Puleo’s classification (p < 0.0001) and Li’s classification (p < 0.0001) obtained significant similarity (Table S8). For the comparison of Bailey’s classification, results showed that the proportion of immunogenic subtype was higher and the percentage of progenitor subtypes was lower in Immune_rich subtype versus Immune_desert subtype (35.25% vs 3.45%, 1.44% vs 40.69%, p < 0.0001, Table S8). Interestingly, Collisson’s classification, we observed that Immune_rich subtype was composed of a more exocrine-like subtype and a less classical subtype compared to Immune_desert subtype (56.12% vs 41.38%, 14.39% vs 33.79%, p < 0.01, Table S8) of Puleo's classification. For Moffitt’s stromal classification, results demonstrated that Immune_rich subtype possessed a more normal subtype and less activated subtype than Immune_desert subtype (51.80% vs 35.17%, 36.69% vs 44.14%, p < 0.05, Table S8). With respect to Puleo’s classification, the frequency of desmoplastic and immune classical was higher within Immune_rich subtype (30.94% vs 0.69%, 23.74% vs 2.76%, p < 0.0001, Table S8). On the contrary, we also found a lower frequency of pure basal-like and pure classical subtypes in Immune_rich subtype versus Immune_desert subtype (7.19% vs 18.62%, 10.79% vs 52.41%, p < 0.0001, Table S8). In terms of the integration of Li’s classification, we observed that Immune_rich subtype had a positive tendency to enrich in immune class and a negative correlation with nonimmune class, compared to Immune_desert subtype (68.35% vs 33.1%, 31.65% vs 66.90%, p < 0.0001, Table S8). Moreover, the correlation between TIME subtype and other published molecualr subtypes was quantified by Cramer’s V (Figure 2F-G). Results revealed that TIME subtype had the highest correlation with Puleo’s classification (Cramer’s V value = 0.63) and the lowest relationship with Moffitt’s tumor classification (Cramer’s V value = 0.03), probably owing to the deconvolution algorithm applied on tumor cells by Moffitt et al. Additionally, after integrating the TIME subtype and Puleo’s classification, we found that patients with Immune_rich and immune classical subtypes obtained the best survival, while the patients with Immune_desert and pure basal-like subtype had the worst survival (only one patient with Immune_desert and desmoplastic subtype was excluded) (p < 0.0001, Figure S4), implying that combination of TIME subtype and Puleo’s classification may guide the prognostic prediction of PC.
Recognization of key IRGs and construction of IRS for the prognostic prediction of PC
In order to quantize the distinct characteristic among Immune_desert and Immune_rich subtype, we applied multiple machine-learning algorithms to construct the prognostic signature based on 1612 IRGs. Before proceeding, a filtering procedure was applied to remove genes with low variability and the mean and variance of each gene were standardized to zero and one, respectively. A total of 284 patients in meta-cohort were divided into training set (n=200) and testing set (n=84) at the ratio of 7:3. Robust prognostic IRGs in PC samples were identified using multi-step processes. First, preliminary screening was performed to include 337 prognosis-related IRGs in meta-cohort via univariate Cox regression analysis. Next, bootstrapping method was used to test the genes which passed initial filtering for robustness. We extracted 70% of samples randomly from the training set and performed univariate Cox regression analysis on these samples to assess the correlation between the gene expression and prognosis. This procedure was repeated 1000 times and the 52 IRGs that were incorporated in 90% of resample runs (achieved P < 0.05 in robustness testing) were kept for next step analysis. Then, the RSF analysis was independently repeated 1000 times, and 8 IRGs with the largest C-index were considered IRS, namely SYT12, TNNT1, TRIM46, SMPD3, ANLN, AFF3, CXCL9 and RP1L1 (Figure 3A-B). A risk prediction score model was then developed by these 8 genes using multivariate Cox regression, and the IRS score for each patient was determined by taking the sum of the regression coefficient for each gene multiplied by its corresponding expression value. The IRS score was then normalized from 0 to 1. According to the optimal cutoff value, PC patients were divided into IRS_high and IRS_low subgroup.
To validate the prognostic efficiency of IRS, the survival analysis was performed on the training set, testing set, meta-cohort, ICGC, GEO and TCGA datasets, respectively. In the 6 internal and external datasets, KM curves revealed that the IRS performed well in distinguishing patients with different prognostic statuses (Figure 3C-H). Also, the univariate Cox analysis showed that SYT12, TNNT1, ANLN, CXCL9 and RP1L1 were risk factors with HR > 1, while TRIM46, SMPD3 and AFF3 were protective factors with HR < 1 (Figure S5A), meanwhile, the survival analysis also demonstrated this result (Figure S5B-I). In addition, the receiver operating characteristic (ROC) curve was utilized to verify the prediction ability of the IRS. As shown in Figure 3I-J, the IRS model was confirmed effective in predicting the survival of PC patients in 1 year (training set, AUC = 0.832; testing set, AUC = 0.617), 2 years (training set, AUC = 0.804; testing set, AUC = 0.697) and 3 years (training set, AUC = 0.865; testing set, AUC = 0.834).
Previous studies have established several prognostic signatures for PC patients, including Wang’s signature, Tao’s signature, Dai’s signature, PAMG signature and PurIST signature. ROC analysis was performed to confirm whether IRS possessed superior survival prediction ability in PC compared to the five signatures mentioned above. The AUC of the IRS were higher than those of the other three prognostic models in the training set (Figure 3K). Notably, in the testing set, the predictive efficiency was far from satisfactory in 3-year survival, possibly due to the limited number of patients (Figure 3L). To further compare IRS with PAMG and PurIST classification, the PAMG score and PurIST were computed. Results illustrated that the IRS_high subgroup possessed a lower PAMG score and higher PurIST score than the IRS_low subgroup (Figure 3M-N). Coincidentally, the Sankey diagram (Figure 3O) and distribution plot (Figure 3P) revealed that the percentage of the Basal-like subtype was significantly lower, and the proportion of Classical subtype was higher within IRS_low subgroup versus IRS_high subgroup (8.44% vs 37.69%, 91.56% vs 62.31%, p < 0.001). The above results fully verified the robustness and predictive effectiveness of our IRS.
Analysis and validation of differential expression for IRS
As mentioned above, a total of eight genes were selected to construct the IRS based on machine-learning algorithm. Then, we distinguished the aberrant expression of these IRGs in PC and normal pancreatic samples. As illustrated in Figure 4A, all of these IRGs were upregulated in PC samples. qRT-PCR was also performed to validate the differential expression patterns of IRGs between normal pancreatic cell line (hTERT-HPNE) and 4 PC cell lines (AsPC-1, BxPC-3, PANC-1 and PaTu 8988t) (Figure 4B), results suggested that the expression of these eight genes was higher in all four types of pancreatic cancer cells than in normal pancreatic cells. Owing to the significant upregulation of these hub genes, they may be served as potential targets of PC which suggested further research.
Exploration of IRS-based chemotherapy prediction and potential immunotherapeutic response
As the IRS was established based on prognostic IRGs, we first analyzed the relationship between the IRS score and the infiltration of immune cells. The IRS score was positively correlated with neutrophils, myeloid-derived suppressor cells (MDSCs) and M2 macrophages. On the contrary, CD8+ T cells and CD4+ T cells displayed a negative relationship with the IRS score (Figure 5A-B). Moreover, we suggested that the IRS may predict the sensitivity of chemotherapy by comparing the IC50 of multiple chemical compounds between IRS_high and _low groups. As shown in Figure 5C, patients who gained a lower IRS score tended to be more sensitive to chemotherapy. In terms of the prediction value of IRS in the treatment of ICB, We calculated the IPS, IPS-CTLA4, IPS-PD1 and IPS-PD1-CTLA4 scores, which are quantitative indexes to access the treatment of ICBs, were higher in the IRS_high group (Figure 5D). Furthermore, we compared the distribution of TME and TMB scores in IRS_high and _low subgroups to evaluate whether the IRS could predict the clinical response to ICB therapy. Results exhibited that the TME and TMB scores were higher in the IRS_high subgroup and both had a positive correlation with the IRS score (Figure 5E-F). Since the IRS had a remarkable correlation with the TIME of PC mentioned above, we further determined whether the IRS could predict immunotherapeutic response in PC via SubMap analysis. We evaluated the similarity of the expression module of immune-related gene expression profiles between our cohorts and a cohort of 32 melanoma patients receiving ICB therapy[44]. Results illustrated the similarity between patients in the IRS_low group and patients who responded to anti-PD-1 and anti-CTLA4 immunotherapy (Figure 5G). All these results implied that the IRS_low group may have better feedback in immunotherapy compared to IRS_high group, which needs to be further validated in immunotherapy cohorts of PC.
Single-cell sequencing reveals potential mechanism of TIME regulation by IRS
To further recognize the TIME personalized features of pancreatic cancer, scRNA-seq data from 24 PC samples were utilized to reveal the potential mechanism of IRS-promoted PC progression. 22910 cells were screened after quality checks according to the aforementioned research methods. According to the marker genes extracted from the literature, 9 clusters were determined and then annotated to 9 cell types (Figure 6A-C): malignant, fibroblast, stellate cell, T cell, endothelial cell, macrophage, ductal, B cell and endocrine cell. To unveil the mechanism of the IRS, we evaluated the distribution of IRS scores in 9 cell types. As illustrated in Figure 6D, higher IRS scores were mainly congregated in malignant cells, which could explain the poor prognosis of high IRS PCs. We also checked the expression of IRS in 9 cell clusters. SMPD3 and TNNT1 were significantly expressed by malignant cells, while AFF3 and ANLN were mainly expressed by B cells (Figure 6E). Therefore, we suggested that malignant cells may contribute to the specific TIME in IRS. Cell-cell communication demonstrated that fibroblast had a significant influence on malignant cells via MIF-(CD74+/CD44) interactions, and ductal cells affected malignant cells through SPP1-CD44 interactions (Figure 6F). Meyer-Siegler KL[45] found that blocking MIF-CD74 interactions may provide new targeted specific therapies for androgen-independent prostate cancer. SPP1-CD44 axis was reported to promote the interplay between CAF and enrichment of stemness population in PC[46]. These results demonstrated that fibroblast cells and ductal cells might promote the progression of cancer via MIF-CD74 and SPP1-CD44 axis, respectively. In addition, CTGF-LRP1 interactions between fibroblast cells and malignant cells could also cause the development of cancer (Figure 6G-H). In fact, the expression of HAVCR2 and ITGB2 was higher in macrophage and B cells, while the expression of LGALS9 was upregulated in fibroblast cells and malignant cells (Figure 6I). These results revealed that fibroblast cells might prohibit the activation of immune cells, leading to the “immune dessert” status of PC. Hence, IRS may promote the progression of tumor and suppress the immune system in TIME.
Identification of IRS-related biological processes and drug targets
In order to investigate which biological process plays a critical role in poor prognostic of PC patients who gained high IRS scores, pathifier and GSEA analyses were performed to elucidate the potential mechanisms involved in the regulation of PC progression by IRS. Based on gene expression data from both pancreatic cancer and normal pancreatic samples, pathway deregulation score (PDS) was computed via “Pathifier” R package. The correlation between PDS scores and IRS scores helps to evaluate whether a pathway (biological process) may be responsible for the poor prognosis of patients with high IRS scores. “Apoptosis”, “TNFA signaling via NFKB”, “G2M checkpoint” and “DNA repair” pathways ranked top, which means these three pathways may contribute to the malignant phenotype in patients with high PPS scores (Figure 7A). Next, we performed GSEA analysis to validate the above conclusion. Enrichment score of each gene set was calculated and adjusted P-value less than 0.05 was considered significantly enriched. As expected, genes with positive correlation coefficients were also enriched in those four pathways (Figure 7B). Taken together, the dysregulation of apoptosis and cell cycle-related process might play a vital role in the poor prognosis of high IRS patients.
In high IRS patients, Genes significantly positively correlated with IRS may be potential targets for pancreatic cancer precision therapy. To identify targetable proteins (genes) with potential therapeutic implications in high IRS score PC patients, we conducted Spearman correlation analysis between the protein abundance of targetable genes and PPS. A protein with a correlation coefficient more than 0.3 (with P < 0.05) was considered as a poor prognosis-related drug target. Next, we calculated the IRS score for each PC cell line from the CCLE project, and performed the correlation analysis between the CERES score and PPS score based on these cell lines. A lower CERES score of a gene indicates a higher likelihood that this gene is dependent on a given CCL. Therefore, we considered a gene with a correlation coefficient less than -0.3 (with P < 0.05) as a poor prognosis-dependent drug target. Potential therapeutic drug targets in high IRS score PCs were then considered as targets identified by both analyses above. Finally, 8 potential targets (CCNA2, EPHB4, INCENP, NCF2, PLOD1, PLK1, PANX1 and CCNB1) were screened out (Figure 7C-D) and the correlated target drugs were also identified (Figure 7E)., which meant that targeting these genes may facilitate the treatment of high IRS PCs.
Identification of potential agents for high IRS score PCs
In the past decade, high-throughput sequencing analysis of large samples has greatly advanced the molecular biology of PC. Hence, we try to detect the potential small molecular compounds for high IRS PCs. The information on compounds in the CTRP and PRISM database were selected for subsequent analysis after removing the duplicated compound information in the two databases (excluding hematopoietic and lymphoid tissue-derived CCLs) (Figure 8A).
For drug response prediction, many machine learning (ML) methods have been reported, ranging from multivariate linear regression and support vector machine (SVM) to RF and k-nearest neighbours (KNN). Among ML methods, linear regression methods, such as ridge regression and elastic net, tend to exhibit good and robust performance in different settings[47]. Therefore, ridge regression model located in the “oncoPredict” package, which has been applied to multiple studies and proven to be reliable, was applied to estimate drug response of clinical samples in this study[33]. Before selecting the compounds, we further validated the predicted drug sensitivity (AUC) in our cohort. Selumetinib, a PI3K pathway inhibitor, was reported to improve the prognosis in the treatment of KRAS-mutant patients compared to those without KRAS mutations[48]. Thus, we classified PC patients into KRAS altered and KRAS unaltered subgroups. The AUC of PC patients in the KRAS altered group was significantly decreased (Figure 8B, P = 2.4e-05), which was consistent with the clinical findings of Simertinib above. Finally, 1 compound from the CTRP database (Canertinib) and 6 compounds from the PRISM database (PP-1, YM-976, CHIR-98014, GW-788388, Brigatinib and Vincamine) were obtained following the protocol described in Materials and Methods (Figure 8C).
Although these 7 compounds had lower predictive AUC values in the samples with higher PPS scores and their predictive AUC values were significantly negatively correlated with IRS scores, the above analysis alone could not support the conclusion that these compounds had therapeutic effects on PCs. Hence, CMap analysis was utilized to find the most reliable compounds. Among the 7 candidate compounds identified before, Canertinib and PP-1 showed relatively low CMap scores (Canertinib, -80.94; PP-1, -64.5), indicating its therapeutic potential (Figure 8D, Table S9). To further test the efficiency of these candidates, two PC cell lines (Capan-2 and Panc 08.13) in CTRP and PRISM have been extracted for the following analysis. We first calculated the IRS score of these two cell lines, and the Capan-2 possessed a relatively higher IRS score than Panc 08.13 (Figure 8E, Table S10). Secondly, the AUC value of these candidates between Capan-2 and Panc 08.13 were compared. The results indicated that only the AUC value of Canertinib was significantly lower in the Capan-2 compared to Panc 08.13, implying that Canertinib might be the promising potential treatment compound targeted high IRS score PCs (Figure 8F, Table S10).