PPS20 can predict clinical outcome in pancreatic cancer
To identify individual genes related to prognosis in pancreatic adenocarcinoma we performed cox regression analyses with overall survival as an end-point measure in three discovery datasets: TCGA, PACA-CA and PACA-AU. Based on resulting HR and p values, we ranked the genes in all three datasets (see Methods). Among the top 500 genes of each dataset, 85 genes that were common in at least 2 datasets were retested utilizing the GSE21501 dataset in order to restrict the number of genes as well as to further eliminate cohort specific effects. Thus, we identified 11 and 9 genes that were significantly associated with shorter and longer overall survival, respectively (Additional File 1 Supp. Table 1). A prognostic scoring system was generated based on log expression of these 20 genes in pancreatic cancer primary tumors (see Methods for details), that we designate “Pancreatic cancer prognostic score 20 – PPS20”. As can be seen in Figure 1, PPS20 can stratify the patients into prognostically distinct subgroups with HR (95% CI) and p values of, 2.016 (1.362-3.088) and 0.007, 2.040 (1.498-3.089) and <0.0001, 2.416 (1.696-4.635) and <0.0001, 2.492 (1.551-5.044) and 0.0009 , in TCGA, PACA-CA, GSE21501 and PACA-AU., respectivelly. PPS20 was then validated in GSE62452 and GSE79668 datasets, with HR (95% CI) and p values of 2.099 (1.335-4.241) and 0.0051, 2.438 (1.486-5.106) and 0.0017, respectivelly (Figure 2). A MVA that included patients without residual disease after operation and patients who did not receive targeted molecular therapy using the TCGA cohort, showed that PPS20 can predict overall survival independent of confounding factors (Additional File 2 Supp. Table 2). We then asked if PPS20 could also stratify patients when event-free survival (EFS) is used as an end-point. Indeed in the TCGA cohort, PPS20 is associated with event-free survival with an HR (95% CI) of 2.312 (1.393-3.982), p value of 0.0015 (Figure 3). Overall, we conclude that, PPS20 is a robust independent prognostic signature for pancreatic cancer.
PPS20 compared to other prognostic signatures
We aimed to assess how our scoring performs when compared to previously published prognostic classifiers, Chen et. al.[11], Yan. et. al [16], and Shi et. al. [17]. A multivariate cox regression analysis which included parameters significant by univariate analyses (PPS20, Shi signature and tumor subtype) identified PPS20 as the only independent prognostic factor in GSE71729 (Table 1). Similarly, PPS20 in GSE62452, PPS20 and Shi signature in GSE79668 were the only independent prognostic indicators among the parameters significantly associated with survival in univariate analyses (Additional File 3 Supp. Table 3, Additional File 4 Supp. Table 4). Therefore, PPS20 remains significant in all three analyses performed in separate cohorts; indicating it is a superior and independent prognostic classifier of PDAC.
Molecular characteristics of PPS20-identified PDAC sub-groups
In order to understand the biological mechanisms underlying the differences in outcome in PPS20 identified PDAC sub-groups, we performed GSEA (Gene set enrichment analysis) between the high PPS20 and low PPS20 tumors in TCGA, PACA-CA, PACA-AU and GSE71729 datasets. The gene sets with nominal p values below 0.01 and FDR q-values below 0.25 were considered enriched. Among the enriched gene sets in high PPS20 and low PPS20 groups, those common to all four datasets are listed in (Additional File 5 Supp. Tables 5&6), representative enrichment plots are shown in (Additional File 6 Supp. Figure 1).
Enrichment of digestion and potassium channel related gene sets in low PPS20 group supports a 'normal like' phenotype, as pancreas is a ductal organ where digestive enzymes are secreted. Secretion of insulin from pancreatic beta cells is regulated by ATP-sensitive K(+) (K(ATP)) channel dependent pathways[23]. In addition HCO3- secreted by pancreatic ductal epithelial cells to duodenum neutrilizing chyme acidity is transported by multiple ion exchangers including Na+–K+–Cl– co-transporter (NKCC1) and Na+–K+-pump on the basolateral membrane [24]. Therefore the GSEA results suggest that tumors with a more favorable outcome are more differentiated, compared to those with worse outcome. In this line, we observe that the PPS20 score is lowest in normal pancreas when compared to primary tumors, and most elevated in metastatic tissues (Figure 4). The enrichment of protein activation cascade gene set, which mainly includes complement system proteins, might suggest a relatively higher immune activity in these tumors.
Keratinocyte differentiation, skin development and epidermis development gene sets enriched in the high PPS20 group include many genes belonging to the keratin family, among which KRT16 has been used as a basal cell marker [25, 26]; KRT17 has been shown to induce cancer stem cell-like properties in cervical cancer [27] and tumor growth, motility and invasion in gastric cancer [28], which is also associated with poor prognosis in breast cancer [29]. Formation of primary germ layer and endoderm gene sets were also enriched in tumors with high PPS20. They include many genes related to extracellular matrix, collagens, laminins, integrins, fibronectin which are known mesenchymal markers [30] and matrix metalloproteinases which are involved in tumor growth, invasion and metastasis [31], as well as HMGA2 which is known to maintain oncogenic RAS-induced epithelial-mesenchymal transition (EMT) in pancreatic cancer [32]. These results suggest that high PPS20 tumors have relatively more invasive and mesenchymal properties which is consistent with shorter event-free survival times (Figure 3). The same group has a lower E-cadherin/Fibronectin ratio, in line with the fact that downregulation of E-cadherin and upregulation of Fibronectin are two indicators of EMT [33] (Additional File 7 Supp. Figure 2).
In order to understand the immune involvement in the prognostic sub-groups, we analyzed TCGA PAAD tumor RNAseq data using CIBERSORT Absolute mode which enables us to assess involvement of 22 immune cell types in absolute fraction scores [34]. We found dramatically higher scores of “CD8 T cell – T regulatory cell” difference in low PPS20 tumors which suggests an active anti-tumor response in these patients, which in turn could be an indicator of eligibility for immunotherapy [35]. Significantly higher scores of “Plasma cell-B cell naïve” in high PPS20 tumors (Additional File 8 Supp. Figure 3), and a higher M1-M2 difference in high PPS20 tumors (Additional File 8 Supp. Figure 3) supports the same conclusion. The overall immune cell content of the tumor-absolute score- was elevated in low PPS20 group (Additional File 8 Supp. Figure 3). We observed that the CD8 T cell – Treg difference as well as the absolute score were linearly correlated with PDCD1 gene expression in PDAC (Additional File 9 Supp. Figure 4). PD-1 gene expression is significantly higher in the low PPS20 group, whereas levels of PD-L1 and CTLA-4 are not altered significantly among groups in the TCGA dataset (Additional File 10 Supp. Figure 5), cumulatively suggesting low PPS20 tumors are targets of an autologous anti-tumor response.
PPS20 as a predictor of response to targeted therapy
When we stratified TCGA PDAC patient data by PPS20, we observed that high PPS20 patients who received molecular targeted therapy had significantly better prognosis compared to patients who did not receive the same therapy (Additional File 11 Supp. Figure 6) while no significant difference was observed in the low PPS20 group, or in case of radiation therapy response (Additional File 11 Supp. Figure 6). The specific drug information used in molecular targeted therapy of these patients is not given in TCGA, but according to “National Comprehensive Cancer Network Guidelines Version 3.2019 Pancreatic Adenocarcinoma”; category 1 molecular targeted therapy recommendations for metastatic disease is combination of either erlotinib or paclitaxel with gemcitabine [36], which are EGFR and beta-tubulin inhibitors, respectively. We also noted that EGFR, RAD51, Cyclin B1 protein level expressions are significantly higher in high PPS20 patients (Additional File 7 Supp. Figure 2), indicating a proliferative activity in this group. Overall, these results show that the PPS20 score can be an identifier of response to molecular targeted therapy in PDAC, especially for high PPS20 patients.
Identification of compounds selectively targeting individual risk groups
In order to identify compounds targeting low PPS20 and high PPS20 groups, the score was applied to CCLE pancreatic cancer cell lines. Pearson correlation analyses between the score and AUC of drugs in CTRP database resulted in the discovery of 40 drugs (Additional File 12 Supp. Table 7, most significant 5 drugs are shown for each group). The most effective drug for the high PPS20 group was BIRB-796, which is a p38 MAPK inhibitor (Additional File 13 Supp. Figure 7A). Among the drugs that are effective on low PPS20 group, Ouabain was the most significant which is the inhibitor of the Na+/K+-ATPase (Additional File 13 Supp. Figure 7B). This observation is also in line with our GSEA results showing enrichment for potassium ion transport and potassium channel activity.
Disscussion
In contrast to other cancer types like colon and breast for which multiple molecular tests are available for risk prediction and/or molecular subtyping, such as PAM50 [37], MammaPrint [38], Oncotype Dx Breast [39], Oncotype Dx Colon [40],there are limited number of molecular signatures defined for prognostication of pancreatic cancer in the literature and none are available to guide clinical therapeutic decisions in practice. A major reason for this is the lack of validation of the present signatures in multiple patient cohorts. As the number of publically available high-throughput transcriptomic data has increased over time, it became possible to include higher number of patient samples/cohorts into biomarker discovery methodologies that enabled identification of more robust biomarkers which are not cohort specific. Using this strategy, we previously identified two mRNA based biomarkers, ULBP2 and SEMA5A, for the prognostication of colon cancer [41]; and identified an independent gene panel for prediction of prognosis in both diffuse and intestinal type gastric cancer (unpublished data); which shows that the growing transcriptomic data enables discovery of such biomarkers which could have been missed when less patients studied.
Therefore, in this study, we aimed to enlarge the number of in silico cohorts and used 4 discovery and 2 validation PDAC gene expression datasets, including data from RNA sequencing and multiple microarray platforms. Indeed, eighteen of 20 genes identified in this study were not included in previously published prognostic gene signatures; two, ARNTL2 and SLC20A1, were used in Shi et al.[17], and Haider et al.[10], respectively. Our results show the robustness of PPS20 even when different assay platforms are used. Although tumor stage (I-IV), differentiation status (poor-moderate-well) and clinical characteristics as such as age and gender varied highly among the datasets analyzed, PPS20 could stratify prognostic subgroups independent of clinical confounding parameters in all cohorts. Indeed these results clearly will be clinically more relevant when validated ex vivo in a large patient cohort.
We compared three different prognostic signatures generated for PDAC to PPS20. We applied these predictive signatures as they were described in their respective publications with modifications as described in the methods section. Two of these (Chen et al (Moffit)[11] & Yan et al[16]) were predictors of overall survival. The third signature that was compared to PPS20 was Shi et al’s signature [17] which predicts recurrence free survival. Although the coefficients that were determined from the original study were utilized in determining the cutoff values for different datasets, the expression values of each gene might not be identical to those used in the original publication and this in turn might have caused some genes that would normally be assigned a value of 1 to have a value of 0. Therefore our application of the Shi signature should be considered only an approximation. In summary, although the aforementioned signatures are, in our analyses, not superior to PPS20, they would need to be further validated utilizing identical tumor samples for a conclusive analysis.
The risk groups identified in this study have distinct molecular features. In high PPS20 group, we found increased protein level expression of Cyclin B1, which is a marker of cell proliferation and as well as DNA repair [42] which is consistent with increased RAD51 which is involved in double stranded break repair, tumor progression and resistance to anti-cancer treatments [43]. TRIO, SLC20A1, MAP4K4 and ERRF1 genes which are upregulated in high PPS20 patients, were also shown to be involved in cellular proliferation and/or tumor growth [44-47]. EGFR, which is one of the major drivers of cell proliferation [48], was also higher in the high PPS20 group at the protein level. EGFR expression has been detected in 25-90% of the pancreatic adenocarcinomas in different studies and is associated with stage, metastasis, poor differentiation and survival [49]. Consistently, we find a differential response to molecular targeted therapy, in high PPS20 group. Despite being one of the category 1 recommendations of NCCN for pancreatic adenocarcinoma, a randomized phase 3 clinical trial showed an improvement of median survival in the gemcitabine+erlotinib versus gemcitabine for only 0.33 months (6.24 versus 5.91 months) [50]. Similarly, another large meta-analyses including 1308 patients concluded that the addition of erlotinib to gemcitabine resulted in a mild but clinically meaningful additive effect [51]. When the patients are first stratified by PPS20, we have showed that high risk patients who were treated with targeted therapy have significantly longer overall survival (Log rank HR: 5.104, p value<0.0001) compared to untreated group, with median survivals of 23 and 6.1 months, respectively. Therefore PPS20 stratification can enable a striking improvement of targeted therapy success, from an overall survival benefit of 0.33 months to 16.9 months. Overall, a better prediction of responders to molecular targeted therapy can be achieved when ex vivo validations are performed for PPS20.
E-cadherin/Fibronectin ratio is slightly lower in high risk group, indicating mesenchymal properties, in line with the association of high PPS20 with shorter event-free survival. Consistently, EPS8, one of the genes related to worse prognosis in PPS20 which functions as part of the EGFR pathway also regulates actin cytoskeleton and promotes EMT [52, 53]. Among the genes associated with shorter survival in PPS20, TRIO, LDHA, MAP4K4 and ARNTL2 contribute to cellular motility/invasiveness/tumor aggressiveness and thus can also be contributing to shorter event-free survival [44, 47, 54, 55]. Similarly, LDHA, EPS8, SLC20A1 and ARNTL2 expression are also related to metastasis or shorter survival in pancreatic cancer [10, 17, 56, 57].
Low PPS20 tumors have higher overall immune cell infiltration, CD8 T cell/Treg ratio, Plasma cell/naïve B cell ratio and PD-1 expression, but not PD-L1 and CTLA4 expression when analyzed by CYBERSORT. It is known that PD1 is expressed on activated T cells and [35, 58], and active T cells can be suppressed by their interaction with PD-L1 [59], and this could be the reason why there are more active CD8 T cells in low PPS20 tumors. GSEA results support these findings as an enrichment of complement system proteins might be due to T cells and antigen presenting cells producing these proteins, and the expression and activation of complement components, in turn, promoting T cell activation and maturation [60]. Our GSEA analysis also indicated a “normal like”, more differentiated phenotype of low PPS20 tumors with an enrichment of digestion and ion channel transport related gene sets. Among genes upregulated in this group, CBX7, MIA3 and KANK1 have been shown to have tumor suppressive activities including roles in inhibition of cellular motility/migration and/or proliferation, and induction of cell cycle arrest in various malignancies [61-68]. PITPNA which we found as a good prognostic indicator, is suggested as favorable prognostic marker in pancreatic, endometrial and renal cancers in the Human Protein Atlas (www.proteinatlas.org) [69], and its overexpression was associated with longer survival in PDAC [70]. Human protein atlas also suggests POL3HR and C2ORF42 are favorable prognostic markers in pancreatic cancer.
Via analyzing cell line drug cytotoxicity data we identified a list of compounds which are selectively effective on either low PPS20 or high PPS20 tumors. The most significant associations were with BIRB-796, a p38 MAPK inhibitor for high PPS20 group and quabain which is a Na+/K+ATPase inhibitor for low PPS20 group. Activation of p38 results in proliferation, invasion, and metastasis of pancreatic cancer cells leading to worse prognosis and its suppression prevents the progression of pancreatic cancer [71, 72]. In another study, the activation of proautophagic/apoptotic pathways was shown to result in downregulation in expression and phosphorylation of MAPK14 (p38 MAPK) both in vivo and in vitro [73]. Increased p38-MAPK has been related to chemotherapy resistance in human gastric cancer cells [74]. The compound identified by our analysis has previously been shown to inhibit all 4 types of p38 MAPK isoforms both in vivo and in vitro [75] and has been shown to increase the efficiency of other chemotherapeutic agents in drug resistant models [76]. In esophageal squamous cell carcinoma, overexpression of Na+/K+ ATPase is associated with severity of the disease and is also reported in medulloblastoma, glioblastoma, melanoma, hepatomas, and non-small-cell lung cancer [77], and in breast cancer, it is reported to increase invasion of endocrine resistant cancer cells [78]. Although Qubain causes Na+/K+ATPase to interact with Src and EGFR, and can actıvate ERK1/2, it also results in growth arrest in human breast cancer cells, possible by increasing the expression of p53 and p21 [79]. Its antiproliferative effects have been also shown in prostate cancer cells, and in pig kidney epithelial cells [80-82]. Since our GSEA results showed enrichment of potassium ion transport and potassium channel activity in low PPS20 group, an inhibitor of a Na+/K+ pump can be a potential drug for this group of patients subsequent to in vitro and in vivo validation.