Identification of Differentially Expressed ERGs
RNA-seq and clinical follow-up data from TCGA-UCEC datasets was downloaded, which contains 543 cancer samples and 35 normal samples. 1184 EMT-related genes (ERGs) are acquired from Human EMT Database. Then, the transcriptional expression of these 1184 ERGs in TCGA-UCEC RNA-seq data was extracted and compared between normal samples and UCEC samples through “limma” package in R software (∣logFC∣ > 1, FDR < 0.05). The results indicated that there are 236 genes significantly upregulated and 169 genes significantly downregulated in UCEC (Figures 1(a) and 1(b)).
Biological functions pathways analysis
Biological functions pathways analysis was executed of these 405 differentially expressed ERGs. Gene ontology (GO) enrichment terms are shown in Figure 2. The result of GO enrichment shows that the screened ERGs positively regulate the DNA-binding transcription activation in UCEC. KEGG pathway enrichment of these genes are shown in Figure 2b. These results indicate that the differentially expressed ERGs can promote cancer-related pathways, including Focal adhesion, EGFR tyrosine kinase inhibitor pathways, and may lead to transcriptional deregulation in cancer.
Screened for the ERGs with significant prognosis
The differentially expressed EMT-related genes (ERGs) in mRNA expression data of UCEC cohort were identified by the “limma” package in R software (FDR < 0.05, ∣logFC∣> 1). After combining clinical data and mRNA expression data of ERGs, univariate Cox regression analyses were used to screen the differentially expressed ERGs with remarkable prognostic value(p<0.05). And 101 genes were screened with significant prognostic value in UCEC (Fig 3).
Establishment and validation of the prognostic EMT‐related gene signature
In order to identify the EMT associated genes, a machine learning algorithm, LASSO algorithms, was performed and eventually 9 genes (EPHB2, TUFT1, CDKN2A, ONECUT2, RBP2, KLF8, E2F1, SIX1, ERBB2) were identified. We utilized 10,000-fold cross validation to select penalty value, lambda(λ). In our study, the optimal computed lambda ranged from 0.0413-0.0809(Fig4a). We set the λ at 0.0809 as it can generate a stricter penalty model which contain much less variates. And at last, a 9-gene LASSO regression model was constructed (Fig4b). The AUC of the constructed LASSO regression model was 0.78 when overall 5-year survival was applied and 0.78 when 2-year survival was applied. These 9 genes were selected for the next step analysis. Multivariate Cox regression analysis performed by “survival” R package was subsequently used to construct a predictive signature. The risk score of the prognostic signature was calculated and EC patients were divided into high-risk group and low-risk group. We verified this formula with overall survival data in both TCGA-UCEC cohort and GEO dataset (GSE21882). And we found that this 9-gene signature was significantly associated with OS (Figure 5a). In GSE21882, as only survival status after 5 years could be acquired, we divided the patients into non-survivors and survivors and the RS were compared between these two groups. Results showed that RS was significant higher in the non-survivors group (Fig5b). In the TCGA-UCEC dataset, our analysis results showed that there was a significant statistical difference in the DFS between the low-risk and high-risk group (Figure5c).
Recently, the endometrial cancer molecular classification was introduced by TCGA and promoted by NCCN which initiated a transition toward molecular-based classification with significant prognostic value and a potential impact on the treatment of EC. There are 4 molecular subgroups: p53-abnormal (p53abn), POLE-ultramutated (POLEmut), mismatch repair–deficient (MMRd), and no specific molecular profile (NSMP). In our study, we explored the above molecular profiling of the high- and low-risk group. In our study, there were higher copy number variations of BRAF, KRAS, PIK3CA, PTEN and TP53 in the high-risk group EC patients by the means of Chi-square test. Also, mutations rate of P53, PTEN and KRAS were higher in the high-risk group EC patients (Fig5d). Coordinated to the TCGA pathology and molecular analysis, high risk group EC patients had higher P53 mutation rate and poorer prognosis,
The clinical pathological characteristics were compared between the two groups, including patients’ age, BMI, pre-menopause/ menopause status, number of pregnancies, different pathological grades, clinical-pathological staging, clinical outcome after the initial treatment, metastasis/recurrence status. The cut off p-value was set as 0.05. And the result showed that patients in high risk group have a higher percentage in menopause status, pregnancy no less than 2 times, G3 pathological differentiation, advanced clinical-pathological staging, and recurrence or metastasis status (Figure6).
Nomogram construction and validation
After univariate and multivariate survival Cox analysis, 4 significant prognostic parameters were enrolled in the construction of nomogram (p<0.1 in multivariate cox regression), detailed result was showed in table1. Respective points of age at diagnosis, differentiation grade, FIGO stage and risk score can be acquired through the point scale axis (Figure 7a). The calculated C-index was 0.796(95%CI: 0.700-0.893). Calibration cures were presented in Figure7b, which showed that the predictive probability was in good accordance with the actual survival proportion.
Immune microenvironment analysis
Based on the immune and stromal score which were acquired through ESTIMATE, we analyzed their relation with the prognosis data in TCGA-UCEC cohort. Patients in TCGA cohort were classified into high tumor purity and low tumor purity, significant prognostic difference of 5-year overall survival were identified, with p=0.012(Fig8a). Wilcoxon rank sum test was used to compared the immune score, stromal score and tumor purity between the high risk and low risk group in the TCGA-UCEC cohort. All p values were lower than 0.05, demonstrating a significant difference in immune score, stromal score and tumor purity between the two groups (Fig8b). CTLA4 was known to negatively regulate T cell function through relatively unique and potentially non-overlapping molecular mechanisms[14].Based on the risk score acquired through TCGA-UCEC cohort, CTLA4 was significantly up regulated in the high-risk score group(Fig8c). The proportions of 22 tumor-immune cells were displayed in Fig8d, and macrophage M0 and macrophage M2 were the top 2 components in EC tumor immune microenvironment. There were significantly difference between the high-risk and low-risk group regrading to the immune cells proportion, and the result were show in Figure8e. And higher proportion of Macrophage M1 and lower Macrophage M2 were found in high risk group, which indicate a potential benefice from the immune check point blockades therapy,
Immune checkpoint blockades therapy and drug sensitivity prediction
Nowadays immune check point blockades targeting CTLA-4 and PD-1 has emerged as a promising strategy in various malignant tumor [10]. Clinical response to immune check point blockades were estimated, and results show that T cells dysfunction was lower in the high-risk group, which indicate a more promising response to immune checkpoint blockades targeting CTLA-4 and PD-1(Fig9a), To obtain a better comprehensive analysis of chemotherapy response, drug sensitivity data were acquire from the GDSC database and the IC50 were compared. Pearson correlation between the risk score and IC50 of different chemical components were conducted, and 8 chemical components were identified for a negative correlation with the risk score (r<-0.3, p<0.05). Results were displayed in Fig9b. Among the 8 components, A.443654 is a selective Akt inhibitor. ABT.263 and BI 2536 both can induce apoptosis. Lower IC50 of these drugs indicates that high-risk EC patients were more sensitive to the treatment compared to the low-risk EC patients.