Identification of DEGs with both EMT and DDR functions in BRCA
To obtain DEDGs (genes with both DDR and EMT functions), we took an intersection for EMT-related genes and DDR-related genes (Fig. 2A). Meanwhile, 9538 DEGs between BRCA and normal samples were identified. Subsequently, we took an intersection for DEDGs and 9538 DEGs to obtain 23 genes (DEDGs) (Fig. 2B). 16 of them (AURKA, FOXM1, CCNA2, TACC3, E2F1, CDK5, TYMS, PTPA, CDKN2A, FOXK2, TRIM28, YWHAZ, BRCA1, CCND1, PIN1 and SKP2) were up-regulated, while 7 of them (VIM, TP63, CCND2, YWHAG, ATM, HIPK2 and JUNB) were down-regulated in BRCA. The expression levels of 23 DEDGs were shown in Fig. 2C. In addition, we analyzed the protein-protein interactions (PPI) for 23 DEDGs (Fig. 2D), where 0.4 (medium confidence) was set as the interaction scoring criterion to explore the relevance of DEDGs. CCND1, BRCA1, ATM, CDKN2A, CCNA2, FOXM1, AURKA, E2F1, CCND2 and SKP2 were the hub genes based on the PPI network (Fig. 2E).
Tcga-brca Classification Based On The Dedgs
We conducted the consensus clustering analysis to further investigate the relationship between the 23 DEDGs and BRCA subtypes. The highest intragroup correlation was observed at k = 2 (the clustering variable (k) was increased from 2 to 5). Based on the clustering results, 23 DEDGs could divide 1050 TCGA-BRCA patients into two clusters (Fig. 3A). The overall survival rate of patients in cluster1 was lower than in cluster2 (P < 0.01, Fig. 3B). Furthermore, the heatmap indicated a significant difference in 23 DEDGs expression levels, age, and T-stage between the two clusters (P < 0.05) (Fig. 3C).
Establishment Of Dedgs Prognostic Model Based On The Tcga Cohort
To explore the relationship between 23 DEDGs and the BRCA patients’ prognosis, 6 DEDGs, including TP63, YWHAZ, BRCA1, CCND2, YWHAG, and HIPK2, were screened by the univariate cox regression (Fig. 4A). Subsequently, we generated a 6-DEDGs prognosis model using the least absolute shrinkage and selection operator (LASSO) Cox regression (Fig. 4B, C). The risk score was calculated as following: Risk score = (− 0.154 * TP63 exp.) + (0.973 * YWHAZ exp.) + (0.219 * BRCA1 exp.) + (− 0.227 * CCND2 exp.) + (0.416 * YWHAG exp.) + (− 0.471 * HIPK2 exp.). Based on the median risk score, TCGA-BRCA patients were divided into the high-risk group (n = 525) and the low-risk group (n = 525) (Fig. 4D). Principal component analysis (PCA) revealed a significant difference between the two risk groups (Fig. 4E). In addition, survival analysis demonstrated that patients in the high-risk group had poorer survival than the low-risk group (Fig. 4F, G). At last, we constructed the receiver operator characteristic (ROC) curves for 5, 7, and 10 years to evaluate the model’s predictive capacity. The areas under the ROC curve (AUCs) at 5, 7, and 10 years were 0.701, 0.639, and 0.616, respectively (Fig. 4H).
Validation Of The 6-dedgs Signature Prognostic Model
To further validate the reliability of the prognostic model, two independent GEO-BRCA cohorts, GSE20685 (n = 327) and GSE88770 (n = 117), were used to validate the predictive model. Patients of two GEO cohorts were divided into the high- and low-risk groups using the same median risk score in the TCGA cohort (Fig. 5A, F). Principal component analysis (PCA) showed substantial differences between the two risk groups of validation cohorts (Fig. 5B, G). Consistent with the results of the TCGA cohort, in the GEO cohorts, patients in the high-risk group had poorer overall survival than the low-risk group (Fig. 5C, D, H, I). Eventually, the prognostic model based on 6-DEDGs also had the excellent predictive capability in the two validation cohorts. The AUCs of GSE20685 at 5, 7, and 10 years were 0.700, 0.640, and 0.621 (Fig. 5E), and the AUCs of GSE88770 at 5, 7, and 10 years were 0.736, 0.684 and 0.601 (Fig. 5J).
Evaluation Of The Independent Prognostic Value Of The Prognosis Model
To illuminate whether the risk score based on the prognosis model was an independent prognostic factor for BRCA patients, the risk score and clinical characteristics (age, sex, TNM and stage) were incorporated into the univariate Cox regression analysis (Fig. 6A). Subsequently, variables with P < 0.05 (stage, age and risk score) were included in the multivariable Cox regression analysis. As expected, the risk score was identified as an independent prognostic factor affecting the prognosis of BRCA. In addition, Stage and age were also ascertained as independent prognostic factors (Fig. 6B). Heatmap based on the TCGA cohort and clinicopathological information demonstrated the differential expression of DEDGs in the two groups and differences in gender, age, and survival status (P < 0.05) (Fig. 6C). Based on these outcomes, we constructed a clinical prognostic nomogram by combining the risk score with three clinical characteristics to predict patients’ survival rates (Fig. 6D). To confirm the predictive performance of the prognostic nomogram, the prognostic calibration curve was depicted and showed good predictive accuracy for survival at 5, 7, and 10 years (Fig. 6E)
Functional Enrichment Analysis Based On The Prognosis Model
Enrichment analyses of GO and KEGG functions were made to explore the potential biological functions and pathways based on our risk classification. A total of 3732 DEGs were screened between the two risk groups. GO enrichment results showed that DEGs were enriched intensively in various immune-related molecular functions (Fig. 7A-C). KEGG enrichment also revealed that DEGs were considerably enriched in immune-related pathways, including viral protein interaction with cytokine and cytokine receptor, Intestinal immune network for IgA production and primary immunodeficiency (Fig. 7D).
Comparison Of Immune Activity Between Two Risk Groups
To obtain the potential correlation between the risk groups and immune activity, we estimated the difference in immune status between the high- and low-risk groups. Firstly, we calculated the ESTIMATE score, immune score, and stromal score for each BRCA patient based on the ESTIMATE algorithm and visualized them in a heat map (Fig. 8A). The high-risk group had lower immune scores, stromal scores, and ESTIMATE scores than the low-risk group (P < 0.05) (Fig. 8B). Subsequently, we compared the enrichment scores of 16 immune cells and 13 immune-related pathways among different risk groups. The results indicated that the high-risk group had lower levels of immune cell infiltration compared to the low-risk group (P < 0.05), such as B cells, CD8+ T cells, dendritic cells (DCs), tumor-infiltrating lymphocytes (TILs) and helper T cells (T helper cells) (Fig. 8C). Meanwhile, 10 immune-related pathways, including antigen-presenting cell (APC) co-inhibitory and co-stimulatory pathway, chemokine receptor (CCR), checkpoint, cytolytic activity, human leukocyte antigen (HLA), inflammation promotion, parainflammation, T cell co-stimulation, type II interferon (IFN) response, were less active in the high-risk group (Fig. 8D). According to the above results, the poor prognosis in high-risk patients may be attributed to the immunosuppressive tumor microenvironment. The results of two GEO validation sets were presented in Fig S1 and S2 in Supplementary file 7.
Drug Sensitivity Analysis Between Two Risk Groups
The sensitivities of 4 common BRCA chemotherapy agents (Olaparib, Niraparib, Cyclophosphamide and Talazoparib) and 4 other tumor chemotherapy agents (Erlotinib, Crizotinib, Axitinib, and Cisplatin) in two risk groups were analyzed using the R package "pRRophetic". The results showed that the IC50 sensitivity scores for these drugs were significantly higher in the high-risk group (Fig. 9A-H). which meant that BRCA patients in the high-risk group may be insensitive to chemotherapy treatment. The results of two GEO validation sets were shown in Fig S3 and S4 in Supplementary file 7.