TMB calculation and correlation with clinical parameters
Breast cancer genomic and transcriptomic data were obtained from a total of 960 tumor samples. The TMB value of each sample was calculated to evaluate the correlation between TMB and the clinical parameters. The survival rate of the high-TMB group was significantly lower than that of the low-TMB group (P = 6.734e-04) (Figure 1A). As shown in Figures 1B and 1C, TMB was significantly higher in older patients than in younger ones (P = 0.0033); however, TMB was not significantly different in terms of each clinical stage between older and younger patients (P > 0.05). TMB was significantly different between the T1 and T2 groups, the T3 and T4 groups (P < 0.05) (Figure 1D), the N0 and N1 groups, and the N1 and N2 groups (P < 0.05) (Figure 1E). However, TMB was not significantly different between the M0 and M1 groups (P = 0.12) (Figure 1F).
DDR gene set and mutational landscape
DDR gene sets were sorted out from 10 DDR-related signaling pathways, and a total of 193 DDR gene sets were obtained (7). Gene information is shown in Supplementary Table 3. The landscape map of the DDR gene mutations in the high- and low-TMB group was visualized using the “maftools” R package (Figure 2). The top five high-frequency mutant genes in the high-TMB group were TP53 (51%), PRKDC (4%), BRCA2 (4%), BRCA1 (4%), and ATM (3%). The top five high-frequency mutant genes in the low-TMB group were TP53 (18%), ATM (1%), PRKDC (1%), BRCA2 (1%), and CDKN1B (1%). It is important to note that 51% of the samples in the high-TMB group had a TP53 mutant, while only 18% of the samples in the low-TMB group had a TP53 mutant (Figures 2A, B). Overall, the high-TMB group had a higher frequency of gene mutations than the low-TMB group. In the high- and low-TMB groups, the median number of mutations was 51.5% and 19%, respectively. In addition, 90% of the mutations were point mutations (Figures 2C, E). Through the co-occurrence and exclusive analyses of these mutant genes, a total of 72 mutant gene pairs were obtained in the high-TMB group, and only one significant mutant gene pair was obtained in the low-TMB group (Figures 2D, F).
Screening of differentially expressed DDR genes in the high- and low-TMB groups.
A volcano plot of 6,424 differentially expressed genes was obtained between the two TMB groups (Figure 3A). A total of 67 differentially expressed genes were obtained by intersecting the DDR genes. A series of oncogenes, including EXO1, CCNE1, CCNE2, POLR2F, TP53BP1, and PSMA8, was shown to be activated or inactivated (Figure 3B). Information on the 67 differentially expressed genes is listed in Supplementary Table 2. To further understand the molecular function of the differentially expressed genes in breast cancer, we performed Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis and Gene Ontology (GO) enrichment analysis. KEGG analysis revealed that the differentially expressed DDR genes were mainly enriched in terms of the proteasome, mismatch repair, and homologous recombination (Figure 3C). The GO enrichment results revealed that the top five enriched signaling pathways of the 67 differentially expressed genes were the Nuclear factor kappa B (NF-κB) inducing kinase (NIK)/NF-κB signaling pathway, DNA repair pathway, Wnt signaling pathway, proteasome complex pathway, and damaged DNA binding pathway (Figure 3D). Furthermore, Gene Set Enrichment Analysis (GSEA) revealed that in the high-TMB group, the differentially expressed genes were mainly enriched in terms of the cell cycle, DNA replication, proteasome, and oocyte meiosis (Figure 3E).
Screening of prognostic factors by univariate Cox regression analysis
To screen the genes related to prognosis, we performed univariate Cox regression analysis using the 67 differentially expressed DDR genes. Ten prognosis-related genes were selected: MDC1, PARP3, POLR2K, PSMB1, PSMB9, PSMD2, PSMD7, PSMD14, RFC3, and UBE2T. Information on these 10 genes is listed in Supplementary Table 4. We divided the 960 breast cancer samples into high- and low-expression groups according to the median expression of the prognosis-related genes in the samples. We also performed Kaplan-Meier survival analysis using the 10 DDR genes in the high- and low-expression groups. The survival rates of the two groups were analyzed using the log-rank test. There was a significant difference in terms of the overall survival rate between the high- and low-expression groups. The high expression of PARP3 and the low expression of POLR2K, PSMB1, PSMD2, and PSMD14 significantly prolonged the survival time of patients, improving outcomes and reducing recurrence rates (P < 0.05) (Figures 4A). In addition, we explored the expression and related pathways of the 10 genes using the Gene Set Cancer Analysis (GSCALite) database. As shown in Figure 4B, the expression levels of the genes were partially enhanced in breast cancer and lung adenocarcinoma. MDC1, POLR2K, PSMB1, PSMB9, PSMD2, PSMD7, PSMD14, RFC3, and UBE2T were highly activated in the apoptotic and cell cycle pathways and reserved in the Ras/mitogen-activated protein kinase (MAPK) pathway (Figure 4C). To validate the expression of MDC1, PARP3, POLR2K, PSMB1, PSMB9, PSMD2, PSMD7, PSMD14, RFC3, and UBE2T in breast cancer cells, we achieved real-time quantitative polymerase chain reaction (RT-qPCR) to identify the mRNA levels of these 10 genes in T-47D, MCF-7, and MDA-MB-231 human breast cancer cells and MCF-10A mammary epithelial cells (normal cells) (Figure 4D). The expression levels of MDC1, POLR2K, PSMB1, PSMD2, PSMD7, PSMD14, RFC3, and UBE2T were higher in breast cancer cells than in normal cells. In contrast, the expression levels of PARP3 and PSMB9 were lower in breast cancer cells than in normal cells. These results are consistent with those of the survival analysis, indicating the suitability of the prediction model based on the prognosis-related genes.
Analysis and evaluation of the risk diagnostic model for breast cancer
We implemented univariate Cox regression analysis to screen 10 prognosis-related genes (Figure 5A). Then, we used the least absolute shrinkage and selection operator (LASSO)-Cox regression model to reduce the dimensionality. The parameter lambda.min was selected as the critical point for the linear risk assessment model composed of seven genes (MDC1, PARP3, PSMB1, PSMB9, PSMD2, PSMD7, and PSMD14) (Figures 5B, C). The gene descriptions and biological processes are shown in Supplementary Table 5. The median risk score of all patients was used as the cutoff value. The patients were divided into the high-risk (n = 480) and low-risk (n = 480) groups using the risk diagnostic model to calculate the risk score of each patient in the training cohort. By analyzing the time-dependent receiver operating characteristic (ROC) curves, we discovered that the model exhibited a clinical significance in terms of the 5-year and 10-year survival rates in patients with breast cancer (area under the curve [AUC] = 0.632 and 0.645, respectively), indicating the good predictive ability of the model in breast cancer (Figure 5D). Additionally, Kaplan-Meier analysis displayed that the overall survival rate of patients in the high-risk group was significantly lower than that of patients in the low-risk group (P = 1.708e-04), indicating the suitability of the risk diagnostic model for predicting the prognosis of patients with breast cancer (Figure 5F). After successfully establishing the risk diagnostic model, we used the GSE26085 dataset as the validation cohort to analyze the overall survival rates and ROC curves. We found that the overall survival rate of patients in the high-risk group was significantly lower than that of patients in the low-risk group (P = 1.694e-02). We also found that the model exhibited a clinical significance in terms of the 5-year and 10-year survival rates (area under the curve [AUC] = 0.641 and 0.647, respectively) in the validation cohort (Figures 5E, G). Subsequently, we explored the protein levels of MDC1, PARP3, PSMB1, PSMB9, PSMD2, PSMD7, and PSMD14 in normal breast tissues and breast cancer tissues. As shown in Figure 5H, the protein levels of MDC1, PSMB1, PSMB9, PSMD2, PSMD7, and PSMD14 were increased, while that of PARP3 was decreased in breast cancer tissues compared with normal tissues.
Tumor-infiltrating immune cells in the risk diagnostic model
Then, we investigated 22 tumor-infiltrating immune cell subtypes in the high- and low-risk groups in the breast cancer training cohort. Of the subtypes, 11 varied significantly between the high- and low-risk groups. Furthermore, the CD8+ T cells, activated NK cells, M0 macrophages, M2 macrophages, resting dendritic cells, and resting mast cells showed significant differences in terms of expression between the high- and low-risk groups (P < 0.0001) (Figure 6A). Furthermore, we investigated the effects of the seven DDR genes on immune cell infiltration in the training cohort using the TIMER database (https://cistrome.shinyapps.io/timer/). Different types of somatic copy number alterations (SCNAs), including those with two copies missing, one copy missing, normal copy number, or multiple copy numbers, in the seven genes were shown to significantly regulate immune cell infiltration in the breast cancer microenvironment (Figure 6B-H). Only the abundance of CD8+ T cells showed significant differences among all seven genes, indicating that CD8+ T cells may be a potential biomarker for distinguishing patients with favorable responses to immunotherapy based on our risk diagnostic model.
Functional identification of risk diagnostic model in breast cancer
We first collected 6 paired breast cancer tissues and adjacent normal tissues to verify the mRNA expression of these 7 genes in vivo. Consistent with the IHC results (Figure 5H), the expression of MDC1, PSMB1, PSMD2, PSMD7 and PSMD14 were significant augmented in breast cancer patients (P < 0.05). In contrast, the expression of PARP3 were significant declined in breast cancer patients (P < 0.05) (Figure 7A). After that, to test the function of risk diagnostic model in breast cancer, we chose MDC1 as candidate gene to detect cell proliferation, cell growth, cell invasion in breast cancer cells. We achieved a growth assay in MDA-MB-231 harboring loss of function of MDC1. Figure 7B indicated that MDC1 depletion significant decreased the proliferation rate (P < 0.05). The results of EdU assays further shown that MDC1 depletion could inhibit cell proliferation (Figure 7C). We next assessed the invasive capabilities by the transwell invasion assays and found that MDC1 knockdown led to significant decline in the invasive capacity of MDA-MB-231 cells (P < 0.05) (Figure 7D). To further investigate the inhibition of invasive capacity of MDA-MB-231 cells with MDC1 depletion, we detected the epithelial markers and mesenchymal markers expression in breast cancer cells. Results showed that MDC1 depletion augmented E-cadherin, α-catenin and γ-catenin at both the mRNA and protein levels, whereas the mRNA and protein levels for mesenchymal markers, including N-cadherin, vimentin and fibronectin were decreased (Figure 7E, F). These results suggested that MDC1 in risk diagnostic model could promote the proliferation, EMT, and invasion potential of breast cancer cells.