3.1 Identification of differentially expressed genes (DEGs) in BC datasets:
Firstly, three gene expression programs (GSE5364, GSE29044, and GSE42568) from the GEO database were included in our study. We used the website analysis tool GEO2R from GEO database to analyze the transcriptome expression profile data of three datasets to screen DEGs (Log Fold-Change ≥ 1.5 or Log Fold-Change ≤ -1.5 and adj.P < 0.05). Among them, in GSE5364, we obtained 687 DEDs, including 401 upregulated sites and 286 downregulated sites; In GSE29044, we obtained 723 DEDs, including 232 upregulated sites and 491 downregulated sites; In GSE42568, we obtained 2049 DEDs, including 966 upregulated sites and 1083 downregulated sites. The DEGs of the three datasets are shown in Fig. 1A, B, and C by volcanic maps, respectively; The data availability and standardization results are shown in the box plot in Fig. 1D, E, and F. Subsequently, we intersected three sets of DEGs and obtained 177 common differentially expressed genes (Fig. 1G). Finally, we downloaded mitophagy-related sites from the GeneCards database, with the inclusion criteria of reliability score > 1. And by intersecting 177 common differentially expressed genes with all obtained mitophagy-related sites, 70 mitophagy related sites were ultimately obtained (Fig. 1H).
3.2 Functional cluster analysis for DEGs:
In order to understand the potential biological effects and site interactions of DEGs. We used the Metascape database to conduct pathway enrichment and correlation analysis on DEGs. The pathway enrichment analysis in the Metascape database shows that these DEGs are mainly enriched in the following pathways: circulatory system process, negative regulation of cell population proliferation, negative regulation of cell differentiation, cell population proliferation, positive regulation of transferase activity, regulation of epithelial cell proliferation, response to growth factor, positive regulation of locomotion, positive regulation of locomotion, enzyme-linked receptor protein signaling pathway, response to hormone, tube morphogenesis, actin filament-based process from GO terms; and Signaling by Rho GTPases, Signaling by Nuclear Receptors, Signaling by Receptor Tyrosine Kinases, Extracellular matrix organization from Reactome dataset (Fig. 2A, B). Afterwards, we defined the interaction and similarity relationship of DEGs using the Overlap diagram (Fig. 2C, D), and the results showed a significant correlation between the significant sites in the three BC datasets. Finally, we constructed a DEGs interaction network through three dimensions (enrichment pathway level, dataset level, and P-value level) to better understand the potential interaction relationships of DEGs (Fig. 2E, F, G).
3.3 Construction of PPI Network for Mitophagy-related sites:
To further elucidate the regulation and interaction relationships of 70 mitophagy-related sites, we imported the obtained 70 mitophagy-related sites into the STRING database for protein-protein interaction analysis, and visualized the results in Cytoscape (version 3.8.2). The STRING database identified 70 protein products and 376 edges PPI networks (Fig. 3A). After that, we used the MCODE plugin in Cytoscape to identify two important interaction modules in the interaction network: MCODE1 and MCODE2 (Fig. 3B, C). Among them, MCODE1 has obvious interactions, and through the prediction of interaction networks and subnetworks, MCODE1 is believed to play an important role in the development of BC. In addition, we used the cytohubba plugin. The top ten hub genes obtained through five algorithms of MCC, DMNC, MNC, Degree and EPC were shown in Table 1, and the overlapping hub genes in the five algorithms are verified through the Venn diagram (Fig. 3D).
Table 1
Hub genes among the five algorithms in the Cytohubba plugin.
MCC | DMNC | MNC | Degree | EPC |
TPX2 | CDC20 | FN1 | MMP9 | MMP9 |
BIRC5 | CCNB2 | MMP9 | FN1 | FGF2 |
CDK1 | TOP2A | FGF2 | FGF2 | FN1 |
CCNB2 | TTK | IGF1 | IGF1 | FOXM1 |
FOXM1 | MELK | FOXM1 | TPX2 | CXCL12 |
AURKA | CEP55 | BIRC5 | BIRC5 | IGF1 |
CDC20 | PBK | CDK1 | CCNB2 | PBK |
TOP2A | FOXM1 | PPARG | CDK1 | MELK |
TTK | RRM2 | CDC20 | FOXM1 | TK1 |
MELK | CKS2 | CCNB2 | TK1 | BIRC5 |
3.4 Somatic mutation profiles for 70 mitophagy-related sites
We included 70 mitophagy related sites in the study of somatic mutation. The results showed that in the mutation spectrum data of BC, the mutation degree of 70 mitophagy related sites varied from 9.5–0.6% (Fig. 4), which may be a potential factor leading to the dual role of mitophagy in the prognosis of BC.
3.5 Identification of prognostic related sites on survival analysis:
The PPI network shows that the interaction between MCODE1 is significant; And through the prediction of MCODE interaction networks and subnetworks, MCODE1 is believed to play an important role in the development of BC. To verify the impact of MCODE1 sites on the prognosis of BC patients and extract potential prognostic biomarkers, we extracted 12 sites from MCODE1 (CDC20, TPX2, TTK, PBK, TOP2A, TK1, NEK2, FOXM1, CEP55, CDK1, CCNB2, BIRC5), and analyzed the prognostic role of each locus in BC based on the Kaplan Meier Plotter database; The inclusion criteria for effective effects are: confidence interval (CI) excluding 1 and logrank-P < 0.05. The results showed that 12 sites met the inclusion criteria and were considered valuable for further ROC validation (Fig. 5A-L).
Note
TTK is also known as CT96; TPX2 is also known as P100; PBK is also known as CT84; CDK1 is also known as CDC2; FOXM1 is also known as TGT3.
3.6 Confirmation of prognostic sites based on diagnostic and therapeutic efficacy:
For the reason to further confirm the diagnostic and therapeutic efficacy of the 12 sites (CDC20, TPX2, TTK, PBK, TOP2A, TK1, NEK2, FOXM1, CEP55, CDK1, CCNB2, BIRC5) selected in “3.5“ and build a disease diagnosis model based on mitophagy related biomarkers in BC, we use the transcriptome data of BC in the ROC Plotter database, at the same time link gene expression levels with response to therapy, and analyze the prognostic effects of diagnosis and treatment at each site to ultimately determine prognostic biomarkers. The inclusion criteria we adopted were: AUC > 0.5, P value < 0.05. The final analysis results indicate that 9 sites (CDC20, TPX2, PBK, TOP2A, NEK2, FOXM1, CDK1, CCNB2, CEP55) have potential roles in the diagnosis and treatment of BC, and have the value of becoming biomarkers for BC prognosis (Fig. 6A, B, C, D, E, F, G, H, I, J, K, L). Finally, we analyzed the expression of 9 prognostic related sites in control tissue, BC tissue, and BC metastatic tissue, and found that their expression levels in BC tissue were significantly higher than those in control tissue (Fig. 6M, N).
3.7 Consistency Cluster Analysis identified two mitophagy-BC-related subtypes:
Firstly, we analyzed the expression differences between BC and control groups for 70 mitophagy related sites based on the transcriptome gene expression profile of TCGA-BRCA project, and the difference results were shown in the form of heatmap (Fig. 7A). After analyzing the complete body differences, we used Consistency clustering analysis to determine the mitophagy-related clusters of BC samples. After conducting k-means clustering analysis, two subgroups (C1, C2) in the TCGA queue were identified as having different mitophagy gene expression patterns (Fig. 7B, C, D, E). Subsequently, in order to verify the survival status of patients in the C1 and C2 subgroups, we downloaded survival data from the TCGA database for patients in two subgroups and conducted K-M survival analysis on both subgroups. The survival curve showed that the survival level of the C2 subgroup was significantly higher than that of the C1 subgroup (Fig. 7G). Therefore, we define the C1 subgroup as a high-risk group and the C2 subgroup as a low-risk group. Finally, we analyzed the expression of 70 mitophagy-related sites in the high-risk and low-risk groups, and found significant differences in expression between the two groups (Fig. 7F). There was a significant block clustering in the heatmap.
3.8 GSEA analysis between different subgroups:
To further determine the potential functional differences caused by differential expression sites in high-risk groups and low-risk groups, we performed a GSEA analysis based on different pathway sets (GO and KEGG) for the transcriptome expression data of the two subgroups. The analysis results show that in GO terms, the condensed chromosome, the condensed chromosome centromeric region, the meiosis i cell cycle process, the chromosome centromeric region, nuclear chromosome segregation are the five major enrichment pathways for high-risk groups (Fig. 8A); endocrine process, the regulation of osteoblast differentiation, artery development, the positive regulation of osteoblast differentiation, the renal system process are the five major enrichment pathways in the low-risk group (Fig. 8B). In KEGG terms, spliceosome, cell cycle, proteasome, RNA degradation, oocyte meiosis are the five major enrichment pathways of high-risk groups (Fig. 8C); focal adhesion, ecm receptor interaction, complement and coagulation cascades, hypertrophic cardiomyopathy hcm, dilated cardiomyopathy are the five major enrichment pathways in the low-risk group (Fig. 8D).
3.9 Immune infiltration analysis and tumor microenvironment recognition:
We further analyzed the tumor microenvironment and immune invasion level of different subgroups. Firstly, we used the ESTIMATE tool to score the immune levels of the high-risk and low-risk groups. The results showed that the Stromal Score, Immune Score, and ESTIMATE-Score of the low-risk group were higher than those of the high-risk group, and the results were statistically significant (Fig. 9A). Subsequently, we used the CIBERSORT tool to analyze and validate the infiltration levels of 22 immune cells between the high-risk and low-risk groups. The results showed that, the infiltration level of: cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs), NK cells resting, Macrophages M0, Macrophages M1, Dendritic cells increased in the high-risk group; The infiltration level of: B cells native, T cells CD4 memory resting, Monocytes, Dendritic cells resting, Mast cells resting increased in the low-risk group; And the results are all statistically significant. The analysis results are presented in stacked bar charts and box plots (Fig. 9B, C). Finally, we also used MCPCounter tool for immune infiltration analysis between high-risk and low-risk groups, and the analysis results showed that, the infiltration level of Monocytic lineage increased in the high-risk group; The infiltration level of: B lineage, Myeloid dendritic cells, Neutrophils, Endothelial cells and Fibroblasts increased in the low-risk group; and the results are statistically significant (Fig. 9D).
3.10 Immunohistochemical (IHC) protein expression validation:
Finally, we obtained IHC images of the 9 prognostic related sites (CDC20, TPX2, PBK, TOP2A, NEK2, FOXM1, CDK1, CCNB2, CEP55) screened in the Human Protein Atlas database, and further investigated their protein expression differences in normal and BC tissues (Fig. 10). The results showed that the protein expression of 9 prognostic related sites in BC tissues was higher than that in normal tissues, which was consistent with the above analysis results of transcriptome data.