Differentially expressed genes related to drug resistance and mitochondrial energy metabolism
In this study, a total of 1118 BRCA samples from TCGA and 477 BRCA samples from the GEO database (GSE42568, GSE86374, GSE10886) were analyzed. The samples were divided into Tumor group and Normal group. To evaluate the differential gene expression between the Tumor and Normal groups, we utilized the R package DESeq2 for differential analysis. Within the dataset, our analysis uncovered 3492 DEGs, among which 2125 were upregulated and 1367 downregulated, in adherence to the criteria of an absolute log fold change (|logFC|) greater than 0.5 and an adjusted p-value (padj) less than 0.05. Subsequently, volcano maps were constructed to visually represent the outcomes of the differential expression analysis. To enhance clarity and facilitate reference, the key genes identified during the subsequent screening process were annotated on the volcano plot (Fig. 1A). Intersect the DEGs with DRGs and MRGs to obtain the DMRDESGs. 15 genes were identified, and a Venn diagram was generated (Fig. 1B). These 15 genes are: ATP7B, SIRT6, FUS, UCP2, AIFM1, PFKL, IL1B, PTEN, IRS1, PKD2, PTGS2, ALDH1A3, FOXO1, PFKFB3, PPARG. A heatmap was generated to compare the expression levels of DMRDEGs among different groups within the TCGA-BRCA dataset and the Combined dataset (Fig. 1C-D). A correlation heatmap was then drawn to depict the relationships among the DMRDEGs within the TCGA-BRCA dataset (Fig. 1E). Finally, the location of 15 DMRDEGs on the human chromosome was analyzed, and the chromosome localization map was drawn (Fig. 1F). The chromosome mapping showed that more DMRDEGs were located in chromosomes 2, 10 and 13, respectively: IL1B, IRS1 in chromosome 2, PFKFB3, PTEN in chromosome 10, FOXO1 and ATP7B in chromosome 13.
Mutation analysis and copy number variation analysis
To investigate the somatic mutation profiles of 15 DMRDEGs (ATP7B, SIRT6, FUS, UCP2, AIFM1, PFKL, IL1B, PTEN, IRS1, PKD2, PTGS2, ALDH1A3, FOXO1, PFKFB3, PPARG) within the TCGA-BRCA dataset, a mutation analysis was conducted on breast cancer patient samples, and the results were compiled and analyzed. The results indicate that the primary somatic mutations in TCGA-BRCA were: Missense Mutation, Nonsense Mutation, and Frame Shift Del, along with splice site mutations, Frame Shift Ins mutations, In Frame Del mutations, etc. Missense mutations are the most common among them. Moreover, the mutation types of DMRDEGs in BRCA patients primarily include single nucleotide polymorphisms (SNPs), as well as some deletions (DELs) and insertions (INS). C>T is the most common single nucleotide variant (SNV) in breast cancer patients, followed by C>G and C>A(Fig. 2A).All 15 DMRDEGs in TCGA-BRCA exhibited somatic mutations. Out of the 991 somatic mutation samples, 96 samples were implicated in the mutation, representing 9.69% of the total. Among them, the PTEN gene has the highest mutation rate, representing 5% of all mutation samples(Fig. 2B). Visualize the mutation status of TCGA-BRCA using stacked bar charts(Fig. 2C).Among the 15 DMRDEGs, there is a correlation between FOXO1 and PPARG, PFKFB3 and PPARG, as well as PFKFB3 and FUS(p-value < 0.05)(Fig. 2D).Based on the KM curves of 15 DMRDEGs in TCGA-BRCA, it can be inferred that samples with mutations have a worse DSS prognosis(Fig. 2E).
GISTIC2.0 analysis was utilized to examine the CNV in TCGA-BRCA, where 15 DMRDEGs all show CNVs . The values range from greatest GAIN to greatest LOSS: PTGS2, FUS, PFKFB3, PFKL, UCP2, PPARG, ALDH1A3, AIFM1, SIRT6, PKD2, IL1B, FOXO1, ATP7B, PTEN, IRS1, respectively(Fig. 2F).
GO and KEGG analysis
Through enrichment analysis of GO and KEGG, we delve deeper into the connection between BP, CC, MF, and KEGG pathway of 15 DMRDEGs and breast cancer. Then, a combined logFC GO and KEGG enrichment analysis was conducted, which was based on the GO and KEGG enrichment analysis. By providing the logFC values of DMRDEGs in the differential analysis results, the z-score corresponding to each GO and KEGG entry was calculated. The specific results are shown in Table S3. The results showed that these 15 DMRDEGs were mainly enriched in BP such as negative regulation of transport,regulation of hormone secretion,negative regulation of protein transport,regulation of protein secretion,negative regulation of establishment of protein localization in breast cancer; in CC, such as caveola, plasma membrane raft,Schmidt-Lanterman incisure,compact myelin,cytoplasmic side of membrane; in MF, such as NAD+ binding, carbohydrate kinase activity, alpha-actinin binding, actinin binding, NAD binding. Additionally, they were enriched in pathways such as AMPK signaling pathway, Central carbon metabolism in cancer, Longevity regulating pathway,Insulin resistance, and FoxO signaling pathway. The results of the GO and KEGG enrichment analysis were visualized through bar charts and bubble charts (Fig. 3A-B), and the GO and KEGG enrichment analysis results of the joint logFC were displayed through chord and circle plots (Fig. 3C-D).
Consensus Cluster Analysis of DMRDEGs
To ascertain the association between breast cancer subtypes and DMRDEGs, we conducted a consensus cluster analysis using 15 genes. As a result, two distinct BRCA subtypes were identified: Subtype 1 (Cluster 1) and Subtype 2 (Cluster 2) (Fig. 4A-C). Among them, subtype 1 (Cluster1) contained 865 samples and subtype 2 (Cluster2) contained 253 samples. To further verify the expression differences of DMRDEGs in BRCA disease subtypes, The boxplot (Fig. 4D) was used to display the expression levels of DMRDEGs and the differences between the two BRCA subtypes. Group comparison plots indicated that the expression levels of SIRT6, FUS, PTEN, and PFKFB3 were significantly different between the disease subtypes (p-value < 0.001), and the expression levels of PKD2 were significantly different between the disease subtypes (p-value < 0.01). ATP7B, PTEN, and PFKFB3 exhibited significant differences between the disease subtypes (p-value < 0.001). The differences in the expression of IL1B, IRS1, and FOXO1 among the disease subtypes were statistically significant (p-value < 0.05).
GSEA enrichment analysis
To assess the impact of gene expression levels on breast cancer subtypes in TCGA-BRCA, we initially compared differences among disease subtypes in breast cancer samples and acquired logFC values for each gene. By using GSEA, the expression of all genes in TCGA-BRCA was examined, including BP, CC, and MF. The detailed findings are shown in Table S4. The results showed that all genes in TCGA-BRCA were significantly enriched in NIKOLSKY BREAST CANCER 7Q21 Q22 AMPLICON (Fig. 5A), DACOSTA UV RESPONSE VIA ERCC3 DN (Fig. 5B), WP GPCRS CLASS A RHODOPSINLIKE (Fig. 5C), NIKOLSKY BREAST CANCER 8Q23 Q24 AMPLICON (Fig. 5D) and other biologically related functions and signaling pathways. Then, the above pathways were visualized by drawing a mountain map (Fig. 5E).The findings indicated that enrichment was associated with immune-related functions and pathways.
GSVA enrichment analysis
To explore the differences in the h.all.v2023.2. Hs.symbols.gmt gene set among breast cancer subtypes, GSVA was conducted on all genes in the TCGA-BRCA dataset. Refer to Table S5 for detailed information. The top 10 logFC positive pathways with padj < 0.05 and the top 10 logFC negative pathways with PADJ < 0.05 were screened respectively, and the enrichment scores of each sample (Fig. 6A) were shown. The boxplots of Combined Datasets for group comparison (Fig. 6B) were also displayed. The results of GSVA showed that the top 10 pathways with positive logFC values were: HALLMARK E2F TARGETS, HALLMARK G2M CHECKPOINT, HALLMARK IL6 JAK STAT3 SIGNALING, HALLMARK MYC TARGETS V2, HALLMARK SPERMATOGENESIS, HALLMARK ALLOGRAFT REJECTION, HALLMARK MYC TARGETS V1, HALLMARK TNFA SIGNALING VIA NFKB, HALLMARK MTORC1 SIGNALING, HALLMARK NOTCH SIGNALING; The top 10 pathways with negative logFC values are: HALLMARK TGF BETA SIGNALING, HALLMARK FATTY ACID METABOLISM, HALLMARK HEME METABOLISM, HALLMARK ESTROGEN RESPONSE LATE, HALLMARK PANCREAS BETA CELLS, HALLMARK PEROXISOME, HALLMARK PROTEIN SECRETION, HALLMARK UV RESPONSE DN, HALLMARK BILE ACID METABOLISM, HALLMARK ESTROGEN RESPONSE EARLY.
CIBERSORT immune infiltration analysis
The CIBERSORT algorithm was used to calculate the correlation between 22 immune cells and breast cancer subtypes 1 (Cluster1) and 2 (Cluster2). According to the results of immune infiltration analysis, the superimposed bar graph of immune cells in TCGA-BRCA was drawn (Fig. 7A). Then, the expression difference of immune cell infiltration abundance in subtype 1 (Cluster1) and subtype 2 (Cluster2) of TCGA-BRCA was shown by grouping comparison boxplot (Fig. 7B). The results showed that B cells naive, Plasma cells, T cells CD4 memory resting, T cells CD4 memory activated, T cells follicular helper, T cells regulatory (Tregs), Monocytes, Macrophages M0, Macrophages M1, Macrophages M2, Dendritic cells resting, Dendritic cells activated, Mast cells resting, Mast cells activated, The abundance of Neutrophils in subtype 1 (Cluster1) and subtype 2 (Cluster2) in TCGA-BRCA showed statistically significant differences (p-value < 0.05). Then, the correlation results of immune cell infiltration abundance were shown by correlation heat map (Fig. 7C). The results showed that T cells CD8 and T cells regulatory (Tregs) showed the greatest positive correlation (cor = 0.385, p-value < 0.001). There was the greatest negative correlation between NK cells resting and NK cells activated (cor = -0.730, p-value < 0.001). The correlation between differentially expressed genes related to drug resistance and mitochondrial energy metabolism (DMRDEGs) and the abundance of immune cell infiltration in TCGA-BRCA was shown by correlation dot plot (Fig. 7D). The results showed that IL1B showed the greatest positive correlation with Mast cells activated (cor = 0.502, p-value < 0.001), and ATP7B showed the greatest negative correlation with Macrophages M1 (cor = -0.318, p-value < 0.001), and the relationship between genes and the abundance of immune cell infiltration was shown by correlation scatter plot (Fig. 7E-F).
Analysis of immunotherapy responses
The ESTIMATE package leverages the distinct characteristics of cancer sample transcriptome to estimate the composition of tumor cells and various infiltrating normal cells. It primarily computes the immune and stromal scores based on RNA seq data of the sample, and subsequently assesses the tumor purity. The expression profile data of different disease subtypes in TCGA-BRCA were analyzed with the R package ESTIMATE, leading to varied immune and matrix scores, Stromal Score, Immune Score, ESTIMATE Score, and Tumor Purity were obtained for the different disease subtypes of TCGA-BRCA samples. Display the scoring results of Stromal Score, Immune Score, ESTIMATE Score, and Tumor Purity through a group comparison chart(Fig. 8A-D). The results indicated that Stromal Score and Immune Score show significant statistical differences among various disease subtypes(p-value < 0.05). However, There is no significant statistical difference between ESTIMATE Score and Tumor Purity in different disease subtypes(p-value > 0.05).
IPS, TMB, and TIDE analyze
To analyze how well breast cancer subtypes predict immunotherapy, TCGA-BRCA-associated IPS were downloaded from the TCIA database, The R package ggplot2 was used to draw the group comparison map of different IPS in breast cancer among different disease subtypes (Fig. 9A-D). The results showed that CTLA4(-)PD1(+), CTLA4(+)PD1(-) of breast cancer subtypes in IPS class, CTLA4(+) pd1 (-), CTLA4(+) pd1 (-), CTLA4(+)PD1(+) showed a highly statistically significant difference (p-value < 0.001).
The difference of tumor mutation burden (TMB) score in different disease subtypes of breast cancer was analyzed and the group comparison diagram was drawn (Fig. 9E). The results showed that the TMB score in breast cancer was highly statistically significant among different disease subtypes (p-value < 0.001). Then, the sensitivity of breast cancer patients to immunotherapy was evaluated by the TIDE algorithm, and the specific analysis results are shown in the group comparison plot (Fig. 9F). The results showed that there were no significant differences in TIDE immunotherapy scores between breast cancer subtypes (p-value > 0.05).
Drug sensitivity analysis
To explore therapeutic strategies suitable for mRNA vaccination in breast cancer patients, drug sensitivity data from the GDSC database were used as a training set to predict the sensitivity of breast cancer patients to common anticancer drugs. Subsequently, the sensitivity difference of two subtypes of breast cancer to different anticancer drugs was evaluated. Here we list the top 20 drugs showing significant variations among different disease subtypes: MK.2206,Lapatinib,AZD8055,WO2009093972,GDC0941,Temsirolimus,EHT.1864,GW.441756,CCT007093,FH535,PF.4708671,PD.0332991,Elesclomol,AKT.inhibitor.VIII,Pazopanib,IPA.3,Axitinib,Metformin,NVP.BEZ235,AMG.706 (Fig. 10A-T).These 20 drugs show statistically significant differences between two subtype groups(p-value < 0.001).
Protein interaction network
A protein-protein interaction Network (PPI Network) of 15 drug resistance and mitochondrial energy metabolism-related differential genes (DMRDEGs) was constructed using STRING database (Fig. 11A). The results of protein-protein interaction Network (PPI Network) showed that 14 of the 15 drug resistance and mitochondrial energy metabolism-related differential genes (DMRDEGs) had interaction relationships, which were: ATP7B, SIRT6, FUS, UCP2, AIFM1, PFKL, IL1B, PTEN, IRS1, PTGS2, ALDH1A3, FOXO1, PFKFB3, PPARG. Through GeneMANIA database, the genes associated with 15 drug resistance and mitochondrial energy metabolism-related differential genes (DMRDEGs) were obtained, which were: ELF5, HTR1B, KLF15, KMT5A, BAK1, GSTA2, PTGS1, SLC25A1, RELA, ATOX1, ATP7A, HAX1, TNNI3, BCL1L11, FABP4, NR2F2, NOXA1, TRIM63, AKT1, PFKFB2 (Fig. 11B).
LASSO regression analysis and screening and validation of key genes
Using clinical data from TCGA-BRCA along with DMRDEGs for LASSO regression analysis, we will build a LASSO regression model and develop a prognostic risk model. Visualize LASSO variable trajectory maps (Fig. 12A) and LASSO regression model maps (Fig. 12B). The results indicated that the LASSO regression model included four genes: ATP7B, FUS, AIFM1, and PPARG, which are considered key genes. The risk score calculation formula for each sample is as follows:
Risk Score = (-0.04787) * ATP7B + (-0.1344) * FUS + 0.2892 * AIFM1 + (-0.02903) * PPARG
The sample is then divided into high and low-risk groups based on the median Risk Score. Next, comparing the expression levels of key genes between TCGA-BRCA and Combined Dataset by high and low risk grouping(Fig. 12C-D). The results show that nearly all key genes display significant expression differences in TCGA-BRCA and Combined Dataset(p-value < 0.001). Among them, ATP7B, FUS, and PPARG show higher expression in low-risk samples, whereas AIFM1 exhibits higher expression in high-risk samples. This suggests that AIFM1 could be a prognostic risk factor in breast cancer, while ATP7B, FUS, and PPARG may serve as protective factors(Fig. 12E). Finally, draw a risk factor map for the LASSO regression results of TCGA-BRCA(Fig. 12F). The results indicate that the LASSO model for breast cancer exhibits high accuracy at one year(AUC > 0.7) but lower accuracy at three and five years(AUC > 0.5).
Survival analysis
Grouping TCGA-BRCA according to the expression levels and Risk Score of four key genes, and plotting the survival prognosis KM curve(Fig. 13A-E). The results showed a significant difference in survival prognosis between the high and low-risk score groups (p-value < 0.001), with the low-risk score group demonstrating a better prognosis than the high-risk score group. There was a highly significant difference in survival prognosis between the high and low expression groups of AIFM1 (p-value<0.01), with the low expression group having a better prognosis compared to the high expression group; There was a significant contrast in survival prognosis between the high and low expression groups of ATP7B (p-value<0.05), with the high expression group showing better prognosis than the low expression group.
Clinical Prognosis and its Corresponding Factors
To assess the correlation between risk score and patient prognosis, we evaluated the relationship between risk score and clinical pathological features predictive of outcome. Evaluate the influence of high or low Risk Score on clinical staging, T staging, M staging, ER/PR immunohistochemistry, and other tumor pathological features on prognosis. As we are well aware, there is a significant difference in the Risk Score under these clinical pathological feature groups (p-value<0.05) (Fig.S1). Visualize the relationships among various clinical pathological characteristics through Sankey diagrams(Fig. 14A). Subsequently, univariate Cox regression analysis was conducted between these clinical pathological characteristics and four key genes, with the results plotted in a forest plot (Fig. 14B). Factors with p < 0.05 were then included in a multivariate Cox regression analysis. Due to the presence of zero and infinite values in the multi-factor HR for the M stage, this clinical pathological characteristic was also excluded. Therefore, the factors included in the multivariate Cox regression include: Pathlogic stage, ER immunohistochemistry, PR immunohistochemistry, FUS gene, and AIFM1 gene. Draw a column chart to show the contribution of these factors to the prognostic model based on the results of multiple factor regression (Fig. 14C). Subsequently, prognostic calibration curves were plotted to assess the accuracy and precision of the Cox regression model (Fig. 14D). Additionally, decision curve analysis (DCA) was conducted to determine the model's prognostic utility (Fig. 14E). The results showed that the 1-year, 3-year, and 5-year prognostic calibration curves were close to the diagonal of the ideal model, indicating a very high accuracy of the model. In decision curve analysis (DCA), a model demonstrates superior performance when its line consistently surpasses both the all-positive and all-negative thresholds within a given range. The larger this range, the greater the net benefit achieved by the model. The results show that the line of the 1-year, 3-year, and 5-year models is consistently higher than that of All positive and All negative models within a certain range, and the model has a higher net profit, indicating better performance. The model reveals that the AIFM1 gene is the most significant contributor, indicating poor prognosis for samples with high AIFM1 expression (HR>1). Finally, a baseline data table was drawn to examine the relationship between the high and low expression of the AIFM1 gene and clinical variables, as shown in Table S6.
AIFM1 knockdown inhibited cellular proliferation, migration, and invasion in breast cancer within the HCC1806 and MDA-MB-231 cell lines.
Western blot analysis corroborated the effective silencing of AIFM1(Fig. 15A). The suppression of AIFM1 markedly curtailed the proliferation of MDA-MB-231 and HCC1806 cells, as evidenced by colony formation assay(Fig. 15B-C). Furthermore, scratch assays and Transwell experiments have yielded results indicating that the migratory and invasive abilities of tumor cells exhibited a notable decline subsequent to the knockdown of AIFM1(Fig. 15D-G). Overall, our findings indicated that AIFM1 may promote the proliferation, migration, and invasion of breast cancer.