Identification of DEAAMRGs Between Normal and Tumor Tissues
A total of 372 genes associated with amino acid metabolism were obtained from KEGG, and 5971 DEGs were identified. ssGSEA analysis showed large differences in amino acid metabolism in TNBC and normal tissues (Figure S1A). Then 143 DEAAMRGs were identified through Venn online tool (Figure2A). Among them, 72 AAMRGs were expressed up-regulated and 71 AAMRGs were expressed down-regulated in TNBC (Figure 2B). Uploading 143 DEAAMRGs to STRING online database constructed protein-protein interaction (PPI) network (Figure 2C). Core modules in PPI were identified using the MCODE plugin in cytoscape (Figure 2D). GO enrichment analysis showed that DEAAMRGs were enriched in biological processes such as amino acid metabolic process, alpha-amino acid metabolic process, small molecule catabolic process, cellular components such as mitochondrial matrix, peroxisome, heterochromatin and molecular function such as vitamin binding, pyridoxal phosphate binding, vitamin B6 binding (Figure 2E). KEGG enrichment analysis showed that DEAAMRGs were enriched in Tryptophan metabolism, Tyrosine metabolism, Arginine and proline metabolism pathways (Figure 2F).
Construction and validation of the risk signature in the TCGA cohort
In the TCGA cohort, univariate Cox analysis screened out thirteen genes related to the prognosis of TNBC (p < 0.05) (Figure 3A). We further conducted a LASSO analysis to construct the risk signature (Figure 3B,C). The risk signature consists of twelve genes and the risk score per each patient was calculated as: risk score = (0.5316 * expression value of SDS) − (1.9759 * expression value of SMYD3) + (0.7044 * expression value of PRDM16) − (0.1056 * expression value of PSAT1) + (0.1282 * expression value of ALDH4A1) − (0.1695 * expression value of GLDC) − (0.0329 * expression value of MAT1A) + (0.5444 * expression value of TH) + (0.3893 * expression value of TDO2) + (0.3195 * expression value of GLYCTK) − (0.3035 * expression value of GPT) − (0.5323 * expression value of AMD1).SMYD3, PSAT1, GLDC, MAT1A,GPT and AMD1 were protective factors. SDS, PRDM16, ALDH4A1, TH, TDO2 and GLYCTK were risk factors. Depending on upper tertile risk scores, we assigned TNBC patients to high-risk and low-risk subgroups (Figure 3D,E,F). A total of 107 TNBC samples were obtained from the GEO database (GSE103091) as the validation cohort. The GEO cohort was divided into high-risk and low-risk groups by the upper tertile risk scores (Figure 3G,H,I). ssGSEA results show differences in Lysine degradation, Phosphonate and phosphinate metabolism and Selenocompound metabolism between high and low risk groups (Figure S1B) In addition, the expression of 10 genes in the risk signature were validated in the HPA database (Figure S2).
In the TCGA cohort, KM analysis showed that the high-risk group had a significantly worse prognosis than the low-risk group (Figure 4A). The accuracy of the risk signature was investigated by calculating the AUCs for 1, 3, and 5 years. And the AUCs for 1, 3, and 5 years were 0.934, 0.970, and 0.953, separately (Figure 4B). The AUC of the ROC curve for risk signature in the TCGA cohort was 0.893 (Figure 4C). In the GEO cohort, KM analysis showed that patients in the high-risk group had worse survival (Figure 4D). The AUCs for 1, 3, and 5 years were 0.692,0.636, and 0.604, respectively (Figure 4E). We analyzed the clinical correlation of risk signature and showed that the risk signature created were significantly correlated with pathological stage and pathological T-stage (Figure 4F-I).
Independent prognostic analysis and construction of prognostic nomogram
The results of the univariate Cox analysis combined with clinical factors showed that risk score, T, N, M, and stage were independent predictors (Figure 5A). In addition, multivariate Cox analysis showed that risk score were independent predictors (Figure 5B). Nomograms predicting 1-, 3-, and 5-year survival were created by incorporating stage and risk score based on the results of stepwise regression analysis (Figure 5C) and calibration graphs were drawn for 1-year, 3-year, and 5-year OS probabilities (Figure 5D). The C-index of the nomogram was 0.962. Based on the calibration curves, it is known that the predicted values of 1-year, 3-year and 5-year survival times are similar to the corresponding true survival times. These results indicate the good predictive power of this nomogram.
Tumor mutation burden analyses
The top ten genes in terms of mutation rate also differed greatly between the two groups (Figure 6A-D). Despite this, there is no significant correlation between TMB and subline scores (Figure 6E). To see if the risk signature or TMB was better at predicting survival, we divided the sample into high and low mutation subgroups based on the median TMB. There was no statistically significant difference between these two groups (Figure 6F). While the high-mutation-low-risk group had the highest survival rate, the low-mutation-high-risk group had the lowest survival rate (Figure 6G). This indicates that the predictive power of our model is stronger compared to TMB.
Tumor microenvironment analysis
The correlation between immune cell infiltration and immune function in the tumor microenvironment of two groups was assessed using the ssGSEA method. Immune function in the high-risk group Type II IFN response, Parainflammation, Type I IFN reponses, APC co stimulation, CCR, APC co inhibition, Checkpoint, T cell co -inhibition, HLA, T cell co stimulation, Cytolytic activity obtained higher ssGSEA scores (Figure 7A). Meanwhile, Macrophage, MDSC, Monocyte, Natural killer T cell, Neutrophil, Plasmacytoid dendritic cell, Regulatory T cell, T follicular helper cell, Type 1 T helper cell, among others obtained higher ssGSEA scores in the higher group (Figure 7B).
In addition, we analyzed the expression of HLA genes and immune checkpoint genes in the high-risk and low-risk groups. The expression of both HLA and classical immune checkpoint genes was slightly higher in the high-risk group than in the low-risk group (Figure 7C,D), especially CTLA4, LAG3, LGALS9, SIRPA, TDO2, TIGIT, HLA-E, HLA-DRB5, HLA-DRB1, HLA-DRA, HLA-DQB1, HLA-DQA1, HLA-DPB1, HLA-DMB, and HLA-DMA were significantly higher than those in the low risk group, showing that the high-risk group showed a stronger immunosuppressive state. Therefore, TNBC patients in the high-risk group benefited more from immunotherapy.
Prediction of drug sensitivity
The results showed that TNBC patients in the low-risk group were more sensitive to Docetaxel, Epirubicin, Oxaliplatin, Paclitaxel and Cisplatin (Figure 8A-E).
Validation of the role of ALDH4A1 in TNBC
The intersection of prognosis-related genes (Figure 9A) and different gene sets screened for signature genes by LASSO analysis (Figure 9B) in the TCGA and GSE103091 datasets were all associated with one gene, ALDH4A1, implying its potentially important role in TNBC. KM curve confirmed shorter survival in the ALDH4A1 high expression group in TNBC (Figure 9C). It even plays a role in the development of breast cancer. Next, we used the GSE180286 dataset in analyzed the differences in ALDH4A1 at the single cell level. The results showed that ALDH4A1 was mainly expressed in epithelial cells. (Figure 9D-F). qPCR results showed that in our collected clinical samples, the expression level of ALDH4A1 in TNBC tissues was significantly lower than that in normal tissues adjacent to the cancer (Figure 9G). In addition, the expression level of ALDH4A1 in the TNBC cell lines MDA-MB-231 and BT-549 was also lower than that in the normal breast epithelial cell line MCF-10A (Figure 9H). To investigate the role of ALDH4A1 in TNBC, we inhibited the expression of ALDH4A1 in TNBC cell lines MDA-MB-231 and BT546 by transfection with siRNA. PCR confirmed that ALDH4A1 was successfully knocked down (Figure 9I-J). Migration results showed that inhibition of ALDH4A1 expression significantly inhibited the invasive ability of both TNBC cells (Figure 9K). These results indicated that ALDH4A1 plays an important role in TNBC.