Identification of BRCA-specific APM-related genes
Based on a previously reported literature [22], we selected the following APM genes for quantification: B2M, CALR, CANX, ERAP1, ERAP2, HLA-A, HLA-B, HLA-C, PDIA3, PSMB5, PSMB6, PSMB7, PSMB8, PSMB9, PSMB10, TAP1, TAP2 and TAPBP. The ssGSEA approach was applied to quantify the APM score for each BRCA tumor tissue [24]. We observed that patients with lower APM exhibited worse OS in the TCGA-BRCA cohort (HR = 1.993, 95% CI = 1.408 – 2.820, p < 0.001; Figure S1), and the similar results were observed in progression-free survival (PFS) (HR = 1.736, 95% CI = 1.218 – 2.475, p = 0.007) and cancer-specific survival (CSS) (HR = 2.188, 95% CI = 1.373 – 3.488, p = 0.0061). Then, we performed WGCNA [37] on the transcriptome profiling data and APM ssGSEA scores to construct a scale-free co-expression network in the TCGA-BRCA cohort. A total of 36 gene modules were generated with a power of 7 as the optimal soft threshold (Figure 1A & Figure S2). The correlations between each gene module and the APM ssGSEA scores were analyzed by Pearson’s coefficient, demonstrating the brown and darkgreen modules exhibited the highest correlation (r = 0.66, p = 2e-135 and r = 0.69, p = 4e-151) with APM (Figure 1A). A total of 1665 genes involved in the two modules were considered as “APM-related module”, in which the module membership (MM) and gene significance (GS) revealed a highly positive correlation (r = 0.839, p < 0.001) (Figure 1B). Subsequently, we submitted all the 1665 genes involved in the APM-related module to Gene Ontology for enrichment analysis. The results indicated that five most significant processes were annotated as leukocyte cell-cell adhesion, mononuclear cell differentiation, leukocyte cell-cell adhesion, regulation of T cell activation and T cell activation (Figure 1C). NMF consensus clustering [38] was applied to classify the TCGA-BRCA cohort into two subgroups (cluster 1 (C1) and cluster 2 (C2), Figure 1D) based on the expression matrix of the established 1665 APM related genes with the optimal NMF k value of 2 (Figure S3). The boxplot showed that APM ssGSEA scores were significantly increased in C1 compared to C2 (p < 0.0001, Figure 1E). To confirm the biological functions of these genes are related to APM in BRCA, we used all GOBP gene sets with a fgsea algorithm between the two distinct subgroups [39]. The top ten pathways enriched in the C1 with higher APM scores displayed dramatically higher activity of various immune processes, particularly T cell activation. Therefore, these results confirmed that the identified genes involved in APM-related modules can represent the characteristics of APM.
Establishment of an APM-related gene signature for prognostic evaluation
Subsequently, the 1665 genes from the APM-related module were submitted to the univariate Cox proportional-hazards model. The volcano plot showed that 186 promising candidates were filtered out when the threshold of p value for Cox regression analysis was less than 0.01 (Figure 2A). For further screening, the 186 genes were analyzed by the random survival forest (RSF) algorithm [40]. As the number of trees in the RSF model increases, the prediction error rate gets smaller, suggesting that the model becomes more robust as the forest structure becomes more complex (Figure 2B). We selected genes with the criterion of relative importance greater than 0.4 as the optimal candidates. Ultimately, a total of 11 genes were identified and the important order for them is displayed in Figure 2C. Subsequently, the most robust prognostic genes for OS were identified among the 11 candidate genes using the Lasso Cox regression algorithm [41]. 10-fold cross-validation was used to avoid overfitting, and an optimal λ value of 0.00566 was determined (Figure 2D and Figure 2E). A collection of 9 genes (DLG3, MRO, NFKBIA, TAPBPL, PARP12, FAM159A, IGJ, PIGR and RAC2) retained their respective Cox coefficients, and the distribution of corresponding coefficients for the gene signature is shown in Figure 2F. Finally, the APMrs formula was developed to quantify the risk assessment of BRCA patients, with their individual gene expression value weighted by the LASSO Cox coefficients.
Subsequently, we used the cBioPortal database to reveal the genomic alterations of the APMrs gene signature in TCGA-BRCA. The oncoplot in Figure 2G showed that somatic mutation rarely occurs in the gene signature, while copy number variation was frequently observed, especially in the PIGR gene. To explore the potential pathways underlying the established APMrs gene signature, DEGs between APMrs-low and -high samples were selected with a threshold p value of 0.0001, and GO enrichment analysis demonstrated that the three most important terms were labeled with “Lymphocyte activation”, “Leukocyte activation”, and “Adaptive immune response”, indicating the APMrs gene signature could reflect the dysfunction of immune response and induce the upstream regulation of leukocyte activation (Figure 2H). Furthermore, we observed that substantial genes were overlapped in the above-mentioned three biological processes (Figure 2I), which could partly interpret the critical role of the APMrs gene signature in the APM process and downstream immune cascade responses.
Based on the TISCH database, the UMAP plot was used to visualize the distribution of 11 different cell categories including B, CD4Tconv, CD8Tex, DC, endothelial, fibroblast, malignant, monocyte/macrophage, plasma, SMC and Tprolif cells in the BRCA TME (Figure 2J). As shown in Figure 2K & L, we assessed the expression patterns of the three most important “protective” genes named NFKBIA, TAPBPL and PARP12 in the 11 main cell clusters. We observed that all the three genes are enriched in the DC cells, which indicates that they are closely correlated with DCs activity and APM process.
Validation of APMrs in different independent BRCA cohorts
Next, we assessed the prognostic value of APMrs signature in BRCA patients. Kaplan-Meier analysis showed that patients with higher APMrs exhibited worse OS in the TCGA-BRCA cohort (HR = 3.344, 95% CI = 2.216 – 5.045, p < 0.0001). Similar results were displayed in PFS (HR = 2.425, 95% CI = 1.604 – 3.665, p < 0.0001) and CSS (HR = 3.291, 95% CI = 1.899 – 5.703, p < 0.0001) (Figure 3A). Furthermore, univariate and multivariate Cox regression analysis were performed and compare the prognostic capacity of APMrs and other traditional features including age, pathological stage and PAM50 subtypes (Basal-like, HER2 enriched, Luminal A, Luminal B and normal-like) in the pooled cohort. Three parameters (APMrs, age and stage) are independent risk factors for OS in TCGA-BRCA patients, and APMrs suggested the highest significance in univariate Cox regression analysis (HR = 3.746, 95% CI = 2.580-5.440, p < 0.001; Figure 3B) and multivariate Cox regression analysis (HR = 3.261, 95% CI = 2.177-4.884, p < 0.001; Figure 3C) among all the variables. To quantify risk assessment and predict 1-, 3- and 5-year OS probabilities for individual BRCA patients, a personalized scoring nomogram was constructed combining APMrs and other clinicopathological features (Figure 3D). Calibration curves of 1-year (green line), 3-year (blue line) and 5-year (red line) OS prediction were close to the ideal performance (45-degree line), indicating the nomogram had a high OS accuracy prediction (Figure 3E). In addition, the DCA analysis showed that the nomogram exhibited the highest 5-year overall survival net benefit when compared to other parameters including age, stage and PAM50 subtype (Figure 3F).
In addition, six datasets (GSE1456, GSE20685, GSE20711, GSE42568, GSE58812 and METABRIC) with overall survival information were used to validate the prognostic value of APMrs. Kaplan–Meier analysis demonstrated that BRCA patients with higher APMrs exhibited worse OS in each cohort (GSE1456: HR = 2.772, 95% CI = 1.018–7.542, p = 0.0035; GSE20685: HR = 3.373, 95% CI = 1.532–7.425, p < 0.0001; GSE20711: HR = 2.296, 95% CI = 1.013–5.204, p = 0.0322; GSE42568: HR = 2.069, 95% CI = 1.063–4.027, p = 0.0322; GSE58812: HR = 3.389, 95% CI = 0.9242–12.43, p = 0.0027; METABRIC: HR = 1.326, 95% CI = 1.161–1.513, p < 0.0001; Figure 4A-F, respectively). Furthermore, we selected five datasets (GSE20711, GSE22219, GSE45725, GSE162228 and METABRIC) to validate the RFS prediction of APMrs. Figure S4 showed similar results, with higher APMrs reflecting worse RFS (GSE20711: HR = 2.421, 95% CI = 1.218–4.812, p = 0.0053; GSE22219: HR = 2.399, 95% CI = 1.538–3.742, p = 0.0009; GSE45725: HR = 3.713, 95% CI = 0.8807–15.66, p = 0.0039; GSE162228: HR = 8.955, 95% CI = 1.029–77.90, p < 0.0001; METABRIC: HR = 1.248, 95% CI = 1.066–1.461, p = 0.0038).
Construction and validation of a survival decision tree to improve risk stratification
Subsequently, we aimed to build an integrated prognostic model to improve risk stratification and risk assessment for BRCA patients. Considering APMrs, age and stage were independent risk factors for overall survival, we included the three parameters and built an integrated survival decision tree with the “rpart” R package based on the recursive partition analysis. As shown in the decision tree (Figure 4G), three risk subgroups are defined based on two components including APMrs as the dominant branch along with pathological stage. Specifically, patients with low APMrs were defined as “low risk” group, whereas “intermediate risk” and “high risk” groups were defined as “High APMrs & stage I/II” and “High APMrs & stage III/IV”, respectively. Significant differences in OS were observed among the three risk subgroups (p < 0.001; Figure 4H). The survival decision tree was further validated in the METABRIC cohort (p < 0.001; Figure 4I), indicating its valuable discriminative capacity. Overall, the survival decision tree could greatly optimize risk stratification and survival prediction accuracy for BRCA patients.
Patterns of immune cell infiltration and immunomodulators expression in BRCA patients
The presence of immune infiltration in tumors is a marker of favorable prognosis, especially in basal-like and HER2-positive breast cancer [48]. To further evaluate the infiltrating levels of various immune cells involved in BRCA, three algorithms including xCell, CIBERSORT and TIMER were performed [46]. Three stacked barplots represented by three different algorithms illustrated the distinct patterns of the relative abundance of 36, 22 and 6 immune cell types, and significantly distinct immune patterns were observed between APMrs-low and APMrs-high samples (Figure 5A-C). Combining the three approaches, the APMrs-high group was closely associated with M0 Macrophages, M2 Macrophages and mast cells, corresponding to an immunosuppressive phenotype [49]. In contrast, the APMrs-low group showed increased infiltrations of CD8+ T cells, activated memory CD4+ T cells, follicular helper T cells, gamma delta T cells, activated myeloid dendritic cell, activated NK cells and M1 Macrophages, which were called the active-immune phenotype [50].
In addition, we compared the expression levels of four kinds of key immune regulatory molecules (including inhibitory immune checkpoints, stimulatory immune checkpoints, antigen presentation-related genes, and cytolytic activity markers) between two different APM risk groups. We observed that the APMrs-low group was marked by the significantly highest expression values of immunomodulatory molecules, particularly some classical immune checkpoints such as PDCD1, CD274, CTLA4, LAG3, TIGIT and TGFB1; while the APMrs-high group presented with the lowest expression levels (Figure 5D-G). A study has identified that breast cancer with high expression of the antigen presentation-related genes such as HLA-A and HLA-B, has an increased immune T cell activation and favorable outcomes [20]. When CD8+ T cell activation or during clinical responses to anti-CTLA-4 and anti-PD-L1 immunotherapies, immune cytolytic activity (CYT) based on three key cytolytic effectors, granzyme A (GZMA), granzyme B (GZMB) and perforin (PRF1) are significantly upregulated [47]. These results demonstrated that the APMrs-low group may benefit more from immunotherapy.
Mutational landscapes in different risk groups of BRCA patients
Considering immune infiltration is always strongly associated with mutation in solid tumors, we investigated the mutational landscape in different risk groups of BRCA. Oncoplots were respectively generated for the two different APM-related risk groups, and top 20 frequently mutated genes in each group were displayed (Figure 6A-B). In detail, TP53, PIK3CA and TTN were the three genes with the highest mutation frequency in both groups. Significantly mutated genes were identified in the APMrs-low group compared to the APMrs-high group. As shown in a forest plot (Figure 6C), CDH1 and PIK3CA are the two most frequently mutated genes in APMrs-low samples. In addition, the heatmaps illustrated the co-occurrence and mutually exclusive mutations of the top 20 frequently mutated genes in each cohort. More co-occurrence and mutually exclusive mutations could be observed in the APMrs-low group, which suggested that APMrs-low samples harbored more somatic mutations and higher immune infiltration in BRCA, thereby more likely to evoke immune responses and consequently have favorable prognosis (Figure 6D-E). Similar results were observed in the METABRIC cohort. In detail, among the top 20 genes with mutation frequency in each group, PIK3CA and TP53 are two most frequently mutated genes (Figure 6F & 6G). Compared to the APMrs-high group, TP53, AHNAK2 and AGTR2 are three most frequently mutated genes in APMrs-low group (Figure 6H). Similarly, the APMrs-low group in METRBRIC displayed more events of co-occurrence and mutually exclusive mutations in comparison with the APMrs-high group (Figure 6I-J). These evidences revealed distinct mutational landscapes in different APMrs groups.
Establishment and validation of an APM-related signature for immunotherapy prediction
However, since the presence of immune infiltration does not imply immune function, we applied the TIDE algorithm and estimated potential ICB responses in all BRCA samples, and classified them into responders and non-responders [42]. Totally 348 DEGs between responders and non-responders were identified with a threshold of FDR q < 0.0001. Next, RF algorithm [43] was used to screen for the most significant genes associated with ICB response in the TCGA-BRCA cohort (Figure 7A). The top 30 genes were ranked by importance according to the mean decreased accuracy and Gini ranking methods (Figure 7B), and 23 overlapping genes were further selected (Figure 7C). Subsequently, in the Lasso logistic regression analysis, 10-fold cross-validation was applied to overcome over-fitting effect (Figure 7D), and an optimal λ value of 0.0035 was selected (Figure 7E). Finally, a total of 22 genes (APBA2, APOL6, BATF2, CCDC102A, CCR10, CHN1, CXCR2P1, EMP3, GMFG, GPR82, HABP4, IFFO1, ITM2C, LBH, LOC728392, MFNG, RNF122, S100A3, ST3GAL2, STAT1, TGFB1, YPEL4) preserved their individual logistic coefficients (Figure 7F).
According to the established formula, the APM-related immunotherapeutic response score (APMis) was calculated for each sample based on the relative expression and Lasso logistic coefficient of individual genes. Figure 7G showed that the responders in the TCGA-BRCA cohort exhibited higher APMis compared with non-responders. In addition, we correlated APMis with TIDE scores in all TCGA-BRCA patients and found a significant negative correlation (R = −0.64, p < 0.001, Figure 7H). These results suggest that the lower the APMis, the higher the TIDE score, the greater the likelihood of immune dysfunction and immune escape, and the worse the immunotherapy effect. In the validation cohort GSE100797, APMis exhibited an AUC value of 0.727 (Figure 7I) to predict immunotherapeutic response. Furthermore, the validation cohort is divided into APMis-low and -high groups, and the Kaplan–Meier analysis demonstrated that patients with higher APMis exhibited better progression-free survival (PFS) in the GSE100797 (HR = 0.3134, 95% CI = 0.09832–0.9988, p = 0.0046; Figure 7J). In addition, we compared the expression profiles of inhibitory and stimulatory immune checkpoints between the APMis-low and -high groups of the validation cohort. Compared with APMis-low samples, some representative immune checkpoints including LAG3, TIGIT, PDCD1, and IFNG were significantly elevated in APMis-high ones (Figure 7K-L). In summary, APMis might serve as a promising biomarker for immunotherapy.
Screening of potential targets and applicable drugs for BRCA patients with high APMrs
Recent polypharmacology studies demonstrated that compounds which are actionable toward more than one gene or molecular pathway should be valued [51]. A list of the top 150 dysregulated genes in APMrs-high groups was submitted to CMap to identify potential drugs applicable for APMrs-high patients. CMap mode-of-action (MoA) analysis revealed 23 mechanisms of action shared by 30 compounds with the highest prediction scores (Figure 8A). In particular, five compounds, namely teniposide, irinotecan, amsacrine, etoposide and SN-38, shared the MoA of topoisomerase inhibitor. The Sankey diagram depicts the flow from representative compounds to potential targets and their expression profiles (Figure 8B). All these target genes have specifically high expressions in APMrs-high BRCA samples, suggesting that targeting these genes with corresponding compounds might be a promising strategy for the personalized treatment of high-risk subsets