Expression and genetic variation of NETs-associated genes contribute significantly to the prognosis of BC
A total of 69 genes associated with NETs were identified from the literature search (Table S1) [52, 53]. In TCGA-BRCA cohort, principal component analysis (PCA) on these selective genes completely distinguished tumor samples from the healthy controls. Importantly, 22 of these genes showed significant differential expression patterns (adjusted p-value ≤ 0.01) (Figs. 1A and B; TableS2; see Materials and Methods). Of these, ALPL, BST1, CD93, CREB5, CSF3, CTSG, F3, G0S2, IL6, MAPK1, MME, PIK3CA, SELP, SLC25A37, TECPR2 and TLR4 were significantly downregulated in tumor samples, to the contrary, AKT1, CSF3R, ENTPD4, HIST1H2BC, ITGB2 and SELPLG were significantly upregulated (Fig. 1B). Gene Ontology (GO) analysis revealed that these significantly differentially expressed genes (DEGs) were primarily associated with biological processes such as leukocyte response to microorganisms, leukocyte migration and adhesion, suggesting neutrophil immune function and innate immunity characteristics (Fig. 1C).
We next investigated the interaction of these 22 NETs-associated genes and their potential prognostic role in BC patients and identified their positive interactions with prognosis risk factors (e.g., PIK3CA, TLR4, BST1, CD93) and/or protective factors (e.g., AKT1, HIST1H2BC) (Fig. 1D). Concerning the mutation status, we found that 402 / 987 (40.73%) BC samples had mutations in the 22 NETs-associated genes, with missense mutations being the most common (Fig. 1E). In particular, PIK3CA exhibited the highest mutation frequency (34%) among all genes. In addition, overall survival (OS) analysis of the TCGA-BRCA cohort revealed that patients with low mutational burden of these 22 NETs-associated genes (p = 0.029, log-rank test) or low copy number variation (CNV) (p = 0.014, log-rank test) had a better prognosis (Figs. 1F and G). To evaluate the predictive performance of data types (NETs enrichment score/NES, copy number variations/CNV, tumor mutational burden/TMB) for the prognosis of BC patients, we introduced a generalized linear model, and found that NES exhibited better explanatory power for prognosis, thus can be used as an alternative mode of gene expression representation (Fig. 1H; see Materials and Methods). Taken together, the above analyses suggest that expression and genetic variation of NETs-associated genes play a significant role in the prognosis of BC.
NetPSig complements BC-related clinically valuable variables
In addition to the expression of NETs-associated genes for BC prognosis, we also took advantage of the signaling pathways that have previously been used to predict prognosis [54-56]. Using the combinatorial approach, we established a novel prognostic signature for BC patients (Fig. 1H). Primarily, we used the TCGA-BRCA cohort as a training set and established a signature, called NetPSig, where the NetPSig score represents a linear combination of ES (enrichment scores) using 29 prognostic paths, weighted by their estimated regression coefficients in a multivariate Cox regression analysis (Figs. 2A-C; Table S3; see Materials and Methods). When tested in the TCGA-BRCA cohort, the NetPSig conveniently stratified patients into high- and low-risk groups with significantly different OS (p < 0.0001, log-rank test) after the optimal cut point, indicating more deaths in the high-risk group (Figs. 2D and E; see Materials and Methods). The better prognostic ability of NetPSig was also evident by the area under the curve (AUC) analysis, which was 0.78, 0.793, 0.861, and 0.772 at 3, 5, 10, and 15 years of OS, respectively, with the AUC being highest for 10 years (Fig. 2F). We further broaden the utility of NetPSig by examining clinically valuable variables such as age, PAM50 subtypes (i.e., Normal-like/Normal, Luminal-A/LumA, Luminal-B/LumB, Her2-positive/Her2, and Basal), and tumor stage. Univariate COX regression analysis revealed that NetPSig (high-risk vs low-risk), age (> 60 vs ≤60), and stage (stage I/II vs III/IV) were valid prognostic indicators. Whereas, in the multivariate COX regression analysis, NetPSig (HR = 6.08, p < 0.001), age (HR = 2.20, p < 0.001), stage III (HR = 2.72, p < 0.001), and IV (HR = 8.99, p < 0.001) were found to be significantly associated with unfavorable prognosis and were independent prognostic indicators (Figs. 2G and H). Taken together, NetPSig relates favorably with the clinically valuable variables associated with BC.
NetPSig demonstrated better prognostic performance compared to other known gene-based signatures for BC
We next assessed the prognostic performance of NetPSig by evaluating it against six independent BC cohorts from the GEO database and from the literature such as, Metabric [19], Chin et al.'s [20], GSE31448 [21], GSE21653 [22], GSE20685 [23], and GSE19615 [24] (Table S2). Importantly, the background corrections and quantile normalization were applied to the expressions of each data set, before calculating the NetPSig scores (see Materials and Methods). For the comprehensive comparisons, we also applied univariate and multivariate COX regression analyses, with NetPSig (high-risk vs low-risk), age (> 60 vs ≤60), PAM50 (LumA/LumB/Normal vs Her2/Basal), and stage (stage I/II vs III/IV) as covariates. Of interest, we found that NetPSig was statistically significant (p ≤ 0.01) in all the included six cohorts (Table 1) and can be considered as a valid and independent prognostic factor for BC, especially when matched with age, PAM50, and stage. The performance of NetPSig was also validated against four publicly available BC gene-based signatures such as, Mo et al.’s (5-gene signature) [39], Sui et al.’s (immune cell infiltration-based signature) [40], Chen et al.’s (15-immune gene signature) [41], and Liu et al.’s (25-gene signature) [42]. In this particular section, we separately calculated the risk scores for BC patients using the TCGA-BRCA cohort combined with other six external datasets (hereafter referred to as the “merged” BC cohort or dataset) and four independent signatures. In this integrated analysis, the AUC determined by the NetPSig score was 0.684, the highest value for the 5-year OS (Fig. 3A; see Materials and Methods). Since, it is known that the concordance index (C-index) represents the probability that the predicted outcome matches the actual observed outcome. Therefore, we randomly selected the samples from the “merged” BC dataset and calculated the C-index corresponding to the optimal stratification of each signature risk score. The analysis was repeated 100 times, and the average C-index of the NetPSig was found to be the highest (Fig. 3B; see Materials and Methods). Thus, NetPSig showed better prognostic performance compared to the other known gene-based signatures for BC.
Biological disparities in high- and low-risk groups identified by NetPSig confer prognostic values
Next, we examined the biological differences between the high- and low-risk groups identified by NetPSig. Our analysis showed that the ES of immune cell types was significantly stronger in the low-risk group compared to the high-risk group, and the ES of most cell types was significantly different between the two groups (Fig. 4A). In addition, the high-risk group contained more elderly patients and deaths, consistent with our aforementioned analysis (Fig. S1). When the clinical characteristics (survival status, age, stage, and PAM50 subtypes) were considered, Normal-like and Luminal A were found to be the least aggressive subtypes with the lowest risk scores (as predicted by NetPSig), whereas HER2 and Basal were identified as the most aggressive subtypes, demonstrated the highest risk scores. Also, tumor stage and NetPSig score showed a positive trend in the “merged” cohort (Fig. 4B). Moreover, we used ESTIMATE [47] and CIBERSORT [48] to further confirm the difference in immune infiltration between the high- and low-risk groups. ESTIMATE predictions revealed lower tumor purity and higher immune infiltration in the low-risk group compared to the high-risk group in the fused BC cohort, with the differences being statistically significant (Fig. 4C; see Materials and Methods). While in CIBERSORT estimations, total immune infiltration was higher in the low-risk group compared to the high-risk group, with significant differences in several types of immune cells, such as B cells, T cells, and macrophages (Fig. 4D; see Materials and Methods). Besides, DEGs were identified in the high-risk and low-risk groups, and their gene ontology analysis showed that the high-risk group was involved in negative regulation of multicellular organismic processes and oxidative stress response, whereas in the low-risk group DEGs were more involved in leukocyte activation and immune regulation processes (Figs. 4E and F; Table S4; see Materials and Methods). The ES of immune checkpoint genes (ICGs) also revealed a significant negative correlation with the risk score, with significantly higher ES in the low-risk group (Figs. S2 and S3). When comparing the performance of NetPSig with ICGs in predicting prognosis, the AUC of NetPSig was found to be the highest at 5 years of OS (Fig. S4), with superior prognostic ability for BC patients. Overall, a lower NetPSig score corresponds to a higher degree of immune infiltration and has a better predictive performance for prognosis compared to ICGs.
Regarding the heterogeneity of mutations, only 10 genes with the highest mutation frequency were demonstrated in each group, and the mutations in these genes were observed in both the high-risk group (215 samples, 83.01%) and the low-risk group (615 samples, 84.25%) (Fig. 4G). Analysis of mutations with Fisher's exact test revealed that TP53 and MYH14 were significantly mutated in the high-risk group, whereas the contrary was the case for CDH1 and PIK3CA (Fig. 4H). Taken together, the biological differences between the high- and low-risk groups identified by NetPSig confer prognostic value.
NetPSig performed favorably for predicting the treatment response in BC patients
In order to investigate the predictive value of NetPSig for BC treatment, we compared the risk scores of patients before and after the adjuvant chemotherapy in the GSE5462 cohort [25, 26]. We observed that BC patients who received adjuvant chemotherapy had a lower risk score, with significant difference (p = 0.045, Wilcoxon test) (Fig. 5A). Subsequently, patients in the GSE41998 cohort [29] were divided into four groups according to their response to neoadjuvant chemotherapy: progressive disease (PD), stable disease (SD), partial response (PR), and complete response (CR). A significant reduction in the risk scores from BC patients with SD/PD to CR/PR was observed, as shown in Figure 5B (p = 0.0028; T-test). Besides, we also found significantly lower risk values for pathological complete response (pCR) compared to the residual disease (RD) (p = 0.016 for GSE25055 [28]; p = 0.00066 for GSE20194 [27]; p = 0.043 for GSE37946 [30]; Wilcoxon test) (Figs. 5C-E; Table S2).
Of importance, risk scores were significantly higher in the immunotherapy-resistant group than in the sensitive group for the GSE124821 cohort [31] (p = 0.00002; T-test; Table S2) (Fig. 5F). It is worth mentioning that not complete information about BC patients receiving immunotherapy was available. Therefore, we used urothelial carcinomas receiving anti-PD-L1 therapy (IMvigor210 [32]) to test the predictive value of NetPSig. Interestingly, the stratification of the IMvigor210 cohort, represented by NetPSig, showed significant differences, as indicated by the KM curve, with patients in the low-risk group having a significantly better prognosis than the high-risk group (Fig. 5G; Table S2). In addition, the distributions of CR/PR and SD/PD in the two groups were examined, and a better response to immunotherapy was observed in the low-risk group (Fig. 5H). Overall, these results showed that NetPSig performed well in predicting BC treatment with good prognostic value.
NetPSig identified potential drugs for the treatment of BC patients with lower risk scores
To further enhanced the ability of NetPSig for BC treatment, we calculated IC50 values of the obtained data from the “merged” BC cohort for 50 antitumor drugs using the R package "pRRophetic" (version 0.5) [49] (see Materials and Methods). When comparing the sensitivity of these anti-tumor drugs between the high- and low-risk groups, we found that the IC50 values of five drugs (Docetaxel, Bosutinib, AZD7762, DMOG, and Sunitinib) were significantly different and lower in the low-risk group. Notably, these drugs have a wide range of applications in the treatment of BC patients, effectively inhibiting the growth of cancer cells and promoting their apoptosis [57]. Thus, these results suggested that patients in the low-risk group were more sensitive to these five drugs, whose 3D chemical structures are also illustrated (Fig. 6A; see Materials and Methods).
Since the high- and low-risk groups of NetPSig were determined using the cutoff optimal point for the cohort, we introduced the XGBoost model to distinguish the potential high- and low-risk categories for individual BC patients (see Materials and Methods). On the basis of the “merged” BC cohort and the best classifier selected by 10-fold cross-validation, the results showed that the overall accuracy and AUC were 81.4% and 0.834, respectively in the training cohort, and 80.1% and 0.807, respectively, in the testing cohort. Thus, indicating favorable predictive power and potential value in supporting the treatment of individual BC patients (Fig. 6B and C; see Materials and Methods). In total, these results provided a new perspective for assessing treatment response and precise dosing.