Single Cell Sequencing Data analysis
After data processing and screening, a total of 38307 cells from 9 TNBC samples were obtained for subsequent analysis. We conducted PCA by using the top 2000 variable genes (Fig. 1A). We select the top 9 principal components and identified 15 clusters by using Seurat (Fig. 1B). We then annotated cell identity of each cluster by typical cell surface markers (Fig. 1C and D). 12088 cells in cluster 1, 4 and 5 expressing epithelial markers were defined as epithelial cells. Next, we identified 7127 malignant cells and 4961 non-malignant epithelial cells by using “infercnv” package (Fig. 1E). We identified 487 malignant cell related genes including 256 downregulated genes and 231 upregulated genes (Fig. 2A). We further analyzed the transcription factor activity by using “SCENIC” package. As displayed in Fig. 2B, E2F4, ATF4, and CEBPB activity were upregulated in malignant cells, while SOX11, TFAP2B, and SOX4 activity downregulated in malignant cells. To explore the difference of cancer related pathways between malignant cells and non-malignant epithelial cells, we performed GSVA analysis. The GSVA results were presented in Fig. 2C, MYC targets, NOTCH signaling, and WNT beta catenin signaling pathway were higher in malignant cells than in non-malignant epithelial cells.
Construction, and evaluation of malignant cell related signature in TNBC
The malignant cell related genes were further assessed for their association with the survival of TNBC in the TCGA training set. We performed univariate Cox proportional hazards regression analysis and identified 34 malignant cell related genes that were fitted into a LASSO Cox analysis to identify the optimal prognostic candidates. As the LASSO Cox analysis showed (Fig. 3A and B), 10 malignant cell related genes (DSTN, FABP7, KDELR1, MUCL1, NANS, POLR2F, S100A16, SKP1, TP53I11, and VDAC1) were selected. We constructed a prognostic signature by integrating the 10 malignant cell related genes expression profiles and corresponding LASSO Cox regression coefficients. According to the signature, we calculated the risk score of TNBC patients in the TCGA training set and the whole TCGA set and ranked them into high-risk and low-risk group according to the median of risk score. The survival curve showed patients in high-risk group have significantly worse OS than patients in low-risk group (P < 0.0001, Fig. 3C and D). The prognostic power of the signature was evaluated by calculating the AUC. The ROC curve showed that this signature had good performance for predicting the survival of TNBC patients (Fig. 3E and F). Distribution of risk score, and survival status of TNBC patients in the TCGA set were displayed in Supplemental Fig.1, the mortality was much higher for patients with higher risk score than those with lower risk score.
Correlation of the 10-malignant cell related signature with clinicopathological factors of TNBC
We further explored the correlations of risk score with clinicopathological characteristics (including age, stage, T stage, N stage, M stage) of TNBC patients. The risk score was not correlated with age of TNBC patients (Fig. 4A). TNBC patients with higher T stage had an upward trend in risk score (Fig. 4B). As to N stage, the risk score was highest in N3 stage TNBC patients. Patients with N2 stage had higher risk score than patients with N0 stage and N1 stage had (Fig. 4C). The risk score was higher in M1 stage patients with no statistically significant difference (Fig. 4D). TNBC patients with higher stage tend to have higher risk score (Fig. 4E). Specifically, stage III/IV TNBC patients had higher risk score than stage I/II patients had (Fig. 4F).
Validation of 10-malignant cell related signature in the external set
To verify the reliability of this signature in TNBC, we applied this signature in the external test set (GSE58812). According to the 10-malignant cell related signature, we calculated the risk score for each patient in the test set and ranked them into a high-risk group and a low-risk group. As presented in Fig. 5A, patients in high-risk group have significantly worse OS than patients in low-risk group (P = 0.0319). The AUC of 10-URG signature for predicting 1-year, 3-year and 5-year survival of TNBC patients in the test set was 0.87, 0.689 and 0.695 respectively, which indicated good performance (Fig. 5B).
Drug sensitivity
As shown in Fig.6, high-risk group patients were more sensitive to AZD8055, AZD2014, AZD6482, PF-4708671, and Uprosertib which target at PI3K/mTOR signaling pathway. Low-risk group patients were more sensitive to ABT737, WEHI539, and Sepantronium bromide which target at apoptosis signaling pathway. Taken together, high-risk group patients may benefit from therapy targeting at PI3K/mTOR signaling pathway and low-risk group patients may benefit from therapy targeting at apoptosis signaling pathway.