Classification of LUAD based on TME related genes
We downloaded data from TCGA dataset to calculate DEGs between tumor and normal tissues, and select genes related to TME. 2854 genes were identified, 903 of which were amplified in LUAD tissues (Fig. 1A). Then, NMF clustering analysis was used to divide all samples into two different clusters (C1 and C2). Compared with other classifications with more than two clusters, the consensus matrix heat map we selected maintains a better boundary (Fig. 1B). What’s more, it was not difficult to find that patients with C2 had a longer survival time (Fig. 1C) and a better progression survival time (Fig. 1D) than patients from C1.
Considering that immune cells play a important role in TME, we performed gene expression based method to estimate the abundance of specific immune cell types of the two subtypes. The immune scores of C1 and C2 cells were significantly different among neutrophils, NK cells, T cells, cytotoxic lymphocytes, endothelial cells, fibroblasts and myeloid dendritic cells (p < 0.05) (Figs. 2A–G). Previous studies have identified six immune subtypes associated with TME, and confirmed that patients with higher characteristics of inflammatory subtypes have a better prognosis (15). In our study, the inflammatory subtypes were also preferentially distributed in C2 (Fig. 2H).
Gene Signature For Lung Cancer Prognosis
At the optimal ratio of 7:3, 507 research samples were randomly turned into training set (n = 356) and testing set (n = 151). There was no significant difference in gender, age and TNM stage of patients at baseline (p > 0.05). We employed Univariate Cox analysis on the training cohort with 331 different TME related genes (p < 0.05). Lasso regression method was used to reduce the number of genes while maintaining high accuracy. In order to minimize the risk of over fitting, we performed LASSO regression algorithm to construct the best gene model (Fig. 3A-B). Then, Cox proportional risk model was applied to selected TME related genes with significant differences. Finally, five genes were screened out and the following equation is constructed:
Risk score=(C1QTNF6*0.365178453387755)+(PLEK2*0.284495496551624)+(FURIN*0.154361274455861)-(TM6SF1*0.396920071856817)+(IGF2BP1*0.208860117937174)
These five prognostic related genes played a key role in lung cancer progression. High risk genes (C1QTNF6, PLEK2, FURIN and IGF2BP1) indicated poorer prognosis. In contrast, TM6SF1 was associated with longer survival in LUAD patients.
Next, we studied the prediction effect of the model in training set and testing set respectively. In the training set, it could be clearly seen that the survival period of LUAD patients from the high-risk group was shorter (Fig. 3C). In order to further evaluate the validity and accuracy of the model, we carried out a time related ROC curve (Fig. 3E). The area under the curve of OS for 1-, 3- and 5-years was 0.698, 0.713 and 0.710, respectively. After that, we first tested the prognostic performance of the gene signature in the testing set. It could be clearly found that, consistent with the training set, patients in the high-risk and low-risk group showed different OS significantly (Fig. 3D). And The AUCs at 1-, 3- and 5-years of OS were all higher than 0.65 (Fig. 3F). Finally, in order to further verify the effectiveness of the model on different platforms, we downloaded data from GSE13213 (n = 117) to confirm the relevant results. We were surprised to find that the survival time of high-risk patients was still shorter than that of low-risk patients (Fig. 3G), while the AUC of OS in 1-year, 3-years and 5-years was 0.845, 0.602 and 0.622 respectively (Fig. 3H). These results were highly consistent with those observed in TCGA related datasets, proving the effectiveness of our prediction model.
Establishment Of Nomogram Associated With Clinicopathologic Factors
In order to integrate multiple LUAD clinical risk factors, we drew a nomogram to quantify the risk of patients. The 1-year, 3-year, and 5-year OS rates of LUAD patients were predicted by an nomogram based on gender, age, stage, T (tissue involvement), N (lymphoid involvement), and risk score, using five gene signatures (Fig. 4A). The corresponding calibration curve indicated that the predicted survival rate was closely related to the actual survival rate (Fig. 4B). Next, we used multi-index ROC curve to evaluate the accuracy of multiple risk indicators in predicting the 5-year survival rate. It was clear that nomograph and risk score showed better prediction accuracy (Fig. 4C). In addition, we evaluated the relationship between risk scores and related clinicopathological factors. Further stratification revealed that the risk score will increase with the progress of LUAD (Fig. 4D-F), and would not be affected by gender (female and male) and age ( < = 65 and > 65) (Fig. 5A-D). These results were sufficient enough to prove the accuracy and stability of the signature.
Comparison Of The Five-gene Signature Risk Model
We further compared our prognosis model with other models of the same type to prove the prediction performance, including Haiming Feng‘s model (16), Songming Chen‘s model (17) and Wei Wu‘s model (18). We calculated the risk score of each sample through multivariate Cox regression analysis as well. The ROC of all models were evaluated based on the corresponding genes. All of the study samples were then turned into high-risk group and low-risk group according to the median risk score. In the four models, the prognosis of low-risk group was all significantly better than that of high-risk group (p < 0.05) (Fig. 6A-D). However, the ROC of the other three models showed lower AUCs, indicating that our model had better prediction performance (Fig. 6E-H). We further calculated the C-index of the four models and found that our model had the highest value, 0.676 (Fig. 6I). Finally, RMS (root mean square) time was applied to estimate the prediction effect of the four models at different time points. At > 80 months, our model performed better than the other two gene signatures, except the signature of Songming Chen (Fig. 6J). All these proved that our model was superior to other models in predicting the 5-year and over-80-month survival rates of patients.
Gsea And Immune Correlation Analyses Of The Tme-related Signature
Finally, we conducted GSEA test on samples from two risk groups. The results showed that LUAD patients in the high-risk group were mainly concentrated in the following pathways and biological activities (Fig. 7A-B):
KEGG_CELL_CYCLE, KEGG_DNA_REPLICATION, KEGG_ECM_RECEPTOR_INTERACTION, KEGG_FOCAL_ADHESION, KEGG_PROTEASOME, GOBP_CHROMOSOME_SEGREGATION, GOBP_CORNIFICATION, GOBP_DNA_DEPENDENT_DNA_REPLICATION,
GOBP_DNA_REPLICATION, GOBP_EPIDERMAL_CELL_DIFFERENTIATION.
At present, immunotherapy has gradually become an important method to treat tumors, and tumor immune microenvironment is one of the important factors affecting the response of immunotherapy (19–20). Therefore, we performed MCP-counter counter algorithm to estimate the immune cell score of each LUAD patient, and then carried out their correlation with the risk score by Pearson method. The results indicated that the risk score was positively correlated with fibroblasts, and negatively correlated with T cells, monocyte lineages, myeloid dendritic cells, neutrophils and endothelial cells (Fig. 7D). Surprisingly, the risk score was also significantly positively correlated with the expression level of POLE2, FEN1 and MCM6 (DNA replication related genes) and LOXL2 (EMT related genes) (Fig. 7C).