In total, 949 patients were included. 507 LUAD patients from the TCGA cohort (235 [46.3%] male, mean [SD] age, 65.30 [10.03]), 442 patients from the validation cohort (223 [50.4%] male, mean [SD] age, 64.39 [10.09])
The construction and validation of novel prognostic model
After screening, 133 lysosome-related genes were obtained in TCGA and GSE68465 cohorts. Then, the univariate Cox analysis was used to explore 133 lysosome-related genes. To prevent model overfitting, LASSO penalized Cox regression modeling was conducted to screen the key lysosome-related genes associated with survival. With this method, a novel prognostic gene model with 26 genes was constructed (Figure1 A-B). And then, risk scores per sample were calculated using the following model formula:
In this formula, β is coefficient and X is the expression level of each prognostic gene i. The samples were divided into a high-risk group and a low-risk group according to the best cutoff value of the training cohort from TCGA-LUAD. As shown by the Kaplan–Meier analyses, patients in the high-risk group had worse OS than those in the low-risk group (P <0.0001) (Figure1 C-D). In the training cohort, the AUC values of the present risk model were 0.71, 0.71, and 0.71 for the 1-, 3-, and 5- year prognoses, respectively. We used GSE68465-cohort to validate this model. In the validation cohort, the AUC values of the present risk model were 0.70, 0.66, and 0.61 for the 1-, 3-, and 5- year prognoses, respectively (Figure1 E-F). The distribution plot of risk score and survival status revealed that the number of TCGA-LUAD patients with a status of deceased increased as the risk score in the training set rose. In GSE68465-cohort, the low-risk group maintained its superior survival status and longer survival time from the training set (Figure2 A-D). In addition, a heatmap demonstrated that the expression of 26 prognostic genes varied significantly among TCGA-LUAD patients with varying risk scores (Figure2 E-F).
Figure 1: Construction and validation of the prognostic model based on the lysosome-related gene signatures in lung adenocarcinoma (LUAD).
(A, B) LASSO analysis with minimal lambda value. (C)The Kaplan–Meier survival analysis showing the difference in overall survival (OS) between the high- and low-risk groups in the training, (D)and validation cohorts. (E)Time-dependent ROC curve analysis in the training, (F)Time-dependent ROC curve analysis in the validation cohorts.
Figure 2: Evaluation and validation of the utility of prognostic signature in the training set and validation set
(A, C) The distribution of risk score and survival status of LUAD patients with different risk scores in the training, (B, D) and validation cohorts. (E, F) Heatmap of the prognostic signatures expression profiles in the high- and low-risk groups in the training and validation cohorts, separately.
Independent Prognostic Factor Analysis and construction of Nomogram
Univariate and multivariate Cox regression analyses were performed by introducing age, gender, stage, TNM stage, and risk scores to assess the independence of risk scores in the survival prediction of LUAD patients. Among the samples in the training cohort, the results showed that Clinicopathologic stage, T stage, M stage, N stage, and risk score were identified as independent negative prognostic factors for patients with LUAD (Figure3 A-B). In addition, risk scores also showed significant differences in age, gender, Clinicopathologic stage, T stage, M stage and N stage (Figure4 A-P). Based on the training cohort, risk scores and clinical factors that were identified as independent negative prognostic factors were integrated to create a nomogram to improve the predictive power of survival in LUAD patients. Calibration plots for 1-, 3- and 5-years OS revealed good agreement between nomogram prediction and actual observations (Figure3 C-D).
Figure 3: Construction and evaluation of the novel nomogram
The univariate Cox regression analysis of the risk score and other clinical features in the training cohort, (B) The multivariate Cox regression analysis of the risk score and other clinical features in the training cohort. (C) A nomogram using risk scores combined with clinical characteristics. (D)The calibration plots of the nomogram for predicting OS probability for 1-, 3-, and 5- year in the training.
Figure 4: The overall survival analysis of risk score in each clinical subtype.
Risk Signature-Based Immune Cell Infiltration, Tumor Microenvironment Analyses
We used CIBERSORT to quantify immune cells, it was found that the expression of Plasma cells, T cells CD4 memory resting, T cells regulatory (Tregs), Dendritic cells resting, Mast cells resting was significantly higher in the low-risk group than in the high-risk group (Figure5 A). Immune-checkpoint related genes like CD44, CD276, TNFRSF9, TNFSF4, TNFSF9, CD70, DCD1LG2 and TMIGD2, were more lowly expressed in the low-risk group (Figure5 B). What’s more, with the increase of risk score, the high-risk group had a lower estimate score, immune score, and stromal score (Figure5 C). In addition, the TMB score of LUAD was higher in high risk group (Figure5 D).
Figure 5: Risk Signature-Based Immune Cell Infiltration, Tumor Microenvironment Analyses
(A) The differences in the scores of immune cells between high- and low-risk groups in the training. (B)The differentially expressed immune checkpoint-related genes between the high- and low-risk groups. (C)ESTIMATE, immune, and stromal scores between the high- and low-risk groups in the training. (D)The difference in tumor mutation burden (TMB) between the high- and low-risk groups in the training. *p < 0.05, **p < 0.01, ***p < 0.001.
Functional and Pathway Enrichment Analyses
502 differential expression genes (DEGs) included 264 up-regulated genes and 238 regulated genes were screened between high-risk and low-risk group in training cohort (Figure6 A-B). By considering the DEGs between the high- and low-risk groups from TCGA-LUAD, we conducted GO enrichment analysis, KEGG pathway analysis to explore the potential biological functions of these DEGs. “mitotic nuclear division”, “mitotic sister chromatid segregation”, “nuclear division”, “chromosome segregation” and “organelle fission” were the most enriched terms among the biological process categories. “condensed chromosome, centromeric region”, “condensed chromosome kinetochore”, “kinetochore”, “chromosome, centromeric region” and “spindle” were the most enriched terms among the cellular component categories, and “microtubule binding”, “tubulin binding”, “microtubule motor activity”, “peptidase inhibitor activity” and “enzyme inhibitor activity” ere the most enriched terms among the molecular function categories. “Cell cycle”, “p53 signaling pathway”, “Oocyte meiosis”, “ECM−receptor interaction” and “Progesterone−mediated oocyte maturation” were identified to be the most enriched among the KEGG pathways of the DEGs (Figure6 C-D).
Figure 6: Analysis of differences between high- and low-risk groups and functional and pathway enrichment analyses
Differential expression of ERG expression between high- and low-risk groups in the training. (B)The volcano plot exhibited both down- and up-regulated ERGs. (C)GO enrichment analysis and (D)KEGG pathway analysis based on the DEGs between the high- and low-risk groups in the training.
Drug sensitive
Differences in drug sensitivity of different risk subgroups were analyzed to investigate the clinical application value of the risk model. Results showed that Camptothecin, Cisplatin, Docetaxel, Doxorubicin, Etoposide, Gemcitabine, Paclitaxel,Vinorelbine and Vinblastine had good effects on patients in low-risk groups(Figure7 A-I).
Figure 7: Prediction of drug susceptibility in different risk groups.
(A–I) Sensitive drugs in low-risk groups.