3.1 Data source and pre-processing
The RNA-sequencing data and clinicopathological characteristics of 490 patients with LUAD were obtained from TCGA data. All patients were randomly allocated to a training set (n = 245) and a testing set (n = 245), and their corresponding clinicopathological characteristics are shown in Table 1.
Table 1
Clinical characteristics of the lung adenocarcinoma (LUAD) patients used in this Study
|
Training cohort
|
Test cohort
|
No. of patients
|
245
|
245
|
Age (%)
|
|
|
<=65
|
114(46.53%)
|
117(47.76%)
|
> 65
|
127(51.84%)
|
122(49.8%)
|
unknow
|
4(1.63%)
|
6(2.45%)
|
Gender (%)
|
|
|
Female
|
136(55.51%)
|
126(51.43%)
|
Male
|
119(48.57%)
|
119(48.57%)
|
TMN Stage(%)
|
|
|
Stage I-II
|
182(74.29%)
|
196(80%)
|
Stage III-IV
|
59(24.08%)
|
45(18.37%)
|
unknow
|
4(1.63%)
|
4(1.63%)
|
pT Stage(%)
|
|
|
T1−2
|
212(86.53%)
|
214(87.35%)
|
T3−4
|
30(12.24%)
|
31(12.65%)
|
unknow
|
/
|
3(1.22%)
|
pM Stage(%)
|
|
|
M0
|
158(64.49%)
|
166(67.76%)
|
M1
|
15(6.12%)
|
9(3.67%)
|
unknow
|
72(29.39%)
|
70(28.57%)
|
pN Stage(%)
|
|
|
N0
|
159(64.9%)
|
158(64.49%)
|
N1-3
|
77(31.43%)
|
85(34.69%)
|
unknow
|
9(3.67%)
|
2(0.82%)
|
3.2 Identification of prognostic CTA-related DEGs
A proportion of CTA-related genes (60/277, 21.7%) were differently expressed between tumor- and non-tumor tissues, and 16 of these were significantly associated with OS in the univariate Cox regression analysis (Fig. 1A). A total of 16 prognostic CTA-related DEGs are displayed in the clustering heatmap and forest plot in Fig. 1B, C, and these were used for the subsequent study.
3.3 Construction of a prognostic risk signature
A training cohort composed of 245 randomly selected patients was utilized to establish the prognostic signature, and the testing set was applied to identify its forecasting capability. We first conducted univariate Cox regression to screen prognostic values in the training cohort, and then multivariate Cox proportional hazards regression analysis to establish the risk signature. The detailed algorithm can be described as follows: risk score = (COX6B2*0.819) + (CEP55*0.47) + (LY6K*0.182) + (MAGEA8*0.484) + (TTK×-0.673) + (HEMGN×-6.878), with a risk score calculated for each patient. We then set the median risk score as the optimal cut-off value to divide the patients into high-risk and low-risk groups (Fig. 2A). As shown in Fig. 2B, patients in the low-risk group possessed a longer survival time than the high-risk group. Kaplan-Meier survival curves then revealed that patients with low risk had a better OS than those with high risk (Fig. 2C). To evaluate the prognostic power of the risk score for OS, we exploited time-dependent ROC curve analysis. The AUCs for one-, three-, and five-year overall survival predictions with respect to risk score were 0.671, 0.727, and 0.735, respectively (Fig. 2D). These findings suggested that the signature reflected a moderate potential prediction of survival in the training cohort.
3.4 Validation of the CTA-related prognostic risk signature
To further validate the robustness of the risk score model based on correlations with the 16 prognostic genes, we performed similar analyses on the testing set using the same algorithm described above. When the patients in the testing set were assigned to a low-risk or high-risk group (Fig. 3A), we ascertained that the majority of clinical characteristics between the different risk groups were not statistically significant except for the pN Stage (Table 2). OS was significantly different in the two predicted risk groups (Fig. 3B), and Kaplan-Meier survival curves revealed that the outcomes were significantly improved in low-risk patients when compared to patients at higher risk (Fig. 3C). In addition, the AUCs of the gene signature at one, three, and five years were 0.619, 0.543, and 0.645, respectively. (Fig. 3D). The results from our training and testing sets suggested that the gene signature efficiently predicted OS in patients with LUAD.
Table 2
Baseline characteristics of the patients in different risk groups
Characteristics
|
Training cohort
|
|
Test cohort
|
|
High risk
|
Low risk
|
P value
|
|
High Risk
|
Low Risk
|
P value
|
Age (%)
|
|
|
|
|
|
|
|
<=65
|
59(48.36%)
|
55(44.72%)
|
0.5685
|
|
52(44.07%)
|
65(51.18%)
|
0.2671
|
> 65
|
60(49.18%)
|
67(54.47%)
|
|
|
64(54.24%)
|
58(45.67%)
|
|
unknow
|
3(2.46%)
|
1(0.81%)
|
|
|
2(1.69%)
|
4(3.15%)
|
|
Gender (%)
|
|
|
|
|
|
|
|
Female
|
58(47.54%)
|
68(55.28%)
|
0.2780
|
|
60(50.85%)
|
76(59.84%)
|
0.1981
|
Male
|
64(52.46%)
|
55(44.72%)
|
|
|
58(49.15%)
|
51(40.16%)
|
|
TMN Stage(%)
|
|
|
|
|
|
|
|
Stage I-II
|
87(71.31%)
|
95(77.24%)
|
0.2453
|
|
93(78.81%)
|
103(81.1%)
|
0.4147
|
Stage III-IV
|
34(27.87%)
|
25(20.33%)
|
|
|
25(21.19%)
|
20(15.75%)
|
|
unknow
|
1(0.82%)
|
3(2.44%)
|
|
|
0(0%)
|
4(3.15%)
|
|
pT Stage(%)
|
|
|
|
|
|
|
|
T1-2
|
103(84.43%)
|
109(88.62%)
|
0.3294
|
|
101(85.59%)
|
113(88.98%)
|
0.5461
|
T3-4
|
18(14.75%)
|
12(9.76%)
|
|
|
17(14.41%)
|
14(11.02%)
|
|
unknow
|
1(0.82%)
|
2(1.63%)
|
|
|
|
|
|
pM Stage(%)
|
|
|
|
|
|
|
|
M0
|
74(60.66%)
|
84(68.29%)
|
0.8328
|
|
75(63.56%)
|
91(71.65%)
|
1
|
M1
|
8(6.56%)
|
7(5.69%)
|
|
|
4(3.39%)
|
5(3.94%)
|
|
unknow
|
40(32.79%)
|
32(26.02%)
|
|
|
39(33.05%)
|
31(24.41%)
|
|
pN Stage(%)
|
|
|
|
|
|
|
|
N0
|
73(59.84%)
|
86(69.92%)
|
0.0413
|
|
68(57.63%)
|
90(70.87%)
|
0.0414
|
N1-3
|
47(38.52%)
|
30(24.39%)
|
|
|
49(41.53%)
|
36(28.35%)
|
|
unknow
|
2(1.64%)
|
7(5.69%)
|
|
|
1(0.85%)
|
1(0.79%)
|
|
3.5 Independent prognostic value of the gene signature
To assess the influences of the clinicopathological factors of LUAD patients and of prognostic signature risk score on patient OS, we employed univariate and multivariate Cox regressions as illustrated using R language and forest plots in Fig. 4A, B. Using univariate Cox analysis, TNM stage (HR = 1.533, 95% CI = 1.263–1.862, p < 0.001; HR = 1.820, 95% CI = 1.474–2.248, p < 0.001, respectively) and risk score (HR = 2.674; 95% CI = 1.702–4.199, p < 0.001; HR = 1.715, 95% CI = 1.113–2.641, p = 0.014, respectively) were significantly associated with prognosis in both the training and testing sets. After multivariate adjustment of other clinical factors, the TNM stage (training set, HR = 1.524, 95% CI = 1.244–1.866, p < 0.001; testing set, HR = 1.805, 95% CI = 1.452–2.246, p < 0.001) and risk score (training set, HR = 2.612, 95% CI = 1.640–4.160, p < 0.001; testing set, HR = 1.570, 95% CI = 1.018–2.420, p = 0.041) acted as independent prognostic factors of OS in the multivariate Cox regression analysis.
3.6 Functional analyses of TCGA cohort data
To uncover the underlying biological functions and pathways correlated with risk score, GO-enrichment and KEGG-pathway analyses were performed based on the prognostically related DEGs between the high-risk and low-risk groups. As can be observed from Fig. 5A, DEGs were significantly enriched in biological processes (BP), including humoral immune response, protein maturation, and extracellular matrix organization. These findings were consistent with the natural immunogenic properties of CTAs. The top 10 cellular components (CC) with respect to collagen-containing extracellular matrix, endocytic vesicle, and collagen trimer were found to be related (Fig. 5A), and the top 10 molecular functions (MFs) included receptor ligand activity, signaling receptor activator activity, and extracellular matrix structural constituent (Fig. 5A). The results of KEGG-pathway analysis disclosed DEGs as primarily enriched in focal adhesion, cytokine-cytokine receptor interaction, and neuroactive ligand-receptor interaction (Fig. 5B).
3.7 Identification of prognosis-related tumor-infiltrating immune cells
To investigate whether the prognostic DEGs were associated with status of the tumor immune microenvironment, we compared tumor-infiltrating immune cells between the different risk groups (the abundance ratios of 22 immune cell types in the 490 LUAD patients as estimated by CIBERSORT are displayed in Fig. 6A). When we examined the 22 immune cell types, we noted that the infiltration levels of memory resting CD4 T cell, monocytes, and dendritic resting cells—as well as resting mast cells—were positively correlated with risk score (Fig. 6B), and that M0 macrophages and activated mast cells were negatively correlated with risk score (Fig. 6B). No significant differences were uncovered between the two groups in their numbers of plasma cells, naive B cells, memory B cells, naive CD4 + T cells, CD8 + T cells, activated memory CD4 + T cells, follicular helper or gamma delta T cells, monocytes, M1 or M2 macrophages, resting or activated NK cells, activated dendritic cells, eosinophils, or neutrophils. We also found that elevated levels of monocytes (Fig. 6C) and plasma cells (Fig. 6E) were associated with improved OS, while higher resting NK cells (Fig. 6D) were related to worse OS.
3.8 Risk signature and TMB
To probe the relationship between risk signature and mutation data, we calculated the number of mutations per million bases in the 490 LUAD patients as the TMB. When we then used the median TMB as the cut-off value to stratify the patients into high- and low-TMB groups, we observed that the number of TMB of patients at high risk was notably greater than for those patients at low risk. (Fig. 7A). However, when we applied Kaplan-Meier analysis to evaluate the potential correlation between TMB in lung cancer and patient OS (Fig. 7B), we noted no association (p = 0.986).
3.9 Immune signature and response to ICI treatment
Tumor immunology and immunotherapy have in recent years offered novel modalities in the treatment of cancer. Kalbasi et al. demonstrated that immune regulatory checkpoints used as targets for immunotherapy—including inhibitors of CTLA4 and PD-1 and PD-L1—exhibited great promise (Kalbasi et al. 2020). IPS is a machine learning-based classifier that can be used to predict response to ICIs in pan-cancer analyses. In the present study, we comprehensively explored the correlation between IPS and our immune signature (IS) in LUAD patients. The IS and potential responses to ICI therapy were evaluated using the ips_ctla4_neg_pd1_neg、ips_ctla4_neg_pd1_pos、ips_ctla4_pos_pd1_neg, and ips_ctla4_pos_pd1_pos scores, and we showed that these four types of IPS were higher in the low-risk group (Fig. 7C–F). These findings indicated that the low-risk group performed better when receiving anti-PD1 and anti-CTLA4 immunotherapy, or a combination of the two therapies. Additionally, we assessed whether the expression of the three genes differed between the high-risk and low-risk groups, and showed that CTLA4 gene expression was augmented in the low-risk group (p = 0.0096, Fig. 7G). We also noted a tendency for PD-1 and PD-L1 expression in low-risk patients to be elevated relative to high-risk patients, but not to a statistically significant extent (Fig. 7H, I).