3.1 Differential Gene Expression Analysis in LUSC Tumours versus Adjacent Parenchyma
The study flowchart is shown in Fig. 1. The top 50 genes that were most up- or down-regulated in LUSC tumours versus the adjacent healthy parenchyma were XXXisualized by generating heat maps and volcano plots (Supplementary Figs. 1A and 1B).
3.2 Identification of Two LUSC Immune and Metabolism-related Subtypes and Their Immune Characteristics
By analysing typing parameters (Figure S1C), we identified the graph with the steepest slope and determined the optimal k-value as k = 2. Subsequently, we examined the typing heatmap (Figure S1D) and found that both k = 2 and k = 3 met our criteria. Based on these considerations, we confirmed k = 2 as optimal and generated a heatmap (Figure S1E) illustrating two distinct types, C1 and C2.
We observed that type-C2 samples exhibited significantly higher OS (P < 0.001) and PFS (P < 0.05) than type-C1 samples (Figure S1F).
Significant differences were observed in StromalScores, ImmuneScores, and ESTIMATEScores between type-C1 and type-C2 (all P < 0.001). We noted that both StromalScores and ImmuneScores were higher in type C1, indicating higher levels of stromal and immune cells in type-C1 samples than in type-C2 samples (Figure S1H).
We observed that C1 was associated with the abundant infiltration of B-lineage cells, CD8 T cells, endothelial cells, fibroblasts, monocyte-lineage cells, myeloid dendritic cells, natural killer cells, and T cells (Supplementary Figs. 1I and 1J), indicating a closer association with immune activity.
3.3 Development and Validation of LUSC Immune and Metabolism-related Risk Model for Predicting Prognosis
Through model construction, we derived the formula based on gene expression levels and their coefficients (Table 1). The risk score was calculated as follows:
Table 1
Gene and coefficient of constructing model
Id | Coefficient | HR | HR.95L | HR.95H | P |
SERPIND1 | 0.152727378 | 1.208145351 | 1.086755421 | 1.343094464 | 0.000465454 |
LBP | 0.11868129 | 1.240198465 | 1.0904023 | 1.410573173 | 0.001046472 |
HBEGF | 0.193394236 | 1.163216977 | 1.015253579 | 1.33274461 | 0.029402358 |
NRTN | -0.132797244 | 0.816321497 | 0.695247582 | 0.958479833 | 0.013223633 |
TSLP | -0.190345011 | 0.828457243 | 0.704798259 | 0.973812571 | 0.022506822 |
RORB | -0.412926526 | 0.725818869 | 0.535955103 | 0.982942467 | 0.038343733 |
CYP2C18 | -0.133174677 | 0.853002358 | 0.753135214 | 0.966112072 | 0.012327665 |
PTGIS | 0.151833524 | 1.207013144 | 1.055986097 | 1.379640068 | 0.005803313 |
MIOX | -0.24277058 | 0.721446409 | 0.549405696 | 0.947359894 | 0.018822815 |
Risk score = ∑ (Gene expression × Coefficient)
To ensure robustness, we randomly divided the dataset once into a 70% training group and a 30% test group. Univariate Cox analysis of the training group revealed significant genes, which were further analysed using Lasso regression for feature selection. The resulting model (Figs. 2A and 2B) was cross-validated to determine optimal parameters, and the point with the minimum cross-validation error was found. We identified that 17 genes corresponded to this point. We used these genes to construct a prognostic Cox model and then optimized it to refine its formula. Survival differences between the high-risk and low-risk groups were evaluated using AUC values in both the training and test groups and in an external group derived from the GEO database. To obtain satisfactory results, we performed hundreds of random-data tests. The training and test groups did not significantly differ in any clinical trait (all P > 0.05), indicating that the grouping was not biased according to the clinical traits (Table 2).
Table 2
Covariates | Type | Total | Test | Train | P |
Age | <=65 | 189(38.18%) | 64(43.24%) | 125(36.02%) | 0.1758 |
> 65 | 300(60.61%) | 83(56.08%) | 217(62.54%) |
unknow | 6(1.21%) | 1(0.68%) | 5(1.44%) |
Gender | FEMALE | 129(26.06%) | 41(27.7%) | 88(25.36%) | 0.6659 |
MALE | 366(73.94%) | 107(72.3%) | 259(74.64%) |
Stage | Stage I | 242(48.89%) | 72(48.65%) | 170(48.99%) | 0.4402 |
Stage II | 159(32.12%) | 43(29.05%) | 116(33.43%) |
Stage III | 83(16.77%) | 30(20.27%) | 53(15.27%) |
Stage IV | 7(1.41%) | 3(2.03%) | 4(1.15%) |
unknow | 4(0.81%) | 0(0%) | 4(1.15%) |
T | T1 | 114(23.03%) | 33(22.3%) | 81(23.34%) | 0.6529 |
T2 | 288(58.18%) | 88(59.46%) | 200(57.64%) |
T3 | 70(14.14%) | 18(12.16%) | 52(14.99%) |
T4 | 23(4.65%) | 9(6.08%) | 14(4.03%) |
M | M0 | 407(82.22%) | 122(82.43%) | 285(82.13%) | 0.7483 |
M1 | 7(1.41%) | 3(2.03%) | 4(1.15%) |
unknow | 81(16.36%) | 23(15.54%) | 58(16.71%) |
N | N0 | 316(63.84%) | 90(60.81%) | 226(65.13%) | 0.3042 |
N1 | 128(25.86%) | 37(25%) | 91(26.22%) |
N2 | 40(8.08%) | 17(11.49%) | 23(6.63%) |
N3 | 5(1.01%) | 1(0.68%) | 4(1.15%) |
unknow | 6(1.21%) | 3(2.03%) | 3(0.86%) |
The coefficients of the nine identified genes were visualized according to the positive and negative symbols of each gene in the formula, with > 0 and < 0 presented in yellow and blue, respectively (Fig. 2C). The coefficients of four genes (HBEGF, SERPIND1, PTGIS, and LBP) were > 0, and the coefficients of five genes (NRTN, CYP2C18, TSLP, MIOX, and RORB) were < 0. In the four graphs obtained (Fig. 2D), the differences between survival rates in the high or low-risk groups in all TCGA data groups, the TCGA training group, the TCGA validation group, and the GEO validation group were all statistically significant (all P < 0.05). Additionally, the AUC values of 1, 3, and 5-year survival in all TCGA data groups, the TCGA training group, the TCGA validation group, and the GEO validation group were all > 0.5). These results show that the constructed model effectively distinguished the risk groups, indicating its predictive capability.
Risk curves (Fig. 2E) were generated using the “pheatmap” package in R, stratifying patients by risk score. As the patient’s risk value increased, the expression levels of HBEGF, SERPIND1, PTGIS, and LBP all increased, indicating that they were risk-associated genes. However, the expression levels of NRTN, CYP2C18, TSLP, MIOX, and RORB were reduced, indicating that they were protective genes.
We observed that in the two graphs (Fig. 3A), the P-value for survival probability in the high and low-risk groups in early-stage LUSC patients was < 0.001, indicating that the model we constructed was more suitable for early-stage LUSC samples than late-stage LUSC samples. Subsequent clinical treatment analysis showed that our model was more suitable for early-stage-LUSC samples.
3.4 Risk Scores Based on These Nine Genes Were Independent Prognostic Factors for LUSC Patients
We performed both univariate and multivariate Cox regression analysis to investigate the potential independence of the riskScore as a prognostic factor. The results indicated significant correlations between age, stage, and risk scores with the prognosis of LUSC patients (Fig. 3B, P < 0.05). The multivariate Cox regression analysis showed that age, stage, and risk scores could serve as independent prognostic factors for LUSC patients (Fig. 3B, P < 0.05).
The P-value for riskScores between Stage I and Stage II was < 0.05, indicating that the risk scores of patients in these two stages were different, whereas there was no difference among other stages (Fig. 3C). Similarly, the P-value for riskScores between N0 and N1 was < 0.05, indicating that the risk scores of patients in these two stages were different, whereas there was no difference among other stages. Additionally, the P-values for the riskScores of other clinical traits, including age, gender, T, and M, were all > 0.05, indicating no difference in the risk scores of these clinical traits. These findings highlight the substantial variation in riskScores among different clinical variable groups.
We constructed a nomogram incorporating the clinical factors related to survival as a quantitative method to predict the survival rate of LUSC patients (Fig. 3D). The accuracy of the nomogram was assessed using calibration curves (Fig. 3E) and the area under the ROC curve (Fig. 3F). The concordance index (C-index) was calculated as 0.689 (95% CI: 0.649 − 0.729). The nomogram demonstrated improved predictive accuracy (AUC = 0.743) compared to other clinical features and original risk scores. Additionally, the DCA results confirmed the better predictive accuracy of the nomogram compared to other predictive indexes (Fig. 3G).
3.5 Analysis of Immune Landscape in LUSC Based on Risk Scores
PDCD1, CTLA4, FAP, TAGLN, and LOXL2 showed positive correlations with patient risk scores (Figure S2A). Conversely, POLE2, FEN1, MCM6, POLD3, MSH6, and MSH2 exhibited negative correlations with patient risk scores.
T cells, CD8 T cells, Monocytic lineage, Endothelial cells, and Fibroblasts exhibited positive regulatory relationships with patient risk scores (Figure S2B). Additionally, according to the violin plot analysis, we observed differences in immune cell populations between high-risk and low-risk groups, including endothelial cells, fibroblasts, monocyte lineages, neutrophils, natural killer cells, T cells, and B lineages. These immune cells were all enriched in the high-risk groups. These results suggest a close relationship between riskScores and immune cells.
In the high-risk group, signalling pathways were significantly enriched, including complement and coagulation cascades, the intestinal immune network for IgA production, graft versus host disease, hematopoietic cell lineage, and cytokine-cytokine receptor interaction (Figure S2C). In contrast, the low-risk group was characterized by ascorbate and aldarate metabolism, pentose and glucuronate interconversions, and drug-metabolizing cytochrome P450, suggesting a stronger connection between the high-risk group and immunity and a stronger connection between the low-risk group and metabolism.
Figure S2D shows patient risk score, TMB, Pan-cancer Microsatellite Instability (MSI), and immune-cell names. From this graph, we noted that most immune cells exhibited a positive regulatory relationship with the patient risk score but a negative regulatory relationship with MSI, and the relationship between immune cells and TMB displayed a mixed pattern of positive and negative regulatory relationships.
We determined the optimal TMB cutoff as 2.55263157894737. According to this cutoff, we divided the samples into two groups based on high and low TMB. We observed a significant difference in survival between the high-TMB and low-TMB groups (P < 0.05, Figure S2E). Combining TMB with patient risk scores for survival analysis revealed a P-value < 0.05, indicating differences in survival rates among the four combinations of high-TMB or low-TMB and high-risk or low-risk (Figure S2F). These findings underscored the correlation of TMB with patient prognosis, highlighting the lowest survival rates in the low-TMB and high-risk groups.
We observed that the difference in TMB between the high-risk and low-risk groups was not significant (P > 0.05, Figure S2G). Furthermore, the correlation coefficient R was < 0, and the P-value of the correlation test was < 0.05, indicating a negative regulatory relationship between TMB and patient risk score (Figure S2H). This observation could have significant implications for the efficacy of immunotherapy, as higher TMB is known to correlate with a heightened response to immunotherapy.
3.6 Excellent Predictability of Our Model Compared with Other LUSC Prognostic Signatures
We compared our model with four previously reported models, named after their first authors HuXiaoshan[5], PuJiangtao[6], WangZhe[7], and ZhouYuting[8]. Among all the models, ours showed the highest AUC values (1 year, 0.653; 3 years, 0.706; and 5 years, 0.704) and C-index (0.655), indicating that our model outperformed the other models in predicting patient survival as a valuable tool in clinical practice (Figure S3A-C). The higher accuracy and robustness of riskScore, as supported by multiple evaluation metrics, indicate its significant contribution to LUSC prognosis prediction.
3.7 Evaluation of the Immunotherapy Response Based on Risk Scores
The P-value of the first two scores was < 0.05 (Fig. 4A), indicating differences in Immune cell Proportion Score (IPS) for two different immunotherapies between the high-risk and low-risk groups. Additionally, we observed that the IPS of samples in the high-risk group was higher than that in the low-risk group, suggesting a more favourable response to a certain immunotherapy in the high-risk group. We predicted different optimal immunotherapies for high-risk and low-risk patients. Patients in the high-risk group who received anti-PD-1 alone or a combination of anti-CTLA4 and anti-PD1 had better outcomes than those in the low-risk group.
According to the TIDE algorithm (Fig. 4B), the low-risk group exhibited a lower TIDE score than the high-risk group, suggesting that patients with a low riskScore benefit more from immunotherapy. Additionally, the microsatellite instability score was higher in the low-risk group, whereas the T-cell dysfunction score and Tumour Inflammation Signature (TIS) score were higher in the high-risk group.
Furthermore, the predictive value of riskScore for OS was assessed using ROC curves. We found that the AUC of riskScore was superior to that of TIDE or TIS, indicating that riskScore had a better predictive value for OS than TIDE or TIS (Fig. 4C).
3.8 Drug Sensitivity and Chemotherapy Drug Target Provide Strong Support for the Basis of Chemotherapy
For Ribociclib, SCH772984, Selumetinib, and SB505124 (Figure S4), the high-risk group exhibited lower drug-sensitivity scores. Conversely, for other drugs, the low-risk group exhibited lower drug sensitivity scores.
We found that targets of drugs such as tirrellizumab (targeting PDCD1), pabolizumab (targeting PDCD1), nabuliumab (targeting PDCD1), sindillizumab (targeting PDCD1), ipilimumab (targeting CTLA4), bevacizumab (targeting C1QA, C1QB, C1QC, FCGR3A, FCGR1A, FCGR2A, FCGR2B, and FCGR2C), androtinib (targeting KDR, PDGFRB, FGFR3, and KIT), and crizotinib (targeting ROS1 and MST1R), were higher in the high-risk group than in the low-risk group. Conversely, the expression levels of paclitaxel targets (TUBB and BCL2), gemcitabine target (TYMS), vinorelbine target (TUBB), and etoposide targets (TOP2A and TOP2B) were higher in the low-risk group (Fig. 4D).
3.9 Better Radiotherapy Effects in the Low-risk Group
The RSI score of the low-risk group was lower than that of the high-risk group, indicating that the low-risk group is expected to benefit more from radiotherapy than the high-risk group (Fig. 4E).
3.10 Genetic Variations of Nine Genes May Influence the Prognosis of Patients with LUSC
Among the 1256 LUSC patients (Fig. 5A), SERPIND1 had seven missense-mutation cases, 48 amplification cases, and 10 deep-deletion cases. LBP had 11 missense-mutation cases, two truncating-mutation cases, one splice-mutation case, and 26 amplification cases. HBEGF had three missense-mutation cases, two amplification cases, and four deep-deletion cases. NRTN had five amplification cases and one deep-deletion case. TSLP had one missense-mutation case, one amplification case, and six deep-deletion cases. RORB had eight missense-mutation cases, three truncating-mutation cases, three amplification cases, and nine deep-deletion cases. CYP2C18 had 10 missense-mutation cases, one splice-mutation case, two amplification cases, and seven deep-deletion cases. PTGIS had five missense-mutation cases, 14 amplification cases, and two deep-deletion cases. MIOX had four missense-mutation cases, one splice-mutation case, 10 amplification cases, and 13 deep-deletion cases. Deep deletion and amplification were the most common genetic alterations observed in these genes. SERPIND1 and LBP exhibited the highest genetic-variation frequencies among the genes studied.
All the nine genes showed mutations in the functional domains of their proteins (Fig. 5B). These mutations may impact the prognosis of LUSC patients (Fig. 5C).
TP53, TTN, CSMD3, MUC16, LRP1B, SYNE1, ZFHX4, FAM135B, KMT2D, SPTA1, PKHD1, PCLO, and PAPPA2 were more frequently mutated in the low-risk group than in the high-risk group. In contrast, RYR2, USH2A, NAV3, XIRP2, and RYR3 were more frequently mutated in the high-risk group than in the low-risk group. The mutation frequencies of CDH10 and PCDH15 were the same in both the high-risk and low-risk groups (Figs. 5D).
3.11 Airway Transmission and T Stage Independently Influence the Prognosis of Patients with Stage-II or Stage-III LUSC
For continuous variables (6 items in total, namely Age, Smoking index, Tumour volume, Smoking time, Smoking cessation time, and Smoking quantity), X-tile analysis identified the respective cutoff points (Age cutoff = 30, Smoking index cutoff = 370, Tumour volume cutoff = 88, Smoking time cutoff = 30, Smoking cessation time cutoff = 1, Smoking quantity cutoff = 10). These continuous variables were then transformed into categorical variables based on these cutoffs, and univariate Cox analysis was performed. The results revealed no statistical significance (all P > 0.05, Table 3).
Table 3
Univariate analysis |
Covariates | HR (95% CI) | P |
Age | 0.948 (0.683–1.317) | 0.752 |
Gender | 1.158 (0.636–2.106) | 0.632 |
Stage | | 0.005** |
Tumor volume(cm4) | | 0.202 |
Histological grading | | 0.723 |
Histological type | | 0.003** |
With necrosis | 0.795(0.488–1.293) | 0.355 |
Peripheral invasion | 0.981(0.627–1.535) | 0.935 |
Peripheral lesion | 2.480(1.010–6.088) | 0.048* |
Intravascular cancer thrombus | 1.004(0.612–1.648) | 0.987 |
Nerve invasion | 0.844(0.413–1.726) | 0.643 |
Pleural invasion | 1.812(0.728–4.510) | 0.202 |
Airway dissemination | 5.513(2.332–13.032) | 0.000*** |
Lymph node metastases | 1.122(0.803–1.567) | 0.500 |
T | | 0.001** |
N | | 0.000*** |
Tumor localization | 0.971(0.702–1.342) | 0.857 |
PDL1 Expression | | 0.000*** |
Smoking status | | 0.039* |
Recent smoking status | 1.509(1.081–2.107) | 0.016* |
Smoking time | | 0.815 |
Smoking quantity | | 0.844 |
Smoking index | 1.153(0.800-1.661) | 0.445 |
Smoking cessation time | | 0.092 |
However, among the categorical variables analysed (a total of 18, namely Stage, T, N, Gender, Smoking status, Recent smoking status, Tumour localization, Histological grade, Histological type, With necrosis, PDL1 expression, Peripheral invasion, Peripheral lesion, Intravascular cancer thrombus, Nerve invasion, Pleural invasion, Airway dissemination, and Lymph node metastasis), Stage, Histological type, Peripheral lesion, Airway dissemination, T, N, PDL1 expression, Smoking status, and Recent smoking status (nine items in total) indicated statistical significance (all P < 0.05, Table 3).
The Omnibus test for the model coefficients yielded P-values < 0.001, indicating that the variables included in the multifactor analysis were statistically significant. The P-values for airway dissemination (Fig. 6A) and T stage (Fig. 6B) were < 0.05, signifying their statistical significance (Table 4). These findings suggested that airway dissemination and T stage independently influence the prognosis of patients with stage-II or stage-III LUSC. For example, the risk of death in patients without airway dissemination was only 16.7% of those with airway dissemination (HR, 0.167; 95% CI, 0.064–0.435).
Table 4
Multivariate analysis |
Covariates | B | SE | Wald | Df. | P | HR | HR (95.0%) CI |
Low | High |
Stage | | | 4.020 | 3 | 0.259 | | | |
ⅡA | -2.391 | 2.373 | 1.015 | 1 | 0.314 | 0.092 | 0.001 | 9.592 |
ⅡB | -1.795 | 1.622 | 1.225 | 1 | 0.268 | 0.166 | 0.007 | 3.991 |
ⅢA | -1.423 | 1.010 | 1.987 | 1 | 0.159 | 0.241 | 0.033 | 1.744 |
ⅢB | 0 | | | | | | | |
Peripheral lesion | -0.923 | 0.516 | 3.204 | 1 | 0.073 | 0.397 | 0.145 | 1.092 |
Airway Dissemination | -1.792 | 0.490 | 13.388 | 1 | 0.000 *** | 0.167 | 0.064 | 0.435 |
T | | | 10.319 | 3 | 0.016 ** | | | |
T1 | 0.317 | 1.270 | 0.062 | 1 | 0.803 | 1.373 | 0.114 | 16.527 |
T2 | -0.146 | 1.304 | 0.013 | 1 | 0.911 | 0.864 | 0.067 | 11.123 |
T3 | -0.756 | 0.594 | 1.622 | 1 | 0.203 | 0.469 | 0.147 | 1.503 |
T4 | 0 | | | | | | | |
N | | | 7.063 | 3 | 0.070 | | | |
N0 | -1.129 | 2.592 | 0.190 | 1 | 0.663 | 0.323 | 0.002 | 52.001 |
N1 | -1.749 | 2.034 | 0.739 | 1 | 0.390 | 0.174 | 0.003 | 9.374 |
N2 | -2.195 | 1.466 | 2.242 | 1 | 0.134 | 0.111 | 0.006 | 1.970 |
N3 | 0 | | | | | | | |
Smoking status | | | 5.223 | 2 | 0.073 | | | |
Ever | -0.265 | 0.314 | 0.712 | 1 | 0.399 | 0.767 | 0.414 | 1.420 |
Still | 0.209 | 0.286 | 0.533 | 1 | 0.465 | 1.232 | 0.704 | 2.157 |
Never | 0 | | | | | | | |
Recent Smoking status | | | | 0 | | | | |
Moreover, we determined that the mean age of onset was 64.58 (± 7.131, SD) for patients with stage II–III LUSC (Fig. 6C). The age distribution of patients followed a normal curve, and the QQ plot confirmed normality (Fig. 6D).
3.12 SERPIND1, LBP, TSLP, and MIOX Were Higher in Tumour Tissues than in the Adjacent Parenchyma
SERPIND1 showed expression in the cytoplasm and cell membrane, LBP exhibited nuclear expression, TSLP displayed expression in the cell membrane and cytoplasm, and MIOX demonstrated cytoplasmic expression (Fig. 6E). The expression levels of these genes were higher in the tumour tissues than in the adjacent parenchyma. Among these genes, TSLP had the highest level in the tumour tissues (83.3%), whereas LBP had the lowest (2.4%).
We initially weighted the frequency by case and conducted a chi-square test on the 2 × 4 contingency table (Table 5) for multiple independent samples. We selected the Pearson chi-square test results where all theoretical frequencies were ≥ 5, yielding χ2 = 63.586 with 3 degrees of freedom, and P < 0.001. Additionally, we identified statistically significant differences in IHC results between SERPIND1 and LBP, SERPIND1 and TSLP, MIOX and LBP, and MIOX and TSLP.
Table 5
| | Antibody (100%) | Total | χ2 | p |
SERPIND1 | LBP | TSLP | MIOX |
IHC Results | (-) | 32(76.2%) a | 41 (97.6%) b | 7(16.7%) c | 22(52.4%) a | 102(60.7%) | 63.586 | 0.000 *** |
(+) | 10(23.8%) a | 1(2.4%) b | 35(83.3%) c | 20(47.6%) a | 66(39.3%) |
*** p < 0.001,df = 3。Each subscript letter indicates a subset of the Antibody class whose column proportions do not differ significantly from each other at the 0.05 level. |