Characteristics of patients
The analysis process was presented in the flow chart (Fig. 1). A total of 930 lung squamous cell cancer (LUSC) patients from TCGA and GEO data sets were enrolled in this study. 458 LUSC patients from GEO data sets were randomly divided into the training cohort and testing cohort. The demographic and clinical features of patients in the training, testing, and validation cohorts were listed in Table 1. There was no significant difference in the clinical features of the training and testing cohorts.
Construction of the IRGPI
1065 immune-related genes (IRGs) were detected by the platforms enrolled in this study. 566580 immune-related gene pairs (IRGPs) were constructed and IRGPs with constant values (0 or 1) in each data set were excluded. The log-rank test was used to assess the association between the remaining IRGPs and the overall survival (OS) of patients. To avoid over-fitting of the prediction model, the LASSO regression was performed to screen the top 14 OS-related IRGPs (Figs. 2a, b). The final 13 IRGPs and LASSO coefficients were shown in Table 2. The 13 IRGPs which contained 25 unique immune-related genes were used to construct IRGPI via L1-penalized Cox proportional hazards regression in the training cohort. Based on the time-dependent ROC curve analysis, the optional cutoff for IRGPI was 1.31 (Supplementary Fig. 1). The risk curve and scatterplot were used to demonstrate the IRGPI and vital status of each patient with LUSC. Patients in the high-risk group had higher mortality than patients in the low-risk group (Figs. 2c, d). Survival curves of low- and high-risk groups were estimated by using the Kaplan–Meier method and were compared by using the log-rank test (Fig. 2e). In the training cohort, the AUC of time-dependent ROC curves at 1, 3, and 5 years was 0.744, 0.775, and 0.803, respectively (Fig. 2f).
Validation of IRGPI as an independent prognostic factor for LUSC
The same formula was used to calculate the IRPGI of patients in the testing and validation cohorts. Risk curves and scatterplots were applied to show the IRGPI and vital status of each patient with LUSC in the testing and validation cohorts. Patients with high risk based on IRGPI had higher mortality than patients with low risk in the testing and validation cohorts (Figs. 3a-d). The IRGPI stratified patients with LUSC into different prognostic groups in the testing and training cohorts by using the Kaplan–Meier curves (Figs. 3e, g). In the testing cohort, the AUC of time-dependent ROC curves at 1, 3, and 5 years was 0.601, 0.648, and 0.691, respectively (Fig. 3f). In the validation cohort, the AUC of time-dependent ROC curves at 1, 3, and 5 years was 0.635, 0.698, and 0.677, respectively (Fig. 3h). Multivariate Cox proportional hazards regression model was performed with variates that were significantly associated with overall survival in univariate analysis (Table 3). Multivariate analyses showed that the hazard of death among the high-risk group was 3.4 times that of the low-risk group (HR= 3.40; 95%CI [2.34-4.94]; p<0.001) in the training cohort. High risk based on IRGPI was an independent risk factor for poor prognosis of patients with LUSC in the testing cohort (HR=2.11; 95%CI [1.48-3.01]; p<0.001) and validation cohort (HR=1.99; 95%CI [1.5-2.63]; p<0.001).
Immune infiltration related to IRGPI
CIBERSORT was used to assess the infiltrations of TIICs in the low- and high-risk groups based on IRGPI in the TCGA validation cohort (Fig. 4). The infiltrations of naïve B cells, plasma cells, CD8+ T cells, activated memory CD4+ T cells, gamma delta (γδ) T cells, M1 macrophages and activated dendritic cells were lower in the high-risk group, as compared with the low-risk group (p=0.002, p=0.004, p=0.027, p=0.024, p=0.004, p=0.008, p=0.001, respectively). Patients in the high-risk group had higher proportions of neutrophils, activated mast cells, and monocytes, as compared with patients in the low-risk group (p<0.001; p=0.002; p=0.043, respectively).
Comparison of biomarkers for LUSC
We summarized current biomarkers for LUSC and compared the biomarkers in Table 4. Li et al. constructed a model with 6 lncRNAs from the TCGA LUSC cohort and the area under the curve (AUC) of the 6-lncRNA signature associated with 3-year survival was 0.672 in the training cohort [11]. Hu et al. constructed a 3-lncRNA signature for LUSC and the AUC of this model associated with 3-year survival was 0.629 in the training cohort [12]. Qi et al. identified 12 miRNAs closely related to the overall survival of patients with LUSC [13]. Yang et al. identified the diagnostic role of miRNA-486-5p in the TCGA LUSC cohort [25]. Li et al. found that 60 genes were statistically related to the overall survival rate in LUSC patients [10]. Shi et al. identified 6 methylation biomarkers for LUSC diagnosis [26]. Li et al. created prognostic predictors based on alternative splicing events for NSCLC patients and the AUC for prognostic predictor was over 0.8 in TCGA LUSC cohort [9]. Choi et al. found that MLL2 mutations predicted poor prognosis in both TP53 mutant and wild-type LUSC [8]. Gao et al. identified a prognostic model contained 5 genes and the AUC of the model for predicting the survival at 1, 3, and 5 years was 0.692, 0.722, and 0.651, respectively [27]. Compared with other prognostic models for LUSC, IRGPI showed a robust ability for predicting the overall survival of patients with LUSC despite using gene expression profiles from different sequencing platforms.
Combination of IRGPI and clinical characteristics
Since the IRGPI and tumor stage were independent risk factors for poor prognosis of patients with LUSC in all the cohorts, IRGPI and tumor stage were combined to fit a Cox proportional hazards regression model in the training cohort. A new score immune clinical prognostic index (ICPI) was built (ICPI=0.2709218*IRGPI + 0.2888875*stage). The optimal cutoff of ICPI for stratifying patients was determined to be 1.05 based on time-dependent ROC curve analysis in the derivation data set (Supplementary Fig. 2). Compared with IRGPI in continuous form, the continuous form of ICPI improved the prognostic accuracy of overall survival in the testing cohort (C-index=0.63, 95%CI [0.58-0.68] vs C-index= 0.68, 95%CI [0.63-0.73]; p=0.013) and validation cohort (C-index=0.62, 95%CI [0.58-0.66] vs C-index= 0.66, 95%CI [0.62-0.70]; p=0.023) (Table 5).