Identification of m6ARLncRNAs in Patients with NSCLC
At first, we downloaded transcriptome information and clinical data from TCGA and GEO datasets and identified 14056 and 3811 lncRNAs, respectively. Thereafter, we extracted the expression levels of 23 m6A regulators from previously published articles. The Pearson correlation analysis was performed based on the expression level of lncRNAs and m6A regulators to identify the potential m6ARLncRNAs with the | Pearson R| > 0.4 and P < 0.001 in the TCGA dataset. As a result, 1651 m6ARLncRNAs were identified. Consequently, a total of 491 m6ARLncRNAs were selected by taking the intersection of GEO and TCGA datasets. Afterward, univariate Cox regression was performed to screen prognostic m6ARLncRNAs, and eventually, 41 m6ARLncRNAs were found to bear a significant association with the overall survival of NSCLC patients (P < 0.05, Fig. S1a). The correlations between the m6ARLncRNAs and 23 m6A regulators in the TCGA dataset are illustrated in Fig. 1b and Supplementary Table 1. Meanwhile, the difference in m6ARLncRNAs between NSCLC and normal tissues was depicted in Fig. S1b.
Identification of distinct molecular patterns mediated by 41 m6ARLncRNAs
To understand the effect of m6ARLncRNAs on the development of NSCLC, consensus unsupervised clustering analysis was conducted in the TCGA dataset based on the expression level of m6ARLncRNAs obtained from univariate Cox regression analysis. The plot showed the relative change in area under the cumulative distribution function (CDF) curve in the k-means (2 to 9) unsupervised clustering of NSCLC (Fig. S2a-b). We also showed the tracking plots of subgroups for k = 2 to k = 9 (Fig. S2c). Lastly, two m6ARLncRNAs-associated clusters were determined and were dubbed cluster 1 (n=705) and cluster 2 (n=258), respectively (Fig. S2d).
To further examine the clinical characteristics of subgroups identified by consensus clustering, we plotted a heatmap, which showed that the expression of m6ARLncRNAs varied and cluster 2 patients had relatively higher expression in most m6ARLncRNAs, such as AC011379.2, AC135050.6, PSMA3-AS1, and AL359921.1. Moreover, the results revealed significant differences in the clinical N stage, age, and gender among clusters (Fig. 2a). To put it simply, cluster 2 was significantly correlated with female sex (P < 0.001), lower N stage (P < 0.01), and younger age (P < 0.05) when compared to cluster 1, as shown in Fig. 2a. In addition, patients with cluster 2 had a longer OS than their counterparts with cluster 1 (P = 0.024) (Fig. 2b). Our findings suggested that the clustering subgroups were closely correlated to the clinical features and clinical outcomes of NSCLC patients.
We investigated the level of immune cell infiltration between the two clusters. The results showed that cluster 1 had a higher infiltration level of neutrophils, macrophages, and activated memory CD4 T cells (Fig. 2c and Fig. S3a-c). Conversely, cluster 2 was more correlated with the infiltration of monocytes, regulatory T cells, naive B cells, and activated NK cells (Fig. S3d-g).
Construction and validation of prognostic m6ARLncSig
The aforementioned results showed that the m6ARLncRNAs could divide patients into two distinct molecular clustering patterns, which were indicative of two significantly different outcomes in terms of survival. To further explore the prognostic implication of m6ARLncRNAs in NSCLC, the 963 patients were randomly assigned into a training dataset (N=483) and a testing dataset (N=480) patients at a 1:1 ratio. LASSO Cox regression analysis and multivariate Cox proportional hazard regression model for those 41 m6ARLncRNAs were conducted to further select a robust and effective risk model for prognosis prediction (Fig. 3a-b). As a result, twelve of 41 m6ARLncRNAs were chosen to construct a prognostic m6ARLncRNAs signature (m6ARLncSig) (Fig. 3c). On the basis of the expression levels of 12 m6ARLncRNAs and their coefficients using the following computational formula: m6ARLncSig score = (0.177 × Expression AC024060.2) + (0.308 × Expression LINC01138) + (0.455 × Expression AL034550.1)+ (0.577 × Expression AP001347.1) + (-0.947 × Expression SEPSECS-AS1) + (-0.895 × Expression ITGA9-AS1) + (-0.832 × Expression TSPOAP1-AS1) + ( -0.466 × Expression AL021328.1) + (-0.240 × Expression AC083843.2) + (-0.220 × Expression AL137003.1) + (-0.069 × Expression SNHG30) + (-0.056 × Expression SNHG12), we calculated the risk score of m6ARLncSig in the TCGA training and testing dataset (Fig. 3c and Table S2). In the m6ARLncSig model, four m6ARLncRNAs (AC024060.2, LINC01138, AL034550.1, and AP001347.1) were identified to be high-risk factors, with a value of hazard ratio (HR) more than 1, which is indicative of a poor NSCLC prognosis. On the other hand, eight lncRNAs (SNHG12, ITGA9-AS1, AC083843.2, TSPOAP1-AS1, SNHG30, AL021328.1, AL137003.1, and SEPSECS-AS1) with an HR less than 1 were found to be protective factors, indicating a better survival relevance of their upregulated expression.
Subsequently, with the median risk score as the cut-off, patients in the training and testing dataset were divided into a high- and low-risk group. The distribution of the risk scores, OS, OS status, and expression profiles of the m6ARLncRNAs in the TCGA training and test set was given in Fig. 4a-b. To assess the prognostic value of m6ARLncSig in NSCLC patients, the survival analysis was performed in the high-risk group (N=242) and the low-risk group (N=241) in the training dataset, whose median score was 1.070. The result revealed that patients in the low-risk group had more favorable outcomes than those in the high-risk group (P < 0.001, Fig. 4c) and the AUC of the ROC curves in the training dataset was 0.694 for the 1-year OS prediction (Fig. 4d). To verify the prognostic m6ARLncSig value of the risk score for survival prediction, we evaluated survival conditions in the testing dataset and the median risk score in the testing dataset was 1.125. Our findings confirmed that the clinical outcome of patients in the low-risk group (N=240) was better than that of patients in the high-risk group (N=240) in the testing dataset (P=0.030, Fig. 4e), with an AUC value of 0.631 (Fig. 4f). Collectively, our results suggest that m6ARLncSig promises to be a good survival predictor.
External and experimental validation of the prognostic significance of m6ARLncSig
To further determine the prognostic significance of the m6ARLncSig, we retrieved three independent datasets, including GSE37745, GSE31210, and GSE30219, from the GPL570 microarray platform in GEO datasets to conduct external validation of m6ARLncSig. According to the above-mentioned formula, Kaplan-Meier survival analysis indicated that all the patients in the low-risk group had a significantly longer survival time when compared with the high-risk group in GSE31210 (P < 0.001), GSE37745 (P = 0.002), and GSE30219 (P < 0.001) and AUC of three datasets was 0.881, 0.700, and 0.643, respectively (Fig. 5a-f). We also conducted qRT-PCR verification for twelve m6ARlncRNAs with our NSCLC tissue samples to validate the prognostic significance. The Kaplan-Meier analysis showed that patients in the low-risk group had better survival outcomes than patients in the high-risk group (P = 0.012, Fig. 5g). The AUC of the ROC curves of m6ARLncSig was 0.763 for the three-year survival prediction of patients (Fig. 5h). Taken together, these results further validated that m6ARLncSig had high validity for survival prediction in NSCLC.
Correlation between m6ARLncSig and clinical parameters
Then, we tried to find a correlation between m6ARLncSig and clinical features. Our results showed that m6ARLncSig was significantly correlated with age, tumor stage, and molecular clustering patterns (P < 0.05, Fig. 6a). Briefly, patients in the advanced-age group tended to have significantly higher m6ARLncSig scores than those in the young-patient group (P = 0.016, Fig. 6b). Patients at the advanced tumor stage had significantly higher m6ARLncSig scores than those at the early stage (P = 0.001, Fig. 6c). Moreover, patients in cluster 1 had significantly higher m6ARLncSig scores than patients in cluster 2 (P < 0.001, Fig. 6d).
Gene set enrichment analysis (GSEA) of the Two m6ARLncSig subgroups
To understand the regulatory mechanisms, GSEA was employed to conduct the signaling pathway enrichment of the two m6ARLncSig subgroups (high- and low-risk groups) in the TCGA dataset. The results revealed that pathways such as ECM receptor interaction (NES = 2.143, normalized P = 0.002), focal adhesion (NES = 2.052, normalized P = 0.002), adherence junction (NES = 1.605, normalized P = 0.032) were enriched in high m6ARLncSig group (P < 0.05, Fig. S4). These findings indicated that m6ARLncRNAs might participate in the progression of NSCLC via the above pathways, which partially explains the poor survival in the high score group and might help us gain insight into the implication of m6ARLncSig.
Assessment of independent prognostic significance of risk score and clinical stratification analysis
To explore whether m6ARLncSig score, as a prognostic factor, is independent of other clinical features, univariate and multivariate Cox regression analyses were conducted on four variables: age, gender, tumor stage, and m6ARLncSig (Table 2). The univariate Cox analysis revealed that m6ARLncSig (P < 0.001, HR = 1.260, 95% CI = 1.195-1.328) was significantly correlated with the OS in the training dataset. Multivariate Cox analysis further indicated that m6ARLncSig (P < 0.001, HR = 1.222, 95% CI = 1.156-1.291) was an independent risk factor of OS. These results were also validated in the testing dataset (P < 0.001, HR = 1.212, 95% CI = 1.103-1.332), GSE37745 (P < 0.001, HR = 1.626, 95% CI = 1.252-2.113), GSE31210 (P =0.001, HR = 1.006, 95% CI = 1.002-1.009), and GSE30219 (P = 0.006, HR = 1.396, 95% CI = 1.101-1.769) dataset (Table 2).
To assess whether m6ARLncSig is still of prognostic value in various subgroups, we conducted a clinical stratification analysis. Kaplan-Meier analysis showed that patients with high-risk m6ARLncSig scores exhibited worse prognosis compared with those with low-risk m6ARLncSig scores across all six subgroups, i.e., young-patient group (P = 0.002, Fig. 7a), old-patient group (P < 0.001, Fig. 7b), early-stage group (P < 0.001, Fig. 7c), late-stage group (P = 0.019, Fig. 7d), male-patient group (P < 0.001, Fig. 7e), and female-patient group (P = 0.004, Fig. 7f). Collectively, these results indicated that m6ARLncSig was not only strongly associated with clinical outcomes and clinical features, but also was an independent predictor for the prognosis of NSCLC.
Relationship between m6ARLncSig and tumor immunity and therapeutic sensitivity
To know the relationship between the m6ARLncSig and tumor immune microenvironment, we used the CIBERSORT algorithm to estimate the difference in tumor-infiltrating immune cells between high- and low-risk groups. The Wilcoxon signed-rank test showed that the m6ARLncSig score was positively correlated with the infiltration level of the neutrophils (P < 0.001, Fig. 8a), macrophages M0 (P < 0.001, Fig. 8b) and activated memory CD4 T cells (P = 0.049, Fig. 8c). And a significant negative correlation was observed between the m6ARLncSig score and the infiltration level of activated mast cells (P = 0.022, Fig. 8d), monocytes (P =0.02, Fig. 8e), and naive B cells (P =0.022, Fig. 8f). We also conducted a correlation analysis between immune cell subpopulations and related functions by using ssGSEA of TCGA-NSCLC. We found that T cell functions, including APC_co_stimulation, CCR, MHC_class_I, and parainflammation exhibited significant differences between the high-risk and low-risk groups (Fig. 8g). We further found significant differences in the expression at all the critical immune checkpoints between the two groups (P<0.05, Fig. 8h). Overall, these results confirmed that m6ARLncSig was associated with the immune microenvironment, immune response, and immune function.
Then, we examined if there existed an association between m6ARLncSig level and radiotherapeutic response and chemotherapeutic sensitivity in the treatment of NSCLC. The result showed that most patients with high m6ARLncSig did not respond or responded poorly to radiotherapy as compared to the low-risk group (P = 0.017, Fig. 9a). We further looked into the association between m6ARLncSig and half inhibitory concentration (IC50) of chemotherapeutics. m6ARLncSig was positively associated with half inhibitory concentration (IC50) of chemotherapeutics, such as lenalidomide and methotrexate (P < 0.001, Fig. 9b-c). On the other hand, the m6ARLncSig score was negatively correlated with IC50 of gefitinib, gemcitabine, paclitaxel, and docetaxel (P < 0.05, Fig. 9d-g). These findings indicated that the m6ARLncSig was intimately correlated with the radiotherapeutic response and chemotherapeutic sensitivity, suggesting that m6ARlncRNAs are involved in the mechanisms of radiotherapeutic and chemotherapeutic responses and might, to some extent, dictate the prognosis of NSCLC patients.
Construction and validation of a nomogram for survival prediction of NSCLC
To simplify the model for the clinical prediction of OS in NSCLC patients, we established a nomogram scoring system in the training dataset (Fig. 10a). Our results showed that the nomogram had an improved performance of OS prediction with a C-index of 0.66 and AUCs of ROC for 3-, and 5-year survival predictions of 0.715, 0.694, respectively (Fig. 10b). As expected, our findings were validated in the testing and the entire TCGA dataset with a C-index of 0.671 and 0.666, respectively. The AUCs of ROC for 3-, and 5-year survival predictions were 0.695, 0.636 in the testing dataset and 0.698, 0.666 in the entire TCGA dataset (Fig. 10c-d). Calibration plots showed that consistency was excellent between the observed and predicted values for the prediction of 3- and 5-year OS in the testing, training, and the entire TCGA dataset (Fig. 10e-j). Therefore, these findings indicated that the nomogram possessed a good prospect of clinical application for prognosis evaluation and treatment planning of NSCLC.