Cohort information and patient characteristic
In the present study, LGG cohort from TCGA was chosen as the training set. In total, 476 patients with survival > 30 days were included to screen prognosis-associated immune-related lncRNAs and construct prognostic model. However, 365 patients with compete clinical characteristics were used to perform multivariate Cox regression analysis to identify the independent variables. These clinical characteristics included age, gender, grade, chemotherapy, radiotherapy, status of IDH, and status of chromosome 1p and 19q (1p19q). As for validation sets, there were two LGG cohorts in CGGA which were termed cohort 1 and cohort 2 here. There were 125 patients eligible for survival analysis and multiple Cox regression analysis in cohort 1. In cohort 2, 201 patients with complete clinical characteristics were included to perform the forementioned analyses. The cohort information and patient characteristics of three cohorts are detailed in Table 1.
Construction of the Hypoxia-related LncRNA Prognostic Signature
In total, 980 hypoxia-related lncRNAs were identified in training set. Meanwhile, 1046 and 1063 lncRNAs were identified in cohort 1 and cohort 2, respectively. Thus, 241 hypoxia-related lncRNAs were intersected in three cohorts (Figure 1A). Subsequently, 120 prognosis-specific hypoxia-related lncRNAs were screened using univariate Cox regression analysis. To establish a hypoxia-related lncRNA prognostic signature, LASSO method was utilized on the 120 prognosis-related lncRNAs. Ultimately, 14 lncRNAs were selected to construct the prognostic signature (Figure 1B). The formula calculating risk score of patients was as follow: risk score = MYLK-AS1 × (-0.0194) + FAM181A-AS1 × 0.1775 + H19 × 0.0185 + DGCR9 × (-0.0192) + LINC00641× (-0.0072) + CRNDE × 0.1336 + FAM66C × (-0.01850) + SLC25A21-AS1 × (-0.0148) + SNHG18 × 0.1931 + NR2F1-AS1 × (-0.0118) + WAC-AS1 × (-0.1600) + PAXIP1-AS2 × 0.1318 + WDFY3-AS2 × (-0.0319) + TMEM254-AS1 × (-0.1876). Among the included lncRNAs, 9 (MYLK-AS1, DGCR9, LINC00641, FAM66C, SLC25A21-AS1, NR2F1-AS1, WAC-AS1, WDFY3-AS2, and TMEM254-AS1) were prognostic protective factors, while the other 5 (FAM181A-AS1, H19, CRNDE, SNHG18, and PAXIP1-AS2) were prognostic risk factors. Based on the formula, risk score was calculated for every patient in training set and patients were divided into high- and low-risk groups according to the median of risk scores (-1.244). Next, Kaplan-Meier method was used to plot survival curve and survival of two groups was compared using log-rank test (Figure 2A). The result showed that patients in low-risk group had better prognosis (median overall survival, 11.59 vs. 4.06 years, p<0.001). Besides, risk score of patients, correlation of risk score with survival status, and heat map of 14 lncRNAs were plotted (Figure 2B-D). In order to assess the prognostic accuracy of this model, ROC analysis was performed and AUC was calculated. As shown in Figure 2E, AUC of ROC predicting 1-, 3-, 5-, and 7-year survival were 0.904, 0.845, 0.763, and 0.774, respectively. Moreover, ROC assessing the prognostic accuracy of clinical characteristics showed that risk score had the highest AUC (Figure 2F). Univariate and multivariate Cox regression analyses were performed to determine whether the established lncRNA signature was an independent prognostic factor (Figure 3). Univariate Cox regression analysis showed that tumor grade, age, status of IDH, status of codeletion in 1p19q, chemotherapy, radiotherapy, and risk score were eligible for multivariate Cox analysis. Subsequently, multivariate Cox regression analysis showed grade, age, and risk score (hazard ratio: 4.253, 95% confidence interval: 2.714-6.664) were independent prognostic factors. Next, we compared risk scores between high- and low-risk groups to better elucidate the correlation between clinical characteristics and risk score. As shown in Figure 4, all clinical characteristics except gender were found to harbor different risk scores. Specifically speaking, patients with older age (> 50), higher grade (WHO Ⅲ), chemotherapy, radiotherapy, mutant IDH, and codeletion in 1p19q had higher risk scores. Heatmap describing the distribution of clinical characteristics in two risk groups is shown in Figure 4H.
Validation of the Hypoxia-Related LncRNA Signature in TCGA Cohorts
To verify the prognostic accuracy of the established lncRNA signature in external dataset, LGG cohorts in the CGGA were acquired. Patients were divided into high- and low-risk groups according to the same formula and cutoff. Kaplan-Meier survival analysis (Figure 5A, B) showed that low-risk group had better prognosis in both cohorts (median survival, cohort 1: 8.52 vs. 2.80 years, p<0.001; cohort 2: 8.17 vs 3.30 years, p<0,001). Risk score of patients, correlation of risk score with survival status, and heat map of 14 lncRNAs were also plotted (Figure 5C-H). ROC analysis showed that the lncRNA signature had good prognostic performance in both cohorts (Supplementary Figure S1A, B). In cohort 1, AUC of ROC predicting 1-, 3-, 5-, and 7-year survival were 0.864, 0.845, 0.817, and 0.798, respectively. In cohort 2, AUC of ROC predicting 1-, 3-, 5-, and 7-year survival were 0.797, 0.809, 0.722, and 0.706, respectively. Moreover, ROC analysis incorporating risk score and clinical characteristics showed that risk score harbored the best prognostic accuracy in both cohorts (Supplementary Figure S1C, D). Next, univariate and multivariate Cox regression analyses showed that the lncRNA signature was an independent prognostic factor in cohort 1 and 2 (Supplementary Figure S2). In cohort 1, higher age, grade Ⅲ LGG, chemotherapy, wildtype IDH, and non-codeletion in 1p19q were associated with higher risk score than their respective counterparts (Supplementary Figure S3). However, in cohort 2, we only found wildtype IDH, grade III LGG, and non-codeletion in 1p19q were associated with higher risk score (Supplementary Figure S4).
Construction and evaluation of a nomogram
We used independent prognostic factors in training set to establish a nomogram predicting 1-, 3-, 5-, and 7-year survival of patients. Every factor (patient age, tumor grade, and risk score) was used to obtain the corresponding score summary and the total score was calculated to prognosticate the survival probability of every patient (Figure 6A). Higher score denoted poorer prognosis. The C-index of the established nomogram was 0.854 (95% confidence interval: 0.811-0.897). Moreover, calibration curves predicting 1-, 3-, 5-, and 7-year survival were used to indicate the consistency between the actual observed survival probability and that predicted by the nomogram. As shown in Figure 6B-E, the calibration curves indicated a good fit for the nomogram.
We next assessed the prognostic accuracy of the established nomogram in cohort 1 and 2. In cohort 1, the C-index was 0.765 (95% confidence interval: 0.703-0.827). Moreover, calibration curves also indicated a good fit for the nomogram in predicting 1-, 3-, 5-, and 7-year survival (Supplementary Figure S5). In cohort 2, the C-index was 0.787 (95% confidence interval: 0.730-0.844). However, the calibration curves showed that the consistence between the actual survival probability and that predicted by the nomogram was not as good as the consistence observed in TCGA LGG cohort and cohort 1 (Supplementary Figure S6).
Gene Set Enrichment Analysis
With regard to KEGG pathways, GSEA showed that genes in high-risk group were enriched in apoptosis, cell adhesion molecules CAMs, chemokine signaling pathway, ECM receptor interaction, focal adhesion, JAK STAT signaling pathway, pathways in cancer, toll like receptor signaling pathway, VEGF signaling pathway, etc (Figure 7). No significant KEGG pathways were found in genes in low-risk group. The GO analysis included the biological process (BP), molecular function (MF), and cellular component (CC) categories. The BP category of genes in high-risk group included leukocyte proliferation, lymphocyte activation involved in immune response, negative regulation of interleukin production, etc. No significant BP category was enriched in genes in low-risk group. The MF category of genes in high-risk group included carboxypeptidase activity, growth factor binding, cytokine binding, etc. With regard to genes in low-risk group, enriched MF was ubiquitin-like protein-specific protease activity. The CC category of genes in high-risk group included vacuolar lumen, vesicle lumen, endoplasmic reticulum, etc. The genes in low-risk group were enriched in vesicle tethering complex regarding CC category. The respective top 10 KEGG pathways and GO terms in high-risk group are displayed in Supplementary Figure S7A-D. In order to investigate tumor-associated signaling pathway, we also performed GSEA with regard to hallmarks. The results showed that tumor-associated signaling pathways enriched in high-risk group included IL2-STAT signaling, PI3K-AKT MTOR signaling, epithelial mesenchymal transition, IL6-JAK-STAT3 signaling, p53 pathway, TNFα signaling via NFκB, etc (Figure 8). However, no significant hallmark was enriched in low-risk group. The top 10 hallmark terms are showed in Supplementary S7E.
Immune Cell Infiltration
In GSEA, we found that BP terms enriched in high-risk group were mainly involved in immune cells. Thus, we performed analysis of immune cell infiltration between high- and low-risk groups. The correlation coefficients of risk scores between B cell, CD4+ T cell, CD8+ T cell, dendritic cell, neutrophil and macrophage were 0.336, 0.385, 0.229, 0.624, 0.452 and 0.487, respectively (Figure 9). In addition, abundances of B cell, CD4+ T cell, CD8+, dendritic cell, neutrophil and macrophage were higher in high-risk group than those in low-risk group (Supplementary Figure S8).