Screening and identification of genome-instability related lncRNAs in LGG patients
The clinical and pathological characteristics of the LGG cohort used in our study were displayed in Table 1.Patients were divided into genomic unstable (GU) group and genomic stable (GS) group based on the cumulative number of somatic mutations. In order to evaluate the difference of genomic instability, we investigated the mutation types, mutation frequencies and copy number variations of the two groups. Top 20 genes with the highest alteration frequencies were analyzed (Fig. 1a and b). The results of the oncoplots and bar plots revealed that the alteration frequencies of most of the genes were significantly higher in patients of GU group than those of GS group including EGFR, FLG, HMCN1, LRP2, NOTCH1, OBSCN, PTEN, RYR2, TTN (The bar plots for FLG, HMCN1, LRP2, NOTCH1, OBSCN, RYR2 and TTN were displayed in Online Resource 5A, P < 0.05). However, the proportion of patients with IDH1 mutations was significantly higher in GS group (80%) compared with GU group (54%, p < 0.001) suggesting that IDH1 may play a crucial role in the tumorigenesis and work synergistically with other genes involved in the same pathway[24]. The analysis of CNVs of chromosomes was further performed to explore the distinct genomic variations between the two groups (Fig. 1c and d). Patients in GU group tended to bear a greater burden of copy number amplifications (P = 8.9 × e− 13) and deletions (P = 3.5×e− 7) than those in GS group. All these findings indicated a significant difference in the genomic instability between the two groups. We compared the lncRNA expression profiles between GU and GS groups. 102 genome-instability-related lncRNAs were identified with |Fold Change| >2.0 and adjusted p < 0.05. Heatmap of top 20 differentially expressed lncRNAs was shown in Fig. 1e.
Table 1
Clinical information for LGG patients in this study.
Covariates
|
Type
|
Total(n=503)
|
Test(n=249)
|
Train(n=254)
|
P-value |
Age
|
<=60
|
443(88.07%)
|
225(90.36%)
|
218(85.83%)
|
0.1524 |
>60
|
60(11.93%)
|
24(9.64%)
|
36(14.17%)
|
|
|
Sex
|
Female
|
224(44.53%)
|
107(42.97%)
|
117(46.06%)
|
0.5434 |
Male
|
279(55.47%)
|
142(57.03%)
|
137(53.94%)
|
|
|
Disease Free Time (Months)
|
<=24
|
284(56.46%)
|
141(56.63%)
|
143(56.3%)
|
0.9516 |
>24
|
184(36.58%)
|
90(36.14%)
|
94(37.01%)
|
|
|
unknow
|
35(6.96%)
|
18(7.23%)
|
17(6.69%)
|
|
|
Disease Free Status
|
Disease Free
|
306(60.83%)
|
155(62.25%)
|
151(59.45%)
|
0.5011 |
Recurred or
Progressed
|
162(32.21%)
|
76(30.52%)
|
86(33.86%)
|
|
|
unknow
|
35(6.96%)
|
18(7.23%)
|
17(6.69%)
|
|
|
Grade
|
G2
|
242(48.11%)
|
120(48.19%)
|
122(48.03%)
|
1 |
G3
|
260(51.69%)
|
129(51.81%)
|
131(51.57%)
|
|
|
unknow
|
1(0.2%)
|
0(0%)
|
1(0.39%)
|
|
|
Histologic Type
|
Astrocytoma
|
189(37.57%)
|
95(38.15%)
|
94(37.01%)
|
0.6627 |
Oligoastrocytoma
|
128(25.45%)
|
59(23.69%)
|
69(27.17%)
|
|
|
Oligodendroglioma
|
186(36.98%)
|
95(38.15%)
|
91(35.83%)
|
|
|
IDH1 Mutation
|
YES
|
91(18.09%)
|
48(19.28%)
|
43(16.93%)
|
0.201 |
NO |
32(6.36%) |
12(4.82%) |
20(7.87%) |
|
|
unknow
|
380(75.55%)
|
189(75.9%)
|
191(75.2%)
|
|
|
Therapy Outcome
|
Complete Remission or
Response
|
83(16.5%)
|
48(19.28%)
|
35(13.78%)
|
0.7811 |
Partial Remission or
Response
|
50(9.94%)
|
25(10.04%)
|
25(9.84%)
|
|
|
Stable Disease
|
59(11.73%)
|
30(12.05%)
|
29(11.42%)
|
|
|
Progressive Disease
|
36(7.16%)
|
20(8.03%)
|
16(6.3%)
|
|
|
unknow
|
275(54.67%)
|
126(50.6%)
|
149(58.66%)
|
|
|
Identification of two LGG clusters based on genome-instability related lncRNAs with distinct survival outcomes and clinical features
As shown in the Fig. 2a, unsupervised hierarchical clustering analysis was conducted for LGG patients based on 102 genome-instability related lncRNAs (GILncRNAs). The somatic mutation counts were significantly different between the two clusters (Fig. 2b, p < 2.22×e− 16). Thus, one cluster with higher mutation counts was defined as GU-like cluster, the other was defined as GS-like cluster. The expression level of UBQLN4, which is correlated with genomic instability, was significantly higher in patients of GS-like cluster than those of GU-like cluster (Fig. 2b, p = 1.4×e− 13). The overall survival (OS) analysis was employed between the two clusters, indicating that patients in GS-like cluster displayed better prognosis compared with GU-like cluster (Fig. 2c, p < 0.001). Heatmap of multi-clinicopathological characteristics was adopted to demonstrate the correlation between the clustering and clinical features (Fig. 2d). As shown in Fig. 2e, patients in GU-like cluster were significantly older than that in GS-like cluster (p < 0.001). Tumors in patients of GU-like cluster were more likely to recur or progress compared with GS-like cluster (p < 0.001). Histologic grade for patients in GU-like cluster was higher than that in GS-like cluster (p < 0.001). Patients in GU-like cluster tended to receive postoperative radiotherapy (p < 0.001). IDH1 for patients in GU-like cluster was more frequently mutated compared with GS-like cluster (p < 0.001). Karnofsky Performance Scores for patients in GU-like cluster were higher than that in GS-like cluster (p = 0.002), indicating self-care abilities for patients in GU-like cluster were more likely to be affected. Clinical outcomes for patients in GU-like cluster seemed to be worse than those in GS-like cluster (p = 0.01), with more patients in tumor-bearing status. Differentially expressed mRNAs (DEGs) were identified between the two clusters (|Fold Change| >4.0, adjusted p < 0.01). Heatmap exhibiting top 20 DEGs was shown in Fig. 2f. Protein-Protein interaction network was employed to further evaluate the interplay of the DEGs (Fig. 2g). 30 hub genes with the highest number of adjacent nodes were noted in Fig. 2h.
Enrichment analysis identified molecular mechanisms affected by DEGs between GS-like and GU-like clusters
To explore the underlying different molecular mechanisms between the two clusters, GO analysis was employed, regarding molecular function (MF), cellular component (CC), and biological process (BP). As shown in Online Resource 1A-C, the DEGs were significantly enriched in molecular functions associated with formation of genomic instability, including DNA-binding transcription activator activity, RNA polymerase II-specific. KEGG pathway analysis identified signaling pathways associated with tumorigenesis and tumor immunology including cytokine-cytokine receptor interaction pathway, ECM-receptor interaction pathway and transcriptional misregulation in cancer (Online Resource 1D-F). These results of functional enrichment analysis suggested that GU-like cluster differs in genome-instability and tumor-immunology associated functions compared with GS-like cluster, implying significant differences in tumor-immune environment and genomic instability.
Distinct tumor-immune microenvironment (TIME) and immunogenomic patterns were identified between GS-like and GU-like clusters
Based on the ESTIMATE algorithm, the compositions of the TIME for each sample were analyzed. Compared with those of GS-like cluster, the immune scores (p = 2.9xe− 13), stromal scores (p < 2.22 x e− 16) and ESTIMATE scores (p = 3.3 x e− 15) were significantly higher in patients of GU-like cluster, indicating higher abundances of immune and stromal cells and lower tumor purity (Fig. 3a). Furthermore, comparisons of the abundance of 23 tumor infiltrating immune cells in different clusters were carried out using ssGSEA. Most of the immune cells were significantly more abundant in patients of GU-like cluster (Fig. 3b). The heatmap of the infiltration levels of the immune cells was displayed in the Fig. 3c. As shown in Fig. 3d, the associated immune functions were more active in patients of GU-like cluster.
Patients of GU-like cluster were more sensitive to immunotherapy and PARP inhibitors
An increasing number of researches revealed that immunotherapy targeting immune check-points brought hope for clinical treatment of malignant tumors[25]. The expressions of immune check-points including PDCD1(PD1), CD274(PD-L1), PDCD1LG2(PD-L2), CTLA4, CD80, CD86 were analyzed between the two clusters (Fig. 3e). The expression levels of all the immune genes mentioned above were elevated in patients of GU-like cluster compared with GS-like cluster. The results of unsupervised subclass mapping analysis demonstrated that patients of GU-like cluster were more likely to respond to the immunotherapy of anti-PD1 (Fig. 3f, Bonferroni corrected p = 0.002). However, no positive results were found in the response to anti-CTLA4 therapy between the clusters. The estimated IC50 values of Rucaparib and Olaparib were significantly lower in patients of GU-like cluster (Fig. 3g, p = 3.1×e− 7, p = 1.7×e− 13, respectively), indicating that LGG patients of GS-like cluster tended to be more resistant to PARP-inhibitor therapy than those of GU-like cluster. However, there was no significant difference in sensitivity to Veliparib between the two clusters (p = 0.27).
Identification of genome-instability related lncRNA signature for prognosis
Univariate cox proportional risk regression was utilized to select the genome-instability related lncRNAs with prognostic values, combining the expression levels of genome-instability related lncRNAs and survival information of LGG patients. 33 genome-instability-related lncRNAs with predictive prognosis roles were determined with p value < 1.0 x 10− 9. The results were shown in the forest plot with hazard ratio (HR) of 95% confidence intervals (Fig. 4a). Patients with LGG were randomly divided into train and test group at a ratio of 1:1. Lasso regression analysis was applied to further evaluate and filter these lncRNAs in the train set and 9 lncRNAs were identified as the key factors in the risk model with 1000 replicates of cross validation (Fig. 4b and c), including LINC01831, CRNDE, AC025171.5, AL390755.1, AC012073.1, AL355974.2, AC010273.3, AC131097.3, AC092718.3. LncRNSs with associated coefficients were listed in Fig. 4d. The risk score was calculated by the following formula:
LGG patients were further divided into high-risk group and low-risk group with the median risk score as the cutoff, in the train and test set, respectively.
Evaluation of genome-instability related lncRNA signature for prognosis
Kaplan-Meier analysis showed that overall survival of patients in low-risk group were significantly higher compared with those in high-risk group in the train and test set (Fig. 4e and l, p < 0.001). ROC curve analysis of the risk model based on genome-instability related lncRNAs signature (GILncSig) over time for 1, 2, and 3 years in the train set showed AUC values with 0.892, 0.912, and 0.905, respectively (Fig. 4f). In the test set, AUC values over time for 1, 2, 3 years were 0.819, 0.899 and 0.900, separately (Fig. 4m). In the train set, the AUC value of GILncSig over time for 1 year is 0.870, which is higher than that of other clinicopathological factors including diagnosis age (AUC = 0.704), sex (AUC = 0.509), histologic type (AUC = 0.354) and histologic grade (AUC = 0.669). Similar results were obtained regarding the ROC curve analysis over time of 2, 3 and 5 years (Online Resource 5B). The expression levels of 9 genome-instability-related lncRNAs in the risk model were significantly higher in patients of high-risk group than those of low-risk group (Fig. 4g and n). The scatter plots of survival time showed that the survival time decreased with the increasing of the risk score and the mortality rate was higher in patients of high-risk group (Fig. 4h and o). As for the scatter plots of somatic mutations, it seemed that the somatic mutation counts were also higher in patients of high-risk group (Fig. 4i and p) and it was validated in the boxplot with p value less than 0.01 (Fig. 4k and r). The expression level of UBQLN4 was significantly higher in patients of low-risk group compared with high-risk group (Fig. 4l and s). Univariate and multivariate cox regression were performed to analyze whether the risk score calculated based on the above 9 genome-instability related lncRNAs were independent prognostic factors of clinical or pathological factors (age, sex, histologic grade, histologic type, and mutation count). The results in the train set showed that risk score either in univariate cox regression analysis (hazard ratio = 4.008, 95% CI [1.876–8.562], p < 0.001) or multivariate cox regression analysis (hazard ratio = 2.937, 95% CI [1.261–6.844], p = 0.013) were significantly correlated with overall survival of LGG patients (Online Resource 2A and B). Similar results were obtained in the test set (Online Resource 2C and D). Nomogram integrating risk score, age, sex, grade as variables was employed to predict the 1-, 3-, 5- year overall survival in LGG cohort (Online Resource 2E). The calibration plots of 3-, 5- years overall survival indicated that the predictive curves were close to the actual curves of overall survival, suggesting that the performance of the nomogram was good in predicting the overall survival of patients with LGG (Online Resource 2F).
Validation of genome-instability related lncRNA signature for prognosis
GBM dataset from TCGA database was used in this study to verify the performance of the risk model based on 9 genome-instability related lncRNAs. According to the expression levels of the 9 genome-instability related lncRNAs in GBM dataset, risk score for each sample was calculated, after which patients were divided into high and low-risk group according to the cutoff of median score. Kaplan-Meier curves showed that patients in the high-risk group had significantly lower overall survival probability compared with the low-risk group (Online Resource 2G, p < 0.001). As the standard chemotherapy for GBM postoperatively, temozolomide (TMZ) can significantly increase the overall survival of GBM patients according to the previous studies. The estimated IC50 values of TMZ were significantly lower in patients of low-risk group (p = 0.0071), indicating that GBM patients in high-risk group tended to be more resistant to TMZ therapy than those in low-risk group (Online Resource 2H). There was no statistical significance in the estimated IC50 values of 3 PARP inhibitors between patients in low- and high-risk groups (Online Resource 5C).
In addition, the predictive performance of our risk model based on genome-instability related lncRNAs signature (GILncSig) was compared with other glioma-associated prognostic lncRNAs signatures established in previous studies (Online Resource 2I). Although the predictive power of our GILncSig for 1 year rate was not better than Qiu’s model, the AUC values of our GILncSig were significantly higher than other three risk models over time for 2, 3 and 5 years, indicating that the risk model created by our group had better prognostic performance in predicting overall survival of LGG patients[18–20].
Distinct clinicopathological features, tumor-immune microenvironment (TIME), immunotherapy response and sensitivity to PARP inhibitors were identified between different risk groups
PCA and t-SNE analysis revealed that the 9 genome-instability related lncRNAs involved in our risk model can distinguish the samples with LGG (Online Resource 3A). As shown in the heatmap (Online Resource 3B), the distribution of diagnosis age, disease free time, disease free status, histologic grade, histologic type, IDH1 mutation status, Karnofsky performance score, postoperative radiotherapy status, person neoplasm status, and therapy outcome between the two groups were significantly different. The detailed results were demonstrated in boxplots to show the distribution of risk scores of patients with multi-clinicopathological characteristics, indicating that the risk scores of patients with worse clinicopathological features tended to be significantly higher (Online Resource 3C).
The distribution of patients in the two gene-clusters corresponding with the risk groups and survival status was shown in alluvial diagram (Online Resource 4A). Comparison between patients of high- and low-risk groups demonstrated that the immune scores (p = 8 x e− 5), stromal scores (p = 0.00018) and ESTIMATE scores (p = 8.2 x e− 5) were significantly higher in patients of high-risk group, indicating higher abundance of immune and stromal cells, while lower tumor purity (Online Resource 4B). Furthermore, most of the immune cells were more abundant in patients of high-risk group (Online Resource 4C and D). The associated immune functions were more active in patients of high-risk group (Online Resource 4E). The expression levels of all the immune checkpoints mentioned above were elevated in patients of high-risk group compared with low-risk group (Online Resource 4F). In addition, we found a positive correlation between the expression of these immune check-points and risk score (Online Resource 4G). The results of unsupervised subclass mapping analysis demonstrated that patients in high-risk group were more likely to respond to the immunotherapy of anti-PD1 (Fig. 5a, Bonferroni corrected p = 0.007). However, no positive results were found regarding the response to anti-CTLA4 therapy between the two groups. The estimated IC50 values of Rucaparib, Olaparib and Veliparib were significantly lower in patients of high-risk group (Fig. 5b-d, p = 1.4×e− 6, p = 8.3×e− 9, p = 0.0034, respectively), suggesting that patients in low-risk group tended to be more resistant to PARP inhibitor therapy than those in high-risk group.