1. Analysis of genes related to the prognosis of LGG patients after TMZ treatment
We obtained 171 low-grade gliomas (LGG) patients receiving temozolomide (TMZ) chemotherapy from the CGGA database as a training set to screen for genes associated with TMZ treatment prognosis (Figure 1A). Univariate Cox regression showed that tumor stage, radiation therapy status, IDH1 mutation status, and chr1p19q co-deletion status had an impact on overall survival (Figure 1B, Supplementary Figure 1). Therefore, we used Cox regression to correct these clinical variables to obtain genes associated with the prognosis of TMZ treatment. A total of 488 risk genes and 128 protective genes passed the screening threshold (p < 0.01; Figure 1C, Supplementary Table 1). Functional enrichment of these genes revealed that protective genes were associated with nervous system development, while risk genes were associated with multiple pathways associated with tumor progression including cell cycle, immune response, EMT, and extracellular matrix organization (Figure 1D, Supplementary Table 2).
We further analyzed the internal correlations of risk genes, protective genes, and non- significant genes, and found that the overall correlations of protective genes and risk genes were higher than those of non-significant genes, suggesting that these genes formed specific functional modules (Figure 1E). At the same time, the overall correlation of protective genes was significantly higher than that of risk genes, suggesting that risk genes were involved in multiple functions. In addition, we analyzed the protein network formed by risk and protective genes respectively. The degree distribution of the two networks showed that the network of risk genes reflected multi-modularity (Figure 1F). Moreover, the centrality measures of the two networks are different in distribution. The mean betweenness centrality of the network of risk genes was low (Figure 1G), and the mean closeness centrality was high (Figure 1H). Network module analysis revealed that risk genes formed multiple functional modules, including lipid metabolism, cell cycle, interferon response, and Wnt signaling pathway (Supplementary Figure 2). These results suggest that TMZ prognostic risk genes are involved in various carcinogenic processes.
2. Development of risk score associated with the benefit of TMZ treatment in LGG
We used consensus LASSO Cox regression to screen key genes for prognostic genes and repeated 1000 iterations to obtain the occurrence frequency of each gene (Figure 2A), of which 14 genes appeared more than 950 times. We then included high-frequency genes in the model in turn and used a 10-fold cross-validation method to evaluate the prognostic model performance by the area under time-dependent ROC at varying follow-up times. The results showed that the prognostic model had the highest predictive power when predicting 3-year survival, while its predictive power was relatively poor for 1-year survival (Figure 2B). Based on the overall prediction of different survival conditions, we selected a model consisting of 14 genes. For these 14 genes, we used Cox regression to determine the contribution of these genes based on a 10-fold cross-validation method. These features, ranked by median weight, included PAX3 (1.06), ADM2 (0.76), GRB14 (0.44), CYP4F12 (0.37), HELZ2 (0.24), RGS16 (0.19), IL13RA2 (0.15), CHRNA1 (0.06), IGF2BP3 (-0.08), HOXA4 (-0.15), IL34 (-0.28), KCNIP3 (-0.32), CACNA1H (-0.33), and GNAL (-0.52) (Figure 2C). Finally, the cumulative sum of the product of gene weight and corresponding expression was used as the TMZ prognosis score.
We assessed the prognostic efficacy of risk scores in training sets. The distribution of risk scores and survival status indicated that patients with lower risk scores generally had better survival than did those with higher risk scores (Figure 2D). When patients were divided into the high-risk groups based on the median risk score, significant differences were shown in survival time between the two groups, which median survival was 1680 days (95% CI 855-2696) for the high-risk group and not reach for the low-risk group (HR 8.58, 95% CI 4.32- 17.05; p<0.0001, log-rank test; Figure 2E). And we found the excellent prognostic accuracy of the risk scores with time-dependent ROC analysis at varying follow-up times, especially for 3- and 5-year survival predictions (Figure 2F).
To confirm that the risk scores had similar prognostic values in different populations, we applied it to the independent validation set of 65 patients from CGGA batch 2 data. There were also differences in the distribution of risk scores and survival status in this data set (Figure 2G). We observed a significant difference in survival time between the high and low-risk groups, which median survival was 1048 days (95% CI 827-1654) for the high-risk group and not reach for the low-risk group (HR 4.94, 95% CI 2.41-10.12; p<0.0001, log-rank test; Figure 2H). Risk scores also showed excellent predictive accuracy in independently validated cohorts, with the AUC of survival years at a different time above 0.8 (Figure 2I).
3. The prognostic power of risk scores was independent of clinicopathological factors
After multivariable adjustment by clinicopathological variables including grade, radio status, IDH mutation status, and chr1p19q co-deletion status, the risk score remained a powerful and independent factor in the training cohort (HR 2.73, 95% CI 2.09-3.58, p < 0.001; Figure 3A). We also noted similar results in the independent validation cohort (HR 1.28, 1.05-1.55, p = 0.013; Figure 3B). The IDH1 mutation status and chr1p19q co-deletion status are key genomic markers related to the prognosis of glioma treatment. When stratified by IDH1 mutation status, the risk score was still a clinically and statistically significant prognostic model in the train and independent validation cohort (Figure 3CD). We got a similar conclusion, in which patients with chr1p19q wild-type and the high-risk score had the worst survival when patients were stratified according to the co-deletion status of chr1p19q (Figure 3EF). We also focused on the prognostic effect of tumor stage on risk score, and the risk score can distinguish patients with different stages from two groups with significant differences in survival (Figure 3GH). Further, we explored the prognostic performance of risk scores in glioblastoma (GBM). Although there are certain differences between LGG and GBM in the genome and other aspects, we found that the risk score still showed certain prognostic ability, and patients with lower risk scores tended to have better survival benefits (p = 0.062 in train data sets and p = 0.015 in independent validation sets; Figure 3IJ).
4. The risk score was related to the immune cell infiltration of LGG
We then analyzed the relationship between risk score and immune microenvironment. We use a variety of algorithms including ESTIMATE, XCELL, and CIBERSORT to measure the composition of the tumor's immune microenvironment. We first analyzed the association of key genes in the model with tumor purity, immune score, stromal score, and different immune cell infiltration proportion. The HOXA4, CYP4F12, and ADM2 were positively correlated with the immune and stromal score but negatively correlated with tumor purity (Figure 4A). The immune cells that are positively associated with these genes include macrophages, naïve CD8+ T cells, and activated dendritic cells (Supplementary Figure 3, Figure 4B).
The GNAL, on the other hand, was negatively correlated with the immune score, stromal score, and some myeloid lymphocyte infiltration (Figure 4AB). We then assessed differences in immune microenvironments between high and low-risk groups. Patients with high-risk scores had higher immune and stromal scores and lower tumor purity (Figure 4C), indicating that unfavorable prognosis in the high-risk group may be associated with the variation in tumor immune microenvironment. Further immune cell infiltration analysis showed that the activated dendritic cells, M1 type macrophages, B cells, and Th2 subset of CD4+ T cells were higher in patients with high-risk scores, while the B cell plasma and naïve CD8+ T cell were lower in patients with high-risk score (Supplementary Figure 4, Figure 4D). These results indicated that high-risk patients showed activation of innate and humoral-type immunity.
5. Development of a nomogram for predicting the benefit of TMZ treatment in LGG
To better predict the prognosis of LGG patients after receiving TMZ therapy in the clinic, a prognostic nomogram was developed by integrating risk score and three independent predictors of mortality including grade, IDH1 mutation status, and chr1p19q co-deletion status from the above analyses into a multivariate Cox regression model (Figure 5A). The calibration plot showed that the nomogram performed well in predicting patient survival according to an ideal model in training and independent validation sets (Figure 5BC). To evaluate the clinical benefits of the nomogram model, we performed decision curve analysis in both the training and independent validation sets. With 3-year survival as the endpoint, the curve showed that nomogram presented more clinical net benefits than several competing intervention strategies, namely, intervention for all, intervention for none, and intervention based on different clinical indicators (Figure 5DE).