3.1 Variation Analysis and WGCNA of DEGs
Differential expression analysis revealed that we obtained 4311 DEGs, indicating that the amount of these genes that were up- and down-regulated compared to normal tissues was 2480 and 1831, respectively(Fig. 2A). The WGCNA analysis was based on 4311 DEGs performed. The minimum soft threshold for constructing a scale-free network is 10, so 10 can be chosen as the optimal soft threshold for subsequent analysis (Fig. 2B). We set minModuleSize = 30 and MEDissThres = 0.25 as clustering criteria and identified eight gene modules (Fig. 2C, D). After analysis, we displayed the blue module to have the highest association between normal and tumor tissue (r = ± 0:9, p = 2e − 216) (Fig. 2E).
The Venn diagram shows the intersection between the best relevant blue modules and ARGs for LGG (n = 77) for subsequent analysis (Fig. 2F).
3.2 Characteristics of the ARGs
We analyzed the expression patterns of these 77 ARGs. Most ARGs were highly expressed in LGG, but a few genes remained lowly expressed in LGG tissues, including SMARCB1, FYN, and ADGRG1 (Fig. 3A). In addition, we performed PPI network analysis using the STRING database to reveal further the linkage between these ARG-associated genes (Fig. 3B). To get a comprehensive picture of the relationship between ARGs in LGG, we conducted a regression analysis (Fig. 3C). The results showed strong correlations between BRMS1 and GSTP1 (r = 0.769, p < 0.001), NUDT1 and BRMS1 (r = 0.757, p < 0.001), and SLC39A6 and CSNK2A1 (r = 0.7, p < 0.001), and the findings suggest that the three pairs of genes are likely to have similar biological functions.
CNV alterations in ARGs were visible on chromosomes (Fig. 3D), and the survey showed a prevalence of CNV-associated mutations. Copy number increases were most common, such as MYC, SMARCE1, and NOTCH1, which showed extensive CNV amplification. In contrast, NOS3, CASP3, and SMAD4 showed copy number loss (Fig. 3E). and in addition, We investigated the incidence of somatic mutations in 77 ARGs in LGG. Among them, TP53 had the highest mutation rate (up to 45.3%), with higher mutation rates in NOTCH1 (7.4%), PIK3CA (7.4%), and EGFR (6.2%), while the other genes had relatively low mutation rates (Fig. 3F).
3.3 Anoikis Patterns in LGG
To determine which ARGs had prognostic value, we first performed a univariate Cox analysis, and this analysis screened 50 of the 77 ARGs (Fig. 4A). A composite illustration of the complex relationship between the 50 ARGs and the prognostic value of the LGG was displayed as a network diagram (Fig. 4B). The data suggest a significant positive correlation between ARGs with prognostic impact and that there may be complex crosstalk between ARGs, which has important implications for patient prognosis.
Next, we identified two modification patterns based on the expression of 50 ARGs for patients with qualitatively different Anoikis patterns using consensus clustering (Fig. 4C, D), including 383 cases in Cluster1 and 114 cases in Cluster2. Predictive analysis showed that Cluster1 had a better prognosis than Cluster2 (Fig. 4E). As shown in Fig. 4F, analysis of compositional differences regarding clinical characteristics showed that compared to C1, the C2 species had more deceased patients (p = 2.5e-11), tumor grade G3 patients (p = 3.6e-06), IDH1-wild type patients (p = 1.2e-69), and MGMT promoter unmethylated patients (p = 1.2e-23), all of which suggest that patients in the C2 group may have a worse prognosis. Next, we used heat maps to illustrate ARGs' expression patterns and show that ARGs were differentially expressed in the two clusters (Fig. 4G).
In addition, TME was calculated in C1 and C2 separately, and the results of Fig. 4H showed that C1 had a lower ImmuneScore, StromalScore, and EstimateScore, but higher tumor purity. Next, we assessed the difference in immune infiltration of immune cells between the two subtypes based on the gene expression profile of LGG, using the CIBERSORT algorithm combined with the LM22 signature matrix and thus inferring the proportion of 22 tumor-infiltrating immune cells in each subtype, of which we significantly infiltrated 13 immune cells between the two subtypes (Fig. 4I, J). The immune cell composition and immune profile between molecular subtypes were further assessed using four additional immune cell infiltration algorithms, MCPCOUNTER, QUANTISEQ, EPIC, and TIMER (Figure S1). These findings imply that the C1 subtype and the cold immune phenotype are correlated with the C2 subtype and immunological hot phenotype, respectively.
3.4 Building a prognostic risk model using ARGs
In this study, data from 497 cases of LGG were randomly split into a training set of 348 cases and a validation set of 149 cases in a 7:3 ratio. We included 50 significant genes with a significant association with the overall survival of patients in LASSO and RF regression. In the LASSO regression analysis, we set the Lambda value to 0.02547996, resulting in 15 genes (Fig. 5A); the RF algorithm screened 42 genes associated with clinical outcomes in LGG patients (Fig. 5B). We then obtained the intersection of these two algorithms, with 12 genes present simultaneously in the results of both regression analyses (Fig. 5C). We performed a multivariate Cox regression analysis of the crossover genes to reduce the dimensionality further, resulting in the selection of six genes (Fig. 5D). We then created a six-gene signature. The model developed was as follows: ARG RiskScore = 0.106966*expEGFR + 0.383904*expSIX1 + 1.353163*expSP1-0.22767*expANGPTL2-0.69252*expBMP2-0.45003*expPDCD4.
3.5 Predictive value of 6 genetic markers
We calculated the expression level of the risk score for each sample and divided the 348 patients into a high-risk group and a low-risk group using the median. The risk score distribution of the samples was also plotted, with higher risk scores on the OS graph than the low-risk scores, suggesting that samples with high RiskScore had a poorer prognosis. Among the different samples, changes in SP1, EGFR, and SIX1 gene expression increased the risk values, and therefore the high expression of these genes was identified as a risk factor. In contrast, changes in ANGPTL2, BMP2, and PDCD4 gene expression decreased the risk values, and therefore the high expression of these genes was identified as a protective factor (Fig. 5E). Survival analysis showed that high-risk patients had poorer OS (Fig. 5H). The prognostic grading of the risk score was also analyzed by ROC curves over time using the R software Time ROC package. We analyzed the classification efficiency for 1, 3, and 5-year predictions. We can see that the model has a high AUC offline region with an AUC above 0.8 and an AUC offline region of 0.87 for the 1-year prediction (Fig. 5K).
Subsequently, we used TCGA (Test1, Test2), CGGA-325 (Test3), CGGA-693 (Test4), and GSE-16011 (Test5) as the validation cohorts and similarly calculated the expression of risk scores for each sample. The 149 LGG patients in the Test1 cohort, 497 LGG patients in the Test2 cohort, 172 LGG patients in the Test3 cohort, 420 LGG patients in the Test4 cohort, and 106 LGG patients in the Test5 cohort were divided into two groups based on the median RiskScore of the validation cohort. The survival analysis results were similar to the previous ones, with patients in the high-risk group being more likely to have shorter OS and higher mortality. Risk score distribution plots display that a high-risk score leads to poor survival times. Heatmaps were created indicating the presence of genes with high expression levels (SP1, EGFR, and SIX1) and genes with diminished expression (ANGPTL2, BMP2, and PDCD4) in the high-risk subgroup. AUC analysis of the risk score ROC curves indicated a high diagnostic value of the risk score in these five validation cohorts. Kaplan-Meier survival curves, RiskScore distribution plots, and risk score ROC curves for the five validation cohorts are shown in Figs. 5F, G, I, J, L, M, and Figure S2A-L, respectively.
3.6 Characterisation between risk subtypes
As shown in Fig. 6A, the high-risk group had a higher proportion of deceased patients, tumor grade G3 patients, IDH1 wild-type patients, and MGMT promoter non-methylated patients (all P < 0.05), all of this suggests poorer prognosis for patients in the high-risk group. We have adopted the ESTIMATE algorithm to study the specific performance of ARGs in TME. Our calculations suggested increased levels of immune fraction, stromal fraction, and estimated fraction but reduced levels of tumor purity in the high-risk group (Fig. 6B).
To further investigate the prevalence of infiltrating immune cells in TME, we used five methods to assess the proportion of immune cell infiltration in high- and low-risk LGG populations. As Fig. 6C shows, according to the CIBERSORT algorithm, the scores of T cell CD4 + memory resting, Macrophage M1, and Neutrophil immune cell infiltration levels are higher in high-risk groups. In contrast, Neutrophil and B cell memory immune cell infiltration levels were relatively low. We also analyzed four other immune infiltration algorithms to understand the immune profile between risk subtypes comprehensively. The level of multiple immune cells infiltrates generally higher in the high-risk group in the TIMER algorithm (Fig. 6D), QUANTISEQ algorithm (Fig. 6E), EPIC algorithm (Fig. 6F), and MCPCOUNTER algorithm (Fig. 6H). The increased immune cell infiltration is most likely a compensatory consequence of the low local immune response. We confirmed this conjecture of ours not only in the multiple immune infiltration algorithm but also in the examination of 47 immune checkpoint gene expressions and human leukocyte antigen (HLA) gene expressions in both risk subtypes, with most HLA genes and immune checkpoints being upregulated in the high-risk group and, conversely, a trend towards downregulation in the low-risk group (Fig. 6H, I). These findings imply that the immunological hot type is linked to the high-risk group, whereas the immune cold type is linked to the low-risk group.
Given these significant immune-related biological features, we investigated the association between Anoikis-associated prognostic model genes and risk scores with the tumor microenvironment separately and further. First, we examined the relationship between the CIBERSORT immune infiltration algorithm score and the expression of six Anoikis-associated prognostic model genes (Fig. 7A), indicating that all six Anoikis-associated prognostic model genes have an impact on the immune microenvironment. Correlations between CIBERSORT scores and ARG risk scores were also assessed (Fig. 7B), and results yielded positive correlations between T.cells.CD4.memory.resting, Macrophages.M1, Macrophages.M0 and T.cells.regulatory.Tregs. and ARG risk scores (Fig. 7C, D). correlated (Fig. 7C,D,E,F); whereas NK.cells.activated, B.cells.memory, Mast.cells.activated, and Monocytes were negatively correlated with the ARG risk score (Fig. 7G, H, I, J).
The difference in the two groups was considerable, as evidenced by our estimated TMB scores (p < 0.05), moreover, TMB scores and risk scores were positively correlated, indicating good efficacy of our risk scores in assessing the tumor microenvironment (Fig. 7K). In addition, in looking at the tumor mutational load in different risk groups (Fig. 7L), we found that the highest frequently mutated in the high-risk group were IDH1, TPRB, and ATRX. The incidence of IDH1 mutations was significantly lower in patients in the high-risk group (61.7%) than in those in the low-risk group (91.4%).
3.7 Identification and biological functional analysis of DEGs between risk subtypes
To analyze the molecular biological processes guided by the six Anoikis-associated prognostic biomarkers, The analysis of differences first obtained 133 DEGs under our set fold change = 1 and P < 0.05, of which 85 were upregulated and 48 downregulated (Fig. 8A). GSEA results for risk subtypes suggest that the high-risk group was enriched for Fc epsilonri signaling pathway, Fc gammar mediated phagocytosis and Mismatch repair. In contrast, the low-risk group was enriched for Ribosome, Selenoamino acid metabolism, and Taurine and hypotaurine metabolosm (Fig. 8B). Subsequent KEGG analysis showed that these DEGs were enriched in numerous Cellular Processes, Environmental Information Processing, Human Diseases, and Organismal Systems (Fig. 8C, E, G).GO analysis showed that more DEGs were enriched in immune-related biological pathways (Fig. 8D, F, H). The results suggest that these numerous molecular biological pathways profoundly impact the anoikis biological pathway.
3.8 ARG risk predicts new chemotherapy regimens
To find chemotherapeutic agents that could improve the prognosis of LGG patients, on the one hand, we used the R software oncoPredict package to analyze the differences in sensitivity of commonly used chemotherapeutic agents between the two risk subtypes. We found that the eight chemotherapeutic agents with the most significant differences in drug sensitivity were LFM.A13, S.Trityl.L.cysteine, Rapamycin, Parthenolide, QS11, PF.562271, Roscovitine and MP470 (Fig. 9A).
On the other hand, based on screening robust DEGs between high and low subgroups, which DEGs imported into Cmap, we screened the four most statistically significant small molecule drugs, including Risperidone, Pipamperone, FR-180204, and Erastin (Fig. 9B). Notably, a negative Score indicated that the drug reversed the desired biological properties and had potential therapeutic value. We then used AutoDock Vina to molecularly dock these four small molecule compounds to their target macromolecular proteins. The highest binding affinity after molecular docking was then visualized using pymol for DRD2 and Risperidone (-8.1 kcal/mol), DRD2 and Pipamperone (-6.9 kcal/mol), MAPK1 and FR-180204 (-7.4 kcal/mol), VDAC2 and Erastin (-7.6 kcal/mol). The binding sites and interaction results for the best-selected conformations showed that three hydrogen bonds in the binding of DRD2 and Risperidone (Fig. 9C), DRD2 and Pipamperone (Fig. 9D), VDAC2 and Erastin (Fig. 9E); and four hydrogen bonds were found in the binding of MAPK1 and FR-180204 (Fig. 9F).
3.9 Building and evaluating survival models for nomograms
To test if RiskScore based on ARGs may be an independent prognostic indicator, multivariate Cox regression studies were done to create and assess a nomogram survival model. The analysis showed that RiskScore was an independent prognostic factor for LGG patients in the TCGA training set (HR = 1.063, 95% CI: 1.037–1.090, p < 0.001) and was validated in multiple validation cohorts (Fig. 10A). Multivariate Cox was used to build a nomogram model in the TCGA cohort to estimate OS at 1-, 3- and 5 years. Age, gender (male, female), tumor grade (GII, GIII), IDH1 status (mutated, wild), receipt of additional adjuvant therapy such as radiation and chemotherapy (radiation and chemotherapy, radiation or chemotherapy, no adjuvant therapy), ARGs-based RiskScore was included in the model. AUC analysis of the model's ROC curve showed that the nomogram incorporated factors of high diagnostic value, both in the TCGA training cohort and the remaining five validation cohorts, with a C-index value of 0.82 (95% CI: 0.77–0.87) for the training cohort model (Fig. 10B, D, F, H, J, L). The model's accuracy in forecasting the 1-, 3-, and 5-year survival rates was demonstrated using calibration curves. The results showed that the column line plot accurately predicted 1-, 3- and 5-year survival rates for LGG patients (Fig. 10C, E, G, I, K, M). Nomograms for the TCGA queue are shown. (Fig. 11A).
3.10 Immunohistochemical analysis of six Anoikis-related model genes
We retrieved IHC-stained images of six Anoikis model gene-associated proteins from the HPA database in LGG and normal brain tissue. We used them to determine whether these six ARGs exhibited differentially high protein expression levels in LGG. Consistent with the above findings, the analysis revealed that protein expression levels of EGFR, SIX1, SP1, ANGPTL2, and PDCD4 were significantly higher in LGG samples than in normal samples (Fig. 12A, B, C, D, E).
3.11 Single-cell analysis of six Anoikis-related model genes
We used the single-cell data GSE89567 from the TISCH database of gliomas to investigate the expression of six ARGs in TME. This dataset has 17 cell group annotations and four intermediate cell types (Fig. 13A, B). EGFR, SIX1, SP1, ANGPTL2 PDCD4 and BMP2 were all expressed in cancer cells, and notably PDCD4 was expressed in all four intermediate cell types (Fig. 13C,D)