3.1 Analysis of lung cancer mutation database in TCGA database and characteristics of LGG patients.
The single nucleotide variation data of LGG patients were obtained from TCGA database, the data processed by varscan software were selected, and the R software "maftools" package was used for visualization. A total of 488 (96.44%) of the 506 LGG patients had mutations, of which missense mutations accounted for the largest proportion. The mutation types were dominated by single nucleotide polymorphisms. In the SNV classification, C>T was the most important type, and IDH1, TP53 and ATRX had the highest mutation rates, as shown in Fig 1A. The waterfall chart (Fig 1B) showed the specific mutation type and mutation gene ratio in each sample. Interaction diagram (Fig 1C) showed the relationship between two mutant genes, in which green indicated positive correlation and brown indicated negative correlation.
The clinical information of 515 LGG samples was obtained from TCGA database. After sorting out the data, it was found that there were 285 male patients and 230 female patients, and the clinical baseline was detailed in Table 1.
Table 1 Clinical characteristics of 515 LGG patients in TCGA clinical database.
Variables
|
Number(%)
|
Vital status
|
|
Alive
|
406(78.83%)
|
Dead
|
109(21.17%)
|
Age
|
|
≤65
|
483(93.79%)
|
>65
|
32(6.21%)
|
Gender
|
|
Female
|
230(44.66%)
|
Male
|
285(55.34%)
|
Tumor grade
|
|
WHO II
|
249(48.35%)
|
WHO III
|
265(51.46%)
|
Unknown
|
1(0.19%)
|
3.2 Correlation between TMB score and survival outcome, clinical stage and primary tumor size.
529 LGG transcriptome data downloaded from TCGA were divided into TMB groups, including 243 cases in high TMB group, 276 cases in low TMB group, and 10 cases were unable to match the grouping due to lack of data. It was found that the survival rate of high TMB group was significantly higher than that of low TMB group, and the difference was statistically significant (P<0.001), which confirmed that the survival time of patients in high TMB group was longer. According to the analysis of clinical data, the correlation between TMB scores and age, gender and grade were explored respectively. The results showed that TMB value of LGG patients aged > 65 years was significantly higher than that of LGG patients aged ≤65 years, and TMB value of LGG patients with G3 grade was significantly higher than that of G1-2 grade, the difference was statistically significant (P < 0.001).
3.3 Gene expression differential analysis, GSEA analysis and PPI network of different TMB groups.
There were 318 differentially expressed genes between the high and low TMB groups, and the differentially expressed genes were identified by "limma" package, with |log2 FC|> 1 and FDR < 0.05 as screening criteria, and the top 10 differentially expressed genes were shown in Table 2. According to the differential gene FDR and its expression multiple, the volcano map was drawn by "ggplot2" package in Fig 3A. The heat map showed the expression level of DEGs by constructing the "pheatmap" package of R software, and the results were shown in Fig 3B. Venny diagram showed 1811 immune related genes and 318 differentially expressed genes, and 26 differentially expressed immune genes were obtained, as shown in Fig 3C. GO enrichment and KEGG pathway analysis of DEGs showed that BP was mainly related to pattern specification process, chromosome segregation, nuclear division and other pathways, while CC was mainly related to spindle, chromosome. Moreover, the MF was mainly related to DNA binding transcription activator activity-RNA polymerase II specific, microtubule binding and extracellular matrix structural constitution. KEGG pathway was related to cell cycle, proteoglycans in cancer, oocytemeios and other signal pathways. The results were shown in Fig 4A, B. In addition, GSEA results showed that the expression of adipocytokine signaling pathway, alanin aspartate and glutamate metabolism, aldosterone regulated sodium reabsorption and other signaling pathways were significantly enriched when TMB level was used as phenotype, as shown in Fig 4C.
Table 2 The top 10 DEGs in LGG
Gene
|
Low group
|
High group
|
LogFC
|
p-Value
|
FDR
|
SFRP2
|
77.20208
|
30.61399
|
-1.33445
|
5.86E-22
|
1.08E-17
|
SELL
|
47.09799
|
22.71912
|
-1.05176
|
2.61E-12
|
3.77E-10
|
USH1C
|
13.91831
|
6.485024
|
-1.1018
|
2.49E-14
|
1.07E-11
|
FSTL5
|
3.620375
|
1.527565
|
-1.24491
|
3.87E-20
|
2.38E-16
|
HPSE2
|
8.994663
|
3.922698
|
-1.19722
|
1.27E-16
|
1.67E-13
|
CHI3L1
|
27.82078
|
189.8005
|
2.770249
|
1.97E-11
|
1.81E-09
|
LTF
|
3.718816
|
24.61306
|
2.726509
|
4.72E-05
|
0.000281
|
FMOD
|
1.543689
|
11.3636
|
2.879966
|
7.54E-07
|
8.79E-06
|
AC004233.2
|
0.111767
|
5.472285
|
5.613576
|
1.50E-05
|
0.000108
|
OTP
|
0.05032
|
0.377971
|
2.909065
|
9.82E-18
|
1.88E-14
|
PPI network of DEGs was constructed by STRING v10, and visualized by Cytoscape. The result was shown in Fig 5A. The cytohubba software of Cytoscape was selected to determine the key nodes in the PPI network, and 10 key genes including CCNB1, AURKA, BUB1, AURKB, CDC20, BIRC5, TTK, CCNB2, CDCA8 and CENPE are obtained. The node degree score was generated according to Cytoscape, and the hub gene score was shown in Figure 5B.
3.4 Immune cells in high and low TMB group.
Based on the CIBERSORT algorithm, the proportion of 22 immune cells in each LGG patient is estimated, and the results are displayed in a block diagram, where different colors represent different cell subgroups, as shown in Fig 6A. The differences between the high and low TMB groups in 22 immune cells are illustrated in Fig 6B. Among them, the expression of the low mutation load group in Monocytes cells was significantly higher than that in the high mutation load group, and the difference was statistically significant. There was no significant difference in the expression of the two groups in the remaining cells.
3.5 Establishment of risk model
The related genes were obtained by univariate COX and multivariate COX analysis, a total of 9 genes, including CXCL11, GDF15, VEGFA, PLA2G2A, BIRC5, VAV3, IL9, TGFB2 and TNFRSF12A, were involved in the construction of the prognosis model. The survival time of the high expression group of 9 genes was significantly longer than that of the low expression group, and the difference was statistically significant (P < 0.05). The results were shown in Fig 7A. The LGG patients were divided into high-risk and low-risk groups. Comparing the survival time of high-risk patients with that of low-risk patients, it was found that the survival time of high-risk patients was significantly longer than that of low-risk patients, and the difference between the two groups was statistically significant (p<0.05), as shown in Fig 7B. The ROC curve constructed a model by calculating the area under the curve to predict the survival time of patients, and the AUC value of the curve was 0.863, indicating that the model had higher accuracy in predicting the survival time, as shown in Fig 7C.
3.6 TIMER database analysis
Immune cell analysis of high TMB and low TMB groups showed that the expression of low mutation load group was significantly higher than that of high mutation burden group in Monocytes cells, and 9 genes including CXCL11, GDF15, VEGFA, PLA2G2A, BIRC5, VAV3, IL9, TGFB2, TNFRSF12A participated in the model construction. The correlation between Monocytes cells and copy numbers of 9 genes was analyzed, and the results were shown in Fig 8A. Kruskal−Wallis analysis showed that among the 9 genes, the copy numbers of CXCL11, CDF15, PLA2G2A, TGFB2, VAV3 were correlated with the content of Monocytes cells.
Multivariate Cox regression analysis was performed on 26 immune-related DEGs by the "Survival" module of TIMER database. Nine genes were identified as highly correlated with the survival time of patients. The expression difference of these immune genes in LGG was verified by analyzing the expression of CXCL11, GDF15, VEGFA, PLA2G2A, BIRC5, VAV3, IL9, TGFB2 and TNFRSF12A in TIMER database. The results showed that SCNAs with different genes had influence on immune infiltration of LGG patients, as shown in Fig 8B. To illustrate the relationship between immune infiltrating cells and LGG survival rate, COX regression equation was used to calculate the expression levels of six immune infiltrating cells. Kaplan-Meier plots showed that the contents of 6 kinds of immune infiltrating cells were related to the survival and prognosis of patients, and the high levels of 6 kinds of cells could prolong the survival time of patients.
3.7 The correlation between BIRC5 gene and immune checkpoint
Ten Hub genes were constructed from PPI network and nine genes were constructed from survival model to screen BIRC5 as the target gene. 529 RNAseq data in LGG HTSeq-FPKM format were downloaded from TCGA database, and 1 case was deleted, with 528 cases in total. RNAseq data in FPKM format were transformed into log2, and the correlation between BIRC5 gene in LGG and CD274, PDCD1, CTLA4 immune checkpoints was analyzed by r software "ggplot2" package. The results indicated that BIRC5 gene expression was closely related to LGG prognosis, and BIRC5 gene expression was positively correlated with pdcd1 and ctla4, but negatively correlated with CD274.
3.8 The correlation between Birc5 expression and prognosis of LGG and clinical features of LGG.
The expression data of BIRC5 single gene was extracted, and the columnar difference map of BIRC5 expression was drawn by software "ggplot2". The result was shown in Fig10A, which showed that the expression of BIRC5 in tumor tissue was significantly higher than that in normal tissue. The LGG patients were divided into survival group and death group according to their survival status, and the correlation analysis was carried out between the two groups, as shown in Fig 10B. The results showed that the expression of BIRC5 in death group was significantly higher than that in survival group (p < 0.001), indicating that BIRC5 gene was a risk factor for LGG patients, and the higher the expression of BIRC5, the worse the prognosis of LGG patients. According to the median expression of BIRC5 mRNA, 528 LGG samples were divided into high expression group and low expression group. The relationship between BIRC5 expression and clinical characteristics of LGG patients was expounded. The results showed that BIRC5 expression was related to age and grade of LGG patients. The expression of BIRC5 in patients aged > 40 with grade G3 was significantly higher than that in patients aged ≤40 with G2 grade, and the difference was statistically significant, as shown in Fig 10C for details, which indicated that the expression of BIRC5 was correlated with advanced age and G3 stage.
3.9 GSEA enrichment analysis of BIRC5 gene
To further understand the biological process and function of BIRC5 in LGG and the related signal pathway, LGG patients were divided into high expression group and low expression group according to BIRC5 expression level. In this study, BIRC5 expression data set. get file and phenotypic data. els file were obtained by Perl language and imported into GSEA4.0.3 software. The result showed that the high expression of BIRC5 mRNA was related to P53 signaling pathway, DNA replication, cell cycle, phosphoinositol signaling system and neuroactive ligand receptor interaction signaling pathway. The results were shown in Fig 11 and Table 3.
Table 3 GSEA analysis of B1RC5 enrichment results
ID
|
ES
|
NES
|
p.adjust
|
FDR
|
CELL_CYCLE
|
0.797
|
2.771
|
0.013
|
0.009
|
P53_SIGNALING_PATHWAY
|
0.723
|
2.296
|
0.013
|
0.009
|
DNA_REPLICATION
|
0.807
|
2.306
|
0.013
|
0.009
|
PHOSPHATIDYLINOSITOL_SIGNALING_SYSTEM
|
-0.577
|
-2.008
|
0.013
|
0.009
|
NEUROACTIVE_LIGAND_RECEPTOR_INTERACTION
|
-0.565
|
-2.349
|
0.016
|
0.011
|