Clinicopathological, expression/methylation profile, and prognostic differences in TETs.
As indicated in Table S1, significant differences were observed in clinical pathological characteristics, such as age at diagnosis, WHO histological type, and Masaoka stage, among the low-risk, high-risk, and TC groups (P < 0.05). However, no sex differences were observed. The age at diagnosis in the low-risk group of TET patients was significantly higher than that in the high-risk group. However, no significant differences were observed between the low-risk and TC groups, as well as between the high-risk and TC groups (Fig. 1A). For tumor mutation burden (TMB), patients in the TC group had significantly higher TMB values than those in the low-risk and high-risk groups (P < 0.01) (Fig. 1B). However, MSI scores were similar, and no differences were observed.
The “limma” package was utilized to filter for DEGs and DMGs between distinct groups. In comparison to the high- and low-risk groups, the TC group demonstrated a significantly higher number of DEGs, with 2267 genes upregulated and 1857 genes downregulated. DEGs were also identified between the high- and low-risk groups, with 509 upregulated genes and 576 downregulated genes (Fig. S1 A-B and Table S2). This trend was also observed for DMGs (Fig. S1 C-D and Table S3). The heat map effectively illustrated the clustering patterns between the TC group and low/high-risk groups, including DEGs and DMGs (Fig. 1C and 1D). However, no clustering was observed between the high- and low-risk groups in terms of expression or methylation profiles.
Survival analysis revealed that patients in the TC group exhibited lower overall survival (OS), progression-free survival (PFS), and disease-free survival (DFS) rates than those in the low-risk and high-risk groups, especially DFS (P < 0.05) (Fig. 1E-1G). However, there was no significant difference in survival prognosis between the low- and high-risk groups.
Clinicopathological, expression profile, and prognostic differences between the GTF2I L424H mutation group and wild-type group
To further explore the association between the GTF2I mutation and pathological classification, we categorized the patients into two distinct groups: those harboring the GTF2I L424H mutation and those harboring the wild-type GTF2I. As illustrated in Figs. 2A and 2B and summarized in Table S3, patients in the GTF2I L424H mutation group exhibited significantly higher values for both age and TMB than those in the GTF2I wild-type group (p < 0.05). This pattern mirrors the observed distinctions between the low-risk and high-risk groups, implying a potential link between the GTF2I L424H mutation, advanced age, elevated TMB values, and the overall pathological classification.
“Limma” analysis revealed significant differences in expression and methylation profiles between the GFT2I mutation group and the wild-type group. In comparison to the GTF2I wild-type, the GTF2I L424H mutation group exhibited 1750 significantly upregulated genes, including 553 upregulated genes in the low-risk group compared to the high-risk group (553/576), and 880 significantly downregulated genes, including 406 downregulated genes in the low-risk group compared to the high-risk group (416/509) (Table S4 and Fig.S2). Interestingly, the heatmap results revealed distinct clustering between the two groups, regardless of whether the DEGs or DMGs were examined (Fig. 2C and 2D). Survival analysis revealed that individuals with the GTF2I L424H mutation had a longer survival rate than those in the GTF2I wild-type group (Fig. 2. E-G). This difference was particularly pronounced in terms of the PFS time (p < 0.05).
GTF2I L424H mutation potentially related to tumor suppression
We further explored the potential mechanism between GTF2I L424H mutation and the prognostic impact of TETs. We performed a comprehensive analysis of the gene mutation landscape. The waterfall plot depicted in Fig. 3A and 3B highlights GTF2I as the most frequently mutated gene, accounting for 44% (66 out of 151) of cases, primarily observed in the low-risk group, especially in type A and AB thymomas. In a comparative mutation analysis between the GTF2I L424H mutation and GTF2I wild-type groups, we observed significant differences in the mutation frequencies of two genes: HRAS (15% vs. 1%) and TP53 (0% vs. 8%) (Fig. 3C and 3D) (P < 0.05). HRAS mutations were notably associated with GTF2I and TTN mutations, while TP53 mutations demonstrated significant mutual exclusivity with GTF2I mutations (Fig. 3E).
Mutation analysis of the tumor signaling pathways was conducted to compare the GTF2I mutation and wild-type groups. The plot illustrating oncogenic signaling pathways (Fig. 3F) indicates that the P53 and Hippo signaling pathways are predominantly mutated in the GTF2I wild-type group. Detailed gene mutations in the P53 and Hippo signaling pathways are presented in Fig. S3. Notably, the top three enriched mutation pathways in the GTF2I mutation group were transcription factors, MAPK signaling, and genomic integrity, whereas the wild-type GTF2I group exhibited mutations in genomic integrity, MAPK signaling, and RTK signaling (Fig. 3G and 3H).
Differential gene expression profiles between the GTF2I L424H mutation group and wild-type group in thymoma patients
They considered the substantial overlap in DEGs between patients with and without GTF2I L424H mutation in thymoma patients (low-risk and high-risk groups), and their effective clustering into distinct groups. A “Limma” analysis was also performed between the GTF2I L424H mutation group and the wild-type group in thymomas. As shown in Fig. 4A and 4C and Table S4, 1870 genes were significantly upregulated in the GTF2I L424H group, among which 555 genes were also significantly upregulated in the low-risk group compared to the high-risk group (555/576). Additionally, 845 genes were significantly downregulated, including 419 genes that were also significantly downregulated in the low-risk group compared to the high-risk group (419/509). Cluster analysis revealed significant differences in the gene expression profiles between the GTF2I L424H mutation and wild-type groups (Fig. 4B). After eliminating interference from patients with TC, we observed that the GTF2I L424H mutation remained associated with a better prognosis, particularly in terms of PFS (Fig. 4D and Fig.S4).
GTF2I L424H mutation may regulate pathways linked to tumor development
To further study the potential effect of GTF2I L424H mutation on tumorigenesis, GO and KEGG enrichment analyses were performed using DEGs in thymoma patients. Enrichment analysis results between the GTF2I L424H mutation group and the wild-type group exhibited overlaps and differences compared to those between the low- and high-risk groups, as illustrated in Fig. 5A and 5B. Biological process (BP) enrichment analysis showed that GTF2I L424H mutation-related genes were mainly involved in cell-cell adhesion via plasma-membrane adhesion molecules, sensory organ morphogenesis, and synape organization. Molecular function (MF) enrichment analysis showed that the role of GTF2I L424H mutation in tumor pathogenesis was mainly related to the external matrix structure constituent, gated channel activity, and heparin binding. We found that GTF2I L424H mutation-related genes were enriched in the collagen-containing extracellular matrix, synaptic membrane, and ion channel complex in the cellular component (CC) enrichment analysis. In addition, KEGG pathway analysis showed that GTF2I L424H mutation participates in some pathways, such as the calcium signaling pathway, neuroactive ligand-receptor interaction, and ECM-receptor interaction (Fig. 5B). Subsequently, GSEA enrichment analysis revealed that GTF2I L424H mutation mainly participated in the TGFβ signaling pathway, ECM receptor interaction, focal adhesion, citrate cycle, TCA cycle, oxidative phosphorylation, etc. (Fig. 5C).
In contrast to the findings between the low-risk and high-risk groups, the cytoHubba plugin identified ten genes (CDH2, CXCL12, EGF, FGF5, FGF10, FGF17, FGF20, FN1, PXDNL and TWIST1) with the highest degree scores as hub genes for GTF2I L424H mutation-related TET, as depicted in Fig. 5D and 5E.
Estimation of immune cell infiltration and immune checkpoints between GTF2I mutant and wild-type thymoma patients
Given the discovery that the GTF2I L424H mutation is related to tumor immune microenvironment pathways such as the TGF-β pathway and ECM-receptor interaction [8, 9] in pathway analysis, we conducted analyses related to immune infiltration and immune checkpoints. There was no significant difference (p > 0.05) in the estimated scores between the GTF2I L424H mutation and wild-type groups (Fig. 6A). The stromal score in the GTF2I L424H mutation group was significantly higher than that in the GTF2I wild-type group (p < 0.01) but lower in the immune score (P < 0.001) (Fig. 6A). The timer score of macrophages in the GTF2I L424H mutation group was significantly higher than that in the GTF2I wild-type group (p < 0.001) but lower in the DC (P < 0.01) (Fig. 6A). The IPS scores of the CP (p < 0.001) and AZ (p < 0.05) in the GTF2I L424H mutation group were significantly higher than those in the GTF2I wild-type group (Fig. 6A). In terms of inhibitory immune checkpoints, CD276 (p < 0.0001), EDNRB (P < 0.0001), IL12A (P < 0.0001), KIR2DL1 (P < 0.05), KIR2DL3 (P < 0.01), and VEGFA (P < 0.0001) showed significantly higher expression levels in GTF2I L424H mutation than in the GTF2I wild-type group, but lower in ADORA2A (P < 0.01), BTLA (P < 0.0001), CD274 (P < 0.05), and PDCD1 (p < 0.001). In terms of stimulatory immune checkpoints, BTN3A1 (P < 0.05), CD40 (P < 0.0001), CX3CL1 (P < 0.001), CXCL10 (p < 0.001), CXCL9 (p < 0.01), ENTPD1 (p < 0.01), ICAM1 (p < 0.0001), IL1A (p < 0.05), TLR4 (p < 0.001), and TNFSF9 (p < 0.001) showed significantly higher expression levels in GTF2I L424H mutation group than in the GTF2I wild-type group, but lower expression of CD27 (p < 0.0001), CD28 (p < 0.0001), CD40LG (P < 0.0001), ICOS (p < 0.001), ITGB2 (p < 0.01), PRF1 (p < 0.05), and TNFSF4 (p < 0.05). The detailed expression profiles of immune checkpoints are shown in Fig. S5.
Survival analysis and construction of multivariate Cox regression model
Thymoma patients with GTF2I L424H mutation exhibited an upregulation of six hub genes and downregulation of four hub genes, as shown in Fig. 7A, relative to the GTF2I wild-type patients. Univariate Cox regression analysis of the expression of the 10 hub genes revealed significant correlations with the prognosis of thymoma (Fig. 7B). High expression levels of CXCL12, EGF, FGF10, FGF20, and PXDNL were identified as negatively affecting thymoma prognosis, particularly for EGF, FGF10, and FGF20 with p-values < 0.1. Conversely, the expression of CDH2, FGF17, FGF5, FN1, and TWIST1 was associated with a more favorable prognosis in thymoma. Notably, CDH2 and TWIST1 exhibited p-values of < 0.1. Kaplan-Meier survival curves demonstrated similar outcomes (Fig. S6). Diagnostic age, sex, WHO classification, Masaoka stage, GTF2I status, and TMB were included in univariate Cox regression analysis. However, only TMB exhibited a p-value of < 0.1 (Fig. S7).
The stepwise multivariable Cox method was used to assess the prognostic significance of the six features selected from the univariate Cox regression analysis in 106 patients with thymoma (Table S5). Notably, FGF20, FGF10, EGF, and TWIST1 were identified using stepwise multivariate Cox regression to construct a risk score model (Fig. 7C). The overall C-index of the Risk-Score model was 0.983, as evidenced by the log-rank test (P < 0.001), score test (P < 0.001), and Wald test (Wald test < 0.05). We then utilized the R package “pROC” (version 1.17.0.1) for ROC analysis to obtain the AUC. Using patient follow-up time and model scores, we applied the ROC function at 1095 and 1825 days, evaluating the AUC and confidence intervals with the CI function. The final AUC results were 0.99 for 1095 days and 1.0 for 1825 days OS (Fig. 7D) suggested successful model construction. The optimal risk score cutoff (6.018) was determined using the R package “maxstat” (maximally selected rank statistics with severe p-value approximations version 0.7–25). Patients were stratified into high- and low-risk groups based on this cut-off. Prognostic differences were analyzed using the survival function in the R package “survival.” Significance was assessed using the log-rank test, revealing a significantly poorer prognosis in the high-risk group than in the low-risk group (2721 days vs. 4575 days, p < 0.001) (Fig. 7E).
We concurrently examined the association between varied risk scores and patient follow-up duration, events, and gene expression alterations. As the risk scores increased, a noticeable decrease in patient survival rates was observed (Fig. 7F). TWIST1 has emerged as a protective factor, displaying downregulated expression with increasing risk scores. Conversely, FGF20, EGF, and FGF10 were identified as risk factors that exhibited an upregulation trend with escalating risk scores.