Identification and pan-cancer analysis of the DRF gene
Firstly, 340 ferroptosis-related genes associated with disulfidoptosis were identified through correlation studies. Using the "limma" package, TCGA-LUAD samples were categorized into tumor and normal tissues. Co-expression modules were constructed with the WGCNA package. Correlation analyses assessed the relationships between these modules and tumor versus normal tissues, leading to the identification of 13 modules with the optimal soft threshold power of 3 (Fig. 1A, B). Among these modules, 8 were significantly correlated with tumors (P < 0.05), with 5 showing positive correlations and 3 showing negative correlations (Fig. 1C). To identify relevant hub genes, we focused on the MEblue and MEbrown modules, which had the most significant positive and negative correlations (Pearson correlation coefficients of 0.52 and − 0.82, respectively; Fig. 1D, E). From these modules, 1,437 related genes were selected, with 613 genes from the MEblue module. After intersecting these genes with the disulfidoptosis ferroptosis (DRF) genes, 31 genes were obtained.
Kaplan-Meier survival analysis was used to screen for 10 candidate genes with strong prognostic relevance. SVM and Random Forest algorithms were then applied for hub gene screening (Fig. 2A). The intersection of these two algorithms narrowed the 10 candidate genes down to 6 (IL33, SLC2A1, CDCA3, KIF20A, FANCD2, RRM2; Supplementary Table 1, 2). These six DRF genes demonstrated favorable receiver operating characteristic (ROC) values in both the TCGA and GSE31210 cohorts (Supplementary Fig. 1), leading to their identification as DRF genes for LUAD. Subsequent survival and differential analyses for the GSE31210 cohort validated the reliability of these genes (Supplementary Figs. 2, 3).
The interrelationships among DRF genes in the genome-wide data for 33 cancers are illustrated in Fig. 2B. Expression analysis revealed differential expression in 18 cancer types, indicating that IL33 could act as a protective factor for most tumors, while the other five genes appeared as risk factors (Fig. 2C). Survival analysis results suggested that DRF genes can differentiate survival times across multiple cancers (Supplementary Table 3, P < 0.05). Notably, WGCNA was used to identify central regulators involved in the connectivity of DRF and disulfidoptosis regulators across 33 cancer types (Fig. 2D). The analysis showed a strong correlation among central DRF regulators (R = 0.65, P = 4.7e-05; Fig. 2E). In vitro experiments demonstrated significantly higher expression of CDCA3, KIF20A, FANCD2, RRM2, and SLC2A1 in the NSCLC (A549) cell line compared to normal lung epithelial cells (BEAS-2B), with statistically significant differences (Fig. 3), confirming the potential of DRF genes as tumor detection markers.
Mutational landscape and immune profile of the DRF gene
The chromosomal locations of DRF genes are shown in Fig. 4A. The study revealed that CNV deletions were more prevalent in IL33, CDCA3, and KIF20A, while SLC2A1, RRM2, and FANCD2 were more frequently amplified (Fig. 4B). These analyses demonstrated significant genetic and expression heterogeneity of DRF in lung adenocarcinoma, suggesting that imbalanced DRF expression plays a critical role in LUAD development. CIBERSORT analysis indicated that IL33 was positively correlated with various immune-activated cells and negatively correlated with immunosuppressive cells, such as regulatory T cells (Tregs) (Fig. 4C).
Clinical and immunological analysis of DRF subgroups
To further investigate the clinical value and functional biological patterns of these DRFs, we performed clustering analyses on all tumor samples in the TCGA LUAD cohort based on the expression levels of the six DRFs. We compared the expression of DRF-related genes with LUAD subtypes to explore their relationship. Setting the clustering value (K) from 2 to 10, the best aggregation stability was found at K = 2 (Fig. 4D). Consequently, the TCGA LUAD cohort was divided into cluster C1 (n = 347) and cluster C2 (n = 238) based on DRF expression. Heatmaps were used to display gene expression and TNM typing, staging, age (≤ 65 or > 65 years), and gender for the 585 LUAD samples. The IL33 gene expression level was significantly lower in the C1 group compared to the C2 group and was more closely associated with poor prognostic indicators (e.g., higher TNM staging and grading, older age), consistent with previous analyses (Fig. 4E). Kaplan-Meier analysis revealed that the C2 group had a more favorable OS compared to the C1 group, which showed a survival disadvantage consistent with clinical parameters (P = 0.003, Fig. 4F).
Given the importance of tumor immunity in tumor development, we explored the level of cellular infiltration in the tumor microenvironment. ssGSEA scores indicated that the C2 group had richer infiltration of immune cells, such as immature B cells and natural killer(NK) cells. The poorer clinical outcome in the C1 group may be related to immunosuppression induced by a Type 2 T helper cell (Th2) subpopulation in CD4 + T cells21 (Fig. 5A). The TME (tumor microenvironment), which includes tumor cells, mesenchymal stromal cells, immune cells, cytokines, and chemokines, was assessed. Figure 5B shows that the C2 group had higher StromalScore, ImmunityScore, and ESTIMATEScore, associated with lower tumor purity and better immune infiltration. This suggests that the C2 group may be linked to an immune microenvironment that promotes tumor death and is likely more sensitive to immunotherapy22. Analysis of the immune environment of the subgroups (Fig. 5C) revealed that Type_II_IFN_Response was more prevalent in C2, indicating characteristics of the immune response. Parainflammation, which is crucial for maintaining homeostasis, was more prominent in C1, potentially driven by p53 mutations, which can promote cancer. Given the sensitivity of parainflammation to NSAIDs, aspirin may offer better therapeutic effects for patients in the C1 group 23.
To further investigate the pathways dominated by DRFs, we conducted GSVA. Figure 5D illustrates that KEGG analysis revealed a significant enrichment of inflammatory metabolism-related pathways in the C2 group. In contrast, tumorigenic pathways, such as cell cycle and amino acid metabolism, were significantly enriched in the C1 group, which may be related to the previously mentioned parainflammation.
Functional and pathway enrichment analysis of DRF groupings
Using the limma package, 320 DEGs were identified from 37,992 genes, and these DEGs were used for subsequent enrichment analyses. GO analysis indicated that DRF-related genes were primarily associated with microtubule binding, extracellular matrix, and structural constituents in the cellular component (CC). These genes were related to condensed chromosomes, centromeric regions, and chromosomal regions. In terms of biological processes (BP), they were associated with nuclear chromosome segregation and mitotic nuclear division (Fig. 6A-D). KEGG pathway analysis highlighted the cell cycle and p53 signaling pathway as the primary DRF-enriched pathways. The p53 signaling pathway regulates cellular processes such as metabolism, antioxidant defense, and ferroptosis24 (Fig. 6E-G).
Validation of MLS risk model construction
For sample selection, we chose 572 samples with available outcomes and non-zero survival times from TCGA-LUAD as the training group. Using the Kaplan-Meier method, 42 genes significantly associated with OS (p < 0.001) were identified from the 320 DEGs. Among these, 40 genes that were expressed in both TCGA-LUAD and GSE31210 were selected to be included in the integrated framework for constructing the MLS risk model.
In the TCGA-LUAD training cohort, consistent models were constructed using 101 algorithm combinations. The average C-index for each model across all cohorts was calculated to evaluate their predictive ability, with the GSE31210 cohort used for model validation (Fig. 7A). Ultimately, the RSF + SuperPC algorithm, which achieved the highest average C-index, was identified as the most valuable model. This model was developed using 24 hub genes (FAM83A, KRT6A, SLC2A1, RHOV, ANGPTL4, METTL7A, GAPDH, ECT2, LYPD3, GJB2, NAPSA, CYP4B1, LOXL2, CTSH, SFTPB, TMPRSS2, BTG2, ANLN, SLC34A2, CYP2B7P, PFKP, UBE2S, C16orf89, KPNA2). Samples with MLS scores above the mean (n = 286) were classified as the high-risk group, while the remaining samples were categorized as the low-MLS group. All patients in the high-MLS group exhibited a poorer prognosis (Fig. 7B). The expression levels of the 24 genes showed statistical significance in both high and low scoring groups (Fig. 7C). The Mulberry diagram indicated that the C2 group, which comprised a larger portion of the low-MLS group, was associated with better survival outcomes, with a statistically significant difference in scores between the C1 and C2 groups (Fig. 7D, E). Furthermore, to comprehensively assess the predictive power of MLS, we included 77 different features in the study. MLS demonstrated a superior C-index in both TCGA-LUAD and GSE31210 datasets (12/78, 10/78; Fig. 7F), thereby confirming the reliability of the MLS model.
MLS immunological profile
The CIBERSORT algorithm was used to analyze the correlation between the 24 MLS genes, MLS scores, and various immune cells. The results indicated that most MLS genes and MLS scores were negatively correlated with immune-activated cells and positively correlated with immune-suppressed cells. Additional analyses using five other immune infiltration algorithms also showed that immune cell infiltration (including dendritic cells, eosinophils, monocytes, and NK cells) was significantly higher in patients with low MLS scores compared to those with high MLS scores, suggesting an immune activation state (Fig. 8A, B). This implies that LUAD with low MLS levels may be classified as "hot tumors." Conversely, T regulatory cells (Tregs) and tumor-associated fibroblasts (CAFs) were predominantly enriched in patients with high MLS scores, which likely contribute to tumor proliferation, invasion, drug resistance, and an immunosuppressive state, leading to poorer outcomes (Fig. 6B, C). This suggests that LUAD with high MLS levels might be classified as "cold tumors," which are less sensitive to immunotherapy. TIDE scores were negatively correlated with risk scores (Fig. 8C), indicating that low-risk patients may benefit more from immunotherapy. Increased TMB expression is associated with enhanced T cell activation. TMB has potential clinical utility in lung cancer and could serve as a biomarker for immunotherapy25. Our survival analyses showed that patients with lower MLS scores and higher TMB levels typically had significantly prolonged OS, suggesting that MLS could complement TMB in assessing patient prognosis (Fig. 8D, E).
GSEA analysis of MLS
GSEA analysis for subgroups (Fig. 8F) revealed that pathways enriched in low-scoring subgroups were associated with immune cell regulation and GnRH receptor signaling. These pathways, in addition to their role in regulating pituitary gonadotropin secretion, may predict survival and resistance to tumor proliferation, metastasis, and anti-angiogenesis in lung cancer patients26. The Hedgehog signaling pathway, utilized in developing tumor-targeting drugs, and VEGFR inhibitors, known for their effects on vascular smooth muscle and cancer cell proliferation, were also identified27. Interestingly, long-term depression was also enriched, suggesting a potential link between these pathways and tumor biology.
Drug sensitivity analysis of MLS
The drug sensitivity analysis identified nine drugs with varying effectiveness. The low-scoring subgroup samples showed greater sensitivity to drugs such as BMS-754807, Doramapimod, JQ1, OF-1, and SB505124. Conversely, the high-scoring subgroup samples were more sensitive to AZD7762, Dasatinib, Docetaxel, and WIKI4. These findings suggest that MLS may be useful in evaluating the efficacy of targeted therapies and chemotherapy (Fig. 9A-I).