1. Identification of Significant Module Genes in IS via WGCNA
An unweighted scale-free co-expression network was established to identify key modules relevant to IS. Initially, we clustered samples from the GSE22255 dataset based on Euclidean distances of gene expression values to detect outliers; Fig. 2A shows that three outlier samples were identified. After removing these outliers, we re-clustered the remaining samples, where white represented control samples and red represented IS samples (Fig. 2B). Subsequently, when the soft threshold (power) was set to 6, the R^2 value reached 0.85, indicating that connectivity tended towards zero (Figs. 2C and 2D). Following this, the dynamic tree cut algorithm identified eight modules within the co-expression network (Fig. 2E). Based on the module-trait relationships depicted in Fig. 2F, we selected three modules (black, red, and turquoise) with correlations greater than 0.2 for further analysis. In total, 4,467 IS-related genes were identified across these three modules for subsequent analyses.
2. Identification of Differentially Expressed IRGs in IS
To explore the extent of gene expression differences between IS and normal conditions, we identified DEGs. We screened 1,049 genes between the IS and control groups, including 564 upregulated and 485 downregulated genes (Fig. 3A). Intersecting these DEGs with the 4,467 genes identified through WGCNA resulted in 383 IS-related DEGs. Enrichment analysis revealed that these genes were involved in processes such as regulation of translation, nuclear speckles, protein kinase activity, and the NF-kappa B signaling pathway (Figs. 3B and 3C). Ultimately, Venn diagram analysis indicated that nine of these IS-related DEGs were IRGs, which were further utilized for feature selection (Fig. 3D).
3. Identification of Hub IRGs Using Machine Learning Algorithms
Three different machine learning algorithms were employed to screen for reliable candidate hub genes in IS. LASSO regression identified five genes as diagnostic markers for IS (Figs. 4A and 4B). Meanwhile, the top eight feature genes with the minimum error were selected using SVM-RFE (Fig. 4C). Random forest analysis was used to assess the importance of the IRGs differentially expressed in IS (Figs. 4D and 4E). Finally, after overlaying key genes using a Venn diagram, the genes aryl hydrocarbon receptor (AHR), oncostatin M (OSM), and neuromedin U receptor 1 (NMUR1) were chosen as common potential hub genes in IS (Fig. 4F).
4. Expression Characteristics of Hub IRGs
We further investigated the expression levels of hub IRGs in IS patients. In the GSE22255 cohort, AHR was significantly upregulated, NMUR1 was significantly downregulated, and the expression difference of OSM was not significant in IS patients compared to normal controls (Fig. 5A). In the GSE58294 and GSE16561 cohorts, OSM was significantly upregulated, NMUR1 was significantly downregulated, and the expression difference of AHR was not significant (Figs. 5B and 5C). Correlation analysis showed that OSM had a negative correlation with NMUR1, while the correlation of AHR with other genes was inconsistent across all cohorts, requiring further experimental validation.
5. Development and Evaluation of Nomogram
Using the identified feature genes (AHR, OSM, and NMUR1; Fig. 6A), we developed a diagnostic nomogram for ischemic stroke and assessed its predictive ability using calibration curves. The calibration curves demonstrated minimal differences between actual and predicted risks of ischemic stroke, indicating excellent accuracy of the diagnostic nomogram (Fig. 6B). ROC analysis showed that the nomogram, AHR, OSM, and NMUR1 had AUCs of 0.906, 0.705, 0.398, and 0.734, respectively (Fig. 6C), indicating excellent diagnostic performance of the nomogram, although the diagnostic performance of individual IRGs was less robust. Additionally, an external validation dataset (GSE58294) was used to assess the diagnostic capability of the feature genes for ischemic stroke in elderly women. ROC analysis showed that the nomogram, AHR, OSM, and NMUR1 had AUCs of 0.944, 0.563, 0.903, and 0.834, respectively (Fig. 6D), indicating excellent diagnostic performance of the nomogram, OSM, and NMUR1 in the validation set.
6. Association of IRGs with IS Immune Infiltration
We further evaluated immune cell infiltration in IS patients and found that, compared to controls, IS patients had lower naive B cell, CD8 T cell, resting CD4 memory T cell, and activated NK cell infiltration, but higher plasma cell, activated CD4 memory T cell, resting NK cell, and neutrophil infiltration (Fig. 7A). Further analysis revealed that CD8 T cells had a significant negative correlation with AHR, OSM, and NMUR1. Additionally, the IRGs exhibited complex and diverse significant correlations with naive B cells, CD4 T cells, NK cells, monocytes, macrophages, dendritic cells, and neutrophils (Fig. 7B). These findings suggest that IRGs play a role in the formation and development of the immune microenvironment in IS patients.
7. Screening of Small Molecule Drugs
For the treatment of IS patients, we used DGIdb to identify 76 potential drugs (Fig. 8), including 16 approved and 60 not approved (purple edges). Approved drugs included chrysin, methylcellulose, piperine, genistein, resveratrol, clioquinol, carbaryl, levothyroxine, niclosamide, phenazopyridine hydrochloride, tapinarof, nitazoxanide, thiabendazole, romiplostim, olanzapine, and omeprazole. A drug-gene network was constructed using Cytoscape, where all approved drugs targeted AHR, but there were no drugs targeting NMUR1 and OSM.
8. Pathways Associated with Hub IRGs
We analyzed the potential mechanisms of action of hub IRGs in IS using GSEA. It was found that genes in the high-expression cohorts of AHR, OSM, and NMUR1 were highly enriched in the ribosome pathway. In the high-expression cohort of AHR, genes were enriched in the C-type lectin receptor signaling pathway and the Fanconi anemia pathway, whereas in the low-expression cohort, genes were enriched in the TNF signaling pathway and thyroid hormone synthesis, secretion, and action (Fig. 9A). Genes in the high-expression cohort of OSM were enriched in the AGE-RAGE signaling pathway, aminoacyl-tRNA biosynthesis, and the C-type lectin receptor signaling pathway, while genes in the low-expression cohort were mainly enriched in the IL-17 and PPAR signaling pathways (Fig. 9B). Genes in the high-expression cohort of NMUR1 were primarily enriched in antigen processing and presentation, glycosaminoglycan biosynthesis, and natural killer-mediated cytotoxicity pathways, whereas genes in the low-expression cohort were primarily enriched in T cell receptor and VEGF signaling pathways (Fig. 9C).