The characteristics of the pyroptosis related genes (PRGs) in PCOS Fig. 1
The genecard website was used to collect PRGs. The potential role of pyroptosis related genes in PCOS was investigated by GO and KEGG enrichment analysis[17]. GO enrichment analysis showed that PRGs were especially abundant in pyroptosis, nuclear membranereassembly, and macroautophagy related biological processes (Fig. 1A). In the KEGG enrichment analysis, Pyroptosis genes are particularly abundant in cytosolic DNA-sensing pathway, NOD-like receptor signaling pathway, and p53 signaling pathway (Fig. 1B). The PPI analysis demonstrated the interaction of 44 pyrogenic genes, and Cytoscape demonstrated eight hub genes, CHMP2A, CHMP2B, CHMP3, CHMP4A, CHMP4B, CHMP4C, CHMP5, and CHMP7 (Fig. 1C). The violin chart showed the differential expression of PRGs between the PCOS patient group and the control group, and the results showed that PYCARD, AIM2, and NOD2 were highly expressed in the PCOS patient group, with statistical significance (Fig. 1D).
Building a machine learning model to identify PCOS disease signature genes Fig. 2
A machine learning model was developed using RF, XGB, GLM, and SVM algorithms to identify PCOS disease signature genes. These models were constructed based on 44 key pyroptosis-related genes to assess the risk of PCOS. The residual box line plot (Fig. 2A) demonstrated that XGB had the lowest root mean square of residuals, while GLM had the highest value of residuals. The residual distribution of RF and SVM models fell between 0.25 and 0.5. The reverse cumulative distribution plot (Fig. 2B) aligned with these findings. ROC curves (Fig. 2C) indicated that the area under the curve (AUC) of four model (XGB with 0.867, SVM with 0.800, RF with 0.733 and GLM with 0.533). Comparatively, the GLM model exhibited a smaller curve area, suggesting potential overfitting. Gene importance analysis was performed for all four methods, yielding gene importance scores (Fig. 2D). In the GLM model, BAX, GPX4, CHMP2A, CASP8, and CHMP4B were the top five genes with the highest scores. For the RF model, NLRC4, AIM2, PLCG1, NOD2, and PRKACA were deemed most important. PYCARD, CHMP2A, CASP8, NLRC4, and NOD2 were significant genes in the SVM model, while CHMP4B, PYCARD, NLRP2, NLRC4, and AIM2 held importance in the XGB model with an accuracy of 0.867. Considering the results, the XGB model, exhibiting the highest accuracy, was selected for next analysis.
Evaluation of a predictive nomogram Fig. 3
The top five genes identified by the XGB model were designated as disease signature genes, which were utilized to construct a nomogram for predicting the incidence of PCOS (Fig. 3A). Each signature gene was assigned a specific score interval, and the scores of all genes were aggregated to compute the final score, which was then compared to the incidence rate. The predictive accuracy of the nomogram was evaluated through a calibration curve, where a closer alignment between the solid and dashed lines indicated higher model accuracy (Fig. 3B). Furthermore, the decision curve analysis illustrated that the model represented by the yellow line was significantly distant from the all curves, thereby indicating its effectiveness in clinical decision-making (Fig. 3C).
Identification of novel subtypes based on PRGs Fig. 4
Using the set of 47 PRGs, the consensus clustering algorithm was applied to categorize PCOS patients into distinct subtypes. When the cluster variable was set to 2, the patients were well classified into two clusters (Fig. 4A). The stability of the number of clusters was observed to be highest when k = 2, as indicated by the fluctuation of the cumulative distribution function (CDF) curve within the range of 0 to 0.2 (Fig. 4B). Additionally, Fig. 4C illustrates the area under the CDF curve for k values ranging from 2 to 9. Notably, when k = 2, the consistency scores of each subtype were the largest and closest.Consequently, the expression differences of pyroptosis regulators between Cluster 1 and Cluster 2 were evaluated to investigate the molecular characteristics distinguishing the clusters. Cluster 1 exhibited higher expression values for CASP3, CASP6, CLRP2, and NLRP2, while Cluster 2 displayed higher expression levels of CHMP2A, CHMP4B, IL1B, CASP9, GPX4, IL6, NLRC4, NLRP1, NLRP3, PYCARD, and TNF. Conversely, no significant differences were observed in the expressions of BAK1, BAX, CASP1, CYCS, and other genes between the clusters (Fig. 4D, E). The GSVA analysis plot highlighted differential pathways between the subtypes (Fig. 4F). Propanoate metabolism, cell cycle, pantothenate and coa biosynthesis, cysteine and methionine metabolism, and the p53 signaling pathway exhibited positive regulatory relationships in the C2 subgroup (shown in blue), while the ribosome, cytosolic DNA sensing pathway, and hematopoietic cell lineage exhibited positive regulatory relationships in the C1 subgroup (shown in yellow).
Immune landscape of PRGs-based subtypes Fig. 5
The histogram analysis of immune cell infiltration in different groups revealed distinct patterns. The Cluster1 group had higher levels of B cells naïve, B cells memory, and Neutrophils, whereas the Cluster2 group showed higher levels of Monocytes and Mast cells resting[18, 19]. The content of other immune cells exhibited comparatively smaller differences between the groups, but further experimental verification is required to ascertain specific reasons for these variances (Fig. 5A). The immune cell differential analysis plot corroborated these findings, demonstrating statistically significant differences (p < 0.05) in B cells naïve, Tregs, Monocytes, Macrophages M0, Mast cells activated, and Neutrophils between the different subgroups (Fig. 5B). Additionally, to confirm the strong association between the PRGs and PCOS, an in-depth exploration of the correlation between the two subtypes and cytokines was conducted. The analysis revealed significant variations in cytokine expression between the gene clusters (Fig. 5C). Notably, IL1RN, IL6, IL10, IL16, IL32, TGFB1, SOCS3, CXCL10, and TNF exhibited higher expression levels in cluster2 compared to cluster1, aligning with existing reports[20, 21]. These findings further validated the close correlation between apoptosis cluster2 and PCOS.
Function enrichment analysis of signature genes Fig. 6
GSEA pathway enrichment analysis was conducted for the model genes. In terms of GO functional analysis, CHMP4B exhibited positive correlations with defense response, extracellular vesicle, and immune response biological functions. NLRC4 showed negative correlations with defense response, response to biotic stimulus, and immune response. NLRP2 displayed negative correlations with cell-cell adhesion, ion transport, and regulation of leukocyte activation. PYCARD exhibited negative correlations with cell activation, extracellular exosome, immune response, inflammatory response, and myeloid leukocyte activation (Fig. 6A). In the case of KEGG results, CHMP4B demonstrated positive correlations with Necroptosis, Epstein-Barr virus infection, and diabetic cardiomyopathy. NLRP2 displayed positive correlations with metabolic pathways and negative correlations with hematopoietic cell lineage and cytokine-cytokine receptor interaction. PYCARD demonstrated positive correlations with cell adhesion molecules, chemokine signaling pathway, lysosome, metabolic pathways, and phagosome (Fig. 6B).
Correlation analysis between immunocytes and signature genes Fig. 7
Correlation analysis was performed to explore the relationship between the model genes and the 22 immune-related cell types. The distribution of immune cells between the AIM2 and AIM2 groups did not show any significant differences (Fig. 7A). In the CHMP4B group, the expression levels of T cells CD8 and Dendritic cells activated were observed to be higher (Fig. 7B). The high NLRC4 expression group displayed elevated expression levels of Mast cells activated, while the low NLRC4 expression group exhibited higher expression levels of Mast cells resting (Fig. 7C). Macrophages with high NLRP2 expression demonstrated higher levels of M0 expression, whereas macrophages with low NLRP2 expression exhibited higher expression levels of Plasma cells (Fig. 7D). The expression levels of PYCARD and AIM2 genes did not exhibit a significant correlation with immune cells (Fig. 7E).
Differential expression analysis of cytokines for signature genes Fig. 8
Cytokine analysis was conducted to explore the relationship between the model genes and cytokine expression. In the high expression group of AIM2 gene, IL10, CISH, and TLR4 exhibited higher expression levels, while IL4 was more highly expressed in the low expression group (Fig. 8A). The group with low expression of CHMP4B displayed higher expression levels of IL6ST (Fig. 8B). The high expression group of NLRC4 exhibited elevated expression levels of IL6 and TNF (Fig. 8C). NLRP2 showed a negative correlation with IL4 expression and a positive correlation with SOCS7 expression (Fig. 8D). TGFβ1 and CISH demonstrated a positive correlation with the expression of PYCARD (Fig. 8E).