Identification of Differentially Expressed Genes (DEGs) in Cryptorchidism Individuals and Control Group. In this study, we utilized data sets from the GEO database, specifically GSE149084, GSE16191, and GSE2551. Differential expression analysis of these data sets revealed the following results: GSE149084 comprised 6 cases of cryptorchidism and 3 healthy controls, with 2375 up-regulated genes and 3965 down-regulated genes. GSE145467 consisted of 16 cases of cryptorchidism and 4 healthy controls, with 2202 up-regulated genes and 3348 down-regulated genes. Furthermore, GSE25518 included 19 cases of cryptorchidism and 4 healthy controls, with 1440 up-regulated genes and 333 down-regulated genes (Fig. 1).
Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Enrichment Analysis. The DEGs (differentially expressed genes) that were up-regulated and down-regulated in GSE149084, GSE16191, and GSE2551 were merged, resulting in the identification of 75 up-regulated differential genes and 57 down-regulated differential genes. Additionally, 1222 genes exhibited up-regulation in at least two datasets, while 317 genes showed down-regulation in at least two datasets. Subsequently, GO and KEGG pathway enrichment analyses were performed on this combined set of 1539 genes (Fig. 2). The GO analysis results unveiled the predominant enrichment of biological processes related to protein-containing complex disassembly, cellular protein complex disassembly, and the development of primary sexual characteristics. In terms of cellular components, enrichment was primarily observed in the mitochondrial matrix, mitochondrial inner membrane, and mitochondrial protein complex. Molecular functions were predominantly associated with isomerase activity and metalloendopeptidase inhibitor activity. Furthermore, the KEGG pathway enrichment analysis highlighted significant pathways linked to neurodegeneration (multiple diseases), amyotrophic lateral sclerosis, prion disease, and more.
Weighted gene co-expression network analysis and module identification. Protein-coding genes were selected from the aforementioned set of 1539 DEGs. Subsequently, the data from GSE16191 were utilized for WGCNA analysis. To determine the appropriate soft threshold power, we employed the scale-free topology criterion and determined a value of 10 (Fig. 3B). The absence of outlier samples was confirmed through hierarchical clustering trees generated using dynamic mixed cutting, which differentiated patients with cryptorchidism from healthy controls (Fig. 3A). The co-expression module specific to cryptorchidism is depicted in Fig. 3C.
Figure 3 Sample clustering and network construction of weighted co-expressed genes based on cryptorchidism A. Clustering dendrogram of samples of cryptorchidism and control group. The color intensity is proportional to the disease state (unilateral or bilateral). B. Analysis of network topology under scale-free fitting index and various soft-threshold powers. Both scale independence and mean connectivity are shown. C. Co-expression module in cryptorchidism
In order to explore the correlation between DEGs and cryptorchidism, we selected protein-coding genes from a pool of 1539 genes. We conducted molecular cluster analysis on the sequence of GSE16191 using the WGCNA algorithm and constructed a co-expression module. We tested different power values to optimize average connectivity and clustered these genes, resulting in the construction of four gene co-expression modules. We then examined the correlation between these modules and clinical data, as illustrated in Fig. 4A. The gene network heat map provides a visual representation of the topological overlap matrix among all the genes analyzed. Darker colors in the map indicate greater overlap, as shown in Fig. 4B. Notably, the purple module, consisting of 1,286 genes, displayed the strongest association with cryptorchidism. To further explore this association, we utilized an adjacency heat map and scatter plots to illustrate the correlation between the module members of the cryptorchidism group and gene significance (correlation coefficient = 0.41, P = 2.9e-53). We also assessed the correlation between the module members of the control group and gene significance (correlation coefficient = 0.19, P = 6.6e-12)(Fig. 4C, Fig. 4D).
Figure 4 Key module in weighted gene co-expression network analysis: A. Heat map showing correlation between differential genes and cryptorchidism. Cell center displays correlation coefficient and p-value; color change reflects strength of correlation. B. Topological overlap matrix of selected protein-coding genes among differential genes. C. Topological overlap matrix of selected protein-coding genes among differential genes. D. Scatter chart depicting disease group and purple module
Functional enrichment analysis of key modules. Functional enrichment analysis was performed on the key modules using the clusterProfiler package, based on the GO and KEGG databases. Figure 5A displays the results of the functional enrichment analysis for the genes in the purple module. The analysis revealed enrichment in biological processes related to mitochondrial translational elongation and termination. Additionally, the cellular components enriched included mitochondrial protein complexes, mitochondrial matrix, mitochondrial ribosomes, organellar ribosomes, and mitochondrial inner membrane. In terms of KEGG analysis(Fig. 5B), the pathways showing the highest degree of enrichment were related to neurodegeneration-multiple diseases, followed by amyotrophic lateral sclerosis, Parkinson's disease, prion disease, and other pathways.
PPI network construction and Hub Gene Identification. We utilized the STRING online database to import the genes present in the purple module for constructing the protein-protein interaction (PPI) network. The resultant PPI network was then visualized using Cytoscape software (Fig. 6). To identify the most central 15 genes within the PPI network, we employed five different algorithms (Degree, MNC, Radiality, Stress, and Closeness) available in the CytoHubba plug-in (Fig. 7). By intersecting the results of these algorithms, we obtained a set of 10 hub genes (Fig. 8A). Subsequent correlation analysis and mapping of these 10 hub genes were performed using the data from cryptorchidism patients in the GSE16191 dataset (Fig. 8C). Moreover, we explored the expression of these hub genes in three GEO datasets (GSE149084, GSE16191, GSE25518) associated with this study. The expression profiles were visualized using a heat map (Fig. 8B).
Gene Set Enrichment Analysis of Hub Gene. Gene Set Enrichment Analysis of Hub Genes was conducted alongside GSEA enrichment analysis using the GSE16191 dataset. Initially, a correlation analysis between the hub genes and all other genes was performed. Subsequently, the 50 genes exhibiting the highest positive correlation with the expression of the hub genes were selected. Correlation heat maps were then generated to visualize the association between them (Fig. 9). These heat maps served as the basis for single-gene GSEA analysis, wherein each hub gene was associated with a gene set exhibiting a similar expression trend (P < 0.05). In a subsequent step, single-gene GSEA analysis was performed for the 10 hub genes and their closely correlated genes (Fig. 10). This analysis was conducted using the ClusterProfiler package in R software. The resulting plots depict cumulative enrichment scores obtained through gene set enrichment analysis. Genes are ranked from small to large on the x-axis, with changes in mountain height reflecting the variation in enrichment score trends within the corresponding pathway.