Identification of DEGs
After processing of the microarray data, 672 DEGs were screened from GSE16191 dataset, including 189 up-regulated genes and 483 down-regulated genes (Figure 1A, B), and 498 DEGs were screened from GSE25518 dataset, including 176 up-regulated genes and 322 down-regulated genes (Figure 1C, D). The DEGs of the two datasets were intersected using the “VennDiagram” package of R language, and the total number of overlapping genes was 438, of which 134 were co-upregulated and 304 were co-downregulated (Figure 1E).
Functional enrichment analysis of common DEGs
The common DEGs were analyzed for GO enrichment in three aspects: biological process (BP), cellular component (CC), and molecular function (MF). The results showed that the common DEGs were mainly enriched in the insulin-like growth factor receptor signaling pathway (ontology: BP), PcG protein complex (ontology: CC), and insulin receptor binding (ontology: MF), as shown in Figure 2 (A, B, C). KEGG pathway analysis revealed that the common DEGs were mainly enriched in axon guidance, spliceosome, nicotinate and nicotinamide metabolism, and mTOR signaling pathway (Figure 2D). Tables 1 and 2 show the enrichment results ranked in the top 10 by P-value.
PPI network construction and module analysis
Based on the STRING database, the PPI network of the common DEGs was mapped, consisting of 299 nodes and 600 edges (Figure 3A), from which 11 functional modules were screened, and the top two scores are shown in Figure 3(B, D). Further KEGG pathway analysis showed that genes in module 1 were mainly enriched in spliceosome and ferroptosis, while genes in module 2 were mainly enriched in IL-17 signaling pathway, progesterone-mediated oocyte maturation, cell cycle, oocyte meiosis, and ubiquitin mediated proteolysis (Figure 3C, E).
Identification and analysis of hub genes
Four algorithms (MCC, Degree, EPC, and MNC) were used in the PPI network to screen the top 20 ranked hub genes (Table 3), and then the Venn diagram of the hub genes obtained by the four algorithms was drawn, which led to nine common genes HNRNPM, SF1, U2SURP, SNRPA1, AQR, RBM39, PCBP2, RBM5, and HNRNPU (Figure 4A). KEGG enrichment analysis showed that the nine hub genes were mainly enriched in the spliceosome (Figure 4B), and the details are shown in Table 4. To verify the reliability of the expression levels of the hub genes, we put the hub genes into the original datasets for validation, and the results showed that the expression levels of hub genes in the two datasets were relatively consistent (Figure 4C, D).
To further explore the interactions and functions between genes, we constructed a co-expression network of hub genes using the GeneMANIA database, as shown in Figure 4E. These hub genes presented a complex PPI network with physical interactions of 41.98%, co-expression of 33.46%, pathway of 18.88%, co-localization of 2.99%, and shared protein domains of 2.57%, with most gene functions were closely related to the spliceosome. In addition, an analysis of the miRNA-mRNA regulatory network was performed by the OmicsBean database (Figure 4F). The results showed that hsa-miR-495 regulated the largest number of target genes and could be regarded as the key miRNA in the network; RBM5 was regulated by the largest number of miRNAs, which was considered as the key gene in the network.
Expression analysis of hub genes in TGCT
Given the hypothesis that the hub genes in the pathogenesis of cryptorchidism may be associated with TGCT, we verified the expression of hub genes in TGCT using the GEPIA database (Figure 5A). It was found that SF1, RBM5, HNRNPM, and AQR genes were significantly expressed in TGCT, with SF1 and RBM5 significantly down-regulated in tumor tissues and HNRNPM and AQR significantly up-regulated in tumor tissues. Figure 5B shows the expression levels of four hub genes in multiple tumor types, with HNRNPM and AQR being the most prominent in TGCT. Subsequently, the Spearman rank correlation test was used to perform correlation analysis on SF1, RBM5, HNRNPM, and AQR, with P-value less than 0.05 being statistically significant (Figure 5C, D). Correlation analysis indicated that most of the four genes were significantly correlated, and the Spearman correlation coefficient between the genes SF1 and HNRNPM was the highest (0.68). GO and KEGG enrichment analysis of these four genes (SF1, RBM5, HNRNPM, and AQR) showed that GO analysis was mainly enriched in spliceosomal complex, mRNA binding, mRNA splicing, and RNA binding (Figure 5E); KEGG analysis was mainly enriched in Spliceosome (Figure 5F). The enrichment results are detailed in Table 5 and Table 6.
The protein expression levels of SF1, RBM5, HNRNPM and AQR
To further verify the expression of SF1, RBM5, HNRNPM and AQR in TGCT, we reviewed the protein expression levels of these four genes in normal testis tissues and TGCT tissues using the HPA database (Figure 6A). According to the results of immunohistochemical (IHC) staining, SF1 and RBM5 were down-regulated in tumor tissues, while HNRNPM was up-regulated. The relevant IHC results of AQR were not included in the HPA database, so the relevant images are not shown. Meanwhile, we explored the localization information of the four genes in human cells at the subcellular level. Figure 6B shows the immunofluorescence (IF) profiles of the four genes, and all four genes were found to be localized in the nucleoplasm.
The association of SF1, RBM5, HNRNPM and AQR expression with immune cell infiltration
Based on the gene expression data of TGCT patients, we utilized the ssGSEA algorithm to explore the characteristics between the infiltration levels of 24 immune cells and the expression levels of four genes, as shown in Figure 7. Correlation coefficients with absolute values greater than 0.3 were considered as highly correlated. The results showed that both genes SF1 and HNRNPM were mainly negatively correlated with immune cell infiltration (Figure 7A, C). The gene RBM5 had both positive and negative correlations with 24 immune cells, among which 3 immune cells were highly positively correlated, including T helper cells, Tem, and Tcm; 6 immune cells were highly negatively correlated, including DCs, neutrophils, macrophages, Th1 cells, iDCs, and NK CD56+ cells (Figure 7B). The gene AQR was also positively and negatively correlated with 24 immune cells, among which 2 immune cells were highly positively correlated, including T helper cells and Tcm, and 6 immune cells were highly negatively correlated, including Tgd, DCs, pDCs, mast cells, macrophages, iDCs, NK CD56+ cells, and NK cells (Figure 7D).
Clinical significance of genes SF1, RBM5, HNRNPM and AQR in TGCT
First, the clinical correlations between the four genes and TGCT were analyzed using RNAseq data and clinical data of TGCT (Figure 8A). The results indicated that at the pathologic T stage, the genes SF1, HNRNPM, and AQR had no significant clinical correlation with TGCT, while the expression level of gene RBM5 at the T1 stage was significantly higher than that in T2 and T3 stages in TGCT. Secondly, we further validated the prognostic value of these four genes in terms of both OS and RFS in the Kaplan-Meier plotter. Survival analysis demonstrated that all four genes had no significance in OS (Figure 8B), while high expression of the genes SF1 and HNRNPM predicted a shorter RFS in TGCT patients (Figure 8C). The ROC curve analysis was conducted on the four genes (Figure 8D), and the results showed that all four genes had good diagnostic values (AUC>0.7), especially SF1 and RBM5 (AUC>0.9).