The DEGs among six GEO microarray datasets
The distribution of gene expression from each microarray dataset was presented in volcano plots (Figure 1A-F), and the top 100 significantly up- and down-regulated genes were displayed in the heatmaps (Figure 2A-F). Through RRA analysis of 6 expression microarrays, we identified 605 DEGs, which consist of 301 up-regulated and 304 down-regulated genes, and displayed the top 20 dysregulated genes by "pheatmap" R package in Figure 3. Next, we analyzed the expression profiles of TCGA and GETx, getting 2253 dysregulated genes. Intriguingly, when combined these 2253 DEGs with the 605 differentially expressed genes from GEO datasets, we found that 244 genes were commonly dysregulated in these two databases, with 164 up-regulated (Figure 4A) and 80 down-regulated genes (Figure 4B).
GO term and KEGG pathway enrichment analysis of DEGs
To study the potential biological function of the 244 DEGs, we performed biological pathway analysis and identified significantly enriched pathways via Enrichr web tools. In GO term (Figure 5A), for the BP group, the DEGs were mostly enriched in "regulation of attachment of spindle microtubes to kinetochore", "cellular response to laminar fluid shear stress" and "microtubule cytoskeleton organization involved in mitosis". In MF group, the dysregulated genes were highly correlated to "microtubule-binding", "microtubule motor activity" and "tubulin-binding". As for CC group, the DEGs were closely related to "condensed nuclear chromosome kinetochore" and "mitotic spindle". KEGG pathway analysis showed 244 DEGs highly enriched in "cell cycle" and "Alanine, aspartate, and glutamate metabolism" (Figure 5B).
PPI network construction and modules analysis
Using the STRING database and Cytoscape software, a totally 244 DEGs were mapped into the PPI network, which included 238 nodes and 1284 edges (Figure 6A). The PPI enrichment p-value were < 1.0e-16. The top three key modules (Figure 6b-d) within PPI network were then selected (Module 1, MCODE score= 43.391; Module 2, MCODE score= 4.8; Module 3, MCODE score= 3.667) and the biological function of Module 1, which consisted of 47 nodes and 998 edges, was further analyzed. GO analysis indicated that Module1 was mainly associated with "regulation of attachment of spindle microtubules to kinetochore", "condensed nuclear chromosome kinetochore" and "microtubule motor activity". KEGG analysis showed that "cell cycle" and "Oocyte meiosis" were the most highly enriched pathways (Figure S1).
The screen of Hub genes and their characteristics
The top ten hub genes with the highest degree of connectivity were CDC45, CDK1, TOP2A, CDC20, CCNB1, CEP55, UBE2C, HMMR, FOXM1, and TPX2 (Figure 6E). The co-expression analysis results of the hub genes demonstrated that these genes actively interacted with each other (Supplementary 2B). Besides, we established the interaction network of ten hub genes with their related genes and explored the biological role (Figure S2A, C-E) of the network by FunRich. The gene alteration type and frequency, as well as the 50 most frequently altered neighbor genes were also exhibited (Figure 7). Gene alteration frequency of 10 hub genes among 606 TCGA ovarian cancer samples was beyond 50%, with most genes showed amplified and multiple altered (Figure 7A-B). The top 3 most frequent altered genes were FOXM1, CDC20 and CCNB1, with FOXM1 and CDC20 largely amplified while CCNB1 deep deleted. Through analysis of ovarian cancer patients’ gene-set from TCGA, we found that CCNB1, UBE2C, CDK1, CEP55 as well as FOXM1 expressed significantly higher in high-grade tumors and predicted worse outcomes since patients overexpressed above genes owned lower overall survival (OS) and disease-free survival (DFS) rates (Figure 8). The Oncomine database showed results from various studies were consistent to our finding (Figure S3). The HPA website also demonstrate that proteins translated by such 5 hub genes were overexpressed in ovarian cancer tissues (Figure S4). HMMR and TPX2 were also negatively correlated to patients' prognosis while no expression difference was observed in diverse tumor grades and CDC20 was positively associated with tumor grade but not correlated to patients' outcomes.
Related small molecule drugs screening
In total, 244 DEGs were analyzed in CMap to screen small molecule drugs, and the candidate molecules with top ten connectivity score are listed in Table 1. Five of these molecules showed a negative correlation and suggested potential in clinical applications. Among them, Trichostatin A, pyrvinium and 8-azaguanine showed a significantly negative correlation and the stuctures of such candidate molecule drugs was found in the PubChem database and shown in Figure S5.
Construction of prognostic model and evaluation of its predictive ability
Univariate Cox regression analysis revealed that 13 of 240 DEGs were significantly correlated to patients' OS in the training cohort. (Table 2) The 13 OS-related genes were listed as follows: CCND1, SYNE4, CCDC80, TMC4, MCC, FOXQ1, KRTCAP3, CXCR4, IL4I1, DEFB1, CSGALNACT1, KLHL14 and MCUR1. Through LASSO Cox regression, we narrowed the number of 13 prognosis-associated genes into twelve according to the minimum criteria (Figure 9). Next, based on the multivariate Cox model, 8 of 12 candidate mRNAs retained their prognostic significance and were finally selected as independent remarkable prognostic factors, which are TMC4, KLHL14, CXCR4, CCDC80, KRTCAP3, DEFB1, SYNE4 and FOXQ1 (Table 3). To predict patient’s outcomes, we developed an individual’s risk score model as follows: risk score = (0.006809 × expression value of TMC4)+ (0.021258× expression value of KLHL14)+(-0.00839× expression value of CXCR4) + (0.031459 × expression value of CCDC80)+(-0.00903 × expression value of KRTCAP3) + (-0.00156 × expression value of DEFB1) + (0.070689 × expression value of SYNE4) + (0.006726 × expression value of FOXQ1). On the basis of the median risk score, patients were divided into high-risk or low-risk groups. Kaplan-Meier curve analysis showed that the overall survival time of the lower-risk group was significantly longer than the high-risk group (P =1.147e-07) (Figure 10E). ROC curve analysis revealed the area under the ROC curve (AUC) of the prognostic model was 0.815 (Figure 10D). Meanwhile, the risk scores (Figure 10A) of ovarian cancer patients in the training group were ranked for displaying their distribution and the survival status (Figure 10B) was marked on the dot plot. The expression pattern of 8 prognostic mRNAs between high and low-risk groups was also shown in the heatmap (Figure 10C). Univariate and multivariate Cox regression analysis concerning the risk score and clinical factors showed that the prognostic model was able to serve as an independent prognostic indicator (Figure 11A-B). ROC curve analysis also showed that the AUC value of the model was 0.820, much significantly higher than the tumor stage (AUC= 0.542), grade (AUC= 0.574) and patients’ age (AUC= 0.701) (Figure 11C). Interestingly, when combined the risk score with clinical factors, the ROC curve of combination model was much higher than each alone(Figure 11D).
As for the testing cohort, we divided the group into 78 high-risk and 108 low-risk individuals based on the training cohort' cut-off risk score. The outcomes of low- and high-risk group patients of the testing cohort were also measured and the overall survival time of the high-risk group was significantly shorter than the lower-risk group (P =1.721e-02) (Figure 12E). The area under the ROC curve (AUC) of the prognostic model was 0.641 (Figure 12D). The risk scores distribution (Figure 12A) and survival status (Figure 12B) of ovarian cancer patients, as well as the 8-prognostic gene expression heatmap (Figure 12C) in the testing group were also present. Meanwhile, the independency of the prognostic model was confirmed in testing cohort since univariate and multivariate cox regression analysis showed the model correctly predicted high- or low-risk factor group patients' outcomes without relying on any clinical factors (Figure 13A-B). ROC curve analysis showed that the prognostic model exhibited better sensitivity and specificity when compared to tumor stage, grade and patients’ age for the AUC value of the model was much higher than latter (Figure 13C). In accordance with results from training cohort, the combination of risk score and clinical factors showed better OS prediction capability (Figure 13D).
The Prognostic Signature correlating to immune cells infiltration
Through TIMER web-tool, we analyzed the relative gene expression levels of six types of immune cells for each patient and found that genes concerning macrophage fraction were expressing significantly higher in the high-risk group (P<0.05) compared to the low-risk group in training cohort (Figure 14). Interestingly, same result was also observed in the testing cohort (Figure 15).