Screening for DEGs
The study was constructed as shown in the flow chart in Figure 1. To identify prognostic genes that play a role in CC pathogenesis, we used CC tissues and normal cervical tissues (Table S1). We stabilized the error rate estimates and improved the reproducibility of the gene expression matrix using surrogate variables for removing batch effects (Fig. S1). PCA showed that the two groups of samples were obviously clustered (Fig. 2A). A total of 207 DEGs between the two groups were observed, with 106 being up regulated and 101 being down regulated (Fig. 2B, Table S2).
Functional and pathway enrichment analyses of DEGs
GO and KEGG analyses were performed using the DAVID database for annotation. Regarding GO terms, the DEGs were primarily enriched in extracellular exosome (58 proteins), serine-type endopeptidase activity (16 proteins), and peptide cross-linking (11 proteins) (Fig. 2C). KEGG analysis revealed that the DEGs were enriched in several pathways such as cytokine–cytokine receptor interaction (10 proteins), chemokine signaling (8 proteins), and tumor necrosis factor (TNF) signaling (6 proteins), as well as transcriptional dysregulation in various cancers such as bladder cancer (4 proteins) (Fig. 2D).
Identification of hub DEGs in CC
Although enrichment analyses reveal the biological processes and pathways related to DEGs, they do not provide information about interactions among the DEGs. Thus, we examined the interactions among the proteins using STRING, and visualized the PPI network using Cytoscape software. To construct the PPI network, 112 significantly enriched DEGs were submitted to STRING (Fig. 3A), and the PPI network was subsequently imported into Cytoscape to construct the sub-networks. Using the MCODE plug-in in Cytoscape, we analyzed the top three sub-modules (MCODE scores ≥ 10) of proteins to identify the hub DEGs. There were 40 hub DEGs in these modules, with those in Modules 1 and 2 primarily being up-regulated DEGs, and those in Module 3 primarily being down-regulated DEGs (Fig. 3B-D). The Oncomine co-expression analysis showed that the mRNA expression levels of 22 of the candidate hub DEGs were consistent with our initial analyses (Fig. 4A and Fig. S2).
To examine the hub DEGs in greater detail, TCGA CC samples (n = 304) were used for survival analyses of the individual hub DEGs (with a cutoff value of p < 0.05). The expression of eight hub DEGs (CXCL1, CXCL8, DSG2, MMP1, SPP1, MCM2, Lymphoid-specific helicase [HELLS], and Vascular cell adhesion molecule 1 [VCAM1]) was significantly associated with OS among CC patients. Interestingly, MCM2, HELLS, and VCAM1 up-regulation played protective roles (Fig. 4B and Fig. S3).
Cox proportional hazards model and risk score
The TCGA-CC samples (n = 304) were randomly divided into a training group (n = 152) and a verification group (n = 152). Among the 304 patients, 223 (73.4%) and 87 (28.3%) had complete follow-up data of clinical features for at least 1- and 3-years, respectively, but only 40 (13.2%) had detailed follow-up data for ≥ 5 years. There were no significant differences in age, race, tumor grade, FIGO stage, or LNM status between the training and verification groups (Table 1). Next, we assessed the significance of the eight above mentioned hub DEGs in a Cox proportional hazards model and consequently developed a novel four-gene prognostic signature. This signature allowed us to determine the high- and low-risk patients, as follows:
Riskscore = (0.58 * expression value of DSG2) + (0.27 * expression value of MMP1) +
(0.33 * expression value of SPP1) + (-0.48 * expression value of MCM2)
With validation using the verification and combination groups, we built the best fitting Cox proportional hazards model using a combination of four high-power prognostic genes (DSG2, MMP1, SPP1, and MCM2) (Fig. 5A). The ROC curves showed that this four-gene signature achieved an AUC value of 0.785 (95% CI: 0.670–0.879), 0.609 (95% CI: 0.507–0.711), and 0.686 (95% CI: 0.612–0.761) for the training, verification, and combination groups, respectively (Fig. 5B). These outcomes suggest that this four-gene signature demonstrates good performance regarding predicting OS among CC patients (Fig. 5C).
OS prediction and evaluation
To further evaluate whether the four-gene prognostic signature can serve as a prognostic factor, we performed univariate and multivariate Cox regression analyses comparing high- and low-risk CC patients. Covariates besides the risk score included clinical risk factors such as age, tumor grade, FIGO stage, and LNM status (Fig. S4). The univariate Cox regression analysis showed that risk score (hazard ratio [HR]: 3.186; 95% CI: 1.513–6.711; p = 0.003) and LNM status (HR: 2.886; 95% CI: 1.435–5.803; p = 0.003) were risk factors, while the multivariate Cox regression analysis confirmed that the risk score (HR: 2.743; 95% CI: 1.285–5.856; p = 0.009) and LNM status (HR: 2.660; 95% CI: 1.290–5.489; p = 0.008) were both independent risk factors (Table 2).
The risk score was then compared between the pairs of subgroups stratified by clinical features to explore whether it was significantly different between the various subgroups. It was only significantly different between LNM-negative and LNM-positive patients, being higher in the latter (1.056±0.053 vs 1.341±0.138, p = 0.019) (Fig. 6).
We also constructed a nomogram to predict 1- and 3-year OS for CC patients using the four-gene signature and LNM status (Fig. 7A). The calibration plots showed good agreement between the predicted and actual probabilities regarding 1- and 3-year but not 5-year OS based on the TCGA-CC cohort (Fig. 7B). The resulting AUC values regarding 1- and 3-year OS were 0.746 (95% CI: 0.635–0.857, n = 142) and 0.748 (95% CI: 0.551–0.944, n = 56), respectively, and the prognostic accuracy values were 69.01% and 83.93%, respectively (Fig. 7C).