The flowchart of this study is shown in Fig. 1. We divided the 367 patients into two groups according to the stage from the TCGA dataset. A prognostic model was established through the EMT related genes and COX analysis was performed, which indicated that the risk model was an independent prognostic indicator of OC.
Gene Set Enrichment Analysis by GSEA
Out of a total of 24991 genes detected, 634 were upregulated, while 543 were downregulated (Fig. 2A). There were also 367 specimens from TCGA classified into two groups (OC patients in stage I-II and OC patients in stage III-IV). GSEA revealed that 24 out of the hallmark biological processes were selected for enrichment, for example, epithelial mesenchymal transition (EMT) (p≐0), oxidative phosphorylation (p≐0), adipogenesis (p≐0), myogenesis (p≐0), coagulation (p = 0.003), apoptosis (p = 0.037), and fatty acid metabolism (p = 0.038), which embodied significant differences between OC patients in stage I-II and OC patients in stage III-IV (Fig. 2B, Table 2). FDR < 0.05 and gene size ≥ 100 were considered as the cut-off criteria. The top-ranking gene set indicating the epithelial mesenchymal transition process (p < 0.001), including 197 mRNAs, was selected for further studies.
Table 2
Gene sets enriched in ovary cancer (367 samples).
GS follow link to MSigDB | SIZE | ES | NOM P-value | Rank at MAX |
EPITHELIAL MESENCHYMAL TRANSITION | 197 | 0.6 | 0 | 5691 |
OXIDATIVE PHOSPHORYLATION | 183 | 0.53 | 0 | 7891 |
ADIPOGENESIS | 190 | 0.41 | 0 | 10392 |
MYOGENESIS | 199 | 0.39 | 0 | 10785 |
COAGULATION | 136 | 0.39 | 0.003 | 10967 |
APOPTOSIS | 159 | 0.32 | 0.037 | 5216 |
FATTY ACID METABOLISM | 157 | 0.32 | 0.038 | 10137 |
Identification of a Six-mRNA Prognostic Signature Predicts Survival of OC Patients
Eighty-three genes were selected because their expression profile was enriched in stage III–IV OC. A heatmap of the 83 common differentially expressed genes in the good and poor prognosis groups is shown in Additional file 1: Fig. S1. Then, a total of 19 prognosis-associated mRNAs were obtained using univariate Cox regression analysis, and six of these mRNAs (TGFBI, SFRP1, COL16A1, THY1, PPIB, BGN) were selected for multivariate Cox regression analysis to construct a risk assessment model. We assigned these mRNAs as 4 risky (HR > 1) and 2 protective (0 < HR < 1) mRNAs (Table 3).
Table 3
The detailed information of five prognostic mRNAs significantly associated with overall survival in patients with stomach adenocarcinoma.
mRNA | Ensemble ID | Location | HR | Β(Cox) | P |
TGFBI | ENSG00000120708 | Chr5: 136,028,988 − 136,063,818 | 1.1324 | 0.1243 | 0.0175 |
SFRP1 | ENSG00000104332 | Chr 8: 41,261,962 − 41,309,473 | 1.0707 | 0.0683 | 0.0217 |
COL16A1 | ENSG00000084636 | Chr1: 31,652,263 − 31,704,319 re | 1.1528 | 0.1422 | 0.0249 |
THY1 | ENSG00000154096 | Chr11: 119,417,378 − 119,424,985 | 1.0853 | 0.0819 | 0.0504 |
PPIB | ENSG00000166794 | Chr15: 64,155,812 − 64,163,205 | 0.7530 | -0.2837 | 0.0069 |
BGN | ENSG00000182492 | Chr X: 153,494,980 − 153,509,546 | 0.8216 | -0.1965 | 0.0393 |
A prognostic model was developed to predict prognosis according to the gene expression and regression coefficients of the six genes. Risk score = 0.1243*expression of TGFBI + 0.0683*expression of SFRP1 + 0.1422*expression of COL16A1 + 0.0819*expression of THY1-0.2837*expression of PPIB-0.1965*expression of BGN. After every patient was endowed with a risk score, the median risk score value was regarded as the cut-off criterion, and the patients were divided into low-risk and high-risk groups (Fig. 3A). The distribution of the patient relapse status is also shown in Fig. 3A. The mortality increased with increasing risk score among these patients. A heat map (Fig. 3B) was used display the six mRNA expression profiles. The expression of risky type mRNAs (TGFBI, SFRP1, COL16A1, THY1) was obviously upregulated as the risk score of OC patients increased, and the expression of protective type mRNAs (PPIB, BGN) was downregulated.
The cBioPortal database was used to detect alterations in the six selected genes in 367 patients. The results indicated that TGFBI included 0.9% alteration, with 2 examples of amplification, 2 examples of deep deletions and 1 example of a missense mutation (unknown significance). SFRP1, COL16A1, THY1, PPIB and BGN showed 4%, 3%, 1%, 1.2% and 4% alterations, respectively (Fig. 3C). Similarly, the alteration types in selected genes varied in different OC pathologies (Fig. 3D).
Validation of the Six mRNAs for the Survival Prediction by Kaplan-Meier Curves and the expression profile of the six mRNAs
The Kaplan-Meier curves were used to validate the prognostic prediction of OC by every mRNA. The results showed that besides SFRP1, the risk score was a specific indicator between high and low risk groups (p < 0.0001). The patients in high risk group were definitely faced with lower survival rate (Fig. 4A). The original data of expression profile of the six mRNAs without log transformation in high and low risk groups were displayed in Fig. 4B. Furthermore, the expression levels of the six selected genes were tested in normal ovary tissues and OC tissues (Fig. 4C). In addition, the expression level of every mRNA was detected in ovarian cancer tissues and normal tissues. It indicated that risky genes like TGFBI (p = 0.0002), SFRP1 (p = 0.0344), COL16A1 (p = 0.0449) and THY1 (p < 0.0001) were overexpressed in ovarian cancer tissues than normal tissues while protective genes like PPIB (p = 0.0128) and BGN (p = 0.0047) were low expressed (Fig. 4D).
EMT process was validated to be associated with high risk group of OC via GSVA database
According to the high and low risk score group through GSVA database, we came to a heatmap displayed various biological processes associated with OC, from which we could see that EMT was included (Fig. 5A). The prognostic signaling pathways were evaluated by KEGG and GO pathways, from which we concluded that the prognosis-related genes were enriched in ligand receptor activity, etc. Figure 5B and C displays the GO and KEGG pathway enrichment plots, respectively, for OC.
The Risk Score was Identified as an Independent Prognostic Indicator in OC
Univariate and multivariate analyses were carried out together to compare the risk score to other common clinical pathological parameters (Table 4). The dataset indicated that risk score, stage and cancer status were independent prognostic indicators since their p values were < 0.05 not only in univariate but also in multivariate analyses. Importantly, the risk score was the most obvious clinical parameter related to mortality for the patients in the high-risk score group, who were 1.408 times more likely to suffer from death than were those in the low-risk score group.
Table 4
Univariable and multivariable analyses for each clinical feature.
Clinical feature | Number | Univariate analysis | Multivariate analysis |
HR | 95%CI of HR | P value | HR | 95%CI of HR | P value |
Riskscore | 367 | 1.408 | 1.02–1.944 | 0.038 | 1.545 | 1.098–2.173 | 0.012 |
Stage | 367 | 0.449 | 0.27–0.747 | 0.002 | 0.475 | 0.265–0.849 | 0.012 |
Cancer Status | 318 | 0.431 | 0.309–0.603 | 0 | 0.421 | 0.300-0.594 | 0 |
Grade | 366 | 1.227 | 0.749–2.009 | 0.417 | | | |
age | 367 | 0.871 | 0.638–1.189 | 0.385 | | | |
venous invasion | 102 | 1.026 | 0.611–1.722 | 0.924 | | | |
lymphatic invasion | 147 | 1.086 | 0.686–1.72 | 0.725 | | | |
Validation of the Six-mRNA Signature for the Survival Prediction by Kaplan-Meier Curves
The Kaplan-Meier curves and long rank method were used to validate the prognostic prediction of OC by clinical parameters (risk score, stage, age, cancer status, grade, venous invasion and lymphatic invasion). The results indicated that patients with high risk scores had poor prognoses. The patients with tumors and in cancer stage III-IV were definitely faced with a high risk of poor prognosis, and the accuracy of our analysis was further verified (Fig. 6A). The time-dependent ROC curve of each parameter demonstrated the sensitivity and specificity of the 5-year OS prediction (Fig. 6B). The AUC of the ROC curve verified the accuracy of our prognostic model. Furthermore, we performed a data stratification analysis on the entire cohort, and 367 patients were stratified based on their clinical parameters. According to the significant results above, patients with tumors and stage III cancers in grade 3 were more likely associated with a shorter survival time (Fig. 6C).
The Kaplan-Meier curves was used to validate the prognostic prediction of OC by the risk score in an independent GEO cohort (GSE9891). Although there are more than a dozen GEO databases for ovarian cancer, we chose GSE9891 database that covered the six genes matched with survival data in our study. It was apparent that patients with high risk scores had poor prognoses (p < 0.0001), which was consistent with the result we obtained above in Fig. 6D. As we elucidated above, every patient was endowed with a risk score. The median risk score value was regarded as the cut-off criterion, and the patients were divided into low-risk and high-risk groups similarly through GEO database. The distribution of the patient relapse status is shown in Fig. 6E. Meanwhile, The Kaplan-Meier curves were also used to validate the prognostic prediction of OC in colon cancer (Fig. 6F) and hepatocellular cancer (Fig. 6G). The results showed that the risk score was not a significantly indicator between high and low risk groups (p = 0.1158 and p = 0.3675), which indicated the specificity of the risk score model in ovarian cancer. PPI networks obtained from the STRING database and visualized by Cytoscape software helped us to identify the hub genes among the prognosis-related genes. The core genes (degrees ≥ 15) (Additional file 2: Fig. S2B) were further submitted to PPI network analysis, which indicated that TGFBI, SFRP1, and COL16A1 were clearly at the center of the network (Additional file 2: Fig. S2A).