Clinical characteristics of the study patients
The TFs mRNA expression profiles of 564 SOC subjects in the TCGA database and 70 SOC subjects in the GEO database were screened. All patients from the TCGA database were randomly separated into training (n=282) and test (n=282) cohorts and there was no difference between two cohorts in age (p=0.4104), FIGO stage (p=0.6542), Grade (p=0.6890) and living status (p=0.6393) (Additional file 2: Table S2). Patients from the GEO database were used as the external validation cohort. The workflow indicated in Fig. 1 shows the process of the study.
Construction and validation of TFs-related risk signature for SOC
Univariate cox regression analysis was performed to construct a TFs-related risk signature of SOC; 88 TFs were selected (p<0.05) (Additional file 3: Table S3) and employed into the lasso cox regression model (Fig. 2a-b). Finally, a TFs-related risk signature comprising 17 different TFs (CBX5, CREB3, FOXJ1, FOXK2, IRF4, LHX2, RB1, SPDEF, STAT2, TBX2, TEAD1, TFAM, TRIM38, ZHX3, ZNF124, ZNF8 and ZXDB) was constructed in the training group (Fig. 2c). The risk score of each patient was calculated according to their 17-TFs expression. Patients were divided into high-risk and low-risk groups, based on the median risk value as a cutoff point (cutoff =-2.047), to investigate the association between the TFs-related risk signature and prognosis of SOC patients (Fig. 2d). The results showed that high-risk TFs-related risk signature was associated with poor overall survival (OS) in the training cohort (Fig. 2d-e). The prediction feature was assessed by the ROC curve and the AUC value of 0.803 (Fig. 2f). Similarly, patients with high-risk scores had a poor OS in the test cohort, sum cohort, and external validation cohort (p<0.0001, p<0.0001, p=0.031, respectively, Fig. 3). All these findings indicate that there was a significant association between the TFs-related risk signature and SOC patients’ prognosis.
The association between the risk signature and various clinicopathologic features including grade, FIGO stage, and residual tumor size was analyzed. The results showed that there was no difference in risk score among different grades (p>0.05, Additional file 4: Figure S1). Patients with FIGO Ⅳ and Ⅲ had higher risk scores compared with patients with FIGO Ⅰ and Ⅱ, (p<0.01, Additional file 4: Figure S1). There was also a significant association between residual tumor size and risk score (p<0.001, Additional file 4: Figure S1). A stratified analysis based on age, grade, FIGO grade, and residual tumor size was constructed. The high-risk group had shorter OS than the low-risk group in all these stratified cohorts, indicating that the TFs-related risk signature could be used to accurately distinguish the prognosis of SOC patients, without considering the clinical parameters (p<0.01, Fig. 4).
Association between the TFs-related risk signature and genetic alterations
Genetic alterations including somatic mutations and SCNAs were identified in the lowest risk (1st) and highest risk (4th) groups. There was no difference between the two groups in the mutation frequency of TP53, TIN, MUC16, DNH3, CSMD3, FAT3, FLG, and APOB (Fig. 5a-b). Patients in the lowest risk group had a higher probability of somatic mutation in the DNAH3 gene, while a higher mutation frequency of KMT2C and USH2A occurred in the highest risk group (Fig. 5a-b). We also analyzed the SCNAs between both groups (Fig. 5c-d). The G values between the highest-risk and lowest risk groups were significantly different, although the amplification (including 19q.12, 8q24.21, 3q36.2 and 11q14.1) and focal peaks (including 19p13.3, 22q13.32, 5q12.1, and 17q11.2) were detected. Additionally, several regions of amplifications harboring multiple oncogenes, such as 8q24.21 (PTV1, MYC)19q13.2 (NFKBIB), 19p13.12 (NOTCH3) and 15q26.3 (IGF1R) were only detected in the highest risk group. Moreover, focal deletion peaks were detected in the group with high-risk scores such as 9p21.3 (CDKN2A, CDKN2B) and 10q23.31 (FAS) (Fig. 5c-d).
Consensus clustering analysis to construct the cluster model
All SOC samples from the TCGA database were divided into different clusters using consensus clustering analysis; two, was found to be the optimal number of clusters (Additional file 5: Figure S2). The principal component analysis (PCA) explored the significant difference in mRNA expression of the 17 TFs between the two clusters (TF1 vs TF2) (Fig. 6a). There was a significant difference in OS between the two clusters (p=0.00023, Fig. 6b). The TF1 cluster was related to a lower risk score, while it had no association with age, FIGO stage, grade, and residual tumor size (Fig. 6b-c).
Pathways enrichment analysis to identify the biological function of TFs in SOC
Firstly, GSEA analysis was performed to identify the differences in biology function among patients with high-risk and low-risk scores. The results indicate differences in several pathways including; chronic inflammatory response, positive regulation of vascular endothelial growth factor, regulation of T cell-mediated cytotoxicity, tumor necrosis factor biosynthetic process, MyD88-dependent toll-like receptor signaling pathway, positive regulation of coagulation, and regulation of lymphocyte migration (Fig. 7a). Then, GSVA analysis was executed to reveal the functional enrichment status of the 17-TFs in SOC. Signal pathways with a high correlation coefficient and statistical significance were selected from GO gene sets and KEGG pathway analysis. The higher gene set enrichment scores of all the selected signaling pathways were associated with higher risk scores, except the glutathione metabolic process and IL-4 response pathways. Some pathways, including the TGF-β signaling and IL-4 response, were mainly related to the inflammatory response (Fig. 7b). Cell proliferation, invasion, and migration-related pathways including; PI3K signaling, cell cycle arrest, Wnt signaling, Notch signaling, mTOR signaling, MAPK signaling, and VEGF signaling, were also enhanced in the high-risk score group. The biological processes involved in each TF were also analyzed, indicating that they had different biological functions in SOC (Additional file 6: Figure S3).
Different immune infiltration, immunotherapy and chemotherapy response between high-risk score and low-risk score groups
There was a significant difference between the high-risk and the low-risk score groups in the regulation of lymphocyte migration and T cell-mediated cytotoxicity. Therefore, immune infiltration, immunotherapy, and chemotherapy responses were analyzed. Data showed that the distribution of aDC, B cells, NK CD56bright cells, NK CD56dim cells, T cells, TFH and Th2 cells was lower in the high-risk than the low-risk score group, whereas mast cells, NK cells, and Tcm cells infiltration was higher in the high-risk than the low-risk score group (Fig. 8a). The difference in immune infiltration between two clusters (TF1 vs TF2) was also determined (Additional file 7: Figure S4). The potential response to immunotherapy for each patient was assessed using the TIDE algorithm. The results suggest that patients with low-risk scores are more sensitive to immunotherapy than those with high-risk scores (p=0.0003, Fig. 8b). Subsequently, the response of anti-CTL4 and anti-PD-1 therapy was explored; patients in the low-risk score group were more likely to respond to anti-PD-1 therapy (p<0.001, Fig. 8c).
Chemotherapy is the main treatment method for SOC and it plays a key role in patients’ survival. The different response to chemotherapy between the high-risk and low-risk score groups was noted. The cell line data obtained from GDSC database was used to train the predictive model and the IC50 values for etoposide, paclitaxel, veliparib, gemcitabine, doxorubicin, docetaxel, and cisplatin of each patient with SOC were determined; the values were calculated by the predictive model. The results indicate that there was a significant difference in the estimated IC50 between the high-risk and low-risk score groups; patients with low-risk scores showed better response to etoposide, paclitaxel, and veliparib (p=3.33e-6; p=0.003; p=0.040, respectively, Fig. 8d). However, the response to gemcitabine, doxorubicin, docetaxel, and cisplatin (p=0.077; p=0.131; p=0.169; p=0.182, respectively) was similar for patients in the two groups (Additional file 8: Figure S5).
Modeling prognostic nomogram for overall survival
Only 503 SOC patients were included in modeling prognostic nomogram for overall survival due to the loss of clinical data. First, the independent prognostic factors were identified using univariate and multivariant cox regression (Table 1). The results indicated that risk score (p<0.001, HR=10.0841), residual tumor size (1-10 mm: p<0.01, HR=1.5889; 10-20 mm: p=0.008, HR=2.0247; >20 mm: p<0.001, HR=2.0447), and age (p=0.02, HR=0.7586) were all independent prognostic factor for 5-year overall survival. The FIGO stage was adopted as one of the independent factors given that the FIGO III+IV parameters were statistically significant in univariant cox regression (p=0.03, HR=2.884), and showed marginal statistical difference in multivariant cox regression (p=0.133, HR=1.7485). There was no association between grade, stage, and 5-year overall survival of patients. Therefore, risk score, residual disease, age, and FIGO stage were used to construct the nomogram, and each variable was assigned a score on the corresponding axis. The probabilities of the outcomes were determined by the location of the total score on the survival axes, which was calculated by adding up all the variables (Fig. 9a). The nomogram-predicted 5-year overall survival closely corresponded with the observations (Fig. 9b). Calibration curves were used to evaluate the accuracy of the nomogram with an AUC value of 0.736 (Fig. 9c). The prognostic model value was calculated according to prognostic factors and regression coefficient. Patients were divided into two groups by setting the median prognostic value as cutoff; there was a significant difference in the 5-year overall survival rate between the groups (p<0.001, Fig. 9d).