Identification of prognosis-related proteins
The study was conducted as shown in the flowchart (Fig. 1). We identified 77 proteins significantly related to overall survival (OS) of KIRC including 35 proteins with HR > 1 and 42 proteins with HR < 1 (Fig. 2A). It mean the high expression of proteins was associated with poor survival of patients when HR > 1.
Construction of a five-protein prognostic model in the training set
Totally 433 patients with a follow-up period longer than one month were randomly divided into training set (n = 216) and testing set (n = 217). There was not significant difference in clinical characteristic between two groups (Table 1). Lasso Cox regression analysis were performed in the training set (Fig. 2B) and five risk proteins were successfully identified (Table 2). The proteins were acetyl-CoA carboxylase 1 (ACC1), insulin-like growth factor binding protein 2 (IGFBP2), mitogen-inducible gene 6 (MIG6), proliferation and apoptosis adaptor protein 15 (PEA15) and RAD51 recombinase (RAD51). Finally, a protein-based prognostic model was established by the five proteins. The risk score = 0.438 * Expression of ACC1 + 0.251 * Expression of IGFBP2–0.229 * Expression of MIG6 + 0.283 * Expression of PEA15 + 0.348 * Expression of RAD51. According to the optimal cut-off value of risk score, patients were divided into high-risk and low-risk group (Fig. 3A). More dead patients corresponded to the higher risk score in the training set (Fig. 3D). Heatmap of the five prognostic proteins presented an obviously different expression between the high-risk and low-risk group (Fig. 3G). The KM curve show that patients in high-risk group suffered a significantly poorer OS than patients in low-risk group (p < 0.001) (Fig. 4A). The area under the ROC (AUC) for 1-year, 3-year and 5-year OS were 0.789, 0.753, and 0.785, respectively. (Fig. 4D). Taking together, the results indicated that our five-protein model had a well performance in survival prediction for KIRC.
Table 1
Clinical features of KIRC patients in training set, testing set and whole set.
Characteristic | Whole set | Training set | Testing set | P-value |
Total | n = 433 | n = 216 | n = 217 | |
Dead (%) | 280 (64.7) | 135 (62.5) | 145 (66.8) | 0.367 |
Median age, years (range) | 60 (26–88) | 59 (26–88) | 61 (29–88) | 0.211 |
Age ≥ 60 years (%) | 228 (52.7) | 107 (49.5) | 121 (55.8) | |
Female (%) | 141 (32.6) | 63 (29.2) | 78 (35.9) | 0.151 |
Tumor grade | | | | 0.183 |
G1 | 8 (1.8) | 1 (0.5) | 7 (3.2) | |
G2 | 181 (41.8) | 89 (41.2) | 92 (42.4) | |
G3 | 171 (39.5) | 89 (41.2) | 82 (37.8) | |
G4 | 71 (16.4) | 36 (16.7) | 35 (16.1) | |
Unknow | 2 (0.4) | 1 (0.5) | 1 (0.5) | |
AJCC stage | | | | 0.754 |
Stage I | 210 (48.5) | 104 (48.1) | 106 (48.8) | |
Stage II | 43 (9.9) | 24 (11.1) | 19 (8.8) | |
Stage III | 103 (23.8) | 48 (22.2) | 55 (25.3) | |
Stage IV | 75 (17.3) | 39 (18.1) | 36 (16.6) | |
Unknow | 2 (0.5) | 1 (0.5) | 1 (0.5) | |
Abbreviations: KIRC, kidney renal clear cell carcinoma; AJCC, American Joint Committee on Cancer. |
Table 2
Proteins contained in the prognostic model.
Protein | Description | Gene symbol | Coefficient | HR | 95% CI | P-value |
ACC1 | acetyl-CoA carboxylase 1 | ACACA | 0.438 | 2.9 | 1.9–4.4 | 2.60E-07 |
IGFBP2 | insulin-like growth factor binding protein 2 | IGFBP2 | 0.251 | 1.7 | 1.4–2.2 | 1.50E-06 |
MIG6 | mitogen-inducible gene 6 | ERRFI1 | -0.229 | 0.35 | 0.21–0.58 | 5.00E-05 |
PEA15 | proliferation and apoptosis adaptor protein 15 | PEA15 | 0.283 | 3.2 | 1.6–6.1 | 0.00055 |
RAD51 | RAD51 recombinase | RAD51 | 0.348 | 5.1 | 2.4–11 | 2.90E-05 |
Validation of the five-protein prognostic model in the testing set and whole set
Validation was conducted in the testing set and whole set with the same method above. Patients were also divided into high-risk and low-risk group with the same cut-off value (Fig. 3B-C) and More death events occurred in the high-risk group (Fig. 3E-F). Different expression of five proteins was also demonstrated between high-risk and low-risk group (Fig. 3H-I). Meanwhile, the KM curve show the OS was significantly poorer in the high-risk group than that in the low-risk group (both p < 0.001) (Fig. 4B-C). The AUC were 0.730, 0.705, 0.725 and 0.763, 0.732, 0.755 for 1-year, 3-year and 5-year OS in the testing set and whole set, respectively (Fig. 4E-F). All of the results were consistent with that in the training set which suggested the reliability of our model.
Moreover, the expression of the five proteins was also validated in the Human Protein Atlas online database. The immunohistochemistry (IHC) staining show a higher expression of ACC1, PEA15 and RAD51 in tumor tissue than that in normal tissue (Fig. 2C). However, data of IGFBP2 and MIG6 was not found.
Independent prognostic value of the five-protein model in the whole set
Univariate Cox regression and multivariate Cox regression analysis were performed to detect whether the risk score calculated by our five-protein model had independent prognostic value from other clinicopathological parameters including age, gender, grade and American Joint Committee on Cancer (AJCC) stage. Univariate Cox regression analysis indicated that age, grade, AJCC stage and risk score were related to the OS of KIRC (Fig. 5A). However, only age, AJCC stage and risk score were confirmed as independent prognostic factors by multivariate Cox regression analysis (Fig. 5B). Moreover, stratified analysis revealed that patients in high-risk group had significantly poorer OS than patients in low-risk group according to the age, gender, grade and AJCC stage (all p < 0.001) (Fig. 6A-D). Furthermore, the risk score was also significantly increasing with the grade (p < 0.001), AJCC stage (p < 0.001) and age (p = 0.033) progress, but not with gender (p = 0.41) (Fig. 6E-H).
Constructing and estimating the protein-based nomogram in the whole set
In order to construct a more reliable predictive model for clinical practice, a composite nomogram was established including three independent prognostic factors (age, AJCC stage and risk score) (Fig. 5C). The calibration plot indicated that the 1-year, 3-year and the 5-year OS predicted by nomogram was in good agreement with the actual outcome (Fig. 7A-C). The AUC of nomogram was 0.855, 0.817 and 0.799 for 1-year, 3-year and 5-year OS, respectively, which were larger than that of age, AJCC stage and risk score (Fig. 7D-F). The DCA demonstrated the nomogram could improve clinical net benefit especially for 5-year OS (Fig. 7G-I). Taking together, the results suggested that our protein-based nomogram could perform best in predicting the OS for KIRC.
Correlation analysis show that most of the co-expressed proteins were associated with RAD51 (Fig. 8A). Several cancer-promoting proteins were positive related to the five proteins in the model such as TIGAR and EEF2 (Fig. 8B). Apoptotic signaling pathway, p53 signaling pathway and PI3K-Akt signaling pathway were enriched by the functional enrichment analysis of the co-expressed proteins (Fig. 8C-D).