Basic characteristics of patients
According to the inclusion and exclusion criteria, 34,957 patients diagnosed with HCC between 2004 and 2015 were included for analysis (Figure 1). The entire cohort was then randomly divided into a training set (n = 24,469) and validation set (n = 10,488) at a ratio of 7:3. The basic clinicopathological features of the entire cohort as well as those of the training and validation sets are shown in Table 1. In the entire cohort, most patients were Caucasian (67.9%), male (77.2%), and younger than 60 years (46.0%). In terms of therapy for HCC, 67.3% of patients did not undergo surgery for the primary nodules, 13.2% of patients were treated by local tumour destruction, 11.8% of patients underwent liver resection, and 7.7% of patients underwent liver transplantation surgery. Regarding tumour characteristics, tumours diameter smaller than 3 cm (33.3%) and stage I per AJCC criteria (41.9%) were most common. There were no significant differences among clinicopathological features between the training and validation sets.
Follow-up and survival analysis of patients
The median follow-up time was 63 months (range: 1–155 months) months. Of the 34,957 patients, 9,840 had survived throughout the follow-up, 21,044 died from HCC, and 4073 died from other causes. For the training set, the 5-year OS, CSM, and death due to other causes were 24.2%, 63.9%, and 11.9%, respectively. The 3- and 5-year cumulative incidence of mortality and CIF curves corresponding to each clinicopathological variable are shown in Table 2 and Figure 2.
In the traditional survival analysis for CSM, the cumulative incidence of CSM estimated by Kaplan-Meier function were higher than that estimated by CIF, the 5-year cumulative incidence of CSM estimated by Kaplan-Meier analysis in the training set was 69.1%. (Table S1)
Identification of risk factors and construction of nomograms
Univariate and multivariate analysis were performed in the training set to identify independent risk factors associated with CSM and OS (Table 3, 4 and Table S3).
For CSM, multivariate analysis based on Fine-Gray’s competing risk analysis showed that age, race, surgical therapy, chemotherapy, radiotherapy, tumour diameter, and tumour stage were independent risk factors for CSM. Specifically, old age, white race, absence of surgical therapy, absence of radiotherapy, absence of chemotherapy, larger tumour diameter, and advanced tumour stage were associated with increased probability of CSM. In the multivariate traditional Cox regression analysis, these seven variables were again identified as independent risk factors of CSM. Moreover, sex and marital status were identified as additional independent risk factors.
Multivariate Cox regression analysis showed that all the aforementioned nine variables were independent risk factors for OS. Specifically, old age, male sex, white race, unmarried status, absence of surgical therapy, absence of radiotherapy, absence of chemotherapy, larger tumour diameter, and more advanced tumour stage were associated with poorer OS.
Based on the associated independent risk factors in corresponding multivariate analysis, the nomogram for predicting OS and competing risk nomogram for predicting CSM were constructed (Figure 3). Nomogram, which integrate various prognostic factors, is an easy-to-apply graphical tool for personalized prediction of survival probability of patients. In the nomogram, each variable has corresponding points that cross the scale. After adding the points of all variables, we can obtain the total points, which can be used to estimate the probability of event occurrence by drawing a line downward from the location of the total points to the survival axes.
Predictive performance of nomogram models
The predictive performance of nomogram models was verified via the C-index and calibration curve in the training and validation sets.
For the competing risk nomogram for CSM, the median unadjusted C-index for different years of the model reached 0.749(Range, 0.740-–0.781) in the training set and 0.754 (Range, 0.746-0.784) in the validation set, respectively. Furthermore, the median 10-fold cross-validation adjusted C-index in the entire cohort reached 0.750 (Range, 0.741–0.781), which indicated the robustness of the model. The calibration plots also displayed good agreement between the predictions of the competing risk nomogram model and observation in the probability of 3- and 5-year CSM in the training and validation sets (Figure 4 and Figure S1). For the traditional model, the median unadjusted C-index reached 0.758(Range, 0.753-0.783) in the training set and 0.759(Range, 0.754-0.780) in the validation set, respectively, and the adjusted C-index reached 0.755(Range, 0.751-0.779) in the entire set. (Table S2)
For the OS nomogram, the C-index values were 0.745 (Range, 0.740-0.768) in the training set and 0.743(Range, 0.740-0.764) in the validation set, and the adjusted C-index reached 0.742(Range, 0.738-0.764) in the entire set (Table S2). Calibration curves for 3- and 5-year were also well-matched with the standard lines (Figure 4 and Figure S1).
Based on the nomograms, each patient was assigned corresponding total points for probability of CSM and OS. The median total points calculated for CSM and OS were 148 (range: 7–254) and 169 (range: 16–298) in the training set and 148 (range: 7–254) and 169 (range: 16–296) in the validation set. Based on previously reported cut-off points (16th, 50th, and 84th of total points in the training set)[16], patients were categorised into four distinct risk groups. CIF and Kaplan-Meier analysis also showed that the curves corresponding to the four risk groups were clearly separated in the training and validation sets (both p < 0.001), further supporting the good predictive performance of the nomogram models (Figure 5).