Baseline information
Of all the 957 HCC patients, the median age was 50 years (IQR, 42–59 years). Most of the patients were males (84.3%, n = 807). BCLC stage 0 and stage 1 accounted for 79.7% (n = 763) of the patients, while stage 2 and 3 took up the rest. There were no patients who were staged 4 in this dataset. There were 56.9% (n = 545) well-differentiated patients and 43.1% poorly-differentiated patients. 67.6% (n = 647) patients had liver cirrhosis. Liver capsule invasion was found in 54.9% (n = 525) cases. MVI was found in 33.2% (n = 318) cases. Satellite lesions were found in 15.2% (n = 145) cases.
The median OS was 38 months (IQR, 19–58 months), and the median DFS was 24 months (IQR, 7–45 months). At year 1, the OS rate was 83.8% (95% CI, 81.5%-86.1%) and the DFS rate was 63.7% (95% CI, 60.6%-66.7%). At year 3, the OS rate was 62.0% (95% CI, 58.9%-65.1%) and the DFS rate was 43.5% (95% CI, 40.4%-46.7%). At year 5, the OS rate was 52.9% (95% CI, 49.8%-56.1%) and the DFS rate was 36.2% (95% CI, 33.2%-39.3%).
Identification of risk factors
To build Cox proportional hazard models, gender imbalance should be substantially relieved within current dataset. We applied the SMOTE method to oversample the female group. As a result, a number of 666 female cases were added to this dataset using the SMOTE algorithm. Sequentially, the OS and DFS models were established with the oversampled dataset, namely a male group of 807 patients and a female group of 816 patients.
After backward step-wise selection, only age, gender, tumor size, BCLC stage, differentiation, tumor edge, lymph node metastasis, liver capsule invasion, MVI, satellite lesions, AST and HBV remained in the OS model and age, gender, tumor size, BCLC stage, differentiation, tumor edge, lymph node metastasis, liver capsule invasion, MVI, satellite lesions, HBV and AFP remained in the DFS model.
For the OS model, male gender (HR = 1.701; 95% CI, 1.414–2.048), tumor size (HR = 1.197; 95% CI, 1.161–1.233), BCLC stage (stage1/stage0 HR = 1.450; 95% CI, 1.192–1.764 and stage3/stage0 HR = 5.914; 95% CI, 4.273–8.183), tumor edge (HR = 2.406; 95% CI, 1.182–4.897), lymph node metastasis (HR = 6.642; 95% CI, 3.377–13.063), liver capsule invasion (HR = 1.418; 95% CI, 1.134–1.773), MVI (HR = 1.404; 95% CI, 1.181–1.669) and satellite lesions (HR = 2.205; 95% CI, 1.726–2.817) were identified as statistically significant risk factors of HCC (Fig. 1). For the DFS model, male gender (HR = 1.445; 95% CI, 1.246–1.676), tumor size (HR = 1.134; 95% CI, 1.111–1.157), BCLC stage (stage1/stage0 HR = 1.523; 95% CI, 1.292–1.795 and stage2/stage0 HR = 1.279; 95% CI, 1.010–1.620 and stage3/stage0 HR = 4.275; 95% CI, 3.160–5.784), poorly differentiated tumor (HR = 1.337; 95% CI, 1.168–1.531), lymph node metastasis (HR = 2.953; 95% CI, 1.457–5.987), liver capsule invasion (HR = 1.215; 95% CI, 1.006–1.467), MVI (HR = 1.674; 95% CI, 1.446–1.937) and satellite lesions (HR = 1.912; 95% CI, 1.541–2.371) were also identified as statistically significant risk factors of HCC (Fig. 1).
The effects of several variables were associated with time after the analysis. For example, HR of lymph node metastasis decreased from 6.642 (95% CI, 3.377–13.063) to 3.073 (95% CI, 0.753–12.540) for the OS model (Table 1). However, HR of lymph node metastasis increased from 2.953 (95% CI, 1.457–5.987) to 5.036 (95% CI, 1.210-20.957) for the DFS model (Table 2).
Model robustness and performance
Calibration plots at year 1, 3, 5 for the OS and DFS models are displayed in Fig. 2. Via a bootstrap resampling method, agreement between the predicted and observed survival was tested every 100 samples, and the plots showed good consistency with no obvious sign of overfitting. Further, C-index values were calculated for OS and DFS models, and they both suggested good performance (C-index = 0.748 for the OS model and C-index = 0.732 for the DFS model).
To investigate the different performances of proposed landmark-based model and conventional Cox model, the prediction ability of the models was assessed using ROC curves, as shown in Fig. 3. Regarding OS prediction, the area under the curve (AUC) was 0.683 and 0.714 at baseline (Cox model) and after 1-year (landmark-based model) respectively, demonstrating a correct recognition rate of 68.3% and 71.4% for the event of death respectively. These results suggest that the landmark-based model exhibited higher prediction ability than conventional Cox model as the follow-up time increased. Similarly, the landmark-based model (AUC = 0.710) outperformed conventional Cox model (AUC = 0.660) as well.
Prediction workflow instances
From the proposed models above, OS and DFS predictions of individual patients could be conducted with ease. Given all the necessary variables for OS or DFS models, survival probabilities at any time could be estimated. For example, a 36-year-old man had an HCC lesion sized 10 centimeters which was well differentiated and staged 1. He also had HBV and MVI were present, but lymph node metastasis, tumor edge, liver capsule invasion and satellite lesions were not present. Laboratory tests showed his AFP was 1211 and AST was 74 (U).
Via the current models, his 3-year OS at baseline was estimated as 14.4% (95% CI, 5.4%-38.5%) and his 3-year OS estimation after 1 year increased to 49.1% (95% CI, 40.4%-59.7%). Similarly, his 3-year DFS estimation at baseline was 10.9% (95% CI, 6.7%-17.7%), and his 3-year DFS estimation after 1 year increased to 53.6% (95% CI, 33.7%-85.1%) (Fig. 4).