1. Study design and patient data
Our study included two major databases, with an internal training cohort used to build the signature and an external cohort used to validate the signature. A total of 707 patients from Shanghai Jiao Tong University Affiliated Sixth People’s Hospital were screened for internal cohort to construct our signature. An external cohort from the Sixth Affiliated Hospital of Sun Yat-sen University with 493 cases, validated our signature. In all, the two cohorts used in our signature included 1200 patients. (Fig. 1).
In our study, we collected clinical information and follow-up data from the internal training cohort patients. The characteristics of the training cohort were shown in Table 1. 431 patients (61.0%) were male, and others were female. The median age of these patients was 65. The Median follow-up time was 47 months. Majority of patients were diagnosed with T4 stage (74.8%), while patients with T1 and T2 accounted for only 2.7%. Most patients had N0 stage (59.3%), and 288 patients had lymph node metastasis, including 170 cases with T1 and 118 cases with T2. The mean values of CEA, CA125 and CA19-9 are 13.97 ng/ml, 21.62 U/ml and 23.8 U/ml, respectively. 495 patients were still alive by the time of follow-up, while 212 patients died.
Table 1
Patient demographics and clinicopathologic data of internal training cohort (NA, not available; CSA, cross section area)
Variables
|
CRC patients
(N = 707)
|
Gender
|
|
Male
|
431 (61.0%)
|
Female
|
276 (39.0%)
|
Median follow-up months
|
47.0
|
Median age (mean ± SD)
|
65 (64.40 ± 11.78)
|
T stage
|
|
T1 + T2
|
19 (2.7%)
|
T3
|
159 (22.5%)
|
T4
|
529 (74.8%)
|
N stage
|
|
N0
|
419 (59.3%)
|
N1
|
170 (24.0%)
|
N2
|
118 (16.7%)
|
Tumor location
|
|
Right colon
|
224 (31.7%)
|
Left colon
|
216 (30.6%)
|
Rectum
|
267 (37.8%)
|
Tumor CSA (mean ± SD)
|
21.91 ± 29.03 cm2
|
Pathological type
|
|
Adenocarcinoma
|
664 (93.9%)
|
Mucinous Adenocarcinoma and Signet-ring cell carcinoma
|
43 (6.1%)
|
Degree of histologic differentiation
|
|
Moderate and well
|
420 (59.4%)
|
Poor
|
287 (40.6%)
|
Lymphatic infiltration
|
|
Present
|
283 (40.0%)
|
Absent
|
424 (60.0%)
|
Vascular infiltration
|
|
Present
|
78 (11.0%)
|
Absent
|
629 (89.0%)
|
Nerve infiltration
|
|
Present
|
675 (95.5%)
|
Absent
|
32 (4.5%)
|
Median Ki67 (Range)
|
60% (5%-90%)
|
NLR (mean ± SD)
|
3.68 ± 5.01
|
PLR (mean ± SD)
|
179.30 ± 134.68
|
CEA value (mean ± SD, ng/ml)
|
13.97 ± 39.05
|
CA125 value (mean ± SD, U/ml)
|
21.62 ± 31.11
|
CA19-9 value (mean ± SD, U/ml)
|
23.98 ± 40.10
|
Metastasis or recurrence
|
|
Yes
|
201 (28.4%)
|
No
|
506 (71.6%)
|
Survival status
|
|
Alive
|
495 (70.0%)
|
Dead
|
212 (30.0%)
|
2. Univariate and multivariate analyses of clinical indicators for prognostic prediction in the internal training cohort
To select the variables that were suitable for inclusion in our signature, we performed univariate analyses. Our results suggested that 9 variables were prognostic factors for OS, including T stage, N stage, pathological type, histologic differentiation, lymphatic infiltration, vascular infiltration, CEA, CA125 and CA19-9 (Table 2; p < 0.05).
Table 2
Univariate and multivariate Cox regression analyses for OS (training cohort)
Variables
|
OS
|
|
|
HR (95% CI)
|
p-value
|
Univariate analysis
|
|
|
Gender (male vs. female)
|
1.097 (0.826–1.456)
|
0.522
|
Age (༞65 vs. ≤65)
|
0.994 (0.756–1.307)
|
0.966
|
T stage (T4 vs. T3)
|
1.656 (1.155–2.373)
|
0.006
|
N stage
|
|
|
N0
|
1 (Referent)
|
|
N1
|
1.858 (1.324–2.606)
|
0.000
|
N2
|
3.514 (2.541–4.860)
|
0.000
|
Tumor location
|
|
|
Right colon
|
1 (Referent)
|
|
Left colon
|
1.050 (0.741–1.487)
|
0.784
|
Rectum
|
1.116 (0.802–1.552)
|
0.516
|
Tumor CSA
|
0.997 (0.990–1.004)
|
0.433
|
Pathological type (adenocarcinoma vs. other)
|
0.529 (0.340–0.823)
|
0.005
|
Histologic differentiation (poor vs. moderate and well)
|
1.583 (1.204–2.081)
|
0.001
|
Lymphatic infiltration (present vs. absent)
|
1.560 (1.159–2.099)
|
0.003
|
Vascular infiltration (present vs. absent)
|
2.154 (1.509–3.077)
|
0.000
|
Nerve infiltration (present vs. absent)
|
2.222 (0.826–5.978)
|
0.114
|
Ki67
|
0.997 (0.990–1.005)
|
0.494
|
NLR
|
1.017 (0.994–1.040)
|
0.159
|
PLR
|
1.001 (1.000-1.002)
|
0.148
|
CEA
|
1.005 (1.003–1.007)
|
0.000
|
CA125
|
1.004 (1.000-1.007)
|
0.024
|
CA19-9
|
1.006 (1.003–1.008)
|
0.000
|
Multivariate analysis
|
|
|
T stage (T4 vs. T3)
|
1.468 (1.019–2.116)
|
0.039
|
N stage
|
|
|
N0
|
1 (Referent)
|
|
N1
|
1.891 (1.246–2.869)
|
0.003
|
N2
|
3.782 (2.471–5.787)
|
0.000
|
Pathological type (adenocarcinoma vs. other)
|
0.567 (0.357–0.902)
|
0.017
|
Histologic differentiation (poor vs. moderate and well)
|
1.332 (0.995–1.782)
|
0.054
|
Lymphatic infiltration (present vs. absent)
|
0.631 (0.420–0.948)
|
0.026
|
Vascular infiltration (present vs. absent)
|
1.597 (1.098–2.324)
|
0.014
|
CEA
|
1.003 (1.001–1.005)
|
0.012
|
CA125
|
1.005 (1.002–1.009)
|
0.002
|
CA19-9
|
1.003 (1.001–1.006)
|
0.005
|
Then, we used these variables as potential prognostic factors to construct prognostic signature with the Cox proportional hazards model. AIC, stepwise, and backward analyses were performed as variable selection methods. Finally, 8 indicators were selected to construct a prognostic signature for OS in CRC, including T stage, N stage, histologic differentiation, lymphatic infiltration, vascular infiltration, CEA, CA125 and CA19-9. Our multivariate analysis results suggested that these 8 variables were all independent prognostic factors except tumor histologic differentiation (table 3; p < 0.05). The C-index was 0.72, corrected with 1,000 permutations (supplementary table 1).
3. Evaluation and determination of the accuracy and predictive power of the signature
To evaluate the prognosis of patients more intuitively, we developed nomogram with Cox regression model. All the variables in the nomogram had a weighted score, and we could predict the 3-year or 5-year survival outcome by the sum of the scores (Fig. 2). To further examine the importance of these variables and calculate the risk score, we developed a nonparametric approach in our signature using random survival forest. Logarithmic transformation was performed for CEA and CA125. Our results suggested that N stage had the largest influence on OS with a VIMP of positive value 0.7217, followed by vascular infiltration, tumor histologic differentiation, CA125, T stage, CEA and CA19-9 (supplementary Fig. 1). The predictive accuracy of the OS signature using time-dependent ROC analysis was relatively high. The AUC of our signature based on the risk score was 0.761 at 3 years and was 0.741 at 5 years (Fig. 3a and 3b). The calibration curves for CRC based on our signature showed excellent correlation between predicted and observed outcomes for OS prediction at 3 years and 5 years (Fig. 3c and 3d). All these results indicated that our signature had good accuracy and prediction ability.
4. Prognosis Among Groups With Different Risk Scores
We assessed patients according to the risk score achieved from the signature using these different variables. Then, patients were classified into three groups (high, intermediate, and low risk) by the cutoff of the risk score. Kaplan-Meier curves were applied to compare survival differences. Compared with the low-risk group, the intermediate- and high-risk groups had hazard ratios of 3.28 (95% CI, 2.37–4.52) and 8.67 (95% CI, 5.86–12.80), respectively (Fig. 4, p < 0.0001). The 5-year OS rate of low risk group was 77%, much higher than intermediate group (46%) and high risk group (8%).
5. Validation Of Our Signature In An External Cohort
Based on the previous results, our signature showed good accuracy. To verify whether our signature was suitable for other hospitals or centers, we collected data from an external cohort for validation. Data were collected from a total of 493 CRC patients. Univariate analysis results showed that the variables included in our predictive OS signature of CRC were all prognostic factors (supplementary table 2, p < 0.05) except CA125. The calibration curves suggested perfect correlation between predicted and observed outcomes for OS prediction at 2 years and 3 years (Fig. 5a and 5b). In addition, according to the scoring criteria of our signature, we divided the patients from the external cohort into three groups (high, intermediate, and low risk), and we obtained similar results. The survival among the groups was obviously different. The OS between the three groups were significantly different (supplementary Fig. 2, p < 0.001).
6. Website for predicting the prognosis of stage II and III CRC patients
Based on our signature of 1200 patients from two cohorts, we developed a website for predicting the prognosis of stage II-III CRC (http://115.28.66.83/liuyuan/coad.php).