Clinical Characteristics
From the data obtained from 2004 to 2013, 4616 patients with synchronous colorectal carcinomas (SCC) in the SEER database were found. Patient characteristics are shown in Table 1. There are significant correlations in age and slight correlation in pN, pM, examined lymph nodes, Surg Prim Site, and chemotherapy between the twice synchronous colorectal. Patients with SCC were mostly older (>65 years), more often men, and likely to have a depth of invasion by T3. Tumors were mostly situated in the cecum, ascending colon and sigmoid colon.
Results of the selected variables with Cox regression and LASSO combined with Cox regression are listed in Table 2. Table 2 indicates that the age of the first time SSC diagnosis, sex, first time size, first time surgery, second time marital, second time grade, second time chemotherapy, first and second times pT, pN, pM, regional nodes examined and site of disease was significantly associated with overall survival (OS) by Cox regression. Table 2 also indicates that the age of first time SSC diagnosis, sex, second-time chemotherapy, first and second times pT, pN, pM, regional nodes examined were significantly associated with overall survival (OS) by LASSO combined with Cox regression.
Results from the relation between first and second times pT, pN, pM, grade and regional nodes are listed in Table 1.
Predictive Variable Selection
31 variables were reduced to 11 or 16 potential predictors on the basis of 4616 patients by LASSO combined with Cox regression or Cox regression in the internal cohort (Figure 2A, 2B, 2C) and were featured with nonzero coefficients in the LASSO Cox regression model or the minute AIC in Cox regression model (Figure 2D).
Development of COX Model and LASSO Model
The multivariable regression model for age, sex, marital, race, site, pT, pN, pM, radiation, chemotherapy, surgery, nodes examined, etc. were included in the Cox regression after variables were selected by the LASSO Cox regression or Cox regression. We showed hazard ratios with 95% CIs for covariates which are included in Table 2.
Apparent Performance of the LASSO Model or COX Model in the Internal Cohort
The calibration curves of the LASSO model and COX model for the probability of overall survival (OS) in 3-5 years between prediction and observation in the internal cohort (Figure 3A,3B, 3C,3D) were plotted to assess the calibration of the COX model and LASSO model, which were accompanied with the Hosmer-Lemeshow test (A significant test statistic implies that the model calibrates perfectly).
Validation of the LASSO Model and COX Model
Internally validation was tested using the internal cohort. The external validation was tested in the external cohort. The LASSO model was formed in the internal cohort and was applied to all the patients of the external cohort. The calibration curves in 3-5 years (Figure 4A, 4B) were derived on the basis of the regression analysis.
C-index and AIC
To quantify the discrimination performance of the COX model, LASSO model, TNM model, and TTNNMM model, Harrell’s C-index and AIC were applied (Table 3). The C-index for the COX model, LASSO model, TNM model and TTNNMM model were 0.710 (95% CI, 0.703 to 0.717), 0.712 (95% CI, 0.705 to 0.719), 0.637 (95% CI, 0.631 to 0.644) and 0.651 (95% CI, 0.644 to 0.657), which were confirmed to be 0.710, 0.712, 0.637 and 0.651 via bootstrapping validation. The AIC for the COX model, LASSO model, TNM model, and TTNNMM model were 33431, 33420, 34043, 33994. The 1-,3-,5- years AUC for the COX model, LASSO model, TNM model, and TTNNMM model are shown in Table 4.
Predictive Accuracy of COX Model and LASSO Model
According to the survROC curves for 1-,3-,5- years overall survival (OS) for the COX model, LASSO model, TNM model, and TTNNMM model (Figs 5A,5B,5C,5D), the ROC curve (a general measure of predictiveness) was found to be greater in 3- and 5- years.
Whether Apparent Different Performance of The LASSO and COX Model
TimeAUC
Time-dependent ROC curves were generated to compare the sequential trends of the LASSO, COX, TNM and TTNNMM model for OS. The time-dependent ROC curve of the LASSO model was continuously superior to that of the COX model, TNM model and TTNNMM model (Figure 6).
BIC
The prognostic performances of the LASSO, COX, TNM, and TTNNMM model were compared using BIC, which is not only a measure of the goodness of fit of an estimated statistical model but also accurately considers the number of parameters included in the model. As shown in Figure 7, there was no significant difference between the COX and LASSO model after the bootstrap analysis (BIC 4.49,95% CI, –2.92–11.91) but there was a significant difference between the TNM and LASSO model (BIC 1178.76,95% CI,1171.15–1186.37), also TTNNMM and LASSO model (BIC 1098.57,95% CI,1092.05–1105.09).
Net Reclassification Improvement (NRI) and Integrated Discrimination Improvement (IDI)
The discriminant ability for LASSO model, COX model, TNM model, and TTNNMM model was calculated using NRI and IDI (Table 5). Compared to the TNM model and TTNNMM model, LASSO model was found to be a higher discriminant and possess reclassification indices (integrated discrimination improvement 0.072 and 0.064; p < 0.001; net reclassification improvement 0.525 and 0.466) (Table 4). In addition, compared to the COX model, the LASSO model doesn’t significantly decrease the discriminant and reclassification indices (integrated discrimination improvement -0.002, p, 0.058; NRI -0.009) (Table 5).
Clinical Use
Decision curve analysis was conducted to determine the clinical usefulness of the LASSO model by quantifying the net benefits at different threshold probabilities. We also plotted the decision curve for the four models in 3-5 years (Figure 8A, 8B).
Visualization of SCC Survival Prediction Model
Survival prediction model of the nomogram was established based on factors selected by LASSO combined with the Cox regression (Figure 9). The nomogram showed that first time age had the most contribution to prognosis, followed by first- and second-times T stage, N stage, metastases and examined lymph nodes. Sex had a modest effect on survival. Each subtype of the variables was assigned a score. A straight line can be drawn down at each time point on the total point scale to determine the estimated probability of survival, according to the total number of points. For each predictor, the points assigned on the 0–10 scale at the top are read and then these points are added. The number on the “Total Points” scale were found and then the corresponding predictions of 3-, and 5-year risk are recorded.