- Clinical characteristics of patients
The data from a total of 63836 patients were included in the study. Out of the patients, 42558 patients were placed within the training cohort, while 21278 were placed within a validation cohort. Figure. 1 shows a schematic of the screening process. The mean ages of patients within the training and validation sets were 62.41 ± 11.62 years and 62.43 ± 11.58 years, respectively. Tumor size was 6.58 ± 15.63 cm in the training cohort and 6.56 ± 15.62 cm in the validation group. In the training cohort, most patients (88.45%) were negative for LNM. The age of 3.49% patients was less than 40 years old, 10.11% patients were between 41 and 50 years old, 30.79% were between 51 and 60 years old, 31.86% were between 61 and 70 years old, and 23.75% were older than 71 years old. Most of the patients in both cohorts were white (82.40%), married (53.79%), and their tumor size was between 2cm to 5cm (49.36%). The pathological characteristics of the majority included EEA (87.27%), no myometrial invasion (65.10%), no cervical stromal invasion (79.80%) and tumors classified as grade 1 (41.50%). The two sets showed similar results for nearly all variables. Table 1 shows the details of demographic and pathological characteristics of the patients in the two cohorts.
- Risk factors for lymph node metastasis
Our univariate analysis considered age at diagnosis, marital status, race, tumor size, histological type, myometrial invasion, cervical stromal invasion, and tumor grade as potential risk factors for LNM from the training cohort data. After multiple logistic regression analysis, it was found that independent risk factors associated with LNM including age at diagnosis, tumor size, histological type, myometrial invasion, cervical stromal invasion, and tumor grade (Table 2). Among these independent risk factors, cervical stromal invasion was considered as a major predictor (OR=6.09, 95% CI 5.50-6.76, P<0.001). Other factors considered to be predictors of LNM included age (OR=1.01, 95% CI 1.01-1.02, P<0.001), tumor size 2-5cm (OR=1.51, 95% CI 1.34-1.70, P<0.001), tumor size 5-10cm (OR=2.71, 95% CI 2.39-3.06, P<0.001), tumor size ≥10cm (OR=3.38, 95% CI 2.90-3.95, P<0.001), histological type SEA (OR=1.78, 95% CI 1.61-1.97, P<0.001) and histological types other than EEA or SEA (OR=1.33, 95% CI 1.18-1.50, P<0.001), positive myometrial invasion (OR=2.77, 95% CI 2.18-3.52, P<0.001), tumor grade 2 (OR=2.27, 95% CI 2.04-2.53, P<0.001) and tumor grade 3 (OR=4.68, 95% CI 4.21-5.20, P<0.001). Analysis in the validation set revealed the same independent risk factors for LNM.
- Design and validation of the nomogram
Based on the independent risk factors identified in the multivariate regression analysis, we designed a nomogram to predict LNM in EC patients (Figure. 2). Among the variables considered in the predictive model, cervical stromal invasion was identified to be the most important predictive factor for the LNM nomogram. Point assignments and predictive scores for each variable in the nomogram models were calculated, with the total score corresponding to a predicted probability of LNM. The performance of the final model was assessed through discrimination and calibration analyses. Based on these analyses, the nomogram had an AUC value of 0.848 (95%CI: 0.843-0.853) for the training group, as compared with 0.806 (95%CI: 0.801-0.812) for the Mayo criteria (P<0.01; Figure. 3A). In the validation group, the AUC value was 0.847 (95%CI: 0.840-0.857) for the nomogram and 0.804 (95%CI: 0.796-0.813) for the Mayo criteria, respectively (P<0.01; Figure. 3B). The nomogram showed discrimination majority to the Mayo criteria for both the training and validation cohorts. The calibration curves for predicting LNM demonstrated that the nomogram was well-calibrated for both the training (Figure. 4A) and validation groups (Figure. 4B).
- Optimal threshold of the nomogram
Each patient was assigned a score using the calibrated nomogram. Then, an optimal cut-off value of 200 points was selected to maximize sensitivity and specificity of average scores in the ROC curve. Patients from training and validation cohorts were divided into low-risk (score < 200 points) and high-risk (score ≥ 200 points) groups. The performance of the nomogram stratification was compared to the Mayo criteria for predicting LNM. The nomogram showed better discrimination than the Mayo criteria in both the training (nomogram: AUC=0.754, 95%CI; Mayo: AUC=0.716, 95%CI;P<0.01; Figure. 5A) and the validation groups (nomogram: AUC=0.751, 95%CI ; Mayo: AUC=0.714, 95%CI ;P<0.01; Figure. 5B). In the training group, the LNM rates were 4.80% and 34.0% in low-risk and high risk groups, respectively, according to the nomogram, and 5.7% and 26.4%, respectively, according to the Mayo criteria (Table 4). In the validation cohort, the predicted rates of LNM were 4.8% and 33.7% in the low-risk and high-risk groups, respectively, according to the nomogram, and 5.6% and 25.9%, respectively, according to the Mayo criteria (Table 5).
- Decision curve analysis
The decision curve analysis results for the nomogram and Mayo models are shown in Supplementary Figure 1A (training cohort) and Supplementary Figure 1B (validation cohort). For predicted probability thresholds between 0% and nearly 60%, the nomogram showed a positive net benefit for both cohorts.