Basic characteristics
This study included 2249 patients (70%) in the training set, average age 42.76 ± 11.71 years (619 males, 1630 females). The validation set comprised 964 patients (30%), average age 43.09 ± 11.79 years (260 males, 704 females). The test set included 533 patients, average age 41.67 ± 11.21 years. The demographics, characteristics of ultrasound images, and intraoperative findings in patients with PTC in the three datasets are listed in Table 1. Consistency analyses for continuous variables showed overall data balance (p>0.05).
Establishment of Traditional Nomogram Prediction Model
To investigate the effect of risk factors on total PLNM, we first investigated the relationship between clinical characteristics and total PLNM by univariate analysis. In the logistic regression analysis, the following demographic data were significantly associated with total PLNM: age ≤ 45 years, male gender, irregular tumor border, microcalcification, abundant tumor vascularization, abundant tumor Peripheral blood flow, ETE, tumor size > 10mm, multifocality, location in under, side of position, T staging as T3-T4, prelaryngeal LNM, prelaryngeal LNMR, prelaryngeal NLNM, pretracheal LNM, pretracheal LNMR, pretracheal NLNM, prelaryngeal and pretracheal LNM, prelaryngeal and pretracheal LNMR and prelaryngeal and pretracheal NLNM (Table 1, Figure 2a).
Then, variables with P value<0.05 in univariate analysis were screened out for multivariate analysis using LR forward stepwise selection. The results showed that age ≤ 45 years(OR=1.595, 95% CI=1.31-1.94, P<0.001), male gender (OR=1.75, 95% CI=1.42-2.15, P<0.001), microcalcification (OR=0.89, 95% CI=0.82-0.96 , P=0.002), abundant tumor Peripheral blood flow (OR=2.03, 95% CI=1.28-3.23, P=0.003), ETE (OR=1.88, 95% CI=1.22-2.88, P=0.004), tumor size > 10mm (OR=2.06, 95% CI=1.68-2.51, P<0.001), multifocality (OR=1.51, 95% CI=1.16-1.96, P=0.002), prelaryngeal LNM (OR=1.51, 95% CI=1.08-2.11, P=0.017), pretracheal LNM (OR=3.17, 95% CI=(1.97-5.11, P<0.001) and prelaryngeal and pretracheal LNM (OR=2.08, 95% CI=(1.23-3.52, P=0.006)showed significant correlations with total PLNM in PTC patients (Table 2).
These were included in the nomogram prediction model, showing good consistency (Figure 2c-d). These were included in the nomogram prediction model, showing good consistency (Figure 2c-d). ROC values for the training, validation, and test sets were 0.85, 0.844, and 0.769, respectively (Figure 3a-b).
Machine learning model evaluation and Visual interpretation using SHAP
27 variables involving clinical characteristics, ultrasound features, and intraoperative frozen pathology were used to establish 9 machine learning-based preoperative prediction models for total PLMN. In the training set, the XGBoost model achieved the highest AUC (AUC=0.935), the highest F1 score (F1 score=0.816), and the lowest FPR (FPR=0.051) (Table 3 and Figure 4 a1). In the validation set, the XGBoost model also achieved a higher AUC value (AUC=0.857), a relatively high F1 score (F1 score=0.667), and a relatively low FPR (FPR=0.104) (Table 3 and Figure 4 a2). In the test set, the XGBoost model also achieved a relatively high AUC value (AUC=0.775), a relatively high F1 score (F1 score=0.610), and a relatively low FPR (FPR=0.161) (Table 3 and Figure 4 a3). At the same time, we also created 9 machine learning-based preoperative prediction models for ipsilateral PLMN and contralateral PLMN, In the training set, the XGBoost model achieved the highest AUC , the highest F1 score and the lowest FPR; In the validation and test set, the XGBoost model also achieved a higher AUC value , a relatively high F1 score and a relatively low FPR (supplementary materials). Additionally, the calibration curves and Brier scores showed low error rates (Figure 4 c1-c3). The precision-recall curves showed that XGBoost has high precision and accuracy (Figure 4 d1-d3). Compared to other models, XGBoost shows better performance on both the ROC curve and the decision curve, indicating better clinical applicability (Figure 4 a1-a3, Figure 4 b1-b3). Therefore, this study has identified XGBoost as the optimal predictive model.
The importance of SHAP features for the XGBoost model is shown in Figure 5a.The SHAP feature importance plot of the XGBoost model shows the ranking of the assessed features from the highest to the lowest average absolute SHAP value. This ranking indicates the relative impact each feature has on the model’s predictions. The bar chart displays the distribution of lymph node metastasis (LNM) cases (red) and non-metastasis cases (blue) (Figure 5b). The top ten features for total PLNM are: prelaryngeal and pretracheal LNMR, tumor size, pretracheal LNMR, prelaryngeal and pretracheal LNM, age, tumor border, pretracheal LNM, pretracheal NLNM, side of position, and hyperechoic. The top ten features for ipsilateral PLNM are: prelaryngeal and pretracheal LNMR, tumor size, pretracheal LNMR, tumor border, age, prelaryngeal and pretracheal LNM, hyperechoic, pretracheal LNM, BMI, prelaryngeal and pretracheal NLNM, side of position (supplementary materials). The top ten features for contralateral PLNM are: ipsilateral central LNMR, tumor border, prelaryngeal and pretracheal LNMR, ipsilateral central NLNM, ipsilateral pretracheal LNMR, side of position, BMI, location, age, ipsilateral pretracheal NLNM (supplementary materials). The SHAP summary plot reflects the relationship between the feature values and the predicted probability (Figure 5c). For example, higher values of prelaryngeal and pretracheal LNMR, tumor size, pretracheal LNMR, and prelaryngeal and pretracheal LNM are associated with a higher probability of PLNM. Conversely, older age (>45 years) and a clear tumor border are associated with a lower probability of total PLNM. The SHAP interaction plot displays the interactions between different features and their impact on the model's predictions (Figure 5d).
The SHAP decision plot (Figure 6) illustrates how each key feature influences the final decision, providing a clear visualization of the relative importance of different features in the model’s predictions. Each colored line represents the prediction for an individual patient, and the positive or negative SHAP values for each feature indicate whether that feature has a positive or negative impact on the final prediction. At the top, each line intersects the x-axis, indicating the corresponding predicted value, which was the model’s final prediction probability. By visualizing the position of a specific sample on the decision plot, we can analyze the key factors that led to the classification of that sample. This type of interpretable analysis helps open the "black box" of the model, making the prediction process more transparent and enhancing the credibility of the model in real-world applications.
Website-based tool
The contribution heat map and bar plot (Figure 7a-b) provide a visual representation of the contribution of each feature to the model output. The top 10 ranked feature variables are: Prelaryngeal and pretracheal LNMR (SHAP value = 0.4, contribution rate = 21.31%), Tumor size (SHAP value = 0.35, contribution rate = 18.80%), Pretracheal LNMR (SHAP value = 0.27, contribution rate = 14.43%), Prelaryngeal and pretracheal LNM (SHAP value = 0.21, contribution rate = 10.94%), Age (SHAP value = 0.16, contribution rate = 8.45%), Tumor border (SHAP value = 0.14, contribution rate = 7.55%), Pretracheal LNM (SHAP value = 0.1, contribution rate = 5.46%), Pretracheal NLNM (SHAP value = 0.07, contribution rate = 3.68%), Side of position (SHAP value = 0.07, contribution rate = 3.51%), Calcification and BMI (SHAP value = 0.11, contribution rate = 5.87%), with a gradual decrease in the positive contribution to total PLNM (Figure 7c). A web-based calculator has been developed using the top 10 feature variables, which is available at http://121.41.36.155:9002/static/html/index.html (Figure 7d). This tool allows clinicians to evaluate the risk of total PLNM and visualize the interpretation of the results at the individual level.