Characteristics of the training and validation cohorts
In total, 596 SPNs patients were included in this retrospective study, including 406 patients from Sun Yat-sen University Cancer Center. Clinical, CT image, and laboratory data were presented in Supplement table 1. The data was randomly divided into training cohort (n=273) and internal validation cohort (n=133). And other 190 patients from Henan Tumor Hospital were used for external validation (Supplement table 2). The mean age (SD) of patients in the training cohort was 58.1 (10.0) years; 189 patients (69.2%) were men and 177 (64.8%) patients were diagnosed as MSPNs, including 148 (83.6%) adenocarcinoma, 20 (11.3%) squamous cell carcinoma and 9 (5.1%) others. In the internal validation cohort, the amounts for adenocarcinoma, squamous cell carcinoma, and others were 70 (81.4%), 13 (15.1%), and 3 (3.5%), respectively, and in the external validation cohort, the amounts for adenocarcinoma, squamous cell carcinoma, and others were 102 (81.6%), 16 (12.8%), and 7 (5.6%), respectively.
Predictors selection
To select the potential predictors for predicting malignancy of SPNs, we used LASSO logistic regression analysis. Figure 1A showed the change in trajectory of each variable was analyzed. Moreover, 10-fold cross-validation was employed for model construction, and the confidence interval under each λ was presented in Figure 1B. According to the 1-SE criteria, we selected λ = 0.037 as the optimal value for the model, which included 15 potential predictors (gender, age, fever, chest pain, diameter, calcification, pleural thickening, pleural adhesion, VC, FEV1, DLCO1, LSR, TBA, CEA, and NSE) with non-zero coefficients from the 63 candidate variables identified in the training cohort, and their coefficients were presented in Figure 1C. The clinical and laboratory data of these selected predictors in training cohort, validation cohort, and external validation cohort were presented in Table 1.
Construction and evaluation of the novel prediction model
For predicting each individual patient’s malignancy risk, the risk score was calculated for each patient with the following formula:
Risk score = -1.014 - (0.131* gender) + (0.025*age) - (0.281*fever) - (0.257*chest pain) + (0.306*diameter) - (0.793*calcification) - (0.188*pleural thickening) + (0.18*pleural adhesion) - (0.227*VC) - (0.083*FEV1) + (0.036*DLCO1) - (0.045* LSR) - (0.006* TBA) + (0.026*CEA) + (0.037*NSE).
Subsequently, we used the following formulas to calculate the probability of malignancy: probability (P) = erisk score /(1+erisk score), where e is the natural logarithm, the values for the continuous variables were medical recorded; gender = 1 if the patient was male (otherwise = 0); the value for the fever, chest pain, calcification, pleural thickening, pleural adhesion, equals 1 if the element exists, and 0 otherwise.
Finally, the calibration of model was analyzed using HL test. The new prediction model showed good calibration with the HL test (2 = 10.673, P = 0.221, Supplement Figure 1A). The AUC for the novel model was 0.799 (95% CI: 0.746 - 0.845), a P value of 0.64 was ultimately selected as a cut-off point and P values > 0.64 should be considered a malignant disease. The sensitivity of this model for the training cohort was 70.06% (62.7%-76.7%), specificity = 77.08% (67.4%-85.0%), positive likelihood ratio (LR+) = 3.06, and negative likelihood ratio (LR-) = 0.39.
Validation of the novel prediction model
The performance of the novel prediction model was validated in the internal validation cohort and external validation cohort. According to the formula constructed in the training cohort, a risk score and probability of malignancy were calculated for each patient in the validation set. Then the discrimination and the calibration of the model were assessed using ROC, calibration curve, and the HL test were performed. For the internal validation cohort, the AUC was 0.803 (95% CI: 0.726 - 0.867). The probability of malignancy with a cut-off point of 0.64, the sensitivity, specificity, LR+, and LR- of model was 68.60%, 74.47%, 2.69, and 0.42. For the external validation cohort, the AUC was 0.719 (95% CI: 0.650 - 0.782), the sensitivity, specificity, LR+, and LR- of model was 76.80%, 49.23%, 1.51, and 0.47. In addition, calibration curve and HL test reflected the new model had a high accuracy of the model for predicting MSPNs both in the internal validation cohort (2 = 8.127, P = 0.421, Supplement Figure 1B) and external validation cohort (2 = 12.04, P = 0.149, Supplement Figure 1C).
Assessment the performance of our model, PKUPH model, and Mayo model for SPNs screening using ROC analysis, DCA, NRI and IDI
The data for training, validation and external validation cohorts were substituted into our proposed model, PKUPH model and Mayo model to generate the respective ROC curves (Figure 2 and Table 2). For the training cohort, the AUC of the three models was 0.799, 0.616, and 0.524, respectively. The AUC of our model was significantly higher than the PKUPH model and Mayo model (P < 0.001). For the internal validation cohort, the AUC of the three models was 0.803, 0.725, and 0.691, respectively. The AUC of our model was significantly higher than the PKUPH model (P = 0.042) and Mayo model (P = 0.026). For the external validation cohort, the AUC of the three models was 0.719, 0.641, and 0.575, respectively. The AUC of our model was also significantly higher than the PKUPH model (P = 0.049) and Mayo model (P = 0.004).
DCA was employed to evaluate the clinical utility of the three models in the training, validation and external validation cohorts (Figure 3). The x-axis of the decision curve was the threshold of the predicted probability using the models to classify MSPNs patients and BSPNs patients. The y-axis shows the clinical decision net benefit for patients based on the classification result in this threshold. The decision curves of the treat-all scheme and the treat-none scheme were used as references in the decision curve analysis. Our model (red) showed had a higher overall net benefit than PKUPH model (black) and Mayo model (blue) both in training, validation and external validation cohorts. The application of our model was associated with reasonably good clinical utility across the three data.
The improvement in the predictive accuracy of our proposed model as compared to the PKUPH model and Mayo model, which was estimated by calculating the NRI and IDI in the training, validation and external validation cohorts (Table 3). Comparing our model to PKUPH model and Mayo model, the changed in NRIs of the training, validation and external validation cohorts were 0.301 (P < 0.001) and 0.469 (P < 0.001), 0.155 (P = 0.094) and 0.454 (P < 0.001), 0.002 (P = 0.980) and 0.063 (P < 0.001), respectively. The improved in IDIs of the training, validation and external validation cohorts were 0.011 (P = 0.679) and 0.123 (P < 0.001), -0.035 (P = 0.326) and 0.119 (P < 0.001), -0.042 (P = 0.198) and 0.246 (P < 0.001), respectively. These results indicated that the new model could supplement the deficiencies of the two models in predicting MSPNs.
Comparison of the sensitivity, specificity, positive likelihood ratio, negative likelihood ratio of the three models analyzed in this study
Comparison of the sensitivity, specificity, LR+, LR- of the three models in the three independent cohorts of patients (Supplement Table 3). The threshold of our model was 0.64, and the threshold of PKUPH model and Mayo model were used literature reports as 0.463 and 0.10, respectively. In the training cohort, the performance of our model were: sensitivity: 70.06% (95% CI: 62.7%-76.7%); specificity: 77.08% (95% CI: 67.4%-85.0%); LR+: 3.06 (95% CI: 2.6-3.5); and LR-: 0.39 (95% CI:0.3-0.6); for PKUPH model, sensitivity was 85.88% (95% CI: 79.9%-90.6%), specificity was 30.21% (95% CI: 21.3%-40.4%); LR+: 1.23 (95% CI: 0.9-1.7); and LR-: 0.47 (95% CI: 0.3-0.7); for Mayo model, sensitivity was 22.03% (95% CI: 16.2%-28.9%), specificity was 79.17% (95% CI: 69.7%-86.8%); LR+: 1.06 (95% CI: 0.8-1.4); and LR-: 0.98 (95% CI: 0.7-1.5). The specificity, LR+, and LR- of our model were better than PKUPH model, whereas the sensitivity was lower than PKUPH model, and the sensitivity, LR+, and LR- of our model had a good performance than Mayo model, but the specificity was worse than Mayo model. There had inconsistent results in the validation and external validation cohorts. Comparison of the three models at their respective thresholds in the three cohorts were inconclusive: each model has its own merits and demerits in predicting MSPNs.
Building and validating combined predictive nomogram
In order to combine the merits of each model in predicting MSPNs, a combined nomogram was constructed from our model, PKUPH model, and Mayo model, to predict malignancy of SPNs in training cohort, validation cohort, and external validation cohort (Figure 4A, B, C, respectively). Each model was assigned a point. As an example, locate our model risk score, draw a line straight upward to the "Points" axis to determine how many points associated with that model risk score. Repeat the process for each model, sum the points achieved for each covariate, and locate the sum on the "Total Points" axis. Final draw a line straight down to find the patient’s risk of malignance. The AUC of combined nomogram was 0.806 for the training set, an AUC of 0.819 for the validation set, and an AUC of 0.7193 for the external validation set, which were higher than those models alone. Then the calibration curves for the probability of malignancy were used to assess the agreement between the predicted and actual observation in training cohort, validation cohort, and external validation cohort (Figure 4D, E, F, respectively). The calibration plots showed a good match between the prediction by nomogram and actual observation. All the results revealed the improvement of SPNs discrimination using the combined nomogram.
The correlation between the novel prediction, PKUPH, and Mayo models
Figure 5 and Supplement Table 4 showed the correlations between the novel prediction model, PKUPH model, and Mayo model in training cohort (A), internal validation cohort (B) and external validation cohort (C). Pearson's correlation coefficients (PCC) was computed to determine the interrelationship between the three models. The results revealed that the new prediction model was significantly and positively correlated with PKUPH model (PCC: training cohort: 0.571, P < 0.001; internal validation cohort: 0.689, P < 0.001; external validation cohort: 0.645, P < 0.001) and Mayo model (PCC: training cohort: 0.213, P < 0.001; internal validation cohort: 0.373, P < 0.001; external validation cohort: 0.278, P < 0.001), indicating that our analysis results had credible prediction value.