The findings were considered suggestive of the pulmonary nodule(s) if they had a size between 5–30 mm. These nodules were seen individually or in a cluster in the right and/or the left lung field. These were calcified/non-calcified and were distributed in the upper, mid, or lower zones on the right or left side of the lung field.
As reviewed by the readers who established the ground truth, 103 out of 308 chest radiographs were positive for pulmonary nodules, while the rest (205 radiographs) were negative for pulmonary nodules. Patient information for the number of chest radiographs and number of nodules per chest radiograph is represented in Table 1.
Table 1
Patient information and nodules characteristics from the external test dataset
Characteristic | Numbers |
Patient information |
No. of chest radiographs | 308 |
No. of chest radiographs with nodules | 103 |
No. of chest radiographs without nodules | 205 |
No. of nodules per chest radiograph |
One nodule | 85 |
Two nodules | 11 |
Three nodules | 3 |
Four nodules | 2 |
Five nodules | 2 |
Radiograph projection |
Postero-Anterior (PA) | 220 |
Antero-Posterior (AP) | 88 |
Standalone performance of DxNodule AI Screen:
The standalone model performance for detection of pulmonary nodules included 80 true positives, 180 true negatives, 25 false positives, and 23 false negative identifications (Fig. 2). The model had an accuracy of 0.83 (0.80, 0.88) in detecting nodules on the chest radiographs. The model achieved a sensitivity of 0.78 (0.69, 0.85), specificity of 0.88 (0.83, 0.92), and an AUC of 0.905 (0.87, 0.94). Table 2 indicates the performance metrics of the DL model in the detection of lung nodules.
Table 2
Performance metrics of DxNodule AI Screen
S.No. | Metrics | Value [95% CI] |
1 | Accuracy | 0.83 [0.80, 0.88] |
2 | Sensitivity | 0.78 [0.69, 0.85] |
3 | Specificity | 0.88 [0.83, 0.92] |
4 | F1 score | 0.77 [0.70, 0.83] |
5 | NPV | 0.89 [0.84, 0.93] |
6 | PPV | 0.76 [0.68, 0.84] |
7 | AUC | 0.905 [0.87, 0.94] |
Comparison between the reference standard and DxNodule AI Screen:
The Bland-Altman (B-A) analysis was used to demonstrate agreement between the reference standard (ground truth) and DxNodule AI Screen in identifying positive nodules. A mean difference value close to zero represents that DxNodule AI Screen performed well in comparison to the reference standard. Our analysis demonstrated a mean agreement difference of 0.036 (95% CI: -1.307, 1.379) for the test dataset (Fig. 3).
Observer performance test:
All 11 readers assessed radiographs with and without the aid of DxNodule AI Screen. The mean performance of unaided readers for detection of pulmonary nodules included 178.58 true negative, 71.42 true positive, 31.58 false negative and 26.42 false positive identifications. The mean performance of the readers improved with AI-aided interpretation with a decrease in false negative (24.64) and false positive (22.18) identifications and an increase in true negative (182.82) and true positive (78.36) identifications compared with unaided interpretation (Table 3).
Table 3
Distribution of lung nodules with DxNodule AI Screen-unaided and DxNodule AI Screen-aided sessions
Reader | Unaided interpretations | Aided interpretations |
TP | TN | FP | FN | TP | TN | FP | FN |
R1 | 90 | 143 | 62 | 13 | 88 | 170 | 35 | 15 |
R2 | 24 | 184 | 21 | 79 | 84 | 185 | 20 | 19 |
R3 | 84 | 140 | 65 | 19 | 89 | 170 | 35 | 14 |
R4 | 83 | 191 | 14 | 20 | 77 | 192 | 13 | 26 |
R5 | 81 | 190 | 15 | 22 | 70 | 200 | 5 | 33 |
R6 | 53 | 196 | 9 | 50 | 56 | 202 | 3 | 47 |
R7 | 73 | 200 | 5 | 30 | 81 | 195 | 10 | 22 |
R8 | 95 | 126 | 79 | 8 | 97 | 129 | 76 | 6 |
R9 | 72 | 193 | 12 | 31 | 81 | 194 | 11 | 22 |
R10 | 72 | 199 | 6 | 31 | 59 | 202 | 3 | 44 |
R11 | 50 | 201 | 4 | 53 | 80 | 172 | 33 | 23 |
Mean (SD) | 71.42 (20.02) | 178.58 (26.50) | 26.42 (26.5) | 31.58 (20.02) | 78.36 (12.44) | 182.82 (21.76) | 22.18 (21.76) | 24.64 (12.44) |
With the aid of DxNodule AI Screen, the mean specificity, balanced accuracy, PPV and NPV across the 11 readers improved with statistical significance (p < 0.05) compared with unaided interpretation. The mean sensitivity and specificity of the readers improved from 0.69 (0.55, 0.82) and 0.87 (0.78, 0.96) in the unaided session to 0.76 (0.68, 0.84) and 0.89 (0.82, 0.96), respectively in the aided session. The mean accuracy, PPV, and NPV of the readers improved from 0.78 (0.72, 0.84), 0.77 (0.65, 0.88), and 0.86 (0.81, 0.90) in the unaided session to 0.83 (0.80, 0.85), 0.82 (0.73, 0.9), and 0.89 (0.86, 0.92), respectively in the aided session (Table 4).
Table 4
Sensitivity, Specificity, Accuracy, NPV, and PPV of unaided and DxNodule AI Screen-aided interpretation modes for pulmonary nodule detection
Reader | Unaided interpretations | Aided interpretations |
Sensitivity | Specificity | Accuracy | NPV | PPV | Sensitivity | Specificity | Accuracy | NPV | PPV |
R1 | 0.874 | 0.698 | 0.786 | 0.917 | 0.592 | 0.854 | 0.829 | 0.842 | 0.919 | 0.715 |
R2 | 0.233 | 0.898 | 0.565 | 0.700 | 0.533 | 0.816 | 0.902 | 0.859 | 0.907 | 0.808 |
R3 | 0.816 | 0.683 | 0.749 | 0.881 | 0.564 | 0.864 | 0.829 | 0.847 | 0.924 | 0.718 |
R4 | 0.806 | 0.932 | 0.869 | 0.905 | 0.856 | 0.748 | 0.937 | 0.842 | 0.881 | 0.856 |
R5 | 0.786 | 0.927 | 0.857 | 0.896 | 0.844 | 0.680 | 0.976 | 0.828 | 0.858 | 0.933 |
R6 | 0.515 | 0.956 | 0.735 | 0.797 | 0.855 | 0.544 | 0.985 | 0.765 | 0.811 | 0.949 |
R7 | 0.709 | 0.976 | 0.842 | 0.870 | 0.936 | 0.786 | 0.951 | 0.869 | 0.899 | 0.890 |
R8 | 0.922 | 0.615 | 0.768 | 0.940 | 0.546 | 0.942 | 0.629 | 0.786 | 0.956 | 0.561 |
R9 | 0.699 | 0.941 | 0.820 | 0.862 | 0.857 | 0.786 | 0.946 | 0.866 | 0.898 | 0.880 |
R10 | 0.699 | 0.971 | 0.835 | 0.865 | 0.923 | 0.573 | 0.985 | 0.779 | 0.821 | 0.952 |
R11 | 0.485 | 0.980 | 0.733 | 0.791 | 0.926 | 0.777 | 0.839 | 0.808 | 0.882 | 0.708 |
Mean (95% CI) | 0.69 (0.55, 0.82) | 0.87 (0.78, 0.96) | 0.78 (0.72,0.84) | 0.857 (0.81,0.90) | 0.77 (0.65, 0.88) | 0.76 (0.68, 0.84) | 0.89 (0.82, 0.96) | 0.83 (0.80,0.85) | 0.89 (0.86, 0.92) | 0.82 (0.73, 0.90) |
p-value | | 0.208 | 0.021 | 0.009 | 0.054 | 0.008 |
There was a significant difference in the performance of radiologists aided by DxNodule AI Screen and those not aided by DxNodule AI Screen. The diagnostic performance of the unaided readers was compared to aided readers and represented as the AUROC curve. Each reader marked an image as either N (No nodule), L (Low confidence in detecting nodule), or H (high confidence in detecting nodule). These three labels were converted to scores: 0, 0.5, and 1 respectively. Using these scores, ROC curves for each reader in both the sessions (aided and unaided) were generated. To get the average ROC curve for all readers, the TPR was averaged for every FPR according to the method described by Chen et al. [15]. The average AUC of the readers improved from 0.798 [0.74, 0.86] to 0.846 [0.82, 0.88] when aided by DxNodule AI Screen (p-value = 0.013). Standalone DxNodule AI Screen achieved an AUC of 0.905 [0.87,0.94] in identifying pulmonary nodules in the test dataset (Fig. 4).
Lung nodule detection with the aid of DxNodule AI Screen:
DxNodule AI Screen assisted the radiologists in identifying nodules that were otherwise missed due to a wide variety of factors like overlying ribs or scapula shadow, small size of nodules, etc. (Fig. 5). DxNodule AI Screen not only helped readers identify radiographs with nodules, but it also helped readers locate nodules more accurately. Figure 6 represents a clinical case wherein in the absence of DxNodule AI Screen, only 1 reader could mark both the nodules correctly, 7 readers marked only one nodule, 2 readers marked nodules at more than two locations, and 1 reader marked the radiograph as negative. With the help of DxNodule AI Screen, 4 readers could correctly locate both the nodules in the radiograph and there were no false-positive findings.