In this study, deep learning was used to train a model to detect pectoral muscle in MIAS database images. The trained model was applied to separately constructed test data to assess its performance. Figure 3 compares the ground truth data from the test data with the results automatically extracted using the trained model.
We performed 5-fold cross validation to make sure the model was robust in terms of data dependency. For each cross validation, 20% of the total data was used as test data, and each data point was used exactly once as test data, without duplication. For each cross validation, the model was tested using 4 statistical indices: sensitivity, specificity, accuracy, and Dice similarity coefficient (DSC). The extracted results of the deep-learning model were compared pixel-by-pixel with the ground truth data, the true positive (TP), false positive (FP), true negative (TN), and false negative (FN) rates were calculated, and the statistical indices were calculated using the equations below. From the results of the 5 cross validations, the mean sensitivity was 95.55%, the mean specificity was 99.88%, the mean accuracy was 99.67%, and the mean DSC was 95.88% (Table 1).
Table 1
Results of 5-fold cross validation for the deep learning-based pectoral muscle detection method
| Sensitivity | Specificity | Accuracy | DSC |
CV1 | 94.01 | 99.91 | 99.57 | 94.50 |
CV2 | 96.40 | 99.87 | 99.68 | 96.78 |
CV3 | 95.65 | 99.90 | 99.72 | 96.45 |
CV4 | 95.97 | 99.83 | 99.66 | 96.18 |
CV5 | 95.73 | 99.89 | 99.71 | 95.49 |
Total | 95.55 | 99.88 | 99.67 | 95.88 |
CV, cross validation; DCS, Dice similarity coefficient |
The deep learning-based pectoral muscle detection algorithm was assessed using the same method as our previous study on an image processing-based method using the RANSAC algorithm, and the results of the two models were compared. We assessed the differences between the automated detection results of the deep learning model and the manually drawn ground truth data. Concordance ≥ 90% between the deep learning-based automated detection and the manual detection images was defined as “good”, concordance ≥ 50% and < 90% was defined as “acceptable”, and concordance < 50% was defined as “unacceptable”. The previous method using the RANSAC algorithm showed 264 “good” results, whereas the deep learning model showed 322 “good” results (Table 2). The FP and FN rates of the previous method were, respectively, 4.51 ± 6.53% and 5.68 ± 8.57% (Table 3). In contrast, the FP and FN rates of the deep learning method were, respectively, 2.88 ± 6.05% and 4.27 ± 8.72%.
Table 2
Comparison of performance (categorical) between the deep learning-based pectoral muscle detection method and the image processing-based method using RANSAC
| Good | Acceptable | Unacceptable |
RANSAC method | 264 | 36 | 22 |
Deep learning method | 322 | 0 | 0 |
RANSAC, random sample consensus |
Table 3
Comparison of performance (detection accuracy) between the deep learning-based pectoral muscle detection method and the image processing-based method using RANSAC
Category | RANSAC method (%) | Deep learning method (%) |
FP | 4.51 ± 6.53 | 2.88 ± 6.05 |
FN | 5.68 ± 8.57 | 4.27 ± 8.72 |
FP < 5% and FN < 5% | 56.5 | 71.0 |
5% < FP < 15%, 5% < FN < 15% | 31.5 | 20.7 |
15% < FP, 15% < FN | 12.0 | 8.3 |
FN, false negative; FP, false positive; RANSAC, random sample consensus |