Models such as MobileNet, Resnet-152, and ViT perform excellently for lung radiograph classification
MobileNet[28,29], Resnet-152[30], ViT[31,32], ordinary convolutional neural network[33] and Alexnet[34] have wide applications in the field of image classification and also have high accuracy in classification judgement of lung x-ray photographs for diseases. The goal of our classification task is to classify lung radiographs into two categories, ‘having disease’ and ‘no disease’(Figure 1b). Among them, the ‘having disease’ category includes common lung diseases such as Atelectasis, Cardiomegaly, Effusion, Infiltration, Mass, Nodule, Pneumonia, Pneumothorax, etc. or a combination of these diseases. We trained the lightweight neural network MobileNet (Figure 1a), the Resnet-152 model, the ViT model, the ordinary convolutional neural network and the Alexnet model to generate the corresponding classifiers, and the classifiers achieved high accuracy based on the dataset provided by the ChestX-ray8 database. The mean area under the receiver operating curve (AUROC) of MobileNet reaches 0.964(Table 1); the AUROC of the Resnet model reaches 0.956; that of the ViT model reaches 0.952; CNNs (common convolutional neural network) reaches 0.761; and that of the Alexnet model reaches 0.785.There are some models with higher accuracy, with MobileNet having the highest accuracy. In addition, these models have good learning capabilities and can be applied to external validation sets, as demonstrated by the ability of the classifier to categorise typical photos based on the scores obtained from the x-ray photos. Overall, these models are very good at classifying diseases and maintain a high level of accuracy.
Table 1:AUROC of each model in lung radiograph classification
Model
|
MobileNet
|
Resnet-152
|
Vision Transformer
|
ordinary CNN
|
Alexnet
|
AUROC
|
0.964
|
0.956
|
0.952
|
0.761
|
0.785
|
Adversarial attacks have a large impact on the model
We use a variety of adversarial attacks[27] on the model to ensure the rigor of the experiments. These methods include FGSM[35] (Figure 2c), FAB[36], PGD[37], and AutoAttack[38] in the white-box attack[39](Figure 2b); AdvDrop[40] in the black-box attack[41], and Square[42], a query-based gradient-free attack method. The results of these attacks can be clearly seen from the images (Figure 2a), after the attack, the number of noises in the images rises significantly, and here too the noise is related to the strength of each method's attack strength ε. The larger ε is, the more noise there is (Figure 2d), and this interference will ineViTably have a greater impact on the accuracy of the classifier. We attack the models separately with the different strategies described above and use different attack strengths (ε = 0.5e-3, ε = 1.0e-3, ε = 1.5e-3) to see the specific impact on the different models. We analyze for the FGSM attack, the accuracy of MobileNet model decreases to become 0.895,0.764,0.455, Resnet-152 decreases to become 0.847,0.618,0.375, and ViT model decreases to become 0.931,0.873,0.798 for different attack strengths, and ordinary convolutional neural network decreases to 0.614,0.446,0.215, and Alexnet model decreases to 0.642,0.496,0.287(Table 2). It can be clearly seen that different models have a more significant decrease in accuracy after countering the attack, and as far as the experimental results are concerned, the ViT 's accuracy decreases less in countering the attack because the ViT model is different from the other several types of architectures, the ViT model is a model based on transformer architecture, and the other models belong to convolutional neural networks. And for the convolutional neural network class of models, MobileNet has the best robustness. Collectively, these models are more significantly affected by the adversarial attack.
Table 2:AUROC of each model under default adversarial attack (FGSM)
Model
ε
|
MobileNet
|
Resnet-152
|
Vision Transformer
|
ordinary CNN
|
Alexnet
|
5.00E-04
|
0.895
|
0.847
|
0.931
|
0.614
|
0.642
|
1.00E-03
|
0.764
|
0.618
|
0.873
|
0.446
|
0.496
|
1.50E-03
|
0.455
|
0.375
|
0.798
|
0.215
|
0.287
|
Robustness of convolutional neural network-like models can be improved by inverse robustness training
Since it was found that the accuracy of convolutional neural networks may decrease a lot when the perturbation is strong, we tried to solve this problem by utilizing inverse robustness training. The so-called inverse robustness training is to add attacks such as FGSM directly to the x-ray photos in the dataset, so as to learn the noise as a part of the model training, and to achieve the ability of the model to ignore the noise in the subsequent image recognition, thus improving the accuracy(Figure 2b). For FGSM with different attack strengths (ε= 0.5e-3, ε= 1.0e-3, ε= 1.5e-3), we experimentally conclude that after the inverse robustness training, the AUROC of MobileNet rises to 0.905,0.783,0.469, and that of Resnet-152 rises to 0.851,0.636,0.385 compared to before. Ordinary convolutional neural network increased to 0.821,0.568,0.239 and alexnet increased to 0.845,0.710,0.399(Table 3), according to these data, it can be seen that the inverse robustness training can improve the robustness of the convolutional neural network class of models in a small way.
Table 3:AUROC of each adversarially trained model under default adversarial attack(FGSM)
Model
ε
|
MobileNet
|
Resnet-152
|
ordinary CNN
|
Alexnet
|
5.00E-04
|
0.905
|
0.851
|
0.821
|
0.845
|
1.00E-03
|
0.783
|
0.636
|
0.568
|
0.710
|
1.50E-03
|
0.469
|
0.385
|
0.239
|
0.399
|
The MobileNet is the convolutional neural network model with the best robustness against adversaria attacks
Convolutional neural network class of models: MobileNet, Resnet-152, ordinary convolutional neural network, Alexnet After a variety of counter-attacks, MobileNet's accuracy decreases significantly less than the other models, in order to validate this result, we used the six counter-attacks used previously and selected 1000 images from the dataset for the attack success rate (ASR) test. We find that in the absence of an adversarial attack and in the presence of an adversarial attack with different attack strengths (ε = 0.5e-3, ε = 1.0e-3, ε = 1.5e-3), both based on the baseline model and the adversarial training model, the MobileNet model has a lower ASR than the other models in most of the experiments (Table 4), which suggests that the MobileNet model is less susceptible to attacks and more stable. Meanwhile, for the inverse robustness training that we proposed before, we also tested it, and we found that the ASR of the models trained by inverse training were all reduced (Table 5), and the MobileNet model has some advantages over the other models but is not particularly obvious. In addition, we speculate that the reason for this may be due to the difference in the number of parameters between the models. Thus, we used ghostnet[43], which is the same lightweight convolutional neural network as the MobileNet model but with more parameters, for comparison. We also performed the same experiments on the ghostnet model, and the results proved that the ghostnet with more parameters has a higher ASR than the MobileNet model, and the accuracy obtained is lower than MobileNet in counter attacks with different attack strengths. Considering the small attack strengths we used, we also tested a higher strength attack with ε = 0.1(Table 4,Table 5). This attack caused a serious decrease in the accuracy of all models, but even so, the low correlation of attacks of this strength is not very meaningful due to the fact that the attack strength is too strong to be observed by the naked eye. Based on this, our previous attack strengths ε = 0.5e-3, ε = 1.0e-3, ε = 1.5e-3 are difficult to observe with the naked eye and are much more interesting to study.
Reasons for better robustness of MobileNet against adversarial attacks
We obtained the amount of noise in white-box attacks such as FGSM for both the MobileNet model and the Resnet-152 model as a way to continue to investigate why MobileNet is better stabilized. Firstly, in terms of quantity, the gradient magnitude of MobileNet is significantly lower than that of Resnet-152, and secondly, in terms of nature, the adversarial noise pattern of MobileNet is more ordered, whereas the noise of Resnet-152 appears to be more disorganized. This suggests that, unlike Resnet-152, the MobileNet model focuses more on overall feature learning than on local details (e.g., edges and lines), which reduces sensitiViTy to high-frequency perturbations. I also utilized Principal Component Analysis (PCA) dimensionality reduction to further investigate the underlying spatial structure of Resnet-152 and MobileNet deep activations, and in the raw images for the lung disease classification task, MobileNet had tighter clustering of samples within classes and greater distances between classes. After the adversarial attack, this difference is even more obvious: the latent space of Resnet-152 becomes more decentralized, while MobileNet maintains a better clustering effect. That is to say compared to Resnet-152, MobileNet can better distinguish different features in lung radiographs. After that, we used the Grad-CAM method to visualize the regions of high importance of Resnet-152 and MobileNet in the input images. In the baseline case, Resnet-152 focuses more on a single portion of the input image, while MobileNet assigns higher importance to multiple regions, i.e., focuses more comprehensively. In the confrontation attack, as the intensity of the attack continues to rise, the region of attention of Resnet-152 becomes scattered and contains more irrelevant parts of the image, while the region of attention of MobileNet remains almost unchanged. Based on these phenomena, we can judge that among the convolutional neural network models, the MobileNet model shows better robustness in white-box attacks because it is more effective in separating the features of lung radiographs and the important regions in the image are more stable.
Table 4:AUROC of each model under different adversarial attacks
|
FGSM
|
PGD
|
Square
|
Mobile
ε
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
5.00E-04
|
11.53%
|
14.61%
|
19.85%
|
17.94%
|
13.69%
|
15.29%
|
13.45%
|
16.24%
|
17.35%
|
14.28%
|
5.84%
|
4.56%
|
9.46%
|
8.32%
|
7.22%
|
1.00E-03
|
29.64%
|
34.75%
|
40.11%
|
38.62%
|
35.78%
|
36.78%
|
32.94%
|
39.48%
|
38.73%
|
33.76%
|
13.96%
|
14.39%
|
17.83%
|
15.47%
|
15.85%
|
1.50E-03
|
44.59%
|
46.13%
|
51.68%
|
50.21%
|
48.98%
|
53.23%
|
56.32%
|
59.11%
|
55.12%
|
54.29%
|
25.92%
|
24.85%
|
28.37%
|
23.81%
|
27.95%
|
1.00E-01
|
67.01%
|
66.71%
|
73.53%
|
71.75%
|
69.34%
|
67.98%
|
68.82%
|
70.18%
|
67.94%
|
68.26%
|
56.74%
|
55.21%
|
60.16%
|
58.24%
|
54.24%
|
|
FAB
|
AutoAttack
|
AdvDrop
|
Mobile
ε
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
ε
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
5.00E-04
|
12.18%
|
14.27%
|
16.25%
|
18.26%
|
17.31%
|
15.87%
|
14.25%
|
14.74%
|
18.39%
|
20.06%
|
20
|
59.14%
|
57.24%
|
74.25%
|
62.84%
|
58.12%
|
1.00E-03
|
26.36%
|
25.88%
|
29.64%
|
25.14%
|
30.54%
|
35.19%
|
38.24%
|
43.48%
|
39.85%
|
40.17%
|
40
|
67.87%
|
65.82%
|
89.13%
|
81.68%
|
70.64%
|
0.0015
|
40.83%
|
47.54%
|
48.17%
|
47.57%
|
41.73%
|
44.05%
|
48.33%
|
55.94%
|
49.17%
|
50.29%
|
60
|
0.7814
|
0.6352
|
0.8854
|
0.8925
|
0.7926
|
0.1
|
58.91%
|
57.73%
|
55.78%
|
57.06%
|
57.65%
|
54.38%
|
56.73%
|
58.23%
|
57.96%
|
57.89%
|
-
|
-
|
-
|
-
|
-
|
-
|
Table 5:AUROC of each adversarially trained model under different adversarial attacks
|
FGSM
|
PGD
|
Square
|
Mobile
ε
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
5.00E-04
|
11.53%
|
14.61%
|
19.85%
|
17.94%
|
13.69%
|
15.29%
|
13.45%
|
16.24%
|
17.35%
|
14.28%
|
5.84%
|
4.56%
|
9.46%
|
8.32%
|
7.22%
|
1.00E-03
|
29.64%
|
34.75%
|
40.11%
|
38.62%
|
35.78%
|
36.78%
|
32.94%
|
39.48%
|
38.73%
|
33.76%
|
13.96%
|
14.39%
|
17.83%
|
15.47%
|
15.85%
|
1.50E-03
|
44.59%
|
46.13%
|
51.68%
|
50.21%
|
48.98%
|
53.23%
|
56.32%
|
59.11%
|
55.12%
|
54.29%
|
25.92%
|
24.85%
|
28.37%
|
23.81%
|
27.95%
|
1.00E-01
|
67.01%
|
66.71%
|
73.53%
|
71.75%
|
69.34%
|
67.98%
|
68.82%
|
70.18%
|
67.94%
|
68.26%
|
56.74%
|
55.21%
|
60.16%
|
58.24%
|
54.24%
|
|
FAB
|
AutoAttack
|
AdvDrop
|
Mobile
ε
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
ε
|
mobilenet
|
resnet-152
|
ordinary CNN
|
Alexnet
|
ghostnet
|
5.00E-04
|
12.18%
|
14.27%
|
16.25%
|
18.26%
|
17.31%
|
15.87%
|
14.25%
|
14.74%
|
18.39%
|
20.06%
|
20
|
59.14%
|
57.24%
|
74.25%
|
62.84%
|
58.12%
|
1.00E-03
|
26.36%
|
25.88%
|
29.64%
|
25.14%
|
30.54%
|
35.19%
|
38.24%
|
43.48%
|
39.85%
|
40.17%
|
40
|
67.87%
|
65.82%
|
89.13%
|
81.68%
|
70.64%
|
0.0015
|
40.83%
|
47.54%
|
48.17%
|
47.57%
|
41.73%
|
44.05%
|
48.33%
|
55.94%
|
49.17%
|
50.29%
|
60
|
0.7814
|
0.6352
|
0.8854
|
0.8925
|
0.7926
|
0.1
|
58.91%
|
57.73%
|
55.78%
|
57.06%
|
57.65%
|
54.38%
|
56.73%
|
58.23%
|
57.96%
|
57.89%
|
-
|
-
|
-
|
-
|
-
|
-
|