A number of modifications were made to the CNN structure. In the CNN model, we first investigated the effect of image size on learning. Secondly, we investigated the impact of max and average pooling on training and testing accuracy. Thirdly, we investigated the impact of changing the size of conv2D. Fourthly, we investigated the impact of adding padding to the Maxpooling layer. Finally, we examined the impact of the dropout layer in the CNN network. The data were divided into a ratio of 70:15:15 for training, validation, and testing respectively. dataset split into training, validation, and testing sets following the 70:15:15 ratio with 1618, 346, and 346 images, respectively.
4.1 Result of Changing the Resolution of the Images in Model Learning:
The CNN models No.1, No.2, No.3, No.4, and No.5 all have the same structure as described in Table 7, with the first conventional layer being 32, the second and third conventional layers being 64, and the remaining layers being 128. The dropout layers were used. The padding for the pooling layer was not used. The max pooling was used in the pooling layers; the only difference in those CNN models was the size of the images in the dataset.
Table 7
The structures of the first five CNN models.
CNN Structure | No.1, No.2, No.3, No.4, No.5 |
Conv2D | 32 |
Padding for Conv2D | same |
Max Pooling | (3,3) |
Padding for Max Pooling | NA |
Dropout | 0.25 |
Conv2D | 64 |
Conv2D | 64 |
Max Pooling | (2,2) |
Dropout | 0.25 |
Conv2D | 128 |
Conv2D | 128 |
Max Pooling | (2,2) |
Dropout | 0.25 |
Dropout | 0.5 |
Epoch | 25 |
As shown in Table 8. We found that the CNN model No.3 with an image size of 256 pixels x 256 pixels had the greatest test accuracy of 90.65%.
Table 8
The result of changing the resolution of the images in the CNN model.
CNN Structure | No.1 | No.2 | No.3 | No.4 | No.5 |
Image size | (352,352) | (288,288) | (256, 256) | (224,224) | (128,128) |
Train accuracy | 0.9396 | 0. 9538 | 0.9570 | 0.9448 | 0.9428 |
Train loss | 0.1786 | 0.144 | 0.1352 | 0.1630 | 0.1633 |
Validation accuracy | 0.7329 | 0.6318 | 0.9079 | 0.8484 | 0.8087 |
Validation loss | 1.0281 | 1.7132 | 0.2285 | 0.7101 | 0.7974 |
Test accuracy | 0.7194 | 0.5899 | 0.9065 | 0.8201 | 0.8345 |
Test loss | 1.0712 | 1.6284 | 0.2956 | 0.7649 | 0.7589 |
Time (minutes) | 58 | 41 | 34 | 24 | 9 |
In terms of test accuracy, model No.3 performed best when the picture size was 256 pixels × 256 pixels, while model No.2 performed worst when the image size was 288 pixels x 288 pixels, as shown in FIGURE 10.
4.2 The Effect of using the Average Pooling on CNN Learning:
The max pooling was used when studying the unique characteristics in the image and wanting the CNN model to learn a collection of unique properties in the image, while the average pooling is better when studying all of the features in the image [30]. Here, CNN model No.6 had the same structure as CNN model No.3, but in model No.6, we replaced all max pooling with average pooling to observe the effect of using average pooling instead of max pooling, as shown in Table 9 After using average pooling, the test's accuracy dropped to 0.7698 and the loss rate rose to 90.22%. This research aimed to investigate plant diseases that appear as distinct spots on plant leaves. In this case, it is better to use max pooling.
Table 9
The effect of using the average pooling on the CNN model.
CNN Structure | No.6 |
Conv2D | 32 |
Padding for Conv2D | same |
Average Pooling | (3,3) |
Padding for Average Pooling | NA |
Dropout | 0.25 |
Conv2D | 64 |
Conv2D | 64 |
Average Pooling | (2,2) |
Dropout | 0.25 |
Conv2D | 128 |
Conv2D | 128 |
Average Pooling | (2,2) |
Dropout | 0.25 |
Dropout | 0.5 |
Image size | (256, 256) |
Epoch | 25 |
Train accuracy | 0.9416 |
Train loss | 0.1678 |
validation accuracy | 0.8303 |
validation loss | 0.5665 |
Test accuracy | 0.7698 |
Test loss | 0.9022 |
Time (minutes) | 30 |
4.3 The Effect of Changing the Size of Conv2d Layers on CNN Learning:
Here, we investigated the effect of the size of the convolutional layers on the CNN model's performance. As shown in Table 10. We found that the best performance was obtained in model No.10 when the first convolutional layer had a size of 32 and the remaining convolutional layers had a size of 128. The test's accuracy here was 94.24%, and the error rate was 21.70%.
Table 10
The effect of the size of the convolutional layers on the CNN model's performance.
CNN Structure | No.7 | No.8 | No.9 | No.10 |
Conv2D | 64 | 64 | 128 | 32 |
Padding for Conv2D | same | same | same | same |
Max Pooling | (3,3) | (3,3) | (3,3) | (3,3) |
Padding for Max Pooling | NA | NA | NA | NA |
Dropout | 0.25 | 0.25 | 0.25 | 0.25 |
Conv2D | 64 | 128 | 128 | 128 |
Conv2D | 64 | 128 | 128 | 128 |
Max Pooling | (2,2) | (2,2) | (2,2) | (2,2) |
Dropout | 0.25 | 0.25 | 0.25 | 0.25 |
Conv2D | 128 | 512 | 128 | 128 |
Conv2D | 128 | 512 | 128 | 128 |
Max Pooling | (2,2) | (2,2) | (2,2) | (2,2) |
Dropout | 0.25 | 0.25 | 0.25 | 0.25 |
Dropout | 0.5 | 0.5 | 0.5 | 0.5 |
Image size Length *width | 256 | 256 | 256 | 256 |
Epoch | 25 | 25 | 25 | 25 |
Train accuracy | 0.9512 | 0.9570 | 0.9615 | 0.9660 |
Train loss | 0.1389 | 0.1123 | 0.0986 | 0.1004 |
validation accuracy | 0.7924 | 0.8755 | 0.8755 | 0.9513 |
validation loss | 0.9515 | 0.3045 | 0.3891 | 0.1728 |
Test accuracy | 0.7914 | 0.8705 | 0.8561 | 0.9424 |
Test loss | 1.2720 | 0.3465 | 0.4874 | 0.2170 |
Time (minutes) | 43 | 62 | 79 | 37 |
4.4 The Impact of Adding Padding to The Maxpooling Layer:
In models No11 and No12, the aim was to analyze the impact of adding padding to the max pooling layers on the performance of the CNN model. We discovered that the accuracy did not increase when adding the padding to the max pooling layers. As a result, we did not use the padding on the max pooling layers because it did not enhance the prior network outcomes. As seen in Table 11.
Table 11
The impact of adding padding to the Maxpooling layer.
CNN Structure | No.11 | No.12 |
Conv2D | 32 | 32 |
Padding for Conv2D | same | same |
Maxpooling | (3,3) | (3,3) |
Padding for Maxpooling | same | same |
Dropout | 0.25 | 0.25 |
Conv2D | 64 | 128 |
Conv2D | 64 | 128 |
Maxpooling | (2,2) | (2,2) |
Dropout | 0.25 | 0.25 |
Conv2D | 128 | 128 |
Conv2D | 128 | 128 |
Maxpooling | (2,2) | (2,2) |
Dropout | 0.25 | 0.25 |
Dropout | 0.5 | 0.5 |
Image size | (256, 256) | (256, 256) |
Epoch | 25 | 25 |
Train accuracy | 0.9383 | 0.9351 |
Train loss | 0.1783 | 0.1665 |
validation accuracy | 0.8267 | 0.8773 |
validation loss | 0.6059 | 0.3300 |
Test accuracy | 0.8561 | 0.8489 |
Test loss | 0.4562 | 0.3198 |
Time (minutes) | 29 | 35 |
4.5 The Impact of the Dropout Layer in the CNN Network:
In models, No.13, No.14, and No.15, the effect of the dropout layers on the CNN model was investigated as shown in Table 12. In model No.13 all dropout layers were set to 0.25, which produced satisfactory results where the test accuracy was 90.65% and the test loss was 0.3599. Several dropout layers were removed in model No.14, resulting in a test accuracy of 0.8705 and a test loss of 0.7843.
When we compared model No.13 to model No.14, we discovered that deleting many dropout layers in the network resulted in a decrease in performance. All of the dropout layers were eliminated from the network in model No.15, and as a result, model No.15 has the lowest test accuracy rate of 82.01% and the lowest test loss of 0.6857 when compared to models No.13 and No.14. When applying the dropout layers with a value of 0.25 in model No.13, there was no difference in the training accuracy and test accuracy, which were 95.12% and 90.65% respectively.
However, in model No.15, where the dropout layer was not used, there was a difference in the results between the training accuracy and test accuracy, which were 96.66% and 82.01% respectively. We discovered that the training accuracy in model No.15 was higher than in models No.13 and No.14 and that the dropout layer is critical to avoid the problem of overfitting in the CNN network.
Table 12
The impact of the dropout layer in the CNN network.
CNN Structure | No.13 | No.14 | No.15 |
Conv2D | 32 | 32 | 32 |
Padding for Conv2D | same | Same | same |
Maxpooling | (3,3) | (3,3) | (3,3) |
Padding for Maxpooling | NA | NA | NA |
Dropout | 0.25 | NA | NA |
Conv2D | 128 | 128 | 128 |
Conv2D | 128 | 128 | 128 |
Maxpooling | (2,2) | (2,2) | (2,2) |
Dropout | 0.25 | NA | NA |
Conv2D | 128 | 128 | 128 |
Conv2D | 128 | 128 | 128 |
Maxpooling | (2,2) | (2,2) | (2,2) |
Dropout | 0.25 | NA | NA |
Dropout | 0.25 | 0.5 | NA |
Image size Length *width | 256 | 256 | 256 |
Epoch | 25 | 25 | 25 |
Train accuracy | 0.9512 | 0.9608 | 0.9666 |
Train loss | 0.1499 | 0.1144 | 0.1063 |
validation accuracy | 0.8989 | 0.8574 | 0.8394 |
validation loss | 0.3194 | 0.6463 | 0.6402 |
Test accuracy | 0.9065 | 0.8705 | 0.8201 |
Test loss | 0.3599 | 0.7843 | 0.6857 |
Time (minutes) | 34 | 33 | 33 |
The CNN model No. 10 with the use of max pooling, multiple dropout layers, the first conventional layer size of 32, the remaining conventional layer size of 128, and the model trained at 25 epochs was the best CNN structure that provided the highest accuracy and lowest loss.
Here, we attempted to apply the early stop and increase the number of epochs for CNN model No.10 to 100. The model stopped at epoch 42 after employing the early stop. Table 12 Displays the model No. 10 result.
Table 13
The performance for model No.10.
Test accuracy | Test loss | Precision | Recall | F1- score | Epoch | Early Stop |
0.9712 | 0.0783 | 0.97 | 0.96 | 0.96 | 100 | 42 |
We noticed the model improves after 20 epochs in, Fig. 11 which shows the accuracy of training and validation at 42 epochs for model No.10.
Figure 12, Shows the training and validation losses when the epoch number is 100 and the model was stopped at 42 using early stopping. Due to dropout layers, there is no problem with overfitting and no dispersion in the results.
4.6 Comparison With Other Approaches
The CNN model was used on the PlantVillage dataset in 2021, [9] to classify 38 types of healthy and unhealthy plants with an overall accuracy of 0.88. Our proposed CNN model No.10 was used to classify 38 classes from the PlantVillage, providing a training accuracy of 0.9488 and a validation accuracy of 92.54%. In 2020, this study [31] used CNN to classify 15 categories of healthy and unhealthy plants using 5032 images for training and 1220 images for validation from the PlantVillage dataset and the CNN model provides a training accuracy of 83.73%. When applying our proposed CNN to classify 15 classes from PlantVillage, the proposed CNN model had a training accuracy of 95.94% and the validation accuracy was 94.72%. This study [32] used a CNN model on the PlantVillage dataset to classify three maize diseases, and CNN provides a validation accuracy of 94.63%. When applying our proposed CNN to classify three classes in PlantVillage, the proposed CNN model has a validation accuracy of 94.72%. We found that using the proposed CNN model and resizing the image to 256 pixels x 256 pixels produces better results.
In Table 14, we compare our proposed CNN model with a set of researchers' results on the Planet Village dataset. According to the results, the proposed CNN model performs better than previous studies on the same database.
Table 14
Comparing the performance for model No.10 with previous research works in PlantVillage dataset.
Reference | Classes | Model | Train Accuracy | Validation Accuracy | |
[9] | 38 | CNN | NA | 0.88 | |
[31] | 15 | CNN | 0.8373 | 0.8273 | |
[32] | 3 | CNN | 0.9964 | 0.9463 | |
Proposed CNN | 38 | CNN | 0.9488 | 0.9254 | |
Proposed CNN | 15 | CNN | 0.9594 | 0.9480 | |
Proposed CNN | 3 | CNN | 0.9650 | 0.9472 | |