5.2 Other Classification Algorithms
For comparison we are using transfer learning by loading ImageNet weights to every model, freezing the internal layers, and swapping out the top layers for specially created layers intended for food dish classification. VGG19, EfficientNet, and ResNet are the basic models that were employed in this comparison.
VGG (Visual Geometry Group): It is the basic CNN architecture for its efficiency and simplicity. It is composed of several convolutional layers with modest convolutional filters, followed by fully linked layers.
ResNet (Residual Network) is a deep CNN design that popularized the idea of residual connections. By reducing the vanishing gradient issue, it makes training extremely deep networks possible.
EfficientNet: The EfficientNet family of CNN designs combines cutting-edge functionality with computational resource savings. In order to balance network depth, width, and resolution, it employs the compound scaling technique.
The mentioned networks underwent 50 epochs of training with an early stopping condition [24]. The learning rate was set at 1 e-4, which will be reduced by a factor of 0.1. Softmax is utilized as an activation function in multi-label classification tasks, as it produces outputs that are mutually exclusive.
RESULT
We assess models in this research using most common evaluation metrics such as accuracy and Confusion Matrix. The proposed Vision transformer gives a test accuracy of 92%.
Table 2
Accuracy of different models
Models | Accuracy |
ResNet50 | 34% |
ResNet50 with regularization | 40% |
VGG16 | 56% |
VGG19 | 65% |
Custom CNN | 81% |
ViT | 92% |
Table 2 presents the accuracy scores of the various models used in the study. Different CNN models showed promising results reaching a maximum accuracy of 81% while others did not meet desired expectations. Utilization of Vision Transformers (ViTs) resulted in a remarkable accuracy of 92%.
The figure 3 demonstrates the graph of Train Loss VS Test Loss and Epochs Vs Accuracy.
Train Vs Test Loss: This graph plots both the training loss and test loss on the same y-axis against epochs on the x-axis. In the graph, A significant decrease in both training loss (from 2.9 to 0.2) and test loss (from 0.9 to 0.4) within just 4 epochs indicates the model is efficiently learning to fit the data.
Epochs Vs Accuracy: This is a separate graph with epochs on the x-axis and accuracy (usually training accuracy) on the y-axis. In the graph, an increase in training accuracy from 87% to 93% in 4 epochs shows the model is getting better at classifying the Indian food images in the training data.
The confusion matrix helps understand the performance of a classification model.It provides a breakdown of how many predictions were correct and incorrect for each class in the dataset. Misclassifications are obvious between similar dishes such as butter naan and chapati. Color intensity indicates the proportion of prediction. Darker colors represent higher values helping to quickly identify which classes are being correctly or incorrectly predicted.
Table 3: Performance of Indian Food Classes
Indian Food Classes
|
Precision
|
Recall
|
F1-score
|
burger
|
0.98
|
1.0
|
0.99
|
butter_naan
|
0.92
|
0.83
|
0.87
|
chai
|
0.97
|
0.97
|
0.97
|
chole_bhature
|
1.0
|
0.99
|
0.99
|
dal_makhani
|
0.95
|
0.91
|
0.93
|
dhokla
|
0.96
|
0.91
|
0.94
|
fried_rice
|
0.97
|
1.0
|
0.99
|
idli
|
0.94
|
0.92
|
0.93
|
jalebi
|
0.98
|
0.96
|
0.97
|
kadhai_paneer
|
0.87
|
0.94
|
0.91
|
Table 3 represents the precision, recall and f1-score of different classes in the dataset.
Table 4
Comparison with state of arts
System | Accuracy |
D.Pandey[1] | 91% |
S.Joo[3] | 88% |
K.Srigurulekha[13] | 86.85% |
S.Mezgec[4] | 86.72% |
Proposed Model | 92% |
Table 4 represents the comparison of the proposed model with other state of arts. Our research investigates food image recognition using a comprehensive dataset of approximately 20 food items, achieving an accuracy of around 92%. In contrast, our broader dataset and focus on a wider variety of food items strengthens the generalizability and robustness of our findings.
In Fig. 5 the proposed model is correctly predicting the food item as kadai paneer and its allergens are listed.