Achieving High Accuracy in Lemon Quality Classification: A Comparative Study of Deep Learning and Transformer Models

doi:10.21203/rs.3.rs-2993626/v1

Download PDF

Research Article

Achieving High Accuracy in Lemon Quality Classification: A Comparative Study of Deep Learning and Transformer Models

https://doi.org/10.21203/rs.3.rs-2993626/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Agricultural product quality assessment is important for the efficiency and marketability of production. Quality assessment improves industry standards, increasing sales and reducing crop loss. Maintaining quality is of paramount importance for all processes, from production to sales. Artificial intelligence has recently been frequently used for product quality assessment in the agricultural field. Both in the literature and in practice, deep learning and machine learning methods are used to process images of agricultural products and evaluate their quality. They are classified according to specified standards. In this study, firstly, data augmentation operations were performed on the lemon dataset consisting of two classes, bad quality and good quality, by using rescaling, random zoom, random flip, and random rotation methods. Afterward, eight different deep-learning methods and two different transformer methods were used for classification. As a result of the study calculated the most successful result as 99.84% accuracy, 99.95% recall, and 99.66% precision with the ViT method. This value is the highest accuracy value in the literature. When the experimental results are evaluated, it shows that lemon classification processes are successfully performed using the ViT method.

Lemon quality

Deep learning

Vision Transformer

Swin Transformer

Determining the classification of agricultural products according to their physical characteristics, such as shape, size, color, and quality status, is a common process both nationally and internationally. Lemon is an important agricultural product that requires appropriate classification due to its annual production cycle and nutritional value as a rich vitamin C and antioxidant source. Lemon production in Turkey has been steadily increasing over the years and has been experiencing the fastest growth in recent years. According to the data obtained, approximately 750,550 tons of lemons were produced in 2015, 1,188,517 tons in 2020, and 1,550,000 tons in 2021. With 1,188,517 tons of lemon production in 2020, Turkey accounted for 41.1% of European lemon production. Of this lemon production, 45% was used for domestic consumption and 54.8% for exports. Accurate grading of lemons and early detection of diseases is very important, as disease and poor quality cause a decline in the plant market [1]. Traditional manual methods of classification and detection are not only slower, more laborious, and less efficient, but can also be easily affected by external factors such as fatigue, experience, and the psychological state of the experts. This can lead to misclassification and detection, which can reduce the market value of the product. To overcome these challenges, artificial intelligence, and computer vision technologies are now being used to improve the accuracy of classifications, reduce erroneous processes and increase efficiency in operations [2]. This approach also allows experts to focus on other areas of their expertise, leading to increased economic prosperity for the country.

Machine learning and deep learning have made significant progress due to the availability of large amounts of data. These methods have become important in numerous research areas, such as image processing and data classification. As a result, machine learning and deep learning methods have been applied in various fields, including medicine, industrial applications, energy systems, and agriculture [3, 4, 5]. Farmers can use machine learning and deep learning-based applications to monitor crop production processes in natural and greenhouse environments. These applications are widely used in areas such as plant quality diagnostics, soil analysis, insect diagnostics, disease detection, and treatment to improve agricultural productivity [6].

Classification of agricultural products is an important issue, especially in understanding the impact of irregularly shaped products on consumer preferences and wastage issues. In this context, artificial intelligence-based studies using the shape and quality of products offer significant potential in this field. In previous studies, statistical methods such as Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA) have been used to classify product quality using agricultural product images such as carrots [7] and apricots [8]. When these classification processes are analyzed in general, it is seen that the QDA method is more successful.

In addition, fast and accurate classification processes such as disease detection, quality, and variety determination in agricultural products are among the factors that will contribute to increasing agricultural productivity. In this context, many studies have been conducted to classify the quality and type of products using artificial intelligence methods. In some of these studies, datasets consisting directly of product images were used [9, 10], while in some studies, classification processes were performed using statistical features obtained using product images [11–16]. These image and numerical data sets are used with various machine learning and deep learning algorithms to classify products quickly and accurately.

There are many classification studies in the literature using lemon images, as we have used in this study. These studies use various deep learning and machine learning methods to perform successful and fast classification processes [17, 18]. In classification processes, whether the input image is grayscale or color is among the factors that will affect the success of the process [19]. In addition to the images of lemons, the detection of diseases from the leaf images of the lemon plant in the agricultural field before the harvest is collected is one of the important factors for early treatment/spraying [20]. With the classification of leaf images taken from the field, more efficient agricultural activities can be realized with lower costs. In deep learning methods, the number of data in the data set also contributes to successful results. For this reason, synthetic data can be obtained by using data augmentation methods such as Generative Adversarial Networks (GANs) to obtain more successful results in case of insufficient data. By increasing the number of data with these methods, data scarcity problems can be solved. Thus, more successful results can be obtained in studies on small data sets [21].

Motivation and our model

The quality of the products is evaluated by size, color, shape, presence of disease or rot. In this case, the external appearance of the products can be considered as the main factor affecting the market. Therefore, the correct classification of products is of great importance. Performing the classification processes performed manually by experts with computer vision methods will not only increase product efficiency and market value, but also ensure better quality management of experts' working time. In addition, since the work of experts is more easily affected by external factors, the use of computer systems will increase the success rate of classification processes. In this study, various transformer methods and deep learning methods were used to classify lemon images.

In the study, the number of data was increased by performing rescaling, random zooming, random flipping, and random rotation operations on the dataset before training. Afterwards, Transformer methods such as Vision Transformer (ViT), Swin Transformer and deep learning methods such as Xception, ResNet50, InceptionV3, NASNetMobile, EfficientNetB5, InceptionResNetV2, ResNet152, DenseNet201 were applied to the obtained data set. As a result of the study, it was observed that more successful results were obtained with Transformer methods.

Novelties and contributions

The novelties of our study are as follows:

In order to increase the success of the study, the data set was augmented using four different methods: rescaling, random zoom, random flip, and random rotation.
For the first time in this study, transformer methods were used for classification on the lemon data set.
Within the scope of the study, classification processes were performed with deep learning methods and transformer methods, and it was observed that higher success was achieved when transformer methods were used.
The success value obtained with the ViT method used in the study is 99.84%, which is the highest accuracy value in the literature.

The rest of the paper, Section 2, Material and Methods, provides information about the dataset and methods used in the study. Section 3, Results and Discussion, contains all the analysis and discussion of the results obtained from the study. The full summary of the study is given in section 4, Conclusions.

The Lemon Quality Dataset [22] is utilized in this study to classify the quality of lemons. The dataset comprises 2076 images, with each image being 300x300 pixels in size. The images depict lemons of varying sizes and qualities, with 951 images classified as bad and 1125 as good quality.

2.1. Deep Learning Methods

In the study, eight different deep-learning models were used. EfficientNet-B5 and DenseNet201, the two models with the highest accuracy, are mentioned in this section.

- EfficentNet-B5: EfficientNet is one of the most effective artificial neural network models developed by Google and used in the field of deep learning. There are 7 versions in total, and the most important feature between versions is the number of layers used. The model is determined according to the size of the input image [23].

EfficientNet-B5 is the medium-sized version of the EfficientNet family, and they are especially used in image classification operations. EfficientNet-B5 achieves higher accuracy than the smaller EfficientNet-B4 but requires less computing power than the larger EfficientNet-B6. EfficientNet-B5 performs particularly well when working with high-resolution images [24].

EfficientNet balances CNN characteristics such as width, depth, and resolution with a technique called composite scaling. The EfficientNet family offers models in various sizes, and each model is optimized for hardware with a specific computing power [25]. The architecture of the EfficientNet-B5 model used in the study is presented in Fig. 1 [26].

- DenseNet201: DenseNet is a convolutional neural network with direct feed-forward connections from each layer to all other layers. Its main advantage is that it minimizes the lost gradient resolution and overfitting in deep inspection problems that can occur with small training datasets [27]. In this architecture, the output from each layer receives all the outputs from the previous layer and processes it together with a unique "growth rate" parameter. Thus, the number of inputs used in each layer increases and better performance can be achieved by using fewer parameters [28]. DenseNet201, a member of the DenseNet Family with 201 layers, uses a condensed network to build models that are easy to train and highly efficient. It shows high performance due to the fact that the current layer can directly access the feature maps of all previous layers [29]. The architectural model of the DenseNet201 method used in this study is shown in Fig. 2 [27].

2.2. Transformer Methods

In this study, Transformer methods, which are Vision Transformer and Swin Transformer, are used. The explanations of these methods are given in this section.

- Vision Transformer (ViT): Vision Transformer (ViT), introduced by Dosovitskiy et al. [30], is a deep learning model used in the field of machine learning and computer vision. ViT uses the Transformer architecture, replacing the previously widely used Convolutional Neural Network (CNN) architecture, and uses a set of attention mechanisms to process images. ViT processes images by making them into "patches". This reduces the size of images and makes it possible to process higher dimensional images compared to previous models. It also uses attention mechanisms instead of CNNs in learning the features of images. In this way, it can provide training with higher accuracy and less data [31].

ViT is a way to extend the traditional Transformer application to image categorization without including any data-specific design to generalize non-textual modalities. ViT models a series of image segments into a semantic tag using Transformer's encoder module for classification. While traditional CNN designs often use filters with a limited receptive field, ViT's attention mechanism focuses on different parts of the image and interprets information throughout the entire image. Thanks to these features, ViT is the first image recognition model to beat traditional CNN designs (eg limited filter usage). ViT architecture consists of Embedding Layer, Encoder and Final classifier head layers. [32]. The Vit architecture model is presented in Fig. 3 [33].

- Swin Transformer: Swin Transformer is a method developed to facilitate the learning of large-scale image classification models [34]. It is used for various purposes such as region-level object detection, pixel-level semantic segmentation, and image-level image classification [35]. Swin Transformer employs a different approach to reduce memory usage in previous image classification models. This approach divides image blocks into block groups and then uses a "shift mechanism" to transfer information between blocks within each block group. As a result, Swin Transformer uses less memory and provides higher scalability compared to other large-scale image classification models. The simple architecture of Swin Transformer is given in Fig-2 [36].

In the study, a data set consisting of a total of 2076 images, 1125 good quality, and 951 bad quality, was used to determine the lemon quality. Before the training with deep learning and transformer methods, data augmentation is applied to images. These methods are rescaling, random zoom, random flip, and random rotation. In order to determine the quality of lemons Transformer methods which are Vision Transformer (ViT), Swin Transformer, and deep-learning methods, which are Xception, ResNet50, InceptionV3, NASNetMobile, EfficientNetB5, InceptionResNetV2, ResNet152, DenseNet201 methods are used. For the performance evaluation of machine learning and deep learning models, the data set is divided into 70% training (5812 images with augmented) and 30% testing (623 images). The block diagram of our proposed model, which includes data augmentation, and deep learning methods, is shown in Fig. 5.

The hybrid models implemented in Figure-5 were tested using Python on a computer with an i9 12950 processor, RTX 3080TI graphics card, and 32 GB RAM.

3.1. Evaluation Criteria

In the field of machine learning, evaluating model performance is crucial for assessing the effectiveness and generalization capabilities of trained models. There are several metrics for performance evaluation for classification methods which are validation accuracy, validation loss, precision, recall, and F1 score [37]. In this study, validation accuracy, validation loss, precision, and recall metrics were used to evaluate the performance of deep learning methods. These metrics provide valuable insights into the model's ability to make accurate predictions on unseen data and are widely employed in model selection and performance comparison.

Validation loss refers to the measurement of the discrepancy between the model's predicted output and the true target values on a validation dataset, which consists of examples that were not used during the model training phase. The validation loss is typically computed using a specific loss function that quantifies the dissimilarity between predicted and true values [38]. By monitoring the validation loss, researchers and practitioners can gauge the model's ability to generalize to unseen data and detect signs of overfitting. A low validation loss indicates that the model is performing well on the validation set, implying that it is effectively capturing the underlying patterns and regularities in the data. A high validation loss, on the other hand, suggests that the model may be struggling to generalize or is overfitting to the training data [39]. The goal is to minimize the validation loss, as it reflects the model's performance on unseen instances and serves as a proxy for its performance in real-world scenarios. Loss calculation is given in Eq. 5.

\(Loss=\frac{1}{N}{\sum }_{i}^{N}f({ŷ}_{i},{y}_{i} )\)

(Eq. 5)

where N is the number of the data and f is the loss function.

Accuracy, on the other hand, measures the proportion of correctly predicted instances from the total number of examples in the dataset. It is a metric that is particularly relevant in classification tasks, where the model's output is a class label or a probability distribution over classes [39]. The accuracy provides a measure of how well the model can classify unseen data, offering insights into its overall predictive capabilities. A high accuracy implies that the model is making accurate predictions on the validation set, correctly assigning instances to their respective classes. Conversely, a low validation accuracy suggests that the model may struggle with generalization or encounter difficulties in distinguishing between different classes. Similar to loss, the objective is to maximize the accuracy, indicating that the model is performing well on unseen data [40]. Accuracy calculation is given in Eq. 6.

\(Accuracy=\frac{TP+TN}{TP+FP+FN+TN}x100\)		(Eq. 6)
\(Recall=\frac{TP}{TP+FN}x100\)	(Eq. 7)

\(Precision=\frac{TP}{TP+FP}x100\)

(Eq. 8)

Included in the equations are TP true positive, TN true negative, FP false positive, and FN false negative. Loss and accuracy are complementary metrics that provide a comprehensive evaluation of a trained model's performance. While loss quantifies the model's prediction errors in a continuous manner, accuracy provides a more interpretable measure of classification correctness. Precision is the proportion of correctly predicted positive instances (true positives) out of the total instances predicted as positive. It measures the accuracy of positive predictions, indicating how reliable the model is when it identifies positive samples. Recall, also known as sensitivity or true positive rate, is the proportion of correctly predicted positive instances (true positives) out of all actual positive instances. It measures the model's ability to identify all positive samples, indicating how effectively it captures the relevant instances. These metrics play a vital role in model evaluation, enabling researchers and practitioners to compare different models, assess their generalization capabilities, and make informed decisions about model selection and hyperparameter tuning [41].

After applying data augmentation techniques to the data set consisting of lemon images, eight different deep learning models, namely Xception, ResNet50, InceptionV3, NASNetMobile, EfficientNetB5, InceptionResNetV2, ResNet152, DenseNet201, were applied. The training parameters used while applying these models are given in Table 1.

Table 1

Hyperparameters of Deep learning models
Hyperparameters	Value
Epoch	20
Learning-Rate	0.01
Batch-size	8
Input-Shape	300x300
Optimizer	Adam
Dropout	0.1
Activation Function	ReLU
Output Function	Softmax

As a result of experimental tests, the values with high classification accuracy and low loss value were chosen as the training parameters for the deep learning models shown in Table 1. The value "20" was selected for the Epoch, which displays how many times deep learning models have been trained using the training data set. A value of "0.01" was chosen for the Learning-rate, which affects the learning capacity and the learning time. The Batch-size value used to update the weights at each training step was found to be "8" when computing the loss function. The Optimizer function "Adam" was selected to enhance the weights. The Dropout value, which breaks the connection between neurons, was set at "0.1" to avoid over-fitting. The outcomes of the deep learning models are provided in Table 2 as a result of the training settings chosen. The best accuracy and lowest loss values that each deep learning model was able to achieve after training are shown in this table.

Table 2

The results of the experimental studies carried out in the study
Models	Epoch	Recall	Precision	Accuracy (%)	Loss
Xception	20	98.35	97.81	98.02	0.0736
ResNet50		96.53	95.96	96.17	0.1123
InceptionV3		98.02	97.59	97.83	0.0822
NASNetMobile		96.95	96.38	96.62	0.1086
EfficientNetB5		99.29	98.86	99.03	0.0382
InceptionResNetV2		98.82	98.19	98.43	0.0694
ResNet152		97.21	96.66	96.84	0.0985
DenseNet201		98.96	98.49	98.65	0.0563

As seen in Table 2, as a result of the application of eight different deep learning models to lemon images, it is seen that the EfficientNetB5 and DenseNet201 models have higher accuracy values than the other deep learning models given in the table. In addition to the accuracy value, recall and precision values are also calculated. Considering these values, it has been observed that the recall value is higher than the accuracy value, and the precision value is lower than the accuracy value. The Recall value gives the proportion of correctly classified positive samples. Because false negative classifications can cause serious problems, the Recall value is very important in classification processes. False negative classifications can overlook what the object is and create obstacles to making the right decision. Precision value is an important evaluation metric used in classification processes where false positive classification is a priority. The fact that the recall value is higher than the accuracy value in this study indicates that good-quality lemons are classified with high accuracy. Considering the products used in fruit juice factories, it is thought that it is meaningful that the recall value is higher than the accuracy value in our study since it is used in fruit juice production in medium-quality products. In addition to deep learning models, recently popular vision transformer models have also been applied to lemon images in order to increase the accuracy values. Training parameters used when applying transformer models are given in Table 3.

Table 3

Hyperparameters of Transformer models
Hyperparameters	Vision Transformer	Swin Transformer
Epoch	100	100
Learning-Rate	0.0001	0.0001
Batch Size	8	8
Optimizer	Adam	Adam
Input Shape	300x300	300x300
Patch Size	15	10
Projection Dimension	225	200
MLP Units	1800,900	1024
Number of Transformer Layers	5	-
Number of Heads	45	8
Window Size	-	5
Shift Size	-	1
Label Smoothing	-	0.1
Activation Function	ReLU	ReLU
Output Function	Softmax	Softmax

In the vision transformer, we train the model by splitting the image into patches. Patch size refers to the size of these patches. Projection dimension refers to the length of the vector that we project these separated patches with the linear projection method. After the projection, the vectors we have obtained are placed in the multi-head attention layers in the transformer encoders, and it is decided how much attention should be paid to the result, considering how much it affects the result. The number of Heads parameter refers to the number of heads in the multi-head attention layers. A transformer layer includes normalization, multi-head attention, and MLP layers. The Number of Transformer Layer parameters indicates the number of these transformer layers. After the transformers come the MLP layers, and the MLP Units parameter refers to the size of these MLP layers. Swin transformer has a shifted-window structure compared to vision transformer. This shifted-window mechanism processes the image by selecting windows on the image that we have divided into patches and shifting these windows. The Window Size parameter expresses the size of the windows on these patches. Shift size refers to how many pixels these windows will be shifted. The label smoothing parameter in Swin Transformer refers to a correction factor that is used to smooth the sharp target distribution, usually caused by the hard coding of the labels. This factor usually takes a value in the range [0, 1]. 0 means that label smoothing is not applied, while a value of 1 means maximum label smoothing [34]. The evaluation metrics obtained as a result of applying the transformer models and the two most successful deep learning models with the parameters specified in Table 3 to the data set consisting of lemon images are given in Table 4.

Table 4

The results of the experimental studies carried out in the study
	Recall	Precision	Accuracy (%)	Loss
EfficientNetB5	99.29	98.86	99.03	0.0382
DenseNet201	98.96	98.49	98.65	0.0563
VisionTransformer	99.95	99.66	99.84	0.0070
SwinTransformer	99.38	99.12	99.23	0.0174

As seen in Table 5, it is seen that transformer models are more successful than deep learning models. Among the transformer models, it is seen that the VisionTransformer model performs a more successful classification than the Swin Transformer model with an accuracy value of 99.84%. In order to show the consistency of the accuracy and loss values of these four models, box-plot graphs are drawn and shown in Figs. 6 and 7.

Figures 6 and 7 show the average loss and accuracy values for the dataset prepared to determine lemon quality. Experimental evaluations were carried out on EfficientNetB5, DenseNet201 deep learning architectures and VisionTransformer, Swin Transformer transformer architectures. In the light of the results obtained, the VisionTransformer method has the best average loss and accuracy values compared to other methods. The accuracy and loss values of the VisionTransformer method are between 0.9871–0.9984 and 0.0070–0.0076, respectively. As seen in Figs. 6 and 7, the boxplot of the Vision Transformer architecture is much smaller than other architectures. In addition, the distance between the extreme values in the boxplot for the Vision Transformer architecture is very small and the difference in accuracy rates is very small. It is seen that the box drawing lengths of the Vision Transformer architecture are shorter than the box drawing lengths of other architectures, the distance of the whiskers to the box is closer, and the median value is in the middle of the box. According to the results, it is seen that the Vision Transformer architecture offers more stable results in the data set prepared for determining lemon quality compared to other architectures. In order to show the contribution of the Vision Transformer method, which is the most successful method we proposed in the study, to the literature, comparisons were made with the studies conducted on the same data set and the results are shown in Table 5.

Table-5 Comparison of the studies with Lemon Quality Dataset on literature

Authors	Year	Microarray Data	Number of Data	Method	Accuracy (%)
He et al.	2021	Lemon	1847	VGG16	95.44
Pramanik et al.	2021	Lemon	314	Xception	94.34
Hernandez et al.	2021	Lemon	913	CNN	92
Bird et al.	2022	Lemon	2690	VGG16	83.77
Bird et al.	2022	Lemon	2690 + 400	VGG16 after CGAN	88.75
Sharma et al.	2022	Lemon	3000	CNN + LSTM	94.2
Yılmaz et al.	2023	Lemon	2076	SAE- CNN	98.96
Proposed Method	2023	Lemon	2076	VisionTransformer	99.84

As seen in Table 5, when the studies on the quality evaluation of the lemon product were examined, the Vision Transformer model used in the study provided a higher success rate than other studies. Fruit diseases are one of the serious major problems in lemon cultivation. Therefore, the detection of these diseases is of vital importance for the cultivation of lemons and other fruits. Lemon is a fruit that is frequently consumed in many parts of the world. Since it is a potential therapeutic for diseases such as cancer and tumors, and also because the vitamins it contains are extremely important for human health, lemon quality and detection of lemon diseases is an important issue. Previously, the detection of these diseases could only be done by observation. Today, these diseases can be detected automatically with image processing methods. In this study, various deep-learning methods were used to classify lemon quality. Vision Transformer and Swin Transformer methods, which are new methods in the literature, and ready-made models such as EfficentNet-B5 and DenseNet-201 were used and the performance of these models was compared. The proposed Vision Transformer model performed better than the other models. This study makes a successful contribution to the literature for lemon quality classifications, as seen in Table 5.

The lemon dataset comprises 2076 images of lemons captured on a concrete surface, which were preprocessed using image processing techniques. Data augmentation such as rescaling, random zoom, random flip, and random rotation was performed before training with Transformer methods which are Vision Transformer (ViT), Swin Transformer, and deep-learning methods, which are Xception, ResNet50, InceptionV3, NASNetMobile, EfficientNetB5, InceptionResNetV2, ResNet152, DenseNet201. Our Transformer methods performed better than other deep learning methods. Our Vision Transformer model showed 99.84% of accuracy, and our Swin Transformer method showed success on the problem with 99.23% of accuracy. As a result of the study, transformer models have taken their place in the literature as the most successful methods of lemon quality classification. With the incorporation of the decision support system proposed in this study, a versatile device can be created which is applicable in various fields such as plant cultivation, agricultural product classification, disease diagnosis, and productivity enhancement. As a result, it would provide benefits such as improved and sustainable plant growth, increased speed and accuracy in agricultural product classification, early diagnosis of plant diseases to minimize product loss, and increased product quality and yield through effective crop monitoring.

Şahin, G. (2022). Türkiye Limon Yetiştiriciliğinin Ziraat Coğrafyası Perspektifinde Analizi. Ahi Evran Akademi, 3 (2), 54–78. Retrieved from
P. Durgapal, D. Rana, S. Aggarwal and A. Gautam, "Defective Fruit Classification using Variations of GAN for Augmentation," 2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON), Prayagraj, India, 2022, pp. 1–6, doi: 10.1109/UPCON56432.2022.9986472.
Kemal Adem, Impact of activation functions and number of layers on detection of exudates using circular Hough transform and convolutional neural networks, Expert Systems with Applications, Volume 203, 2022, 117583
Hekim, Mahmut; Cömert, Onur; And Adem, Kemal (2020) "A Hybrid model based on the convolutional neural network model and artificial bee colony or particle swarm optimization-based iterative thresholding for the detection of bruised apples," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 28: No. 1, Article 5.
P. N, P. R. K. G, P. Chanduru N M, K. N and N. V, "Fruit Disease Classification using Convolutional Neural Network," 2022 3rd International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2022, pp. 1052–1057, doi: 10.1109/ICESC54411.2022.9885440.
Q. Cheng, J. Li, G. Shen and Q. Du, "Digital Image Soil Analysis based on Machine Learning," 2021 7th Annual International Conference on Network and Information Systems for Computers (ICNISC), Guiyang, China, 2021, pp. 673–677, doi: 10.1109/ICNISC54316.2021.00127.
Jahanbakhshi A, Kheiralipour K (2020) Evaluation of image processing technique and discriminant analysis methods in postharvest processing of carrot fruit. Food Sci Nutr 8(7):3346–3352
Khojastehnazhand M, Mohammadi V, Minaei S (2019) Maturity detection and volume estimation of apricot using image processing technique. Sci Hortic 251:247–251
Unal Y, Taspinar YS, Cinar I et al (2022) Application of pre-trained deep convolutional neural networks for coffee beans species detection. Food Anal Methods 15:3232–3243
Adem, K., Ozguven, M.M. & Altas, Z. A sugar beet leaf disease classification method based on image processing and deep learning. Multimed Tools Appl 82, 12577–12594 (2023).
Koklu M, Ozkan IA (2020) Multiclass classification of dry beans using computer vision and machine learning techniques, Comput Electron Agricult 174: 105507
Hasan MM, Islam MU, Sadeq MJ A deep neural network for multi-class dry beans classification. In: 2021 24th international conference on computer and information technology (ICCIT), 2021, pp. 1–5.
Avuçlu, E., Taşdemir, Ş. & Köklü, M. A new hybrid model for classification of corn using morphological properties. Eur Food Res Technol 249, 835–847 (2023). https://doi.org/10.1007/s00217-022-04181-x
Murat Koklu, Ramazan Kursun, Yavuz Selim Taspinar, Ilkay Cinar, "Classification of Date Fruits into Genetic Varieties Using Image Analysis", Mathematical Problems in Engineering, vol. 2021, Article ID 4793293, 13 pages, 2021. https://doi.org/10.1155/2021/4793293
Koklu, M., Sarigil, S. & Ozbek, O. The use of machine learning methods in classification of pumpkin seeds (Cucurbita pepo L.). Genet Resour Crop Evol 68, 2713–2726 (2021). https://doi.org/10.1007/s10722-021-01226-0
Kılıçarslan, S. (2022). Kurum Üzüm Tanelerinin Sınıflandırılması İçin Hibrit Bir Yaklaşım. Mühendislik Bilimleri ve Araştırmaları Dergisi, 4 (1), 62–71. DOI: 10.46387/bjesr.1084590
He Y, Zhu T, Wang M, Lu H (2021) On lemon defect recognition with visual feature extraction and transfers learning. J Data Anal Inform Process 9:233–248
Sharma R, Kukreja V (2022) Amalgamated convolutional long term network (CLTN) model for lemon citrus canker disease multi-classification. In: 2022 International conference on decision aid sciences and applications (DASA) (pp. 326–329). IEEE.
Hernández, A., Ornelas-Rodríguez, F. J., Hurtado-Ramos, J. B., & González-Barbosa, J. J. (2021). Accuracy Comparison Between Deep Learning Models for Mexican Lemon Classification. In Telematics and Computing: 10th International Congress, WITCOM 2021, Virtual Event, November 8–12, 2021, Proceedings 10 (pp. 62–73). Springer International Publishing.
Pramanik, A., Khan, A. Z., Biswas, A. A., & Rahman, M. (2021, July). Lemon Leaf Disease Classification Using CNN-based Architectures with Transfer Learning. In 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT) (pp. 1–6). IEEE.
Jordan JB, Barnes CM, Manso LJ, Ekárt A, Faria DR (2022) Fruit quality and defect image classification with conditional GAN data augmentation, Scientia Horticulturae 293: 110684, ISSN 0304–4238
Yusuf Emir Koroglu, 2022, “Lemon Quality Dataset”, https://www.kaggle.com/datasets/yusufemir/lemon-quality-dataset
M. M. Shahriar Maswood, T. Hussain, M. B. Khan, M. T. Islam and A. G. Alharbi, "CNN Based Detection of the Severity of Diabetic Retinopathy from the Fundus Photography using EfficientNet-B5," 2020 11th IEEE Annual Information Technology, Electronics and Mobile Communication Conference (IEMCON), Vancouver, BC, Canada, 2020, pp. 0147–0150, doi: 10.1109/IEMCON51383.2020.9284944.
S. Wu, J. Wang, Y. Ping and X. Zhang, "Research on Individual Recognition and Matching of Whale and Dolphin Based on EfficientNet Model," 2022 3rd International Conference on Big Data, Artificial Intelligence and Internet of Things Engineering (ICBAIE), Xi’an, China, 2022, pp. 635–638, doi: 10.1109/ICBAIE56435.2022.9985881.
R. N. Lazuardi, N. Abiwinanda, T. H. Suryawan, M. Hanif and A. Handayani, "Automatic Diabetic Retinopathy Classification with EfficientNet," 2020 IEEE REGION 10 CONFERENCE (TENCON), Osaka, Japan, 2020, pp. 756–760, doi: 10.1109/TENCON50793.2020.9293941.
M. Tan, Q.V. Le, 2019, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, https://arxiv.org/abs/1905.11946
Y. Altaf, A. Wahid and M. M. Kirmani, "Deep Learning Approach for Sign Language Recognition Using DenseNet201 with Transfer Learning," 2023 IEEE International Students' Conference on Electrical, Electronics and Computer Science (SCEECS), Bhopal, India, 2023, pp. 1–6, doi: 10.1109/SCEECS57921.2023.10063044.
P. Padhi and M. Das, "Hand Gesture Recognition using DenseNet201-Mediapipe Hybrid Modelling," 2022 International Conference on Automation, Computing and Renewable Systems (ICACRS), Pudukkottai, India, 2022, pp. 995–999, doi: 10.1109/ICACRS55517.2022.10029038.
A. D. J. Abadicio et al., "Ground-level Post-Disaster Image Classification using DenseNet201 for Disaster Damage Assessment," 2023 International Conference On Cyber Management And Engineering (CyMaEn), Bangkok, Thailand, 2023, pp. 132–137, doi: 10.1109/CyMaEn57228.2023.10050981.
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy
İ. Tuncel, A. Albayrak ve M. Akın, "Öz Dikkat Mekanizması Tabanlı Görü Dönüştürücü Kullanılarak Sıtma Parazit Tespiti", Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, c. 13, sayı. 2, ss. 271–277, Haz. 2022, doi:10.24012/dumf.1120289
M. A. -E. Zeid, K. El-Bahnasy and S. E. Abo-Youssef, "Multiclass Colorectal Cancer Histology Images Classification Using Vision Transformers," 2021 Tenth International Conference on Intelligent Computing and Information Systems (ICICIS), Cairo, Egypt, 2021, pp. 224–230, doi: 10.1109/ICICIS52592.2021.9694125.
M. T. Mali, E. Hancer, R. Samet, Z. Yıldırım and N. Nemati, "Detection of Colorectal Cancer with Vision Transformers," 2022 Innovations in Intelligent Systems and Applications Conference (ASYU), Antalya, Turkey, 2022, pp. 1–6, doi: 10.1109/ASYU56188.2022.9925335.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., & Lin, S. (2021). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv preprint arXiv:2103.14030.
Ze Liu, Han Hu, Yutong Lin, Zhuliang Yao, Zhenda Xie, Yixuan Wei, Jia Ning, Yue Cao, Zheng Zhang, Li Dong, Furu Wei, Baining Guo; Swin Transformer V2: Scaling Up Capacity and Resolution. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 12009–12019
L.-H. Li and R. Tanone, "Disease Identification in Potato Leaves using Swin Transformer," 2023 17th International Conference on Ubiquitous Information Management and Communication (IMCOM), Seoul, Korea, Republic of, 2023, pp. 1–5, doi: 10.1109/IMCOM56909.2023.10035609.
Powers, D. M. (2020). Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation. arXiv preprint arXiv:2010.16061.
Keskar, N. S., Mudigere, D., Nocedal, J., Smelyanskiy, M., & Tang, P. T. P. (2016). On large-batch training for deep learning: Generalization gap and sharp minima. arXiv preprint arXiv:1609.04836.
Novaković, J. D., Veljović, A., Ilić, S. S., Papić, Ž., & Tomović, M. (2017). Evaluation of classification models in machine learning. Theory and Applications of Mathematics & Computer Science, 7(1), 39.
Ferdinandy, B., Gerencsér, L., Corrieri, L., Perez, P., Újváry, D., Csizmadia, G., & Miklósi, Á. (2020). Challenges of machine learning model validation using correlated behaviour data: Evaluation of cross-validation strategies and accuracy measures. PloS one, 15(7), e0236092.
Hossin, M., & Sulaiman, M. N. (2015). A review on evaluation metrics for data classification evaluations. International journal of data mining & knowledge management process, 5(2), 1.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Achieving High Accuracy in Lemon Quality Classification: A Comparative Study of Deep Learning and Transformer Models

Status:

Version 1

Abstract

Figures

1. Introduction

2. Material and Methods

2.1. Deep Learning Methods

2.2. Transformer Methods

3. Results and Discussions

3.1. Evaluation Criteria

4. CONCLUSION

References

Additional Declarations

Status:

Version 1