Automatic diabetic retinopathy detection using an ensemble learning approach and classifiers with self-adjusting weights

doi:10.21203/rs.3.rs-4376163/v1

Download PDF

Research Article

Automatic diabetic retinopathy detection using an ensemble learning approach and classifiers with self-adjusting weights

https://doi.org/10.21203/rs.3.rs-4376163/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Diabetic retinopathy (DR), a common eye disease, can cause extreme damage to the vitals of the patients or even blindness. This disease can cause vision impairment and possibly full blindness in diabetic patients who do not receive the correct diagnosis and treatment in the early stages. Diabetic retinopathy must be detected early since the disease will gradually damage the eye. A computer-aided prognosis-based technique is currently being used to assist clinicians in identifying DR in its earliest stages. The existing approaches for classifying stages of DR have an inadequate ability to reliably detect early stages due to their inability to capture the complex underlying features. The present study aimed to automatically identify the stages of DR disease by evaluating retina image samples using a proposed ensemble machine learning approach with self-adjusting classifier weights. To improve the categorization of various stages of DR, the present study trained an ensemble of three deep Convolutional Neural Network (CNN) models and two machine learning models (ResNet50, Densenet121, Squeezenet1_0, SVM, and decision tree as base learners) using the openly accessible Kaggle dataset of retina pictures, to encode rich characteristics. The study results demonstrate that, in contrast to existing approaches, the proposed model identifies all stages of DR and outperforms state-of-the-art methods on the identical Kaggle dataset, with an accuracy of 98.35%.

Diabetes retinopathy

fundus images

CNN

ensemble approach

deep learning

medical image

A diabetic patient does not have enough insulin, which causes an increase in the presence of glucose in the blood (Navaneethan & Devarajan, 2024). Diabetes affects over 400 million humans and may negatively impact vital organs such as the kidneys, heart, neurological system, and retina (Adriman et al., 2021). When diabetes affects the retina and retinal vessels and causes swelling, blood leakage, and fluid discharges, the complication is called DR (Alyoubi et al., 2020). The DR might result in loss of vision in its advanced stages. The DR accounts for about 2.5% of all blindness worldwide (Bourne et al., 2013). When a patient suffers for a longer period, their chance of developing retinopathy increases. The probability of partial or total vision loss can be reduced only if diabetes patients are appropriately and routinely examined to discover and heal DR impairment at an early stage (Bilal et al., 2021). Lesions of various forms, such as hemorrhages, microaneurysms, and hard and soft exudates, identified on retinal imaging confirm the existence of DR and aid in determining the stage of the illness (Canayaz, 2022). The initial visible indication of retinopathy is microaneurysms, which appear on the retina once the vessel walls become fragile. They are very small, round, red pigments with a diameter of 120 µm. Figure 1 depicts the type of hemorrhages, deeper and superficial, which may be seen in retinal imaging as spots of 130 µm with irregular separations. However, hard and soft exudates appear on the retina as white dots and yellow spots, respectively.

The five stages of DR are further classified based on the lesions on the retina as healthy, mild, moderate, severe, and proliferative DR, as shown in Fig. 1(a-e). The moderate stage includes one micro-aneurysm and a little circular red dot at the terminal of blood vessels. The micro-aneurysm ruptures into deeper layers and generates a flame-shaped retina hemorrhage during the Moderate stage. The severe stage has more than 20 intraretinal hemorrhages in each quadrant, with confirmed venous bleeding and extensive intraretinal microvascular discrepancies. Figure 1(a & b) shows that the normal and mild stages are visually comparable. As a result, detecting the mild stage is challenging. This disease can become an obvious comorbidity after being ignored, undetected, or improperly advised for a long while. The final visual damage caused by the disease includes haemorrhage, macular edema, retinal detachment, and retinal capillary nonperfusion (Zhou, 2009).

The categorization of DR has been thoroughly examined in the literature. Quelle et al. (2017) identified DR lesions using three CNN algorithms for binary classification. Additionally, the authors utilized the Kaggle and DiaretDb1 datasets for training and testing. Chandrakumar and Kathirvel (2016) studied stage-wise categorization and proposed the CNN model using a dropout regularisation strategy trained on the Kaggle dataset and evaluated on the DRIVE and STARE datasets. The algorithm they use has a 94% accuracy. The Kaggle dataset proposed by Memon et al. (2016) has been subjected to the CNN architecture for analysis. The dataset underwent initial processing, and to achieve a uniform amount of brightness across all of the photos, nonlocal mean denoising has been utilized along with a delta value.

The total kappa score accuracy for analysis is 0.74, with 10% of the photos utilized for validation. A CNN model has been proposed by Gondal et al. (2017) for the detection of referable DR. Based on binary classification, the CNN model performs with a sensitivity of 93.6% and specificity of 97.6% on DiaretDB1. Pratt et al. (2016) presented a CNN structure for identifying five phases but could not effectively categorize the moderate stage owing to the complexity of the design. Utilizing a skewed dataset from Kaggle resulted in high specificity but at the cost of low sensitivity, which is another study limitation. The innovative structure provided by Wang et al. (2017) identifies pictures as normal or abnormal, referable or non-referable DR, and achieves significant AUC on a normal and referable DR task 0.978 and 0.960, respectively, with a specificity of 0.5. In order to treat DR in two phases, Yang et al. (2017) suggested a deep CNN. Highlighted lesions have been forwarded to the global network for evaluation. For the model analysis, the authors employed kappa scores and class weight. However, they did not consider the proliferative DR phase in their study. Garcia et al. (2017) suggested a technique that used CNN independently utilizing the right and left eye pictures. To increase the sharpness of the photos, the dataset underwent pre-processing and enhancement processes. They attain the highest specificity, 54.47% sensitivity, and 83.68% accuracy on VGG16 without a completely linked layer.

Color fundus pictures have been utilized to diagnose DR. Highly qualified domain specialists can only perform manual evaluations, which is thus costly and time-consuming. As a result, it has been critical to utilize computer vision technologies to analyze fundus pictures and help physicians/radiologists automatically. Several manual engineering and end-to-end learning-based techniques (Ghosh et al., 2017; Nida ey al., 2019) are employed to identify DR in the Kaggle dataset1, but none detect the Mild stage. Utilizing resources to predict DR in several patients can help reduce the severe damage of this disease. A large number of artificial intelligence and machine learning methods have been used in the detection of the disease in order to achieve success in patient diagnosis. The existing models and algorithms failed to satisfy the needs in terms of accuracy. This failure ignited the motivation to develop a model that can help diagnose and prevent the severe damage due to this disease.

The present study employs a unique ensemble approach in which the model does not uniformly capitalize on the contributions of the classifiers. The classifiers utilized in the proposed ensemble learning have been weighted by a small artificial neural network (ANN) trained to learn. A novel DR detection with ensemble learning (DRDEL) weight-based ensemble learning approach has been utilized to improve classification performance. The early identification of DR is crucial for treatment, as the condition is difficult to treat later on and can result in blindness. To the best of our knowledge, no other study has identified the moderate stages of DR utilizing the Kaggle dataset with the weight-based ensemble learning approach utilized in the present study. Compared to all the cutting-edge models, the proposed model most accurately categorizes the DR grades. This study proposes a weighted voting ensemble method based on an ANN with backpropagation for combining the classifiers and determining the impact weights. This research marks the pioneering application of ensemble learning on the APTOS-19 dataset. This study has trained Densenet121, ResNet50, Squeezenet1_0, SVM, and Decision Tree as base learners for the experiments. The DRDEL ((Deep Learning Ensemble for Diabetic Retinopathy Detection) model has been evaluated using various performance metrics such as accuracy, F1 score, precision, and recall. The novelty of the proposed model lies in its ability to assign weights that signify the influence of individual classifiers on the final prediction task.

3.1. Dataset description

The study conducted experiments using Kaggle's latest freely available dataset, which gained attention for its use in an automatic blindness detection competition related to DR. The APTOS-19 dataset consists of 3,662 fundus photographs collected for analysis. The dataset has been fully labeled with five distinct labels (0, 1, 2, 3, and 4) assigned to each data point. The labels assigned to each data point represent the severity grades of DR, ranging from 0 for a normal and healthy condition to 4 for the most severe stage. The grades assigned to the dataset have been labeled as normal (0), mild (1), moderate (2), severe (3), and proliferative (4) for varying degrees of DR. A pool of this dataset has been shown in Fig. 2.

The dataset consists of 1805 images classified as grade 0, 370 as grade 1, 999 as grade 2, 193 as grade 3, and 295 as grade 4. The dataset is partitioned into a training set containing 2930 images and a validation set containing 732 images for training and validation purposes.

3.2. Pre-processing

The dataset includes images classified into five distinct classes based on the severity of DR. Table 1 presents the distribution of sample images among the various classes in the Kaggle dataset. The initial row of Table 1 illustrates the distribution of various classes in the dataset, which is significantly imbalanced. Imbalanced data used for training deep neural networks can result in biased classification. Figure 3 displays the pre-processing steps typically applied to input datasets before feeding them into a machine-learning model. As part of the first pre-processing step illustrated in Fig. 3(a), each input image is resized to 337 × 224 (refer to Fig. 3(b)) while maintaining the aspect ratio, aiming to decrease the training overhead of deep networks.

Up-sampling and down-sampling techniques have also been employed to address the dataset imbalance (Roychowdhury et al., 2014). To address the imbalance, up-sampling has been conducted by augmenting minority classes through random cropping of 224 × 224 patches (refer to Fig. 3(c)), followed by flipping and 90° rotation (refer to Fig. 3(e)), aiming to balance sample distribution, enhance the dataset, and mitigate overfitting. In the down-sampling process, redundant instances from the majority classes are eliminated to match the cardinality of the lowest class. In the resulting distributions (before flipping and rotation), each image is mean normalized (refer to Fig. 3(d)) to mitigate feature bias and accelerate training time. The dataset is partitioned into two subsets, namely training and validation sets, with proportions of 80% and 20%, respectively. The validation set has been employed during the training process to assess and mitigate overfitting. The learning rate is adjusted adaptively from 0.01 to 0.0001 to prevent overfitting based on the observed improvement in validation loss. Image augmentation is implemented using the Keras Image Data Generator, with a re-scale value of 1/225, shear and zoom ranges set to 0.2, and horizontal and vertical flip set to True. The data generator performs automatic on-the-fly data augmentation during runtime.

Table 1

No. of samples of different classes present in the APTOS-19 dataset
Grade	No DR	Mild	Moderate	Severe	Proliferative
Training images	1434	300	808	154	234
Validation images	371	70	191	39	61
Total images	1805	370	999	193	295

3.3. Ensemble model

An ensemble method is a meta-algorithm consolidating multiple machine-learning techniques into a single predictive model. Ensemble methods can serve various purposes, including reducing variance (Bagging), reducing bias (Boosting), or improving prediction accuracy (Stacking). Stacking is an approach that leverages multiple predictive models to create a new model by aggregating information from each model. The stacked approach often performs better than individual models, owing to its soothing nature. Stacking emphasizes the strengths of each base model where it excels while downplaying its weaknesses in areas where it underperforms. Stacking achieves optimal results when the base models used are notably diverse from each other. To enhance the predictive performance of our model, we employed stacking, which is evident from the observed results.

As part of the proposed approach, ResNet50, Densenet121, Squeezenet1_0, SVM, and Decision tree have been combined as base learners in an ensemble method. Let ${\Omega }$ = (ResNet50, Densenet121, Squeezenet1_0, SVM, and decision tree as base learners) be the pre-trained models used in the experiment. The pre-trained models, including ResNet50, Densenet121, Squeezenet1_0, SVM, and decision tree as base learners, are fine-tuned with a dataset (X; Y) of N fundus images (X) of size 224×224 and their corresponding labels (Y = $\raisebox{1ex}{$y$}\!\left/ \!\raisebox{-1ex}{$y$}\right.ϵ$ normal (0), mild (1), moderate (2), severe (3), and proliferative (4)). The training set (X_train; Y_train) is divided into mini-batches of size n = 8, denoted as (X_i; Y_i) ∈ (X_train; Y_train), i = 1; 2…….$\frac{N}{n}$, and the CNN model h ∈ ${\Omega }$ is iteratively optimized (fine-tuned) to minimize the empirical loss.

$$L\left(w,{X}_{i}\right)= \frac{1}{n}\sum _{x\in {X}_{i}, \text{y}\in {Y}_{i} }l\left(h\left(x,w\right),y\right) \left(1\right)$$

In the equation, $h\left(x,w\right)$ refers to the CNN model that predicts class y for input $x$ given w, and l (⸱) represents the categorical cross-entropy loss penalty function. The training process updates the learning parameters using Nesterov-accelerated Adaptive Moment Estimation (Reyad et al., 2023).

$${w}_{t+1}= {w}_{t}-\frac{\alpha }{\sqrt{\widehat{v}}+ϵ}\left({\beta }_{1}{\widehat{m}}_{t}+\frac{\left(1-{\beta }_{1}\right)\frac{\partial }{\partial {w}_{t}}L\left({w}_{t},{X}_{i}\right)}{1-{\beta }_{1}^{t}}\right) \left(2\right)$$

In the equation, $\widehat{v}$, $\alpha$ and $\widehat{m}$ represent the gradient's second-order moment, learning rate, and first-order moment. The decarov Momentum hy rates, denoted as ${\beta }_{1}$ and ${\beta }_{2}$, are initially assigned a value of 0.9. The Nestelps aims to provide directional guidance for the next step and prevent fluctuations. The initial weights, denoted as ${w}_{t}$, at time t = 0 being set to the learned weights of the model h ∈ ${\Omega }$ through transfer learning (Yosinski et al., 2014). Each model, h ∈ ${\Omega }$, applies SoftMax as its activation function in the output layer to produce probabilities indicating the input's relationship to various classes (normal, mild, moderate, severe, and PDR). The learning rate, denoted by $\alpha$, is initially 0.01 and is reduced by a factor of 0.1 until it reaches 0.00001. During training, we implement 50 epochs with early stopping to prevent model overfitting.

3.4. Algorithm

The image dataset is represented as I_{3662 X (224 X 224 X 3),} and L_{3662 X 1} represents the true labels of the corresponding image samples such that each L_i € {0,1,2,3,4}. I is further divided into two subsets, I^T_{3112 X (224 X 224 X 3)} and I^V_{550 X (224 X 224 X 3),} which represent the train set and validation set for the proposed model, respectively. Similarly, L^T_{3112 X 1} and L^V_{550 X 1} represent the true labels of train and validation sets, respectively. S^T_{3112 X 350}, a fully structured training set for training Decision tree and SVM, is obtained after using a pre-trained VGG-16 network with I^T as its input. This algorithm creates a hypothesis F, which maps each I_i^V to {0,1,2,3,4}, resulting in a 550 X 1 prediction vector L^'. The step-by-step algorithm for the DRDEL model is represented below in Fig. 4.

3.5. Initialization and parameter setting

The CNNs utilized in this context are pre-trained on the ImageNet dataset, while the built-in SVM and decision tree from the sklearn library serve as the underlying base learners. For the DRDEL model, Python 3.6 has been employed as the programming language, and the CNNs have been trained on Google Colab utilizing an NVIDIA TESLA T-80 GPU with a consistent batch size of 16. The Fastai library, compatible with Python 3.6, has been utilized for managing CNNs. All CNN models employed in this context have been pre-trained on the ImageNet dataset. The three CNNs have been initially trained for 20 epochs, and the learning rate versus training loss plots have been generated for each network. After analyzing their learning rate versus training loss graphs, the networks have been reloaded, unfrozen, and trained for an additional 10 epochs with adjusted learning rates determined by the "loss vs. learning rate" graphs. This process continues until all the CNNs have reached 50 epochs. By following the approach as mentioned above, four sub-models have been obtained for each CNN: the first trained for 20 epochs, the second trained for an additional 10 epochs (total 30 epochs), the third trained for an additional 10 epochs (total 40 epochs), and the fourth trained for an additional 10 epochs (total 50 epochs). The training loss has been assessed for each sub-model, and the best-performing sub-model has been chosen from each CNN for subsequent prediction and ensembling tasks. The chosen sub-models of each CNN have been slightly modified to obtain the prediction probabilities for each class on each sample.

Features were extracted from 3112 image samples, each with dimensions of 224 × 224, for training multiclass SVM and multiclass Decision Tree models. After performing dimensionality reduction on the 3112 samples with 21,055 attributes, a set of 350 selected attributes has been utilized to train the two multiclass classifiers. The performance of each classifier has been evaluated for statistical analysis, and prediction probabilities have been calculated in the same manner as for the CNN sub-models. The probabilities of class predictions have been fed into the neural network's input layer to train the model to calculate impact weights. The proposed network for the final ensemble task employs backpropagation to update the impact weights. The network for the final ensemble task has been trained using a separate labeled dataset comprising 3112 samples and 25 attributes. The attributes of the dataset consist of the class-wise probabilities assigned to each sample by the five learners. The accuracy of the model stabilized after 150 epochs, and the DRDEL has been used to evaluate the model's performance on the validation set consisting of 550 samples.

This section presents the experimental details of the DRDEL model for DR detection. Four performance measures, namely accuracy, precision, recall, specificity, and F1-score, have been used to validate the results of the proposed model. To provide a detailed demonstration and enhance visual convenience, the confusion matrix of the DRDEL model has been visualized using a heat-map graph.

4.1. Performance parameters

For qualitative evaluation of our proposed model, accuracy, receiver operating curve (ROC), and area under the curve (AUC) (Bradley, 1997) were utilized as performance metrics.

Accuracy

Accuracy is a performance metric calculated as the percentage of correctly predicted instances, taking into account both positive and negative classes

Accuracy = $\frac{TP + TN}{TP + TN + FP + FN } \left(3\right)$

In the context of performance evaluation, FP (False Positive), FN (False Negative), TP (True Positive), and TN (True Negative) correspond to specific classification outcomes. False positives (FP) refer to instances where negative classes are incorrectly predicted as positive, while false negatives (FN) represent positive classes incorrectly predicted as negative. True positive (TP) refers to the number of correctly identified instances of DR images, whereas true negatives (TN) represent the negative classes that are correctly identified as negative concerning the given TP instance.

Precision

Precision is a performance metric that represents the ratio of the truly classified samples to the sum of True Positive and False Positive.

Precision = $\frac{TP}{TP + FP } \left(4\right)$

Recall

Recall or True Positive Rate is also known as sensitivity and is calculated as the ratio of the true positive predictions to the sum of true positive and false negative predictions.

Recall = $\frac{TP}{TP + FN } \left(5\right)$

Specificity

TNR (True Negative Rate), also known as specificity, is a measure of how well a model can correctly identify negative instances and can be calculated as

Specificity = $\frac{TN}{TN + FP } \left(6\right)$

F1-Score

It is a performance metric, the F1-score, representing the weighted harmonic mean of recall and precision.

F1 = $2 \times \frac{Precision \times Recall}{Precision+ Recall} \left(7\right)$

The score, ranging from 0 to 1, indicates the quality of the performance, where 1 represents the best score and 0 represents the worst.

The model designed for classifying retinal images has been implemented using Python 3.6. The retinal pictures used in the present study have been obtained from the Kaggle APTOS-19 dataset. The suggested model undergoes training and testing by leveraging transfer learning models such as ResNet50, Densenet121, and Squeezenet1_0, combined with their ensemble approach and Decision Tree and SVM classifiers. The convolutional, pool5, and fully connected (FC) layers have been utilized to acquire data from the images, facilitating the training of multiclass SVM classifiers and Decision Trees. Each image has a total of 350 features extracted. The dataset comprises approximately 3662 retinal images distributed across various categories. The proposed model has demonstrated a remarkable accuracy of 98.21%, a notably high performance that holds true across nearly all cases, distinguishing it from other existing methods.

5.1. Confusion matrix

A confusion matrix examines the relationship between the actual and predicted labels to assess a classification model's performance on a validation set. The confusion matrix for the present model has been examined using the single-label classification strategy for a set of five classes, as illustrated in Fig. 5. This matrix (refer to Fig. 5) displays the count of accurately and inaccurately classified samples, including their true and predicted labels. Each element in the confusion matrix represents the relationship between the actual label and the predicted label for every image in the validation set. This approach enables a more in-depth examination of the strengths and weaknesses of the classification model. The figure shows that most instances belonging to the No, mild, moderate, severe, and proliferative DR classes have been accurately classified. The current model demonstrated its highest performance in predicting the No DR class, achieving accurate predictions for 366 out of 372 images. However, the accurate predictions for Mild DR, Moderate DR, Severe DR, and Proliferative DR were 68, 188, 37, and 59 images out of 70, 190, 39, and 61 images, respectively.

5.2. Comparison with state-of-the-art models

Figure 6(a) illustrates a comparison between the accuracy of the DRDEL model and the accuracies achieved by state-of-the-art models. The bar graph in this figure illustrates accuracies presented in percentage form on the y-axis, while the x-axis displays various classifiers, including the proposed classifier. The precision of the proposed model has been compared with that of the most advanced methodologies to date in the bar graph illustrated in Fig. 6(b). Figures 6(c) and (d) illustrate the contrast in Recall and F1-Support, respectively, when compared to different models, maintaining the same assumption regarding the axes employed in comparing accuracy and precision. The DRDEL model demonstrated outstanding performance on the APTOS 2019 dataset, achieving a remarkable accuracy of 98.35%. Additionally, the model exhibited a high recall of 97.26% and precision of 96.46% with an F1-score of 96.85% (refer to Fig. 6). In the context of medical image analysis, the priority is to maximize recall, ensuring precise identification of affected patients for more accurate diagnoses.

The precision, recall, and F1-score for five classes have been computed using both macro-average and weighted-average approaches. The precision, recall, and F1-score exhibited macro-average values of 0.75, 0.67, and 0.70, respectively. Additionally, the corresponding weighted-average values recorded have been 0.86 for precision, 0.87 for recall, and 0.86 for F1-score. The model demonstrates the highest precision, recall, and F1-score values for the No DR class, specifically achieving 0.97, 0.98, and 0.98, respectively. Figure 6 presents a comparison of the performance of the proposed method with other reported approaches. Figure 6 demonstrates that our method outperforms most previously reported methods by considerable margins.

5.3. Performance Evaluation on Severity Grading

The model has been evaluated using a validation set consisting of 732 fundus photography images. The study employed a single-label classification approach to assess the performance of the model, evaluating precision, recall, and F1-score using macro averaging. Figure 7 illustrates the class-wise accuracies of all classifiers employed in this study, including the DRDEL classifier. Upon evaluating performance measures such as accuracy, precision, recall, and F1-support, it becomes evident that the proposed DRDEL model stands out as the most accurate classification model for categorizing grades of DR based on retina images from the APTSOS-19 dataset. Figure 7 depicts the class-wise accuracies of individual base learners as well as the final ensemble model, DRDEL. Densenet121 demonstrates the highest accuracy in classifying No DR and mild DR classes among all five base learners, achieving 98.37% and 95.32%, respectively. Squeeznet1_0 exhibits the most accurate classification for the moderate DR class with 93.32%, while Resnet50 excels in accurately classifying severe DR and proliferative DR classes with 92.82% and 95.84%, respectively. The utilization of our self-learned weighted voting approach to ensemble the five base learners resulted in an enhanced accuracy for the final DRDEL model, yielding final accuracy rates of 98.65%, 97.14%, 94%, 94.87%, and 96.72% for the No DR, mild DR, moderate DR, severe DR, and proliferative DR classes, respectively.

The graph displayed in Fig. 8 illustrates the examination of testing and training accuracy concerning the proposed DRDEL algorithm to predict automatic DR. The suggested methodology attains an improved training loss of 0.08029 and a validation loss of 0.08624. Upon assessing the performance of the proposed DRDEL algorithm, the training model demonstrates superior performance when compared to the testing model, which is attributed to the larger available dataset. Figure 8 illustrates the training and testing loss analysis for the MGACSG algorithm aimed at accurately predicting DR. This graph suggests a more favorable outcome with a minimal loss rate. The performance evaluation indicates that the proposed MGA-CSG algorithm achieves a training loss of 0.04 and a testing loss of 0.08.

5.4. Development of the GUI tool

The ANN model has been combined with the "Tk Interface," using the Tkinter library that is part of Python 3, to develop the DiaScan graphical user interface (GUI) toolkit. The Tkinter package makes a complete set of graphical user interface components available in Python, including the Tk class. The handle array has been transmitted to diverse callback functions. The primary responsibility of these callback functions involves managing user interactions with the GUI tool, including actions like selecting a menu item or clicking a push button. The current work used classifiers with self-adjusting weights and an ensemble learning technique to construct an application for the autonomous identification of DR. The application has been developed based on the results of the study. Patients, as users, may easily submit fundus pictures obtained from diagnostic examinations. The application then presents data regarding the classification and intensity of the condition, including proliferative, normal, mild, moderate, severe, and proliferative, to indicate the progression of DR to different levels.

Additionally, the application features a healthy eye for comparative analysis with the generated reports. The segmentation procedure has been carried out successfully, indicating the damaged areas in the application. After segmentation, a descriptive report has been shown in the GUI panel, enabling patients to understand and educate themselves on their condition in their chosen language. Figure 9 illustrates a pictorial diagram depicting the GUI of the tool.

Researchers and clinicians can employ the GUI as an artificial intelligence tool to easily and accurately determine DR stages through seamless segmentation. This can be accomplished by uploading the fundus image of the eye affected by DR directly into the interface. This technique streamlines obtaining vital information for automated DR identification, improving accessibility and efficiency for medical professionals. Figure 9 illustrates the step-by-step process of utilizing the GUI tool on the front panel. These schematic representations delineate the sequential approach users can employ to interact with the GUI efficiently. To attain the intended outcomes in automatic DR detection, designers must meticulously follow this sequential approach. This involves entering the pertinent information from the eye fundus images into the corresponding box of the GUI.

5.5. Class activation maps in automatic diabetic retinopathy classification

The generation of class activation maps for different stages of automatic DR classification using fundus pictures revealed significant insights (can be seen in Fig. 10(a-e)). In normal retinopathy, class activation maps highlight regions that align with healthy anatomical, such as the macula and optic disc. In cases of mild DR, class activation maps have shown the presence of microaneurysms and hemorrhages, which are signs of early pathological alterations. In the moderate stages of the illness, class activation maps showed intraretinal microvascular anomalies and larger regions of exudates, indicating that the disease has been becoming worse. Advanced pathology has been indicated by cotton-wool spots, intraretinal hemorrhages, and extensive neovascularization in class activation maps for severe DR. The most severe type of DR, proliferative DR, has been highlighted by class activation maps, which showed substantial neovascularization at the disc and elsewhere, coupled with concomitant fibrous growth.

Fundus images play a significant role in contemporary ophthalmology, particularly in the early detection of DR during screening processes. Moreover, it aids healthcare professionals in promptly examining the relationship between regular and complex characteristics of DR. Fundus images commonly exhibit a variety of image artifacts, including noise, reduced contrast, and uneven illumination, which can affect their dynamic range. Hence, there is a critical requirement for automated DR detection models that can precisely classify the severity of DR. Numerous machine learning and ensemble learning-based models of a similar nature have been created, yet achieving consistent accuracy has proven challenging. This work introduces a unique ensemble learning model based on weighted voting. In this model, the influence weights of the base learners have been determined through an ANN architecture. To assess the effectiveness of the proposed work of art quantitatively, various performance metrics such as accuracy, precision, recall, and F1-Score have been employed. These metrics have been applied to the validation set, comprising 732 fundus images from publicly accessible APTOS-19 data. The efficacy of the DRDEL model has been demonstrated as a notably successful solution for analyzing medical images, with a particular focus on the APTOS 2019 dataset. The impressive accuracy rate of 98.35% underscores the model's proficiency in accurately categorizing retinal disease conditions, highlighting its potential as a dependable tool for diagnostic support. The noteworthy achievement of a high recall rate of 97.26% by the DRDEL model holds special significance within the realm of medical image analysis. Alternatively recognized as sensitivity or true positive rate, recall measures the model's capacity to detect all pertinent instances of a condition within the dataset. When diagnosing retinal disorders, a higher recall assures that the model correctly detects a significant number of patients with the condition, reducing the possibility of false negatives. In this case, false negatives would mean that the model failed to identify those who had retinal disorders, which might lead to missed diagnosis and postponed medical care. With a precision rate of 96.46% and an F1-score of 96.85%, the model underscores its proficiency in detecting positive cases while effectively minimizing false positives. Precision is pivotal as it signifies the ratio of true positives to the combined total of true and false positives. This importance lies in ensuring that the identified cases have been genuinely relevant. The F1-score, amalgamating precision and recall, offers a well-balanced metric that proves especially valuable in situations characterized by an imbalanced class distribution. The DRDEL model has been developed to prioritize recall to lower the risk of overlooking patients with retinal disorders. This, in turn, contributes to more dependable and timely medical interventions. The model demonstrated its highest accuracy when predicting the No DR class; it showed a maximum accuracy of 98.35% (366 out of 372 images). This suggests that the model performs very well in recognizing instances without indications of DR, a crucial capability for reducing false positives and preventing unnecessary interventions for patients without the condition. The discrepancies in accuracy observed across various DR classes can be ascribed to the intrinsic difficulties in identifying minute details in the progression of the disease. It has become more difficult for the model to correctly identify severe and proliferative DR phases since they often include complex patterns and subtle features. The overall success of the DRDEL model has been attributed to the strategic combination of these base learners using a self-learned weighted voting approach. The ensemble model achieves higher accuracy rates in all DR classes by assigning weights corresponding to each base learner's strengths. The ultimate accuracy rates were 98.65%, 97.14%, 94%, 94.87%, and 96.72% for the classes No DR, mild DR, moderate DR, severe DR, and proliferative DR, respectively. These results demonstrate the synergy accomplished by ensemble learning. This method improves diagnostic performance in detection by using each base learner's unique qualities to produce a more robust and accurate model that can be used for a range of DR severity levels. Additionally, the application developed from this study makes it simple for patients to upload fundus photos to evaluate DR. Users have been empowered to understand their state using the provided categorization and intensity data, as well as a healthy eye for comparison. Effective segmentation identifies affected regions, and the GUI presents a detailed report to enhance patient comprehension. The GUI is an efficient AI tool that simplifies determining the stage of DR for researchers and physicians by allowing direct input of fundus images. The use of class activation maps to automatically classify various stages of DR from fundus images is significant advancement in the field of ophthalmology and computer vision. Researchers have obtained useful insights into the pathological alterations in the retina at different stages of DR by creating class activation maps for fundus pictures. Class activation maps can identify distinct features associated to various phases of DR, such microaneurysms, hemorrhages, exudates, neovascularization, and fibrous development. This aids in a detailed and understandable assessment of disease severity. Furthermore, class activation maps offer researchers a deeper understanding of the underlying pathophysiology of DR, paving the way for the development of more effective diagnostic tools and personalized treatment approaches

This study introduces a novel ensemble learning approach to assist the healthcare sector in automatically classifying the stages of DR, enhancing the diagnostic process with a machine-aided classification system. The DRDEL model utilized the multi-label APTOS-2019 dataset for the DR classification task. This model utilizes a small, structured dataset to train a fundamental artificial neural network, determining the weights for each classifier's contribution in the final prediction task. DRDEL demonstrated superior performance compared to the existing model in the same task, achieving an accuracy of 98.35%, precision of 96.46%, recall of 97.26%, and an F1-score of 96.85%. This study distinctly indicates that the probabilities of each classifier concerning various classes in multiclass classification should not be uniformly weighted in the final prediction. The DRDEL model holds potential for application in the classification of medical image data with anticipated enhancements in the future. Additionally, the model can be extended to DR datasets beyond APTOS-19. To enhance classification performance in the future, incorporating robust classifiers such as Random Forest and CNNs can serve as the foundation for the model's base learners. The class activation maps provide a valuable tool for both understanding the features driving automatic classification decisions and for assisting clinicians in disease staging and management.

Declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Ethical approval

This material has not been published in whole or in part elsewhere; The manuscript is not currently being considered for publication in another journal; All authors have been personally and actively involved in substantive work leading to the manuscript, and will hold themselves jointly and individually responsible for its content.

Informed consent

Informed consent was obtained from all individual participants included in the study.

Funding

No funding was received for this work. Data availability Enquiries about data availability should be directed to the authors.

Author contributions

All listed authors significantly contributed to this article's development and writing.

Adriman R, Muchtar K, Maulina N (2021) Performance evaluation of binary classification of diabetic retinopathy through deep learning techniques using texture feature. Procedia Comput Sci 179:88–94. doi.org/10.1016/j.procs.2020.12.012
Alyoubi WL, Shalash WM, Abulkhair MF (2020) Diabetic retinopathy detection through deep learning techniques: A review. Inf Med Unlocked 20:100377. 10.1016/J.IMU.2020.100377
Bilal A, Sun G, Li Y, Mazhar S, Khan AQ (2021) Diabetic retinopathy detection and classification using mixed models for a disease grading database. Access 9:2354423553. https://doi.org/10.1109/ACCESS.2021.3056186
Bourne RR, Stevens GA, White RA, Smith JL, Flaxman SR, Price H, Jonas JB, Keeffe J, Leasher J, Naidoo K, Pesudovs K, Resnikoff S, Taylor HR (2013) Vision Loss Expert Group. Causes of vision loss worldwide, 1990–2010: a systematic analysis. Lancet Glob Health 1(6):339–349. 10.1016/S2214-109X(13)70113-X
Bradley AP (1997) The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recogn 30:1145–1159. doi.org/10.1016/S0031-3203(96)00142-2
Canayaz M (2022) Classification of diabetic retinopathy with feature selection over deep features using nature-inspired wrapper methods. Appl Soft Comput 128:109462. https://doi.org/10.1016/j.asoc.2022.109462
Chandrakumar T, Kathirvel R (2016) Classifying diabetic retinopathy using deep learning architecture. International journal of engineering research and technology. 5, 19–24. doi.10.17577/ijertv5is060055
García G, Gallardo J, Mauricio A, López J, Del Carpio C (2017) Detection of diabetic retinopathy based on a convolutional neural network using retinal fundus images. in Conference: International Conference on Artificial Neural Networks. New York, NY, USA: Springer. 635–642. doi.10.1007/978-3-319-68612-7_72
Ghosh R, Ghosh K, Maitra S (2017) Automatic detection and classification of diabetic retinopathy stages using CNN. in Conference: 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN). 550–554. doi.10.1109/SPIN.2017.8050011
Gondal WM, Köhler JM, Grzeszick R, Fink GA, Hirsch M (2017) Weakly-supervised localization of diabetic retinopathy lesions in retinal fundus images. in Proc. IEEE International Conference on Image Processing (ICIP). 2069–2073. doi.org/10.48550/arXiv.1706.09634
Memon WR, Lal B, Sahto AA (2016) Diabetic retinopathy. The Professional Medical Journal. 24, 234–238. doi.10./TPMJ/17.3616
Navaneethan R, Devarajan H (2024) Enhancing Diabetic Retinopathy Detection through Pre-processing and Feature Extraction with MGA-CSG Algorithm. Expert Syst Appl. 123418doi.org/10.1016/j.eswa.2024.123418
Nida N, Irtaza A, Javed A, Yousaf MH, Mahmood MT (2019) Melanoma lesion detection and segmentation using deep region based convolutional neural network and fuzzy c-means clustering. Int J Med Informatics 124:37–48. doi.org/10.1016/j.ijmedinf.2019.01.005
Pratt H, Coenen F, Broadbent DM, Harding SP, Zheng Y (2016) Convolutional neural networks for diabetic retinopathy. Procedia Comput Sci 90:200–205. doi.org/10.1016/j.procs.2016.07.014
Quellec G, Charrière K, Boudi Y, Cochener B, Lamard M (2017) Deep image mining for diabetic retinopathy screening. Med Image Anal 39:178–193. doi.org/10.1016/j.media.2017.04.012
Reyad M, Sarhan A, Arafa M (2023) A modified Adam algorithm for deep neural network optimization. Neural Comput Applic 35:17095–17112. doi.org/10.1007/s00521-023-08568-z
Roychowdhury S, Koozekanani DD, Parhi KK (2014) DREAM: Diabetic Retinopathy Analysis Using Machine Learning. IEEE J Biomedical Health Inf 18:1717–1728. 10.1109/JBHI.2013.2294635
Wang Z, Yin Y, Shi J, Fang W, Li H, Wang X (2017) Zoom-in-net: Deep mining lesions for diabetic retinopathy detection. Int. Conf. Medical Image Computing and Computer-Assisted Intervention. Berlin, Germany. 267–275. doi.org/10.48550/arXiv.1706.04372
Yang Y, Li T, Li W, Wu H, Fan W, Zhang W (2017) Lesion detection and grading of diabetic retinopathy via two-stages deep convolutional neural networks, Int. Conf. Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, Switzerland, pp 533–540. doi.org/10.48550/arXiv.1705.00771
Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? in Advances in Neural Information Processing Systems. 3320–3328. doi.org/10.48550/arXiv.1411.1792
Zhou ZH (2009) Ensemble Learning, Encycl. Biometrics 270–273. 10.1007/978-0-387-73003-5_293

Download PDF

Reviewers agreed at journal
08 May, 2024
Reviewers invited by journal
08 May, 2024
Editor assigned by journal
07 May, 2024
First submitted to journal
06 May, 2024

You are reading this latest preprint version

Automatic diabetic retinopathy detection using an ensemble learning approach and classifiers with self-adjusting weights

Status:

Version 1

Abstract

Figures

1. Introduction

2. Literature review

3. Materials and Methods

3.1. Dataset description

3.2. Pre-processing

3.3. Ensemble model

3.4. Algorithm

3.5. Initialization and parameter setting

4. Experimental details

4.1. Performance parameters

5. Results

5.1. Confusion matrix

5.2. Comparison with state-of-the-art models

5.3. Performance Evaluation on Severity Grading

5.4. Development of the GUI tool

5.5. Class activation maps in automatic diabetic retinopathy classification

6. Discussion

7. Conclusion

Declarations

Declarations

Funding

Author contributions

References

Status:

Version 1