Development of Revised ResNet-50 for Diabetic Retinopathy Detection

doi:10.21203/rs.3.rs-2200376/v1

Download PDF

Research Article

Development of Revised ResNet-50 for Diabetic Retinopathy Detection

https://doi.org/10.21203/rs.3.rs-2200376/v1

This work is licensed under a CC BY 4.0 License

Journal Publication

published 19 Apr, 2023

Read the published version in BMC Bioinformatics →

You are reading this latest preprint version

Diabetic retinopathy (DR) produces bleeding, exudation, and new blood vessel formation conditions. DR can damage the retinal blood vessels and cause vision loss or even blindness. If DR is detected early, ophthalmologists can use lasers to create tiny burns around the retinal tears to inhibit bleeding and prevent the formation of new blood vessels, in order to prevent deterioration of the disease. Imaging examination is a valuable technique for doctors in the prediction and treatment of diseases. The rapid improvement of deep learning has made image recognition an effective technology; it can avoid misjudgments caused by different doctors’ evaluations and help doctors to predict the condition quickly. Therefore, this paper adopts visualization and preprocessing in the ResNet-50 model to improve module calibration, to enable the model to predict DR accurately.

Diabetic retinopathy (DR)

deep learning

ResNet-50

Diabetes is one of the most serious and common chronic diseases in the world; it causes life-threatening, disabling, and costly complications, and reduces life expectancy. The 9th edition of the IDF report shows that there was a prevalence of 9% (463 million adults) in 2019, and the 10th edition of IDF estimates that there were 537 million people living with diabetes worldwide in 2021 [1], with its global prevalence estimated to be over 10%. Furthermore, the number of patients suffering from diabetes mellitus is expected to increase significantly. Diabetic retinopathy (DR) is a potentially blinding complication of diabetes mellitus [2]. DR causes impaired vision and may even lead to blindness if it is not diagnosed in early stages. Thus, retinal examinations remain a primary component of DR management, and they are essential for reducing the long-term consequences of the disease. It is difficult but of the utmost importance to recognize and treat DR, to avoid the more serious risk of permanent blindness in at-risk individuals [3].

Deep learning (DL) is a multilayer neural network learning algorithm that has emerged in recent years. DL has brought a new perspective to machine learning, leading to significant advances in artificial intelligence and human–computer interactions [4]. The rapid improvement of deep learning has made image recognition an effective technology; it can avoid misjudgments caused by different doctors’ standards, and help doctors to diagnose the condition more quickly. Thus, many recent studies have applied deep learning and image recognition technology to the recognition of DR [5–7]. The results of the APTOS 2019 Blindness Detection study (https://www.kaggle.com/c/aptos2019-blindness-detection) show that the recent methods can obtain DR grading with an accuracy of between 75% and 85%. The current deep learning framework for detecting and grading DR is ResNet-50 [8, 9]. However, the disadvantages of ResNet-50 are overfitting and fluctuations in accuracy, which then affect the accuracy of detecting DR.

The aim of this study is to apply preprocessing methods and a revised structure of ResNet-50, to improve its performance in detecting DR. This study proposes using the standard operation procedure (SOP) to process the fundus image. In each generation, ResNet-50 adopts the adaptive learning rating to adjust the weight of layers. In order to change the structure of ResNet-50, this study adopts the visualization tool to obtain the suitable features, instead of original features form ResNet-50. The results show that the performance of the revised ResNet-50 is better than the original ResNet-50. Finally, this study implements the proposed system using JavaScript, so that users can upload the fundus image to the website and obtain the DR results.

This study proposes two approaches to designing the DR grading system: a standard operation procedure (SOP) for preprocessing the fundus image, and a revised structure of ResNet-50, which is described in the following subsection. Finally, this DR grading system is implemented in a website, to allow users to check a fundus image by themselves. Figure 1 shows the flowchart of the proposed DR grading system.

A. Dataset

In order to verify the accuracy that a deep learning system will achieve with a DR dataset, the concept of training, validation, and testing must be applied. The DR dataset from Kaggle (https://www.kaggle.com/competitions/aptos2019-blindness-detection/data) includes 35,126 fundus images, of which 25,805 are normal (without disease). Only 9,321 fundus images exhibit DR, which is divided into four stages [10]: mild nonproliferative diabetic retinopathy (NPDR), moderate NPDP, severe NPDP, and proliferative diabetic retinopathy (PDR). The imbalanced proportion of normal and DR images in Big Data has been identified as one of the main challenges for the algorithms. This can commonly cause overfitting problems [11], as there is a high performance of DR grading in training data, but low performance in the testing data.

Figure 2 shows the method used to select the training data from the 35,126 fundus images. In addition, the validation and testing data also follow a similar method to select 300 fundus images individually, which are different from any images in the training data.

B. SOP for fundus image preprocessing

Preprocessing methods are a very important stage in image recognition; they can be used to eliminate noise/variation in the retinal fundus image, and improve the quality and contrast of the image. Consequently, the trained modules can obtain more credible and accurate results. Therefore, the proposed SOPs (Fig. 1 step 1) for preprocessing fundus images in this study are introduced respectively.

1) Remove the black border of the fundus image:

Many types of fundus image are obtained from Kaggle, due to different types of fundus photography equipment and environments. For instance, the black border of the fundus image (Fig. 3 (a)) would affect the performance of DR grading. This study adopts the auto-cropping method [12] to crop out the uninformative black areas, as shown in Fig. 3 (b). The auto-cropping methods, using crop_image_from_gray functions [12], are performed as follows:

Convert this image (RGB format) to Gray format using OpenCV library. The value of a pixel is 255 when the color of the image is white; if the color is black, the value of the pixel is 0.

Produce the clipping mask which contains 0 and 1 values. When the value of a pixel > tolerance, the value is 1 (True). When the value of pixel ≦ tolerance, the value of mask is 0 (False), as shown in Fig. 3 (c). Default tolerance is 7.

Find a rectangular area which includes column and row elements with 1 values (red square in Fig. 3 (c)).

Extract the rectangular area from the image (RGB format).

2) Create a circular crop around the center of the fundus image: After removing the black border of the fundus image, some parts of the information are also removed, as shown in Fig. 3 (b), and the form of the fundus image is not circular. Even if we resize the fundus image (Fig. 3 (b)), the fundus will be deformed. In order to create a circular crop around the imaged center of the fundus image, as shown in Fig. 3 (d), this study adopts the following processing methods:

Find the height (H) and width (W) of the fundus image (H*W).

Find the longest side (L), either the height or width.

Resize the fundus image (Fig. 3 (b)) (L*L).

Produce the circular image. There is circular mask with radius (L/2) at center, where the value of the mask is one. Values outside the circular mask are zero.

Combine the fundus image (Fig. 3 (b)) with the circular image using cv2.bitwise_and (OpenCV).

Remove the black border of the fundus image again, as mentioned above.

3) Assess quality of the fundus image: In order to obtain the most important features from fundus images, this study adopts the Eye-Quality library [13, 14] to assess the image quality with three labels: reject, usable, and good (Fig. 4). The Eye-Quality (EyeQ) library is developed from the EyePACS dataset (https://www.kaggle.com/c/diabetic-retinopathy-detection) to provide fundus image quality assessment, using a multiple color-space fusion network (MCF-Net) based on ResNet121. The dataset only includes usable and good quality fundus images, to train and test the performance of DR grading.

4) Equalize the histogram of the fundus image:

This study uses an equalized histogram of the fundus image, where the distribution of the image is changed to a uniform distribution, to enhance the contrast and make the features relatively clear. This study equalizes the histogram in Fig. 4, which is a good or usable image. An image in RBG format should be transferred to YCrCb or HSV format before equalizing the histogram. For YcrCb format, Y is the luma component, and Cr and Cb are the red-difference and blue-difference chroma components. HSV format is an alternative representation of the RGB color model, and has three components: hue, saturation, and value. This study transfers the image from RBG format (Fig. 5 (a)) to HSV format first, and then equalizes the histogram of hues and values of the fundus image [15] (Fig. 5 (b)).

C. Revised structure of ResNet-50

To solve the classification problems, many different types of ResNets are used, with different numbers of layers: specifically, 18, 34, 50, 101, and 152 layers [16]. The current deep learning framework for detecting and grading DR is ResNet-50 [8, 9]. However, the disadvantages of ResNet-50 are overfitting and fluctuations in accuracy, which affect its accuracy in detecting DR. This study proposes three strategies to improve the performance of ResNet-50, as follows:

1) Adaptive learning rate in ResNet-50: Learning rate is a particular issue in deep learning. A high learning rate causes weight updates that will be too large, and the performance of the model will oscillate over training epochs. A learning rate that is too low may never converge or may get stuck in the local solution. Thus, this study adopts the adaptive learning rate for ResNet-50, as follows:

Set learning rate ($lr$=0.01) and$factor\left(0.5\right)$

Set the low bound of $lr$, where$lr>0$

If the performance of ResNet-50 fails to change every two epochs, the learning rate will be adjusted according to Eq. (1):

$${lr}^{{\prime }}=lr*factor$$

2) Regularization: Regularization can be employed to minimize the overfitting of the training model [17]. There are two common methods: L1 and L2 regularization. This study applies both L1 and L2 regularization, with kernel_regularizer, which applies a penalty on the layer’s kernel [18, 19], and activity_regularizer, which applies a penalty on the layer’s output [20].

3) Obtain suitable features from conv5_block1_out and conv5_block2_out in ResNet-50: A visualization tool can be applied to observe the features in different layers in ResNet-50 [21–23]. In the conv5_block1_out and conv5_block2_out layers, Fig. 6 (a), (b) shows the distinctive features, which indicate the bleeding part in red color. However, the bleeding part in red does not appear clearly in the final layer in ResNet-50 (Fig. 6 (c)). If the features of two layers could be combined (Fig. 6 (d)), the accuracy of DR grading should be improved. Therefore, this study performs different operations to combine the features of conv5_block1_out with those of conv5_block2_out.

D. Online DR grading system

This study adopts Python, html and JavaScript for web development. Functions include a web application framework, sitemap management, and web interactive design. The online DR grading system can be accessed by “post,” which is a request method supported by HTTP, used by the World Wide Web as a server. Thus, users can upload their fundus image through the online DR grading system, and the training model in the server can grade the image to evaluate whether DR is present. Then, the results will be returned and shown on the website. Figure 7 shows the flowchart of the online DR grading system.

The DR dataset used in this study, obtained from Kaggle (https://www.kaggle.com/c/aptos2019-blindness-detection), includes 35,126 fundus images, of which 25,805 are normal (without disease), and only 9,321 exhibit DR. The training data include 1,500 normal fundus images, labeled as 0, and 1,500 DR images, labeled as 1. In addition, the validation and testing data use 300 fundus images individually, which are different from the training data. Accuracy and loss function are examined to evaluate the performance of the deep learning system:

$$accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$

$loss \left(cross entropy\right)=\frac{-1}{N}\sum _{1}^{N}{y}_{i}*\text{log}\left(\widehat{{y}_{i}}\right)+{(1-y}_{i})*\text{log}\left(1-\widehat{{y}_{i}}\right)$ ) (3)

where TP is true positives, TN is true negatives, FP is false positives, and FN is false negatives. N is the number of samples, ${y}_{i}$ is the actual output, and $\widehat{{y}_{i}}$ is the predictedoutput.

A. L1 regularization and L2 regularization in kernel_regularizer

The L1 regularization and L2 regularization apply a penalty on the layer’s kernel individually, to evaluate which regularization is suitable for the layer’s kernel, as shown in Fig. 8. Although the performance of two regularizations is similar, the training model’s performance using L1 continues to increase slowly in the later epochs, which may cause overfitting problems. Thus, L2 regularization is adopted to apply a penalty on the layer’s kernel.

B. L1 regularization and L2 regularization in activity_regularizer

The L2 regularization is adopted to apply a penalty on the layer’s kernel. Next, L1 regularization and L2 regularization apply a penalty on the layer’s output, as shown in Fig. 9. The result shows that L1 regularization in activity_regularizer can perform better than L2, because the former’s results give higher accuracy and lower loss in validation data.

C. Adaptive learning rate in ResNet-50

In general, the learning rate in deep learning is fixed. This study adopts the adaptive learning rate, instead of fixed learning rate. The common value of factors in Eq. (1) is 0.1, which causes a steep decrease in the learning rate, with the loss value remaining at 1 (Fig. 10 (a)). That is, the coverage of model focuses on early epochs. This study selects the initial learning rate ($lr$) = 0.01 and factor = 0.5. Figure 10 (b) shows that the minimum learning rate can reach 10^− 11, and the performance can increase when the training model adopts the proposed designed parameters.

D. Suitable features from conv5_block1_out and conv5_block2_out in ResNet-50

Table 1

The averaged accuracy of DR grading among different methods. The best performance is highlighted in bold
	Multiply	Add	Average	Maximum	Subtract
Training accuracy	0.7876	0.7892	0.8377	0.82342	0.88048
Training loss	0.5747	0.57436	0.5533	0.56892	0.51524
Validation accuracy	0.74326	0.74696	0.73142	0.74088	0.7311
Validation loss	0.6077	0.5979	0.615	0.61406	0.6376
Test accuracy	0.77224	0.76076	0.76076	0.7628	0.75064
Test loss	0.6001	0.59218	0.60124	0.6034	0.6212

The visualization tool illustrates the features from the final layer of ResNet-50, as shown in Fig. 6 (c), where the bleeding part in red color does not appear clearly; instead, the bleeding part can be observed in the conv5_block1_out and conv5_block2_out in ResNet-50 (Fig. 6 (a), (b)). For conv5_block1_out, peripheral color distribution is relatively average. For conv5_block2_out, the characteristic bleeding part is clearly shown. Next, this study adopts different mathematical methods (addition, subtraction, multiplication, average, and maximum) to merge the features from conv5_block1_out and conv5_block2_out, as shown in Fig. 11. The features using the subtraction method are not clear; those using addition, average and maximum are similar to each other. Features using multiplication are clearer than in other methods, and in the original layer (Fig. 6 (c)).

This study evaluates and compares the performance of DR grading using different merging methods. In order to observe the performance of these different methods, each is performed five times independently, as shown in Table 1. The result also shows that merged features using multiplication can obtain higher accuracy than other methods.

E. Compare the performance of DR grading between ResNet-50 and revised ResNet-50

This study compares the performance of DR grading between ResNet-50 and revised ResNet-50 (Fig. 12). The result shows that the revised ResNet-50 avoids the overfitting problem, deceases the loss value, and reduces the fluctuation problem.

Finally, the proposed DR grading system, which was trained using the Kaggle DR dataset, is applied to test its performance with the publicly available JSIEC dataset, using 144 fundus images. The averaged accuracy of DR grading is 0.9485 (loss = 0.4295), from five independent runs.

F. Online DR grading system

The study adopts JavaScript to “post” the fundus image to the server and obtain the result. The time between posting and obtaining the result is 1 sec. Users also can drag the image to the designated area and upload it. When the user selects the upload button, they will be notified of the result in a message window, as shown in Fig. 13.

The aim of this study was to adopt preprocessing methods and revise the structure of ResNet-50, to improve its performance in detecting DR. This study applied SOP to process fundus images. For ResNet-50, we adopted the adaptive learning rating to adjust the weight of layers and change the structure of ResNet-50. The results of this study demonstrate that the performance of the revised ResNet-50 is better than that of the original ResNet-50. Finally, this study develops an online DR grading system using JavaScript, enabling users to upload a fundus image to the website and obtain the DR results.

Ethics approval and consent to participate: Not Applicable

Consent for publication: Not Applicable

Availability of data and materials: The datasets used and/or analyzed during the current study available from the corresponding author on reasonable request.

Competing interests: All authors declare that they have no conflicts of interest.

Funding: This research received no external funding

Authors' contributions: Conceptualization, C. L. Lin and K. C. Wu; methodology, C. L. Lin and K. C. Wu; software, K. C. Wu validation, C. L. Lin and K. C. Wu; formal analysis, K. C. Wu; data curation, K. C. Wu; writing—original draft preparation, C. L. Lin and K. C. Wu; writing—review and editing, C. L. Lin; supervision, C. L. Lin.

Acknowledgements: We would like to thank two anonymous (unknown) reviewers and the editor for their comments.

H. Sun et al., "IDF Diabetes Atlas: Global, regional and country-level diabetes prevalence estimates for 2021 and projections for 2045," Diabetes research and clinical practice, vol. 183, p. 109119, 2022.
M. M. Nentwich and M. W. Ulbig, "Diabetic retinopathy-ocular complications of diabetes mellitus," World journal of diabetes, vol. 6, no. 3, p. 489, 2015.
A. Soni and A. Rai, "A Novel Approach for the Early Recognition of Diabetic Retinopathy using Machine Learning," in 2021 International Conference on Computer Communication and Informatics (ICCCI), 2021: IEEE, pp. 1-5.
D. Learning, "Deep learning," High-Dimensional Fuzzy Clustering, 2020.
Z. Khan et al., "Diabetic Retinopathy Detection Using VGG-NIN a Deep Learning Architecture," IEEE Access, vol. 9, pp. 61408-61416, 2021.
S. Qummar et al., "A deep learning ensemble approach for diabetic retinopathy detection," IEEE Access, vol. 7, pp. 150530-150539, 2019.
H. Pratt, F. Coenen, D. M. Broadbent, S. P. Harding, and Y. Zheng, "Convolutional neural networks for diabetic retinopathy," Procedia computer science, vol. 90, pp. 200-205, 2016.
Y. Huang, L. Lin, P. Cheng, J. Lyu, and X. Tang, "Identifying the key components in ResNet-50 for diabetic retinopathy grading from fundus images: a systematic investigation," arXiv preprint arXiv:2110.14160, 2021.
V. Jiwane, A. DattaGupta, A. Chauhan, and V. Patil, "Detecting Diabetic Retinopathy Using Deep Learning Technique with Resnet-50," in ICDSMLA 2020: Springer, 2022, pp. 45-55.
M. Z. Atwany, A. H. Sahyoun, and M. Yaqub, "Deep learning techniques for diabetic retinopathy classification: A survey," IEEE Access, 2022.
A. Viloria, O. B. P. Lezama, and N. Mercado-Caruzo, "Unbalanced data processing using oversampling: Machine Learning," Procedia Computer Science, vol. 175, pp. 108-113, 2020.
K. SURESH, "DIAGNOSIS OF DIABETIC RETINOPATHY USING TRANSFER LEARNING," Andhra University, 2020.
C. Carrillo et al., "Quality assessment of eye fundus images taken by wide-view non-mydriatic cameras," in 2019 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), 2019: IEEE, pp. 1-6.
H. Fu et al., "Evaluation of retinal image quality assessment networks in different color-spaces," in International Conference on Medical Image Computing and Computer-Assisted Intervention, 2019: Springer, pp. 48-56.
C. Li, Y. Wang, Z. Zhao, and F. Su, "Vehicle Re-Identification: Logistic Triplet Embedding Regularized by Label Smoothing," in 2019 IEEE Visual Communications and Image Processing (VCIP), 2019: IEEE, pp. 1-4.
F. Sultana, A. Sufian, and P. Dutta, "Advancements in image classification using convolutional neural network," in 2018 Fourth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), 2018: IEEE, pp. 122-129.
N. Rauf, S. O. Gilani, and A. Waris, "Automatic detection of pathological myopia using machine learning," Scientific Reports, vol. 11, no. 1, pp. 1-9, 2021.
B. J. Kim, H. Choi, H. Jang, D. G. Lee, W. Jeong, and S. W. Kim, "Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks," arXiv preprint arXiv:2205.07260, 2022.
U. Michelucci, "Regularization," in Applied Deep Learning with TensorFlow 2: Springer, 2022, pp. 111-144.
T. Lee, V. P. Singh, and K. H. Cho, "Advanced Neural Network Algorithms," in Deep Learning for Hydrometeorology and Environmental Science: Springer, 2021, pp. 87-106.
E. Ennadifi, S. Laraba, D. Vincke, B. Mercatoris, and B. Gosselin, "Wheat Diseases Classification and Localization Using Convolutional Neural Networks and GradCAM Visualization," in 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), 2020: IEEE, pp. 1-5.
X. A. Inbaraj and J.-H. Jeng, "Mask-GradCAM: Object Identification and Localization of Visual Presentation for Deep Convolutional Network," in 2021 6th International Conference on Inventive Computation Technologies (ICICT), 2021: IEEE, pp. 1171-1178.
M. D. Zeiler and R. Fergus, "Visualizing and understanding convolutional networks," in European conference on computer vision, 2014: Springer, pp. 818-833.

No competing interests reported.

Download PDF

Journal Publication

published 19 Apr, 2023

Read the published version in BMC Bioinformatics →

Editorial decision: Major revision
13 Feb, 2023
Reviews received at journal
24 Jan, 2023
Reviewers agreed at journal
14 Jan, 2023
Reviewers invited by journal
14 Jan, 2023
Editor assigned by journal
02 Jan, 2023
Editor invited by journal
27 Dec, 2022
Submission checks completed at journal
27 Dec, 2022
First submitted to journal
24 Oct, 2022

You are reading this latest preprint version

Development of Revised ResNet-50 for Diabetic Retinopathy Detection

Status:

Journal Publication

Version 1

Abstract

Figures

I. Introduction

Ii. Methods

Iii. Results

Iv. Conclusion

Declarations

References

Additional Declarations

Status:

Journal Publication

Version 1