2.1 VARIABLES
- Independent Variables: Artificial intelligence-based code - Images (Raw Data)
- Dependent Variables: Accuracy – Sensitivity – Specificity
- Control Variables: Programming Language Python – Keras Software – Py Torch – TensorFlow
2.2 DATASETS
RIM-ONE (Figure 4).
RIM-ONE is an open retinal fundus image dataset with accurate gold standards of the optic nerve head provided by different experts. It includes images from healthy eyes as well as from eyes with glaucoma at different stages. A variety of measurements by zones of the optic disc is also proposed for the purpose of validation. Three Spanish hospitals have contributed to the development of this dataset: Hospital Universitario de Canarias, Hospital Clínico San Carlos, and Hospital Universitario Miguel Servet. This dataset contains 485 stereo eye fundus images with a resolution of 2144 x 1424. Two sets of ground truths for the optic disc and optic cup are available. The first set is commonly used for training and testing. The second set, which acts as a “human” baseline, also uses DCSeg as a tool for optic disc and cup segmentation of stereo and monocular retinal fundus images.
Drishti-GS (Figure 4).
Drishti-GS is an online retinal fundus image dataset for glaucoma analysis and the study of optic nerve head segmentation. This dataset generates manual segmentation for optic disc and optic cup. It also provides CDR and labels for each image as glaucomatous or healthy. The dataset contains 101 eye fundus images with varying resolutions. It also uses different imaging modalities, such as optical coherence tomography, Heidelberg retina tomograph, and fundus imaging, to assess glaucoma. Aravind Hospital contributed to the development of this dataset.
Table 2
Overview of datasets used for evaluating methods
Dataset
|
Total Size
|
Healthy
|
Glaucomatous
|
Train
|
Validate
|
Test
|
RIM-One
|
485
|
313
|
172
|
296
|
21
|
168
|
Drishti-GS
|
101
|
31
|
70
|
48
|
04
|
49
|
2.3 PROCEDURES
Convolutional neural networks, a class of artificial neural networks, is applied to analyze visual imagery.
2.3.1 Data Partitioning
The data partitioning stage, which is the first stage, was executed to divide the dataset into different training and test sets. Specifically, 3/5 of the circumpapillary images constitute the training set while the test set is defined by 2/5 of the data.
2.3.2 Segmentation
A Unet-like architecture was used to learn different pixel-level features. The U-net was modified to have multiple inputs so that the network can receive more original raw pixel information during training. In this way, the risk of overfitting was avoided, and the network’s learning capability was enhanced. Hence, with specific training, the technology was able to identify glaucoma-affected eyes (pixel-wise). (Figure 3).
2.3.3 Regression
The segmentation task is more of an image regression instead of (a pixel classification problem), which deep learning (AI) usually needs in order to transform the low-level pixel information into high-level features. However, for the segmentation tasks, low-level pixel-wise features are more important. In contrast to learning to classify the pixels, directly mapping a retinal image to its corresponding label can keep more low-level pixel-wise features.
2.3.4 Loss function
Due to the major pixel-wise similarities in training images, the mean absolute error (MAE) was adopted as the loss function in order to calculate the pixel-wise difference between the label and the prediction.
2.3.5 Classification
After training the algorithm using segmentation (pixel by pixel), classification was used so that the technology could identify and distinguish glaucomatous eyes. In order to do so, the region and around-area of the optic cup/disc containing key pixel-wise features, such as the vertical disc diameter, the oval shape of the disc/cup, the ISNT rule and the yellow-orange rim, were used as they provide accurate results when distinguishing glaucoma.
Glaucoma Diagnosis Using CNNs versus a Traditional Method Comparative Study
Additionally, a comparative study was conducted to compare the CNN-based detection method with the traditional testing methods. used at King Abdullah Medical City in Saudi Arabia, and other tertiary care setups.
Table 3 CNN vs Traditional Methods Comparative Study
Approach
|
Time
|
Accuracy
|
Accessibility
|
MAE
|
Mean OC
|
Mean OD
|
Efficiency
|
CNN-based Method
|
30-seconds to 1-minute
|
98.9%
|
Accessible to patients worldwide (offering sufficient help to those in remote areas).
|
0.0012.
|
0.978.
|
0.996.
|
Highly efficient, as it provides early, accessible, and accurate diagnosis.
|
Traditional Method
|
Varies for primary glau-coma. How-
ever, it takes approximately 20 to 45 minutes.
|
86.2%
|
Most facilities are not available in primary or secondary hospitals: only in tertiary care setups.
|
1.03.
|
0.743.
|
0.813.
|
Efficient. However, somewhat unreliable for early diagnosis as it is prone to be affected by clinician’s bias or fatigue. It can also be inaccessible to certain demo-
graphics of people living in remote areas.
|
2.3.6 Data Prepossessing
First, the variance between training and validation images was reduced by cropping (600 x 600) the size of the ROI (region of interest) patches with the models that were already trained. This data processing is conducted to allow our model to focus on learning the necessary and important pixel-wise information needed for accurate prediction. (Figure 5).
2.3.7 Data Augmentation Skills, such as image Rotation -90/180/270 at various angles and image flipping, were also used in order to increase the number of training images. This increase is necessary as it is needed to ensure the network's receptive field is
sufficient and accurate. However, for the classification task, the ROIs were cropped for training and testing. The cropped regions were then resized to many different sizes for multiple deep learning training. Finally, the model was averaged as the final (most accurate) prediction. The training platforms used were: Python2.7, TensorFlow, Py Torch, and Keras.
For segmentation, on the training set, the mean Optic Cup Dice is 0.900, the mean Optic Disc Dice is 0.937, and MAE CDR is 0.0012. On the validation set, mean Optic Cup Dice = 0.978, mean Optic Disc Dice = 0.996, and MAE CDR is 0.0043. On the testing set, mean Optic Cup Dice 0.989, mean Optic Disc Dice is 0.999, and MAE CDR is 0.0026 Best rank (results-online): 4th. For classification, on the training set AUC: is 1.0, and Sensitivity: is 1.0. On the validation set, AUC is 0.9803, and the Sensitivity: is 0.99. on the testing set, AUC: is 1.0, and Sensitivity: is 0.99 The latest (results-online) Rank: 1st. (Figure 6).
2.3.8 Data Analysis
The data was assessed and validated for accuracy, sensitivity, and precision. The MAE function was used to assess potential errors. For automated localization of the optic disc, the model was trained for 100,000 iterations while the images were previously employed for training and validation. Once trained and evaluated on Drishti-GS and Rim-One datasets, the model was also tested on other publicly available databases and the results compared with some state-of-the-art methods developed specifically for those datasets. The results highlight the comparative performance of the fully automated method with state-of-the-art algorithms. The accuracy of the methods discussed in this paper are taken for 70% Intersection Over Union (IOU) The results reported by state-of-the-art algorithms are for IOU > 0. This diagnostic approach performed significantly better than existing methods, which means it was able to learn the discriminative representation of OD. It should be noted that most existing methods are normally designed with a particular dataset in focus whilst the algorithm that is proposed in this work was not designed to conform to any specific datasets. However, its performance is superior to those methods tailored specifically for those individual datasets. Thus, accuracy alone does not portray the true performance of the algorithm. Other performance metrics such as precision and sensitivity need to be assessed. Mathematical definitions of all these performances are given below. The results reveal a significant improvement in accuracy, sensitivity, and specificity levels of glaucoma diagnosis. In fact, traditional testing methods had a 72.3% accuracy, 73.1% sensitivity, and 82.1% specificity while this diagnostic approach showed a 98.9% accuracy, 98.8% sensitivity, and 99.1% specificity.
Graphs: