A generalized computer vision model for improved glaucoma screening using fundus images

doi:10.21203/rs.3.rs-4098190/v1

Download PDF

Article

A generalized computer vision model for improved glaucoma screening using fundus images

https://doi.org/10.21203/rs.3.rs-4098190/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Objective

Worldwide, glaucoma is a leading cause of irreversible blindness. Timely detection is paramount yet challenging, particularly in resource-limited settings. A novel, computer vision-based model for glaucoma screening using fundus images could enhance early and accurate disease detection. Herein, we developed and validated a generalized deep-learning-based algorithm for screening glaucoma using fundus images.

Methods

The glaucomatous fundus data were collected from 20 publicly accessible databases worldwide, resulting in 18,468 images from multiple clinical settings, of which 10,900 were classified as healthy and 7,568 as glaucoma. All the data were evaluated and downsized to fit the model's input requirements. The potential model was selected from 20 pre-trained models and trained on the whole dataset except Drishti-GS. The best-performing model was further trained for classifying healthy and glaucomatous fundus images using Fastai and PyTorch libraries. The model's performance was compared against the actual class using the area under the receiver operating characteristic (AUROC), sensitivity, specificity, accuracy, precision, and the F1-score.

Results

The high discriminative ability of the best-performing model was evaluated on a dataset comprising 1,364 glaucomatous discs and 2,047 healthy discs. The model reflected robust performance metrics, with an AUROC of 0.9920 (95% CI: 0.9920 to 0.9921) for both the glaucoma and healthy classes. The sensitivity, specificity, accuracy, precision, recall, and F1-scores were consistently higher than 0.9530 for both classes. The model performed well on an external validation set of the Drishti-GS dataset, with an AUROC of 0.8751 and an accuracy of 0.8713.

Conclusions

This study demonstrated the high efficacy of our classification model in distinguishing between glaucomatous and healthy discs. However, the model's accuracy slightly dropped when evaluated on unseen data, indicating potential inconsistencies among the datasets—the model needs to be refined and validated on larger, more diverse datasets to ensure reliability and generalizability. Despite this, our model can be utilized for screening glaucoma at the population level.

Health sciences/Diseases/Eye diseases/Optic nerve diseases

Scientific community and society/Scientific community/Education

Computer Vision

Glaucoma Screening

Fundus Images

Deep Learning

Artificial Intelligence

Glaucoma is a multifactorial optic neuropathy that affects millions of people worldwide.¹ Glaucoma is characterized by progressive degeneration of the optic nerve head (ONH), leading to irreversible vision loss if left undiagnosed and untreated in a timely manner.^2,3 The early stages of the disease are usually asymptomatic, thus proactive and effective screening methods are essential for preventing substantial vision loss. However, present glaucoma detection methods are restricted by improved access to comprehensive eye care, particularly in low- and middle-income countries where access to advanced diagnostic equipment and trained glaucoma specialists is limited.^4,5 Moreover, current screening methods have a significant false-positive rate, which can cause unnecessary anxiety and burden.^6,7 These challenges highlight the need for innovative approaches in glaucoma screening, including advancements in artificial intelligence.

Traditional techniques for glaucoma detection typically include measuring intraocular pressure (IOP) and evaluating the ONH using optical coherence tomography (OCT), and visual field testing, these methods are often time-consuming, costly, and subject to variability.⁸ Furthermore, an experienced clinician must interpret these images. Although conventional diagnostic procedures are valuable, they serve as a ground for utilizing more advanced technologies—computer vision—to detect glaucoma more accurately and efficiently.

Computer vision-based models to diagnose glaucoma using fundus images⁹—a widely used and non-invasive imaging technique that can provide valuable details about ONH conditions—have shown promise. Fundus images reveal the early signs of glaucoma, which are relevant for screening and managing glaucoma.^10,11 This imaging modality can be integrated with deep-learning algorithms to automatically extract glaucomatous features from fundus images. These algorithms have the potential to enhance the efficiency of glaucoma screening and diagnosis.¹² Nevertheless, most existing studies focus on specific populations or regions with limited data that lack the potential to be a generalizable model for glaucoma screening.^13,14,15,16

In this study, we sought to develop and validate a generalized deep-learning-based algorithm from 20 publicly accessible glaucomatous fundus data using the top-performing model out of 20 pre-trained models to ensure an accurate screening model for the disease, making it more useful in ophthalmic practice. This algorithm may provide a practical and cost-efficient way for screening glaucoma at the population level.

Description of Datasets

We used 20 publicly accessible glaucoma datasets from cohorts across the world (eFigure 1 in the Supplementary ).^17,18 A full description of the cohorts is contained in the Supplementary Text. In brief, these datasets involved people from diverse ethnic groups, with retinal images being obtained using various cameras at differing resolutions. Our study design depends on pre-existing and anonymized data. Nevertheless, all investigations followed relevant guidelines and regulations, ensuring high ethical compliance standards throughout the study. All the datasets, excluding the Drishti-GS cohort, were used for training the classification model. The Drishti-GS dataset was used for external validation of our model.¹⁹ The dataset's ground truth was constructed by four experts with varying clinical experience (3 to 20 years), making it an ideal option for external validation.

Data Pre-processing and Augmentation

We utilized a diverse set of fundus images from 20 publicly accessible databases. These databases exhibited considerable variability in fundus images; some datasets had only ONH disc images, while others contained complete retinal fundus images. We pre-processed different image types to create a uniform set of ONH disc images—using OpenCV to identify and isolate circular objects (ONH) in fundus images.²⁰ This involved converting to grayscale, applying a Gaussian blur, and using the Hough Circle Transform to improve circle detection.²¹ The ONH images were then automatically cropped and resized to a uniform resolution of 512 x 512. If any image did not detect the ONH or if the ONH was off-center, the image was manually cropped, focusing on the most clinically significant aspects of the fundus images.²² After cropping, the image was downsized to 224 x 224 with three channels to ensure consistent dimensions for input into our deep-learning models for glaucoma screening.

Acknowledging the constraints of limited high-quality, publicly accessible fundus images, we increased our training dataset via data augmentation—a technique pivotal to preventing model overfitting and enabling robustness and generalisability.²³ To enhance our dataset's diversity, various data augmentation techniques were employed, including random rotation, cropping, flipping, scaling, lighting adjustment, affine transformations, and zoom, coupled with normalization using ImageNet statistics, as shown in eTable 1 in the Supplementary. Importantly, these augmentation techniques undistorted the critical features of the ONH disc images relevant for glaucoma diagnosis.

Model Selection and Training

Our study compared 20 deep-learning architectures for classifying healthy and glaucomatous disc images. This experiment included a variety of models, including ResNets (18, 34, 50, 101, 152 layers), VGG (16, 19 layers with batch normalization), AlexNet, DenseNets (121, 161, 169, 201 layers), SqueezeNets (1.0, 1.1), GoogLeNet, ShuffleNet, ResNext (50_32x4d, 101_32x8d), and Wide ResNets (50_2, 101_2). We utilized the Fastai library using the cnn_learner function to create a learner object combining model, data, loss functions, and class weights.²⁴ All the models were trained for three epochs using a fine-tuning approach, and then we evaluated the accuracy of each model on our validation set, which consisted of 20% of the total 18,366 images. The process was repeated three times using randomly split data for both training and testing purposes. For the comparative performance of the different models, we plotted a bar chart of the accuracies in eFigure 2 in the Supplementary. The best-performing model (vgg19_bn using pre-trained weights from ImageNet²⁵) was selected based on its performance and complexity. The vgg19_bn incorporates batch normalization, which accelerates training and increases model stability by reducing internal covariate shifts.²⁶ The model's parameters were altered to better fit our dataset by fine-tuning it for 15 epochs until the validation loss stopped decreasing. Following fine-tuning, we used the one-cycle policy to train the model, which adjusts the learning rate and momentum for more efficient training.²⁷ The training was terminated when the validation loss failed to improve over two consecutive epochs, enabled by a callback function.

The degree of agreement with the established ground truth of publicly available datasets varied considerably across datasets.²⁸ Subsequently, we employed the ImageClassifierCleaner from the Fastai library to review and clean our dataset²⁹—excluding around 1% of the total images the model had incorrectly predicted. We repeated the shuffling and training process seven times and removed approximately 7% of the images to improve the overall quality of the dataset. The data-cleaning process effectively reduced the number of misclassified images from the dataset. The clean dataset showed a marked classification imbalance between healthy and glaucomatous fundus images: 59.7% healthy and 40.3% glaucomatous. We employed a weighted cross-entropy, a loss function commonly utilized in training a classification model, to address the problem of imbalanced data.³⁰ The model was re-trained on this refined data, which is more consistent and less likely to mislead the model. Finally, we evaluated the model's performance on unseen data.

Model Decision Visualization

We employed Gradient-weighted Class Activation Mapping (Grad-CAM) to improve the interpretability and transparency of deep-learning models—Grad-CAM provides valuable insights into the decision-making process by highlighting the important regions in an image relevant to predicting the healthy and glaucomatous disc images.³¹ Grad-CAM calculates the gradient of the image score for a particular class and estimates the gradient of the final classification score concerning the weights of the last convolutional layer. These visualizations help uncover the underlying patterns and features that influence the model's decision for clinicians, making it easier to validate and understand its predictions.

Statistical Analyses

To evaluate the performance of our deep-learning model in distinguishing between healthy and glaucomatous discs, we applied various evaluation metrics: the area under the receiver operating characteristic (AUROC), sensitivity (or recall), specificity, accuracy, precision, and the F1-score. These metrics provide a comprehensive measure of the model's accuracy, ability to avoid false positives, and balance between precision and recall. The model's prediction probabilities and true labels produced by the model were transformed into numpy arrays to facilitate subsequent computations. To validate the performance of these measurements, we utilized a bootstrap resampling method.³² This method involved 4,000 iterations of resampling with replacement from our dataset, which allowed for calculating 95% confidence intervals for each performance metric.

The experiment was performed on a virtual Ubuntu desktop (version 22.04) using NVIDIA A100 with 40GB of GPU RAM at Nectar Research Cloud.³³ Python (version 3.10.6) along with PyTorch (version 2.0.0 + cu117), Fastai (version 2.7.12), TorchVision (version 0.15.1 + cu117), Matplotlib (version 3.5.1), and Scikit-learn (version 1.2.2) libraries were employed.^34,35,36

Data Description

A total of 117,152 fundus images were retrieved from 20 databases with 12 unique countries and two unknown countries (Table 1). Non-referral glaucomatous images were excluded from the EyePACS dataset to prevent bias towards the healthy group, and 512 ungradable images were also excluded from the datasets. Our deep-learning model was trained and evaluated on an extensive dataset of 18,468 disc images from multiple clinical settings worldwide; 10,900 images were healthy, while 7,568 had glaucoma.

Table 1

*Overview of the dataset sources and distribution of fundus images across classes.*
Dataset	Country of data origin	Glaucoma	Healthy	Ungradable
Training Dataset
HRF³⁷	Germany and the Czech Republic	15	15	0
ACRIMA³⁸	Spain	396	309	0
REFUGE1³⁹	China	120	1080	0
RIGA⁴⁰	Saudi Arabia and France	56	693	0
RIM-ONE-DL⁴¹	Spain	172	313	0
Retina (sjchoi86-HRF)⁴²	Unknown	101	300	0
DRIONS-DB⁴³	Spain	15	95	0
ODIR⁴⁴	China	200	2427	389
ORIGA⁴⁵	Singapore	168	482	0
LAG⁴⁶	China	1607	3139	108
JSIEC⁴⁷	China	13	38	0
BIOMISA⁴⁸	Pakistan	17	22	0
BEH⁴⁹	Bangladesh	171	463	0
VEIRC⁵⁰	India	32	225	0
LES-AV⁵¹	Unknown	11	11	0
G1020⁵²	Germany	290	715	15
PAPILA⁵³	Spain	88	333	0
KEH (Harvard)⁵⁴	Korea	756	209	0
EyePACS⁵⁵	USA	3270	98172*	0
External Validation
Drishti-GS1⁵⁶	India	70	31	0
Total (117,152)		7,568	109,072	512
* Excluded from the training set

Model Performance

Our best-performing model demonstrated robust performance in discriminating between healthy and glaucomatous disc images, as shown in Table 2. The model achieved high accuracy, precision, recall, and specificity in classifying the images. It successfully maintained a balance between sensitivity and specificity, as indicated by the AUROC of 0.9920, showing a strong ability to accurately identify glaucoma and healthy discs. Moreover, the F1 scores, which consider both precision and recall, were higher than 96% for both classes.

Table 2

*Key performance metrics of the classification model*.
Classification Metrics	Glaucoma (n = 1,381) Mean [95% CI]	Healthy (n = 2,030) Mean [95% CI]
AUROC	0.9920, [0.9920, 0.9921]	0.9920, [0.9920, 0.9921]
Accuracy	0.9671, [0.9670, 0.9672]	0.9671, [0.9671, 0.9672]
Sensitivity (Recall)	0.9530, [0.9528, 0.9532]	0.9768, [0.9767, 0.9769]
Specificity	0.9768, [0.9767, 0.9769]	0.9530, [0.9529, 0.9532]
Precision	0.9654, [0.9652, 0.9656]	0.9683, [0.9682, 0.9684]
F1-score	0.9592, [0.9591, 0.9593]	0.9725, [0.9724, 0.9725]
CI = Confidence Intervals

Generalizability of the Model

To ensure the applicability of our model across populations, we validated it on the DrishtiI-GS dataset, which was excluded from the model's training and validation steps. The model demonstrated significant generalizability, as highlighted by the performance metrics in Fig. 1. The model attained an overall accuracy of 87.13% (0.8713) and AUROC of 87.51% (0.8751) for distinguishing between healthy and glaucomatous discs. This performance on the DRISHTI-GS dataset suggests the broad applicability of our model for glaucoma screening across populations, demonstrating its generalizability and reliability.

Model Errors: Insights from the Top Losses

Our model for glaucoma screening had made some incorrect predictions on the validation set; 47 healthy images were wrongly categorized as having glaucoma (false positives), and 65 actual cases of glaucoma were missed (false negatives) (Fig. 1). Examination of the 47 misclassified healthy images revealed certain features, such as optic disc cupping, which could be associated with glaucoma. Among the 65 actual glaucoma cases the model missed, several could be early or borderline cases where pathological changes in the optic disc are less distinct. The conclusions from this error analysis provide valuable insights into areas for improving our model. To further understand the nature of our model's misclassifications, we utilized the top_losses function from the Fastai library to visualize the instances with the highest loss, shown in Fig. 2. This analysis provides a comprehensive picture of our model's strengths and weaknesses, guiding us in refining our model for improved accuracy in glaucoma screening.

Gradient-Weighted Class Activation Mapping (Grad-CAM)

Our classification model achieved promising results, exhibiting a robust capacity to screen glaucoma at a population level using fundus images. However, to further evaluate the decision-making process of our deep-learning model, we utilized Grad-CAM to visualize which regions in the fundus images influenced the model's classification results. Grad-CAM heatmaps helped visualize the critical areas of fundus images that the model focused on to discriminate between healthy and glaucomatous discs. The heatmaps, represented in terms of higher-intensity-colored regions superimposed on the fundus images, provided an intuitive understanding of the salient features recognized by our model. Moreover, localized areas around the optic nerve and retinal nerve fiber layer, where early signs of glaucomatous damage commonly occur, were also emphasized in the heatmaps (Fig. 3).

The Grad-CAM visualizations provided valuable insights into classifying fundus images as healthy or glaucomatous. As expected, in glaucomatous images, the heatmaps frequently highlighted the ONH, a critical region for diagnosing glaucoma due to characteristic features such as increased cup-to-disc ratio and neuroretinal rim thinning—which suggests the model correctly focused on clinically relevant areas when detecting glaucoma. Interestingly, healthy images demonstrated a diffused heatmap, implying that the model's decision was guided more by the absence of pathological traits than the presence of specific healthy features. Saliency maps are primarily clustered at the ONH of glaucomatous discs.

In this study, we aimed to develop and validate a generalized glaucoma screening model. The best-performing model achieved promising accuracy, highlighting the potential of our approach to revolutionize the landscape of glaucoma screening. To the best of our understanding, no prior study had attempted to use extensive glaucomatous fundus data from diverse demographics and ethnicities, with images from various fundus cameras with different resolutions. Most deep-learning-based models were trained on a limited dataset for classifying healthy and glaucomatous fundus images from a single institution, which made the models non-generalizable for different populations and settings.⁹ Our training dataset included 7,498 glaucoma cases and 10,869 healthy cases, gathered from 19 different datasets. This dataset—one of the most extensive clusters of fundus images ever used to develop a generalized glaucoma screening model—represents a wide range of ethnic groups and fundus cameras used, which could improve our model's performance and make it more applicable globally. The best-performing model was selected in this study out of 20 pre-trained models (eFigure 2 in the Supplementary)—choosing the right deep-learning architecture for a specific task is extremely important.⁵⁷ This extensive and diverse dataset using the potential deep-learning architecture can enhance the model’s generalizability, making it a versatile and practical tool for glaucoma screening in diverse populations.

Our best-performing model exhibited exceptional discriminative ability between glaucomatous and healthy discs; the model learned glaucomatous features from heterogeneous data. The vgg19_bn attained a high degree of AUROC of 99.2%, which was exceeded by ophthalmologists (82.0) and deep-learning systems (97.0)⁵⁸, demonstrating its potential for practical use in glaucoma screening. Li et al. trained and validated their model on 31,745 and 8,000 fundus images, respectively.⁵⁹ The model performed exceptionally well, achieving an AUC of 0.986 with a sensitivity of 95.6%, a specificity of 92.0%, and an accuracy of 92.9% for identifying referable glaucomatous optic neuropathy. Our model, maintained balance across all performance metrics, as revealed in Table 2, for both glaucoma and healthy cases. The model was unbiased towards any particular class, making it a reliable tool for glaucoma screening for wider populations of glaucoma.

Furthermore, we implemented the DenseNet201, ResNet101, and DenseNet161 architectures (eTable 2 in the Supplementary). The DenseNet201 demonstrated a classification accuracy of 96%, with an AUROC of 99%. Steen et al. employed the same DenseNet201 architecture, but their model achieved an accuracy of 87.60%, precision of 87.55%, recall of 87.60%, and an F1 score of 87.57% on publicly available datasets containing 7,299 non-glaucoma and 4,617 glaucoma images.⁶⁰ Our study has a unique strength in balancing sensitivity and specificity, evident from our model's high AUROC values—a significant advantage in real-world clinical settings. Many previous models had difficulty maintaining this balance, resulting in high false positive or false negative rates.^15,16,61

Liu et al. trained a CNN algorithm for automatically detecting glaucomatous optic neuropathy using a massive dataset of 241,032 images from 68,013 patients, and the model’s performance was impressive.⁶² However, the model struggled with multiethnic data (7877) and images of varying quality (884), revealing a drop in AUC with 7.3% and 17.3%, respectively. In contrast, our model demonstrated a modest decline in accuracy, approximately 9.6%, when tested on the DRISHTI-GS dataset. We suspect that part of this performance shift might be due to inconsistencies and the lack of a clearly defined protocol for glaucoma classification across the publicly available dataset. We discovered specific variances in the classification criteria for glaucoma within the dataset (Fig. 2), which may have contributed to the slight drop in accuracy. Despite this, the model's accuracy remained potent, indicating a strong generalization capability. Nevertheless, it would be useful to evaluate the model’s performance across different datasets to confirm its reliability and generality further.

Investigating our model's top losses, we ascertained two significant insights. First, the model did not perform well in classifying borderline cases, suggesting a need for advanced training techniques to handle such intricacies. Second, we identified potential mislabeling of fundus images within our dataset. This mislabeling could introduce confusion during the model's learning phase, thereby decreasing performance. Both findings highlight the need for robust data quality checks and expert verification during dataset preparation. To improve the generalizability of the CNN model for glaucoma screening, we should consider accurate data labeled for training a model by glaucoma experts based on clinical tests rather than expanding the fundus data from multiethnic populations.

We explored the decision-making process of our deep-learning model employing Grad-CAM to create heatmaps for the input fundus images. Heatmaps generated using Grad-CAM highlighted the regions of the fundus images that the model analyzed when determining the presence or absence of glaucoma. Interestingly, the model's emphasis areas align well with those that ophthalmologists would typically examine, such as the optic disc and cup, strengthening the clinical relevance of our model. These visual insights add a layer of transparency to our deep-learning model and provide a key link between automated classification and clinical understanding. These insights from the Grad-CAM heatmaps will be invaluable in ensuring the model's decision-making process correlates with the clinical indicators of glaucoma. This can build clinicians’ trust in these algorithms, allowing for wide adoption in clinical practices.

Although our study demonstrates promising results, there are several limitations. First, we observed that our dataset had mislabeled fundus images, which could impact our model's learning process and accuracy. We employed a data-cleaning procedure to address these challenges, removing 1306 images using ImageClassifierCleaner.²⁹ This process led to a cleaner and more reliable dataset, upon which we re-trained our model. This refinement considerably enhanced the model's robustness and improved its ability to generalize to unseen data. Second, we observed that class imbalance could reduce the model's effectiveness; however, we utilized class weight balance techniques to address this. Furthermore, a data augmentation technique was used in the training phase that could be different from the actual clinical images. Next, our Grad-CAM heatmaps indicated that the model occasionally focused on non-relevant regions for classification decisions, implying that the model might be learning from noise or artifacts within the images. Despite this limitation, the heatmaps confirmed that the model-based its predictions on clinically interpretable features. Finally, our model's external validation was conducted solely on the DRISHTI-GS dataset. Our study also acknowledges the publicly unavailable glaucomatous fundus data from the African continent (eFigure 1 in the Supplementary) for our model’s training and validation. Incorporating glaucoma datasets from African countries could be highly beneficial to further enhance our model's generalizability, especially in under-resourced areas. Future studies should aim to validate the model across multiple datasets, diverse populations, and varied imaging devices to ensure broader applicability. Additionally, our model did not integrate clinical data, such as patients’ glaucoma history or IOP measurements, and visual field data, which could further enhance its predictive capabilities. Despite these limitations, the potential of our refined model for automated glaucoma screening remains significant and provides exciting prospects for future enhancements.

Our study used fundus images to establish a robust computer vision model for glaucoma screening. The best-performing model ascertained high values across multiple evaluation metrics for glaucoma and healthy cohorts, demonstrating its robustness. Our approach promises a fast, cost-effective, and highly accurate tool that can assist ophthalmologists and optometrists in the decision-making process, ultimately improving patient outcomes and reducing the socioeconomic burden of glaucoma. However, the model's accuracy considerably dropped when evaluated on unseen data, indicating potential inconsistencies among the datasets—the model needs to be refined and validated on larger, more diverse datasets to ensure reliability and generalizability. Prospective work will involve validating the model across different datasets, integrating clinical data, and refining the model’s architectures.

Acknowledgments

This research was supported by a Program Grant from the National Health and Medical Research Council (NHMRC; GNT1150144). SM, JEC, DAM and AWH are supported by NHMRC fellowships, and AKC is supported by a Research Training Program Scholarship from the University of Tasmania.

Conflict of interest: No conflicting relationship exists for any author.

Zhang, N., Wang, J., Li, Y. & Jiang, B. Prevalence of primary open angle glaucoma in the last 20 years: a meta-analysis and systematic review. Sci. Rep. 11, 1–12 (2021).
Medeiros, F. A., Zangwill, L. M., Bowd, C., Mansouri, K. & Weinreb, R. N. The Structure and Function Relationship in Glaucoma: Implications for Detection of Progression and Measurement of Rates of Change. Invest. Ophthalmol. Vis. Sci. 53, 6939–6946 (2012).
Stein, J. D., Khawaja, A. P. & Weizer, J. S. Glaucoma in Adults—Screening, Diagnosis, and Management: A Review. JAMA 325, 164–174 (2021).
Hamid, S., Desai, P., Hysi, P., Burr, J. M. & Khawaja, A. P. Population screening for glaucoma in UK: current recommendations and future directions. Eye 36, 504 (2022).
Kolomeyer, N. N. et al. Lessons Learned From 2 Large Community-based Glaucoma Screening Studies. J. Glaucoma 30, (2021).
Forbes, H. et al. Impact of the Manchester Glaucoma Enhanced Referral Scheme on NHS costs. BMJ Open Ophthalmology 4, (2019).
Moyer, V. A. Screening for glaucoma: U.S. Preventive Services Task Force Recommendation Statement. Ann. Intern. Med. 159, (2013).
Sharma, P., Sample, P. A., Zangwill, L. M. & Schuman, J. S. Diagnostic Tools for Glaucoma Detection and Management. Surv. Ophthalmol. 53, S17 (2008).
Chaurasia, A. K., Greatbatch, C. J. & Hewitt, A. W. Diagnostic Accuracy of Artificial Intelligence in Glaucoma Screening and Clinical Practice. J. Glaucoma 31, (2022).
Sihota, R., Sidhu, T. & Dada, T. The role of clinical examination of the optic nerve head in glaucoma today. Curr. Opin. Ophthalmol. 32, 83 (2021).
Bourne, R. R. A. The optic nerve head in glaucoma. Community Eye Health 25, 55 (2012).
Zedan, M. J. M. et al. Automated Glaucoma Screening and Diagnosis Based on Retinal Fundus Images Using Deep Learning Approaches: A Comprehensive Review. Diagnostics 13, (2023).
Li, L. et al. A Large-Scale Database and a CNN Model for Attention-Based Glaucoma Detection. IEEE Trans. Med. Imaging 39, (2020).
Gheisari, S. et al. A combined convolutional and recurrent neural network for enhanced glaucoma detection. Sci. Rep. 11, (2021).
Hemelings, R. et al. Accurate prediction of glaucoma from colour fundus images with a convolutional neural network that relies on active and transfer learning. Acta Ophthalmol. 98, (2020).
Hung, K. H. et al. Application of a deep learning system in glaucoma screening and further classification with colour fundus photographs: a case control study. BMC Ophthalmol. 22, (2022).
Khan, S. M. et al. A global review of publicly available datasets for ophthalmological imaging: barriers to access, usability, and generalisability. The Lancet Digital Health 3, e51–e66 (2021).
glaucoma-dataset-metadata/README.md at main · TheBeastCoding/glaucoma-dataset-metadata. GitHub https://github.com/TheBeastCoding/glaucoma-dataset-metadata/blob/main/README.md.
Drishti-GS Dataset Webpage. http://cvit.iiit.ac.in/projects/mip/drishti-gs/mip-dataset2/Dataset_description.php.
OpenCV Library. OpenCV - Open Computer Vision Library. OpenCV https://opencv.org/ (2021).
Bapat, K. Hough Transform using OpenCV. LearnOpenCV – Learn OpenCV, PyTorch, Keras, Tensorflow with examples and tutorials https://learnopencv.com/hough-transform-with-opencv-c-python/ (2019).
Jonas, J. B. & Budde, W. M. Diagnosis and pathogenesis of glaucomatous optic neuropathy: morphological aspects. Prog. Retin. Eye Res. 19, (2000).
Goceri, E. Medical image data augmentation: techniques, comparisons and interpretations. Artificial Intelligence Review 1.
Howard, J. & Gugger, S. Fastai: A Layered API for Deep Learning. Information 11, 108 (2020).
vgg19_bn — Torchvision 0.15 documentation. https://pytorch.org/vision/stable/models/generated/torchvision.models.vgg19_bn.html?highlight=vgg19_bn#torchvision.models.vgg19_bn.
Ioffe, S. & Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. (2015).
Howard, J. & Gugger, S. Deep Learning for Coders with fastai and PyTorch. (‘O’Reilly Media, Inc.’, 2020).
Amjadian, E., Ardali, M. R., Kiefer, R., Abid, M. & Steen, J. Ground truth validation of publicly available datasets utilized in artificial intelligence models for glaucoma detection. Invest. Ophthalmol. Vis. Sci. 64, 392–392 (2023).
Vision widgets. https://docs.fast.ai/vision.widgets.html.
Ho, Y. & Wookey, S. The Real-World-Weight Cross-Entropy Loss Function: Modeling the Costs of Mislabeling. https://ieeexplore.ieee.org/abstract/document/8943952.
Selvaraju, R. R. et al. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. Int. J. Comput. Vis. 128, 336–359 (2019).
Hesterberg, T. C. What Teachers Should Know About the Bootstrap: Resampling in the Undergraduate Statistics Curriculum. Am. Stat. 69, 371 (2015).
Login - Nectar Dashboard. https://dashboard.rc.nectar.org.au/dashboard_home/.
PyTorch 2.0. https://pytorch.org/get-started/pytorch-2.0/.
torchvision. PyPI https://pypi.org/project/torchvision/.
Installing. scikit-learn https://scikit-learn.org/stable/install.html.
Lehrstuhl für Mustererkennung & Friedrich-Alexander-Universität Erlangen-Nürnberg. High-Resolution Fundus (HRF) Image Database. https://www5.cs.fau.de/research/data/fundus-images/.
CNNs for Automatic Glaucoma Assessment using Fundus Images: An Extensive Validation. figshare https://figshare.com/s/c2d31f850af14c5b5232.
iChallenge-GON数据集 - 飞桨AI Studio. https://aistudio.baidu.com/aistudio/datasetdetail/177198.
Almazroa, A. Retinal fundus images for glaucoma analysis: the RIGA dataset. doi:10.7302/Z23R0R29.
Website. http://medimrg.webs.ull.es/.
GitHub - cvblab/retina_dataset: Retina dataset containing 1) normal 2) cataract 3) glaucoma 4) retina disease. GitHub https://github.com/cvblab/retina_dataset.
DRIONS-DB: RETINAL IMAGE DATABASE. http://www.ia.uned.es/~ejcarmona/DRIONS-DB.html.
Website. https://odir2019.grand-challenge.org/Download/.
Zhang, E. Glaucoma Detection. (2022). ORIGA-light: An online retinal fundus image database for glaucoma analysis and research," 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, Buenos Aires, Argentina, 2010, pp. 3065-3068, doi: 10.1109/IEMBS.2010.5626137
GitHub - smilell/AG-CNN: The model of ‘Attention Based Glaucoma Detection: A Large-scale Database with a CNN Model’ (CVPR2019). GitHub https://github.com/smilell/AG-CNN.
1000 Fundus images with 39 categories. (2019).
Raja, H. Data on OCT and Fundus Images. (2020) doi:10.17632/2rnnz5nz74.2.
Deep-Learning-Based-Glaucoma-Detection-with-Cropped-Optic-Cup-and-Disc-and-Blood-Vessel-Segmentation/Dataset at master · mirtanvirislam/Deep-Learning-Based-Glaucoma-Detection-with-Cropped-Optic-Cup-and-Disc-and-Blood-Vessel-Segmentation. GitHub https://github.com/mirtanvirislam/Deep-Learning-Based-Glaucoma-Detection-with-Cropped-Optic-Cup-and-Disc-and-Blood-Vessel-Segmentation/tree/master/Dataset.
GitHub - ProfMKD/Glaucoma-dataset: glaucoma dataset - Labelled data for fundus images. GitHub https://github.com/ProfMKD/Glaucoma-dataset.
Orlando, J. I. et al. LES-AV dataset. (2020) doi:10.6084/m9.figshare.11857698.v1.
Bajwa, M. N. et al. G1020: A Benchmark Retinal Fundus Image Dataset for Computer-Aided Glaucoma Detection. (2020).
Kovalyk, O. et al. PAPILA. (2022) doi:10.6084/m9.figshare.14798004.v1.
Kim, U. Machine learn for glaucoma. (2018) doi:10.7910/DVN/1YRRAC.
AIROGS - Grand Challenge. grand-challenge.org https://airogs.grand-challenge.org/data-and-challenge/.
Drishti-GS Dataset Webpage. http://cvit.iiit.ac.in/projects/mip/drishti-gs/mip-dataset2/Home.php.
Alzubaidi, L. et al. Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data 8, 1–74 (2021).
Buisson, M. et al. Deep learning versus ophthalmologists for screening for glaucoma on fundus examination: A systematic review and meta-analysis. Clin. Experiment. Ophthalmol. 49, (2021).
Li, Z. et al. Efficacy of a Deep Learning System for Detecting Glaucomatous Optic Neuropathy Based on Color Fundus Photographs. Ophthalmology 125, (2018).
Steen, J., Kiefer, R., Ardali, M., Abid, M. & Amjadian, E. Standardized and Open-Access Glaucoma Dataset for Artificial Intelligence Applications. Invest. Ophthalmol. Vis. Sci. 64, 384–384 (2023).
Diaz-Pinto, A. et al. CNNs for automatic glaucoma assessment using fundus images: an extensive validation. Biomed. Eng. Online 18, (2019).
Liu, H. et al. Development and Validation of a Deep Learning System to Detect Glaucomatous Optic Neuropathy Using Fundus Photographs. JAMA Ophthalmol. 137, (2019).
Z. Zhang et al., "ORIGA-light: An online retinal fundus image database for glaucoma analysis and research," 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology, Buenos Aires, Argentina, 2010, pp. 3065-3068, doi: 10.1109/IEMBS.2010.5626137.

There is no conflict of interest

Supplementary.docx

Download PDF

Editorial decision: revise
05 Aug, 2024
Review #2 received at journal
02 Jul, 2024
Reviewer #2 agreed at journal
28 Jun, 2024
Review #1 received at journal
25 Jun, 2024
Reviewer #1 agreed at journal
16 Jun, 2024
Reviewers invited by journal
14 May, 2024
Editor assigned by journal
09 May, 2024
Submission checks completed at journal
14 Mar, 2024
First submitted to journal
14 Mar, 2024

You are reading this latest preprint version

A generalized computer vision model for improved glaucoma screening using fundus images

Status:

Version 1

Abstract

Figures

INTRODUCTION

METHODS

Description of Datasets

Data Pre-processing and Augmentation

Model Selection and Training

Model Decision Visualization

Statistical Analyses

RESULTS

Data Description

Model Performance

Generalizability of the Model

Model Errors: Insights from the Top Losses

Gradient-Weighted Class Activation Mapping (Grad-CAM)

DISCUSSION

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1