Optical coherence tomography (OCT) has become a valuable imaging tool in medical diagnostics, especially in ophthalmology, offering detailed cross-sectional images of biological tissues. The increasing availability of OCT data has spurred interest in developing efficient methods for OCT image classification to improve diagnostic accuracy and streamline clinical decision-making. Classifying OCT images presents challenges due to intricate structural patterns and subtle tissue variations. Researchers have explored various approaches, including traditional machine learning methods, deep learning architectures, and hybrid models, to effectively distinguish between normal and pathological conditions. Initially, feature extraction techniques were pivotal in extracting relevant information from OCT images for training classifiers. In recent years, deep learning methods, particularly convolutional neural networks (CNNs), have gained traction for their ability to automatically learn hierarchical representations from raw OCT data.
Wali, Asad et al. [22] conducted experiments on an OCT dataset, exploring various built-in feature extraction methods across three distinct machine learning classifiers. Their study aimed to identify the most effective combination of feature extraction methods and machine learning classifiers for accurately categorizing OCT images. Researchers have likely tested methods that can effectively capture relevant information, such as texture, shape, or other distinctive features, from OCT images. The researchers found that when the HOG feature extraction method was paired with the support vector machine (SVM) classifier, it yielded superior performance compared to other combinations. SVM is a powerful algorithm for classification tasks and is known for its ability to find the optimal hyperplane that best separates different classes in a high-dimensional feature space. This suggests that the combination of HOG features and SVM classification was particularly well suited for the characteristics of the OCT dataset, resulting in 78.8% accuracy and reliable categorization of the images.
Srinivasan et al. [23] utilized machine learning techniques for diagnosing and analyzing AMD and DME diseases using OCT images. The authors extracted histogram of oriented gradients (HOG) features from OCT images and implemented support vector machines (SVMs) for classification and recognition.
The study by Schmidt-Erfurth et al. [24] compared unsupervised and supervised learning methods for binary classification in patients with macular degeneration and diabetic retinopathy using a deep learning approach on a dataset of approximately 20,000 images. The research achieved a high accuracy of up to 97% in distinguishing between healthy individuals and those with AMD. The team employed a deep denoising autoencoder (DDAE) trained on healthy samples to identify features that differentiate normal tissue from anomalies in SD-OCT scans. They also utilized support vector machines (SVMs) to model normal probability distributions and a clustering technique to detect inconsistencies in the data. The identified categories were then evaluated by retinal experts, with some matching known retinal structures and others representing novel anomalies not previously associated with known structures. The study revealed that these novel categories were also linked to disease, highlighting the potential of these methods in disease detection and classification.
Lee et al. [25] developed a 21-layer convolutional neural network (CNN) for the purpose of ranking AMD disease, achieving a notable 93% accuracy in binary classification between AMD and normal cases. Kermanni et al. [26] presented a CNN solution utilizing the Inception V3 model and employing transfer learning, resulting in an impressive 96.6% accuracy on a dataset comprising approximately 84,000 samples categorized into drusen, CNV, and DME classes. Huang and team [27] introduced a CNN-based classification approach for distinguishing normal retina, CNV, DME, or drusen, achieving an accuracy rate of 89.9%. Another study by Huang et al. [28] proposed a method based on the Inception V3 model. Tsuji et al. [29] demonstrated high accuracies of 99.6% and 99.8% using the capsule network and InceptionV3, respectively. Prabhakaran introduced the OctNET model, which achieved an impressive 99.69% accuracy on the Kermany database, featuring a relatively lightweight architecture suitable for fast computations. In a separate study, G Latha and P Aruna Priya [30] concentrated on assessing the efficacy of various machine learning classifiers in detecting glaucoma in retinal images. Their approach integrated Gabor transforms and efficient computational classification techniques, utilizing support vector machine (SVM), neural network (NN), and adaptive neurofuzzy inference system (ANFIS) classifiers to evaluate the performance of the glaucoma retinal image classification system.
Roychowdhury et al. [31] developed a method to localize cysts in OCT images of patients with diabetic macular edema (DME). The approach involved identifying six subretinal layers in each image through an iterative high-pass filtering technique. Subsequently, dark regions were identified as potential cystoid regions. To estimate the area of these regions, features such as the solidity, mean, and maximum pixel value of the negative OCT image were analyzed for each candidate cystoid region. The algorithm successfully delineated the boundaries of contiguous cysts, enabling the breakdown of large cysts. The system demonstrated the ability to detect cystoid areas with a mean error of 4.6% and a standard deviation of 6.6% based on 120 OCT images from 25 DME patients. It achieved a sensitivity of 100% and specificity of 75% in distinguishing images with cysts from those without. Moreover, there was a 90% correlation between the manually segmented area and the cystoid area identified by the algorithm. The algorithm accurately located cysts in the inner plexiform layer with 88% accuracy, in the inner nuclear layer with 86% accuracy, and in the outer nuclear region with 80% accuracy.
Zhou [32] presented an automated system for diabetic retinopathy (DR) detection using a deep learning approach comprising two main stages: image preprocessing and deep learning classification. In the preprocessing stage, morphological operations, adaptive histogram equalization, and vessel segmentation techniques were employed to enhance image quality and reduce noise. For classification, the study utilized a pretrained EfficientNet-B4 model that was fine-tuned on a DR fundus image dataset to classify images into five levels of DR severity. Data augmentation methods such as random rotation, flipping, and cropping were implemented to improve the model's generalizability. The system was evaluated on two public datasets and achieved high accuracy and outperformed existing state-of-the-art techniques in diabetic retinopathy detection.
Jian Li [33] developed an advanced deep learning system for the detection and classification of diabetic retinopathy (DR) involving four key stages: initial image processing, image enhancement, subtraction, and classification. The process commenced with enhancing retinal image quality through morphological operations and contrast-limited adaptive histogram equalization (CLAHE), followed by binary thresholding for segmentation. The image enhancement phase utilized a dual attention mechanism to enhance image quality by focusing on both channel and spatial relationships. Feature extraction was performed using the EfficientNet-B4 model, which was fine-tuned on a DR fundus imaging dataset. Classification was carried out using a multiclass SVM classifier to categorize images into five severity levels of DR. The system's effectiveness was evaluated on two public datasets, achieving high accuracy and surpassing existing state-of-the-art methods, highlighting its potential as a valuable tool for diagnosing and classifying diabetic retinopathy. H. Fu [34] developed a deep learning method for automatically diagnosing diabetic retinopathy (DR) using fundus images. The method achieved an impressive accuracy of 92.9%, a sensitivity of 93.5%, and a specificity of 92.5% on the test data. It demonstrated high precision, recall, and F1 scores, indicating its ability to accurately identify DR with minimal false positives. Comparative analyses with experts showed the system's superior accuracy, highlighting the potential of deep learning in DR diagnosis.
S. W. Ting et al. [35] provided a comprehensive review of artificial intelligence and deep learning advancements and applications in ophthalmology. This review likely covers the use of deep learning algorithms for tasks such as image analysis, disease detection, classification, and treatment planning for various ocular diseases. The potential impact of AI on improving diagnostic accuracy, patient outcomes, and healthcare delivery in ophthalmology should also be discussed. R. Gargeya and T. Leng [36] presented a study on a deep learning-based system for the automated identification of diabetic retinopathy. The study likely includes details on the dataset, model architecture, training procedures, and performance evaluation metrics. The clinical implications of using automated systems for DR screening, including their benefits in early detection and disease management, are also discussed. Hwang D.K. et al. [37] proposed a method based on the Inception V3 model with preprocessed images, achieving 96.9% accuracy. Tasnim et al. [38] conducted research on utilizing deep learning techniques for analyzing retinal OCT images. Among the models explored, MobileNetV2 achieved an accuracy of 99.17% when tested on the Kermany dataset, which consists of 84,484 samples categorized into four groups.
The objective of this study was to address the current research gap concerning the application of machine learning classification techniques to OCT data. Our methodology emphasizes the use of actual or translated image pixels rather than depending solely on pixel metadata. We have chosen to implement a random forest classifier algorithm to extract meaningful insights from OCT datasets.
OCT dataset
In this research, the OCT dataset provided by Kermany et al. [26] on Kaggle was utilized. The dataset contains a significant number of 84,484 images. Due to hardware constraints, a subset of 4,000 images was randomly chosen from the training dataset. This selection ensured that each class was represented, with 1,000 images per class included in the subset. This approach aimed to create a manageable and diverse sample for the investigation while balancing dataset size with computational limitations, thus enabling the feasibility of the experimental analysis.