Recent advances in DL technology in the ophthalmic field has allowed rapid and accurate diagnosis of several retinal diseases. These advances have led to the predictions of the prognosis, and they have also identified systemic markers of the disease. Importantly, the diagnostic performance of DL algorithms was equivalent to or even surpassed the diagnostic abilities of trained clinicians.
Currently, the reported DL algorithms for the analyses of anterior segment slit-lamp images appear to be developing with the intension of screening common diseases as a substitution of clinicians. An inception-based algorithm developed by Gu et al classified a broad category of diseases including cataracts, neoplasms, non-infectious and infectious disorders, and corneal dystrophy.5 This type of AI does not need to surpass the capabilities of clinicians but be equal to them. Another type of AI has been developed to surpass clinicians’ ability of diagnosis, however, reports on this type of AI for corneal diseases has been scarce.
Predicting the causative pathogen in infectious keratitis is one such representative challenge which needs to surpass the clinicians’ ability. Because of its vision-threatening nature, prompt and accurate diagnosis will benefit clinicians and patients. Proper diagnosis will also reduce the socio-economic burden on the medical resources.
In clinical practice, a diagnosis is made after the results of culture, smear, and responsiveness to the treatment are known. The initial diagnosis is made by the medical history, and slit-lamp examinations are generally modified after the culture and smear results become available. However, the diagnosis largely depends on the experience of the clinicians and the diagnostic algorithms are not unified.
We calculated the pathogen probability scores after feature normalization using a DL algorithm, and then constructed a diagnostic algorithm using GBDT, a hierarchical series of decision trees. (Fig. 4) GBDT is a machine learning algorithm, which uses successive series of decision trees for learning. In GBDT, the coefficient or weight of the first tree is adjusted by the second tree, which is further adjusted by a third tree, and so on. GBDT is well recognized for its high accuracy and efficacy in classification problems, and it has been used in many AI competitions including Kaggle. Thus, the learning of an effective classification algorithm should help clinicians to diagnose more accurately.
In the decision-making process implemented by GBDT (Fig. 4), we found that bacterial probability score for the initial diagnosis was the most important decision (Fig. 4, 1st tree). For a correct diagnosis, the use of combinations of bacterial probability scores are indicated by the GBDT at the first stage. A different set of probability scores can augment the information to be learned because different illuminations, angles, or fluorescein staining serve as complementary roles. This is also similar to the clinical decision-making process.
Clinically, an alternative of bacterial probability score can be obtained by laboratory testing, including the outcomes of the culture and smear tests. Obviously, these are not probability scores, and GBDT can also handle these important features together if necessary. For example, we have reported on the effectiveness of real-time PCR quantification of bacteria using 16S r-DNA for the diagnosis of bacterial keratitis.6 Although the outcomes of laboratory tests are not obtained immediately, incorporation of these variables into GBDT can greatly improve the diagnostic accuracy of the algorithm. Thus, the GBDT platform is versatile and allows seamless use of the general characteristic of the patients, laboratory test results, and it is not limited to score or values of decision level features.
The second tree in the GBDT classified fungal and HSV keratitis using the fungal probability score (Fig. 4, 2nd tree) Then, the 3rd tree classified the fungal keratitis from the HSV suspected image again using the fungal probability score. Clinically, this diagnostic process is facilitated by calcofluor or fungiflora staining of the smear, considering its specificity. However, in our hands, the incorporation of staining into our slit-lamp images based on the GBDT algorithm did not appreciably improve the overall diagnostic accuracy. We interpret the low sensitivity of these staining compromise their values although they are high in specificity.
The fourth tree first classifies acanthamoeba keratitis (Fig. 4), then rules out possibility of HSV infection using the HSV probability score. Non-acanthamoeba images are reexamined using the acanthamoeba probability score. This process illustrates the differential diagnosis of acanthamoeba and HSV. For example, in the early stage of acanthamoeba keratitis, pseudo dendritic lesions are often observed masquerading as herpetic keratitis. This leads to improper use of antiviral drugs or steroids. However, the acanthamoeba probability score of the slit-lamp images represented the characteristics of acanthamoeba infection well, and high AUC was obtained (Fig. 3).
In the diagnosis of fungal keratitis, a relatively lower AUC was obtained (Fig. 3). The GBDT indicated requirements for differential diagnosis from HSV in the second tree.
Thus, incorporation of HSV real-time PCR as another feature characteristic will significantly improve its diagnostic accuracy, although the availability of PCR is limited in most clinical practice.
Generally, the diagnostic accuracy of identifying the causative pathogen by slit-lamp examinations is low for the general ophthalmologist. It was surprising to learn the low diagnostic accuracy by expert ophthalmologists (Fig. 1b).This was also true for corneal specialists.7 In our setting, the accuracy of identification of the four categories of pathogens averaged about 40% for board-certified ophthalmologists. This reflects the difficulty of identifying the causative pathogen in the real-world setting in a tertiary referral hospital. Moreover, improving the diagnostic efficacy by training may be difficult. AI support for the identification of the causative pathogen will greatly improve clinical practice and the improvement is not limited to corneal specialists.
For corneal diseases, the available literatures on DL algorithms are still very limited.7,5 This is in marked contrast to the abundance of retinal imaging AI. Compared to retinal images, the development of anterior segment image AI is hampered by several difficulties arising from differences in the acquisition of the images as stated earlier, and the large number of clinical signs that need to be learned.8 For example, when infectious keratitis images were assessed, the performance of a well-established DL framework, VGG16, trained for whole image was insufficient and the overall accuracy remained at 55.24%.7
There are several factors that might explain why the determination of the causative pathogens was so difficult by examinations of the slit-lamp images alone. One was the difficulty in extracting sufficient information from one image. 9 Another difficulty arises from difference in illuminations or recording angles. This is in marked contrast to the imaging of the fundus in which the images are obtained at the same angle with similar quality and can be acquired by non-experts.
To overcome such difficulty, several approaches have been used to improve the accuracy of the classifications. One approach is the patch level feature learning. Li et al reported segmentation of the anatomical structures and annotations of the pathological features for deep learning.8 They used 54 pathological features, including the presence of corneal edema, ulcer, corneal opacity, neovascularization, hypopyon, pterygium, and cataract.8 However, it remained unclear whether this anatomical feature-based classification can improve the accuracy of the classification for infectious keratitis.
Xu applied patch level learning for classification of infectious keratitis, that was bacterial, fungal, and herpetic stromal keratitis. For this, infectious lesion, conjunctival injection, and anterior chamber inflammation were annotated by manual drawings.7 Using the patch level classification outcomes, the accuracy was 52.5% for VGG16.7 To improve classification accuracy, smaller lesions were randomly sampled from each patch, and the resultant sequence of smaller lesions were used as sequential features for a long short-term memory algorithm. Using the inner-outer sequential order patch algorithm, the accuracy of classification of bacterial keratitis was improved to 75.29%, and the AUC was 0.92.
However, the patch level learning model has some drawbacks. One significant drawback is the requirement of manual drawing of the patch identification by clinicians. This requires large efforts by examiners and may cause some bias for patch detection. Another is the issue of universality or robustness to low resolution images. For example, it remains unclear whether sequential order patch algorithm can perform equally well for fluorescein-stained images. In addition, their softmax based calculations are not robust for low resolution images, photographs obtained with different angles of illumination, or distractors.4 When softmax is used, the quality of the images is encoded as the norm of deep feature vector which is the distance of the vector from the origin. Unless the norm is constrained, poor resolution or image quality will significantly reduce the performance of the learning process.10
Deep learning-based recognition has been used for many practical applications. Face recognition is one important field of security. However, face recognition is a challenging task, because few samples per individual are available for training, and image quality of face or their angles or illumination differ. To overcome this problem, feature normalization using Ring loss was developed. 4 This allowed the normalization of feature characteristics with the norm constraint of the target. Generally, the preservation of convexity in loss function is known to be important for effective optimization of the network. Because Ring loss maintains convexity of softmax function, effective learning is achieved. 4 Moreover, Ring loss approach has been robust to numbers of distractors, lower resolution, or extreme pose (angle) images. Thus, Ring loss with softmax is a simple approach and does not need meticulous annotation by manual drawings.
The limiting factor in improving performance of algorithm was postulated by one research group to be the quality of information in the images.9 However, the quality of information from the image is different depending on the recording conditions or the staining. In addition, specific recording conditions are preferred for specific pathogen. For example, dendritic keratitis is better diagnosed with fluorescence staining. This suggests need of decision level adjustment by causative pathogens or fluorescence staining.
Generally, the integration of multiple modalities can improve the classification efficacy.11 This can be conducted at the score level or the decision level. The score level and decision level fusion scheme has been shown to improve the discrimination efficacy greatly in the field of multi-biometric verification such as for authentication in banking.11 To improve the classification efficacy in this study, the probability score and decisions steps were integrated using GBDT which was versatile and efficient.
There are some limitations in our study. Our algorithm was developed based on more than 4000 images, however the case numbers may still be limited and may not be applicable to geographic regions which have epidemiologically different pathogenic species. Another limitation is that our algorithm classified four categories of pathogens. Based on therapeutic purposes, this classification appears appropriate. However, we are aware that some pathogens have clinical characteristics that resemble that of other organisms. For example, the clinical characteristics of mycobacterium infection is somewhat similar to fungal keratitis. This difficulty can be overcome by training with more detailed classifications which can be easily implemented by our algorithm.
In conclusion, we have developed an AI algorithm which can identify the causative pathogens of infectious keratitis. This algorithm outperformed the accuracy of clinicians. The development of this DL algorithm is important and may become the basis for future development of auto-diagnosis by slit-lamp as well as establishment of efficient tele-medicine platform for anterior segment diseases.