In recent years, artificial intelligence technology based on classic machine learning (ML) or deep learning (DL) has been widely used in a variety of fundus disease screening including DR. Gulshan et al. used the deep learning algorithm for the screening of DR and obtained extremely high sensitivity and specificity. [5] Takahashi et al. used a modified deep learning algorithm model for the screening and grading of DR, which can obtain grading results similar to those of ophthalmologists. [6] However, even if the application of artificial intelligence technology in DR screening and grading has achieved very high accuracy, the final results can only be used as a diagnostic reference. Training junior ophthalmologists to grow rapidly and perform DR reading accurately is still an important part of ophthalmologist training. If junior ophthalmologists can master the DR reading method through centralized training quickly, it is not only conducive to the growth of ophthalmologists, but also reserves the strength of physicians for labeling AI training set. Therefore, it is of great significance to find an efficient DR reading training method. There is no previous discussion on the standard way of DR reading training, and there is no literature exploring the use of AI reading labeling system for reading training and learning. In this study, the AI reading labeling system was used for DR reading training of junior ophthalmology residents and medical students. Compared with the traditional reading training requiring at least 1,500 to 2,000 pictures, This training using AI labeling system require only 500 pictures to obtain a high diagnostic accuracy. This training showed very obvious advantages in terms of time and number of pictures compared with the traditional reading training.
In this DR reading training, after 8 round of reading, the mean Kappa score value of 13 participants increased from 0.67 in the first reading to 0.81 in the eighth reading, the mean Kappa score value of the first 4 rounds was 0.77, indicating significant agreement, and the mean Kappa score value of the last 4 rounds was 0.81, indicating that after training, the overall reading accuracy of participants was significantly improved. The Kappa score value is not linearly increased each time, which may be due to the fact that the difficulty level cannot be completely consistent with the picture loaded in each time, resulting in the bias of the results.
At the same time, the trainees were also divided into two groups for statistics. The first group was junior ophthalmology residents with certain basic knowledge of ophthalmology, who also attended ophthalmology course and participated in the clinical work of ophthalmology. The second group was medical students who had not learn ophthalmology basic knowledge before the start of reading training, and had not participated in the course and clinical work of ophthalmology after the start of reading. The initial Kappa score of the two groups reflected the difference in the knowledge base of the two groups of readers, with an initial Kappa score of 0.71 in group 1 and 0.62 in group 2, reflecting that the accuracy of the basic reading was higher in group 1 than in group 2. As the training progressed, the difference between the two groups gradually narrowed, and the Kappa scores increased to 0.76 in group 1 and 0.84 in group 2 for the eighth reading, with a more significant increase in the medical student group. The mean Kappa score of the first four rounds was 0.77, the mean Kappa score of the last four rounds was 0.81 in group 1, 0.71 in the first four rounds and 0.82 in the last four rounds in group 2, which also reflected that the gap in reading accuracy between the two groups was reduced, and after reading training, even medical students without an ophthalmological knowledge base could be familiar with the law of DR reading and achieve a certain diagnostic accuracy.
The results of harmonic mean value of presence or absence of DR, referral of DR and severe DR showed that the harmonic mean value of determination of presence or absence of DR was the lowest, and the harmonic mean value of referral of DR and severe DR was relatively higher, which may be because it may not be very accurate for the presence or absence of microhemangioma based on fundus color photography alone. The small microhemangioma in the picture may be confused with poor quality artifacts at the time of photography, leading to incorrect conclusions. This also suggests that for the reading training, we should be cautious in selecting the fundus photographs used for the training, try to select the pictures with good quality, and eliminate the possible confounding factors caused by the poor quality of picture shooting.
This study also has some limitations. Since the original application of the reader labeling system used in the training is to train the AI deep learning model, which is not used for the reading training of physicians, the system cannot immediately give the correct grading answer after labeling, and needs to uniformly conduct the retrospective learning of picture grading after each labeling, which has an effect on the reading learning efficiency. In addition, the number of people included in the training was small, and there may be some error in the statistical mean. In order to make this system more conducive to reading training, the AI reading result prompt function can be added, and the gold standard is given after each round of picture labeling for comparison, which can make the training efficiency and strengthen the effect of reading training.
In conclusion, the use of artificial intelligence DR reading labeling system can effectively improve the DR reading level of junior ophthalmologists, and can achieve a certain reading accuracy in a short time and using less reading volume, which is a feasible reading training method.