Purpose: Expertise for auditing AI systems in medical domain is only now being accumulated. Conformity assessment procedures will require AI systems: 1) to be transparent, 2) not to rely decisions solely on algorithms, or iii) to include safety assurance cases in the documentation to facilitate technical audit. We are interested here in obtaining transparency in the case of machine learning (ML) applied to classification of retina conditions. High performance metrics achieved using ML has become common practice. However, in the medical domain, algorithmic decisions need to be sustained by explanations. We aim at building a support tool for ophthalmologists able to: i) explain algorithmic decision to the human agent by automatically extracting rules from the ML learned models; ii) include the ophthalmologist in the loop by formalising expert rules and including the expert knowledge in the argumentation machinery; iii) build safety cases by creating assurance argument patterns for each diagnosis.
Methods: For the learning task, we used a dataset consisting of 699 OCT images: 126 Normal class, 210 with Diabetic Retinopathy (DR) and 363 with Age Related Macular Degeneration (AMD). The dataset contains patients from the Ophthalmology Department of the County Emergency Hospital of Cluj-Napoca. All ethical norms and procedures, including anonymisation, have been performed. We applied three machine learning algorithms: decision tree (DT), support vector machine (SVM) and artificial neural network (ANN). For each algorithm we automatically extract diagnosis rules. For formalising expert knowledge, we relied on the normative dataset [13]. For arguing be- tween agents, we used the Jason multi-agent platform. We assume different knowledge base and reasoning capabilities for each agent. The agents have their own Optical Coherence Tomography (OCT) images on which they apply a distinct machine learning algorithm. The learned model is used to extract diagnosis rules. With distinct learned rules, the agents engage in an argumentative process. The resolution of the debate outputs a diagnosis that is then explained to the ophthalmologist, by means of assurance cases.
Results: For diagnosing the retina condition, our AI solution deals with the following three issues: First, the learned models are automatically translated into rules. These rules are then used to build an explanation by tracing the reasoning chain supporting the diagnosis. Hence, the proposed AI solution complies with the requirement that ”algorithmic decision should be explained to the human agent”. Second, the decision is not solely based on ML-algorithms. The proposed architecture includes expert knowledge. The diagnosis is taken based on exchanging arguments between ML-based algorithms and expert knowledge. The conflict resolution among arguments is verbalised, so that the ophthalmologist can supervise the diagnosis. Third, the assurance cases are generated to facilitate technical audit. The assurance cases structure the evidence among various safety goals such as: machine learning methodology, transparency, or data quality. For each dimension, the auditor can check the provided evidence against the current best practices or safety standards.
Conclusion: We developed a multi-agent system for retina conditions in which algorithmic decisions are sustained by explanations. The proposed tool goes behind most software in medical domain that focuses only on performance metrics. Our approach helps the technical auditor to approve software in the medical domain. Interleaving knowledge extracted from ML-models with ex- pert knowledge is a step towards balancing the benefits of ML with explainability, aiming at engineering reliable medical applications.