Design, study population and TB diagnosis
This retrospective diagnostic pilot study included a set of 498 posterior-anterior (PA) CXR images of patients included in a cluster randomized clinical trial conducted in Ethiopia and Guinea-Bissau from January 2017 until the end of March 2018.(17, 18) (Figure 1 Flow diagram) The patients included were adults seeking help for symptoms indicative of TB at primary health care centers. They were part of a follow-up group seen two weeks after inclusion for TB-presumptive findings where initial sputum analysis was not performed or was negative. The CXR and Xpert analyses were performed within the same week and prior to the clinical assessment and were only carried out for patients who had persistent symptoms at the two-week follow-up.
Clinical TB was diagnosed by experienced physicians based on clinical signs and symptoms, CXR findings and a trial of antibiotics (17). This diagnostic standard was included because it is a common method for the diagnosis of TB in low-resource settings. (19) The clinicians assessing the patients were blinded to the radiologists’ assessments, as they were performed after the study conclusion. All PTB patients were assessed for treatment outcome, and all included participants were visited one year after inclusion to assess survival.
The CAD CXR
The software qXR was developed by qure.ai (Mumbai, India) as an artificial intelligence (AI)-based CAD for analysing posterior anterior (PA) CXRs for abnormalities.(16) The AI algorithm was trained using a multihospital training dataset of 2.3 million chest X-rays and their corresponding radiology reports to identify abnormal X-rays and specific abnormalities.
qXR assigns a score between 0 and 1 to indicate the presence of a certain finding. It also outputs a pixel map indicating the location of the target abnormality (see Appendix). This study used a threshold of 0.5, which is considered the default recommended for most settings. (20)
In the present study qXR version 3.2 was used to analyse the CXRs.
The radiologists
Both radiologists have longstanding experience in analysing CXRs for TB in a high incidence setting.
Radiologist A is an assistant professor of radiology at the College of Medical and Health Sciences, University of Gondar (CMHS UOG), and obtained his medical degree at CMHS UOG and his radiology specialty certificate at Addis Ababa University (AAU). In addition, he has served in an academic position and as a consultant radiologist at Gondar University Hospital (GUH). He has served as the head of the Department of Radiology at GUH for two years. He has 10 years of experience in the field of radiology.
Radiologist B is an associate professor of radiology at the Department of Radiology at CMHS UOG. He has worked as a consultant radiologist at the GUH since 2010. He obtained his medical degree and specialty certificate in radiology at AAU. He currently serves as head of department at the Department of Radiology at CMHS UOG. He has 14 years of experience in the field of radiology.
In the analysis we also analysed the pooled radiologists´ assessments against CAD CXRs.
The radiologists and AI crew were blinded to all clinical information and microbiological test results.
Assessment of chest radiographs
The PA CXRs were photographed using commonly available digital or mobile phone cameras (Sony Cyber-shot (20.1 MP), Samsung Galaxy A3 (13.0 MP), Canon EOS 70D (20.2 MP); resolution varying from 72 - 350 dpi) that placed the CXRs on a light box in daylight using a guide provided by qXR (Appendix), anonymized, numbered in a random order and sent digitally to qure.ai and the radiologists.
Classification of findings
The findings assessed by CAD CXR and the radiologists are presented in Table 1.
CXRs were classified as abnormal if one of the following findings was identified: consolidation, cavitation, fibrosis, nodule, blunted cardiopulmonary angle, pleural effusion, hilar lymphadenopathy, or tracheal shift.
Treatment outcome
According to the WHO, failure was defined as death during treatment, loss to follow-up and/or treatment failure (i.e., a patient whose sputum smear or culture was positive at month 5 or later during treatment), while treatment completion and cure were defined as successful.(21)
Analysis
All data analyses were performed using Stata version 11 (Stata Corporation, College Station, Texas, USA). Kappa coefficients, and associated 95% confidence intervals (CI), were used to investigate the interrater reliability of the radiologists and AI. The following scale was used for the interpretation of kappa coefficients: <0, poor; 0–0.20, slight; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61–0.80, substantial; and 0.81–1.00, almost perfect (22). To assess the overall accuracy of the AI, ROC analysis was used to determine the AUC using the “roctab” command in Stata 11, which provides a nonparametric estimation of the ROC curve and Bamber and Hanley confidence intervals for the AUC of the ROC curve. The sensitivity and specificity were calculated.
The same accuracy measures were also used to compare the radiologist’s assessment to that of the CAD CXR and were plotted together with the ROC of CAD CXR .
The sensitivity, specificity, positive predictive value and negative predictive value were calculated, and microbiological confirmation (by GeneXpert Mtb/Rif (GX)) and/or clinical diagnosis were used as the reference standards.