Our study demonstrates that the assessed AI tool-building platform allows physicians without any prior training in data sciences or computer programming to train AI models with acceptable performance. All five AI models to identify different causes of suboptimal CXR had high sensitivity, specificity, and accuracy for identifying suboptimality from non-included anatomy, improper exposure, substantial patient rotation, low lung volumes, and overlying anatomy obscuring visibility of lungs or mediastinum. Performance of all 5 AI models was consistent across different patient age groups, sex, and radiographic projections. Using CXR data from our internal multi-institutional consortium, we were able to train five AI models from end-to-end. Such process includes data identification, report curating, standard of reference data labeling to model training and testing on a subset of 3278 CXR without help from any data scientists or engineers over a span of less than 8 weeks. At the same time, we were able to restrict our training datasets to three sites and establish "local" generalizability on CXR from the remaining two sites.
To the best of our knowledge, there is one peer-reviewed report on the use of convolutional neural network for quality control on chest radiographs from Nousiainen et al. The authors evaluated three causes of suboptimal CXR (non-inclusion of entire lungs, patient rotation, and low lung volumes) based on the European guidelines for radiographic quality assessment as opposed to the ACR-SPR-STR guidelines 7. For these overlapping three suboptimality causes, our AI models (respective AUCs of 0.87, 0.94, 0.94) were comparable or better than the corresponding AUCs of 0.88, 0.70 and 0.79 for lung exclusion, patient rotation, and low lung volume reported in the prior study 4. Although such differences in performance might be related to the use of differences in the deployed neural networks, training and test datasets, complexity of CXR, or labeling of CXR as suboptimal and optimal. While Nousiainen et al. included presence of any rotational discrepancy, we opted for presence of substantial rotation due to its higher clinical impact on diagnostic interpretation and lower need for rejecting or repeating the CXR due to minor degree of patient rotation 7.
The AI models that we trained has several clinical implications. First, implementation of such algorithms at the point of care, acquisition devices can identify and prompt repeat acquisition of suboptimal CXR targeted towards correction of specific cause (such as correct inspiratory effort, patient rotation or inclusion of previously excluded lung apices). Repeat acquisition involves additional radiation dose to the patients but avoids delay in diagnostic interpretation and patient care which can be key in critical or unstable patients. Several radiography vendors are also working on such AI applications to improve quality and decrease reject rates and repeat acquisition 13.
The incidence of suboptimal CXR images reported in the literature varies considerably, and some studies report that to be as high as 50–96% 3,4. With over 5000 CXRs per week at our healthcare sites, this implies that a manual auditing process in daily workflow is impractical and will undercount the true incidence of suboptimal CXRs. Beyond the point of care devices, the AI models that we trained can help automate and expedite auditing of CXR for quality assurance and improvement purposes which is currently a manual process involving a technologist reviewing all radiographs which required repeat acquisition. An automated process using the trained AI models can help track such information in near time and provide targeted, large-scale feedback to the technologists and the department on specific suboptimal causes. Constant feedback can help initiate mitigating educational activities and help bring down the frequency of suboptimal CXR as well as the repeat rates. A decrease in repeat radiography rates will translate into both radiation dose and cost savings. Indeed, Poggenborg et al reported that feedbacks on image quality resulted in substantial improvement in CXR 3.
There are a few commercially available AI algorithms such as Annalise.ai and Qure.ai that can identify some causes of suboptimal CXR. However, these are not approved for clinical use by the United States Food and Drug Administration (FDA) at the time of preparation of our manuscript. From the diagnostic interpretation perspective, once implemented in the routine clinical workflow. AI models such as the one we trained have the potential to trigger and insert automated and editable report text pertaining to the presence and cause of suboptimal CXR. This can both alert the radiologist and save their time on describing or dictating such limitations.
There are limitations in our study. First, we used enriched, retrospective datasets of optimal and suboptimal CXR rather than continuous, prospective CXR. Given the sheer volume of CXR, a prospective trial will be impractical and require evaluation of thousands of additional CXR to obtain symmetric representation of all suboptimality causes. Second, we did not perform a pre-hoc or post-hoc analysis to determine the sample size. However, high performance of all AI models is reassuring that our study was adequately powered. Third, we did not stratify the suboptimality causes into those that could be avoided or mitigated with repeat CXR versus those that cannot be corrected. Examples of the latter suboptimality would be patients with complex anatomy (scoliosis and/or kyphosis resulting in rotation or oblique projection) or extremely sick or ventilated patients (with low lung volumes). Fourth, we did not assess the impact of our AI models in real-world chest radiography which was not possible since the models are not part of our clinical workflow or radiography equipment yet. Fifth, although we included CXR from five healthcare sites in our data, all sites were affiliated with one consortium and belonged to the Northeast United States. This limits generalizability of our models in different geographic practices or population demographics. Finally, we did not include uncertainty analysis in our study for CXR with different extent or uncertainty levels of suboptimality 14.
In summary, the assessed deep learning platform can enable physicians to train and test successful AI models without any background in data science or computer programming. The trained AI models can identify and classify different causes of suboptimal CXR. Implementation of such AI models to identify suboptimal CXR can help provide feedback on suboptimal exams resulting in continuous quality control thereby improving the quality of care.