In the present study, an attention-based MIL model was developed to identify spirometry-defined COPD patients using a large and highly heterogeneous collection of CT scans from multiple scanner manufacturers, slice thickness, and institutes in China. It is also a ‘real-world’ dataset containing participants recruited from both outpatient, inpatient and physical examination scenario. Implemented with the novel DL networks, our model achieved an AUC of 0.934 for the test group. This DL-based approach also revealed satisfactory robustness across distinct scanner models, and slice thickness employed to reconstruct CT scans, with AUC of 0.8 and above. The generalizability of the model was externally validated using a separate dataset collected from a large cohort comprised of LDCT scans (NLST), with the AUC of 0.866 (95%CI: 0.805, 0.928). A multi-channel 3D ResNet50 network was further trained to predict GOLD stages for confirmed COPD patients, achieving an accuracy above 0.8 for every stage. To our knowledge, the proposed model offers the best performance for detecting COPD and predicting GOLD stage to date. It is also the first attempt to apply DL-based approaches to COPD case-finding among a natural population in Chinese.
Although the heterogeneous pathological nature of COPD has been understood for decades, patients are currently diagnosed primarily by spirometry, a history of exposure (smoking or other environmental factors), and respiratory symptoms at the time of presentation. Over the last few years, it has become evident that patients without spirometry abnormalities who experience COPD-like respiratory symptoms or acute exacerbation events (with significant pulmonary structural abnormalities) can often be found among these populations (25, 26). Carpo et al. presented an analysis of baseline phenotyping and a 5-year longitudinal progression for the COPDGene study, demonstrating that spirometry criteria alone were insufficient to characterize COPD participants among current and former heavy smokers (27). Results also indicated quantitative CT metrics outperformed spirometry when predicting disease progression and mortality. Thus, CT scans could be used to improve COPD case-finding and evaluation beyond spirometry alone.
The development of artificial intelligence for large-scale data processing has increasingly led to the use of ML-based techniques in establishing a direct link between diagnostic images and disease categorization (14, 36). This approach overcomes the limitations of conventional manual CT image inspection, such as inter/intra-observer variability and heavy workloads. It also bypasses the requirement of prior knowledge of radiographic features, which is required for quantitative CT analysis. Previous studies by Gonzàlez and Lisa et al. (23, 24) have explored the application of ML-based methods to CT image analysis for COPD detection and evaluation. The analysis process used in this study differed in terms of patient selection and disease spectrum distribution. Most notably, we adopted a novel attention-based MIL strategy, thus increasing the proportion of lesion character information and achieving high robustness with a relatively limited training set. A multi-channel 3D ResNet50 network allowed the model to extract spatial information between slices and identify abnormal images exhibiting relatively small regions of interest (ROIs), further improving staging performance (see Supplemental Appendix 2–4).
This study offers several clinical benefits. The deep learning model was trained using subjects recruited from both respiratory clinics and health management centers, thus including participants with both normal spirometry and CT results. This scenario is representative of diverse clinical situations in which COPD could be detected among the general population. Previous attempts using DL-algorithms for COPD detection have mostly been trained using cohorts enrolling former and current smokers, which may not truly reflect case-finding in real world settings. While researchers from the COPDgene and ECLIPSE cohorts have reported desired COPD imaging results, it is crucial to further expand this expertise into a Chinese population, as a very small proportion of subjects from these studies were ethnically Chinese (11). Furthermore, the increased use of LDCT for pulmonary nodule assessment and lung cancer screenings has created an opportunity to apply the present model to COPD detection, with subsequent confirmation using spirometry. This is particularly relevant since our model was generalized to LDCT in the NLST subset.
The present study does include some limitations. First, spirometry was used to diagnose COPD instead of symptoms or radiographs, which may prevent our algorithm from being generalized to the detection of COPD in patients without airflow limitations, such as paraseptal emphysema. This was a result of the relatively objective criteria used for enrollment. However, focusing on this participant group remains practical as they are associated with increased morbidity and mortality rates and could benefit from early detection and proper management (2). Second, the size of our cohort is relatively small compared to other large cohorts, such as COPDgene and ECLIPSE. Though we have adopted strategies to improve the efficiency of detection and staging, performance was unsatisfactory in particular subgroups partly due to the insufficient patients enrolled. We are currently recruiting more participants and hope to optimize our cohort in the future. Third, the ability of DL to detect and stage COPD without specification of clinical or radiographic characteristics could be both a strength and a weakness. The ‘black box’ nature of DL may severely limit its utility in clinical situations, as it does not provide sufficient information to clinicians concerning its decision making process. Future work is urgently needed to elucidate the decision path.