We demonstrated the feasibility of predicting healthcare costs and classifying top-50% spenders by using deep learning models based on chest radiographs (CXR) that are widely available in clinics and hospitals. The models were developed to identify patients who are likely to incur high healthcare expenditure and predict their subsequent amount of healthcare spending within 1, 3, and 5 years. Unlike physicians who are trained to identify only a handful of imaging biomarkers known to medical literature, our deep learning algorithm is able to take into account thousands of imaging features of weak to moderate correlations with healthcare spending as presented in the training set. When a CXR is evaluated by the deep learning algorithm, its pixels are aggregated, transformed, and passed through many layers of filters with each layer extracting different lines, angles, patterns, and associations. As those extracted features are then passed upstream to higher-level filters, they are compared to the thousands of CXR that the algorithm was trained on. All these numbers finally converge to the estimated cost.21,22 Considering that CXR tend to be standardized, deep learning algorithms are trained to be extremely sensitive to details that clinical radiologists may not typically recognize.
From a data scientist perspective, the ability of deep learning algorithms to predict healthcare expenditure from CXR is a testament to the vast amounts of information hidden in imaging data that can be leveraged with data science. The addition of other demographic and clinical variables to the imaging data resulted in minimal improvements to the model, despite the baseline models showing that sex, age, and zip code median income are individually associated with healthcare expenditures. This again affirms the presence of rich information within imaging data and the ability of deep learning models to extract them. It is important to note that deep learning algorithms are, at large, approximations based on a large volume of data.23 The causality for each prediction cannot be definitely deduced.24 As of any machine learning predictions, it cannot be used as definitive proof of a patient’s health or future health expenditure. In addition, there remains ethical concerns as well if the algorithm is used to deny coverage by insurance companies. Nevertheless, the deep learning algorithm can be potentially used by government or insurance companies to identify high-risk individuals and take appropriate actions to secure their health and reduce cost. Such predictions can provide an important starting point in identifying high risk patients to achieve reduction in their healthcare spending and encouraging lifestyle modifications and more intensive medical management to achieve better medical and financial outcomes.
From a clinical perspective, the deep learning algorithm takes into account a combination of demographic factors (age, sex), baseline health factors (weight, bone health), as well as clinical diseases (e.g., enlarged heart, osteophytes, etc) that are inferred from CXR. For example, having hemodialysis access or enlarged heart from congestive heart failure could be strong indicators of higher healthcare spending predicted by the algorithm. Having replaced hardware or numerous osteophytes could be indicative of older age, which in itself is a predictor of higher healthcare spending as well. While the algorithm does not explicitly give these medical diagnoses when it arrives at its final spending prediction, the algorithm is able to incorporate numerous weak to moderately associated cost predictors in the CXR and assemble them into the final cost predictions. Our algorithm could be used in outpatient settings to estimate approximate future healthcare costs such that patients, doctors, and insurance companies would have a reliable indicator to consider when making patient treatment and financial decisions. The identified high-risk patients could be subject to more intensive preventive medical interventions and close follow-up visits to modify patient outcomes. The algorithm could also be used to identify patients with CXR that appear normal according to current clinical radiological standards but are still at risk for high medical costs. Similar to most deep learning algorithms, the application of ours can potentially be automatic, fast, scalable, and relatively low cost when compared to other services in the healthcare system.
Several limitations to the study should be noted, primarily related to selection bias inherent in this particular dataset. First, the performance differences between 1, 3, and 5-year models were observed, which can be attributed to both drastic differences in sample size as well as inherent loss of predictive information about the future. The 3-year expenditure model performed slightly worse than the 1-year expenditure model using 56.7% of the sample size. The 5-year model used only 8.1% of the sample size but still achieved reasonably accurate classification and regression results. We believe that with more data the 5-year expenditure model would show even more promise. Second, the development and testing of the model involved data originating from a single hospital system and most lived in the San Francisco Bay Area in the United States healthcare system. The model will likely not generalize to the non-American healthcare system due to the particular structure of healthcare expenses. However, a similar approach can be undertaken to build a new model with any local dataset. Third, missing data (mainly due to missing financial information) constituted 37% of the originally extracted dataset and they were missing not at random. For example, homeless patients may not have had a zip code available. Fourth, patient death information was not available and could have had a variable impact on healthcare costs. Fifth, the dataset did not include inpatient cases and portable CXR.