We designed this systematic review according to the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) and relevant research guidance by Debray et al.(15, 16)
Literature search
We will systematically search PubMed and Embase from inception to September 16, 2021 to identify studies reporting the development, update, or external validation of an individualized patient prediction model using any ML methods to predict health care spending or changes in health care spending over any time period in any setting.
We will use the following search algorithm: ("machine learning" OR "statistical learning" OR "ensemble" OR "superlearner" OR "transfer learning" OR "classification and regression tree" OR "decision tree" OR "random forest" OR "naive bayes" OR "neural network*" OR "support vector machine" OR "gradient boosting machine" OR "K nearest neighbour" OR "clustering" OR "deep learning" OR "reinforced learning") AND ("high cost*" OR "medical cost*" OR "medical care cost*" OR "health care cost*" OR "healthcare cost*" OR "cost of care" OR "costs of care" OR "per capita cost*" OR "cost bloom" OR "patient spending*" OR "health care spending*" OR "healthcare spending*" OR "medical care spending*" OR "medical spending*" OR "high utilizer*" OR "high need*" OR "super utilizer*" OR "payment*" OR "expenditure*" OR "reimbursement*" OR “risk adjustment”). We will also perform a reference screening of all eligible articles to identify additional studies.
Eligibility criteria
Table 1 shows a detailed description of the Population, Intervention, Comparator, Outcome, Timing, and Setting (PICOTS) for this systematic review. To consider a study as eligible, we will follow the definition of a prediction modelling study as proposed by the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) statement.(17, 18) Accordingly, a study will be eligible if it reports on the development, update or external validation of a model/algorithm used for predicting an individual’s health care spending as a function of two or more covariates. We will include prediction models that were developed, updated, or validated using ML techniques in patients with any medical condition and in any care setting or time period. We will include models examining binary, continuous, or categorical outcomes relevant to health care costs. We will consider as eligible any observational study (e.g., prospective or retrospective cohort studies and case-control studies), but will not include any randomized or observational studies designed to evaluate the impact of ML-based prediction models in health care spending.
Table 1
Key items for framing aim, search strategy, and study inclusion and exclusion criteria following the PICOTS framework
Item
|
Definition
|
Population
|
Patients with documented costs of health care services in any setting
|
Intervention
|
Any prediction model designed to predict individual patient probabilities for incurring costs of health care services in any setting, or to predict probabilities for any changes in patient costs over time.
|
Comparator
|
Not applicable
|
Outcomes
|
Any cost-related outcome as reported by prediction models
|
Timing
|
Any prediction horizon as reported by prediction models
|
Setting
|
Any health care setting
|
We will exclude articles (a) describing ML-based prediction models using ecological data to predict aggregate-level health care spending (e.g., county-level, or country-level); (b) building ML-based models with a primary goal of causal inference, which aim to estimate the change in one’s healthcare costs if a covariate of interest (e.g. insurance) changed from one level (e.g. commercial insurance) to a different level (e.g. public insurance); (c) applying a traditional statistical methods, such as linear regression, logistic regression or Cox regression for the prediction purposes; (d) presenting a systematic review of prediction models; (e) describing prediction models using radiomics or speech parameters; (f) models using biomarkers that are not clinically validated (e.g. genetic polymorphisms), and (g) performing cost-effectiveness analysis without predicting individual-level health care spending. Additionally, we will exclude conference abstracts, because they do not present a detailed description of their methods and their results, which would hinder a thorough methodological assessment. We will also exclude methodological articles that present a novel ML approach for prediction modelling without aiming at building an ML-prediction model for health care spending.
Data extraction
To facilitate the data extraction process, we will construct a standardized form by following the CHARMS checklist, previously published research, and relevant recommendations.(15, 19–22) From each eligible article, we will extract the population characteristics, geographic location, sample size (and number of events for binary outcomes), study design, predicted outcome and its definition, prediction horizon, and measures of model performance (discrimination, calibration, classification, overall performance). We will also extract the ML methods used in the final prediction model, whether the study included development, internal validation and/or external validation of the model, and whether any model presentation was available in the eligible studies. We will specifically evaluate whether the authors reported only apparent performance of a prediction model or examined overfitting by using internal validation. Also, we will examine whether a shrinkage method was applied in eligible studies and which method was used. We will consider that the authors adjusted for optimism sufficiently if they re-evaluated the performance of a model in internal validation and performed shrinkage as well. We will additionally record the data source of predictors, whether there was any inclusion of routinely collected molecular predictors, and whether there were any criteria for manually including or excluding predictors from the final prediction model. Additionally, we will categorize external validation efforts into temporal and geographic validation.(23) For each eligible study, we also will examine whether the authors reported the presence of missing data on examined outcomes and/or predictors included in the prediction models; if so, we will record how missing data were treated. We will also extract information on how continuous predictors were handled and whether non-linear trends for continuous predictors were assessed.
Risk of bias and reproducibility assessment
We will appraise the presence of bias in the studies developing, updating or validating a prediction model by using Prediction model Risk Of Bias Assessment Tool (PROBAST), which is a risk of bias assessment tool designed for systematic reviews of diagnostic and prognostic prediction models.(24) It contains multiple questions categorized into four different domains: participants, predictors, outcome, and statistical analysis. Question responses are categorized as either “yes”, “probably yes”, “probably no”, “no”, or “no information”, depending on the characteristics of the study. If a domain contains at least one question signaled as “no” or “probably no”, it is considered high-risk. To be considered low-risk, a domain should contain all questions answered with “yes” or “probably yes”. Overall risk of bias is graded as low-risk when all domains are considered low-risk, and overall risk of bias is considered high-risk when at least one of the domains is considered high-risk.
Moreover, we will appraise the computational reproducibility of the eligible studies by following recently published reproducibility standards.(19, 20, 25) This assessment will be based on the availability of data, models, source codes and dependencies, and analysis plan. We will grade the reproducibility of eligible articles into three categories with varying degrees of rigor for computational reproducibility.