Population and Data Sources
This was a retrospective cohort study based on anonymized records of the entire population of Catalonia, a North-East region in Spain with approximately 7.5 million people. The regional Health Department of Catalonia provides universal healthcare to the Catalan population through a network of 64 general hospitals, 27 psychiatry hospitals, 375 primary care centres, 91 skilled nursing facilities for intermediate care, and 130 ambulatory mental health facilities. Data were retrieved from the CHSS, which stores clinical and resource utilization information from various registries, including hospitalization, primary care visits, emergency department visits, skilled nursing facilities, palliative care, and mental health services, information on pharmacy dispensation, out-patient visits to specialists, home hospitalization, medical transportation (urgent and non-urgent), ambulatory rehabilitation, respiratory therapies, and dialysis. The source registries have an automated data validation system to identify inconsistencies and undergo external audits periodically to ensure provider payment accuracy. All data used for the analysis were recorded in the source registries during 2017 and were retrieved in July 2019. The study protocol was approved by the Independent Ethics Committee of the IDIAP Jordi Gol (Spain).
The performance of the multimorbidity tools was assessed on 10 subpopulations: all adults (i.e., aged >17 years), people aged >64 years, people aged >64 years and institutionalized in a nursing home for long-term care, and people with specific diagnose codes of the international classification of diseases (9th and 10th versions, clinical modification; ICD-9-CM and ICD-10-CM; all converted to ICD-9-CM), including ischemic heart disease, cirrhosis, dementia, diabetes mellitus, heart failure, chronic kidney disease, and chronic obstructive pulmonary disease.
Tools for Multimorbidity Assessment
The analysis included five quantitative measurements of multimorbidity (Figure 1): the Charlson index score [11], the count of chronic diseases according to three different proposals (i.e., the QOF [8], HCUP [13], and Karolinska institute [12]), and the multimorbidity index score of the GMA tool. Briefly, the Charlson index was designed as a tool for predicting life expectancy from a list of 17 comorbidities weighted according to their 1-year risk of death. The QOF was intended as a tool for incentivizing care of patients with chronic diseases and defines multimorbidity based on the presence of more than one diagnostic from a list of 17 important chronic conditions. The HCUP measures multimorbidity by counting the number of chronic diseases among all conditions codified in the chronic condition indicator (CCI) and grouped with the clinical classification software (CCS) [25]. The HCUP defines a chronic condition based on two criteria: (a) the given disease place limitations on self-care, independent living, and social interactions, and (b) result in the need for ongoing intervention with medical products, services, and special equipment [26]. The Karolinska proposal is a clinically-driven measure of multimorbidity based on the count of chronic diseases from a list of 918 ICD-10 codes (one- to -four-digit level) [12]. In the Karolinska proposal, chronic diseases are selected based on the following criteria, applicable to older populations: prolonged duration and either (a) left residual disability or worsening quality of life or (b) required a long period of care, treatment, or rehabilitation. The GMA tool considers all chronic diagnoses (identified using the CCI of the HCUP) present at a given time and acute diagnoses reported during the study period. The GMA index score is computed by adding the weights of each diagnosis group (defined using the CCS of the HCUP system). Supplementary file 1 provides further details on the GMA algorithm.
Study outcomes
We investigated the contribution of each multimorbidity measure to explaining eight outcomes associated with chronic patients: all-cause death, hospitalization, non-scheduled hospitalization, number of primary care visits (including general practitioner, nurse, and social worker, either at the primary care facility, home or via teleconsultation), visits to the emergency room (ER), medication use, admission to a skilled nursing facility for intermediate care, and high expenditure, defined as the 95th percentile of total expenditure in our area [27]. All outcomes were assessed in a 1-year time frame from January 1 to December 31, 2017.
Statistical Analysis
The characteristics of the study population were described as absolute and relative frequencies and rates across all investigated outcomes. Continuous outcomes were transformed to binary using the 95th percentile of the given variable among the target population as cut-off: admission to a skilled nursing facility for intermediate care (i.e., one or more admissions), admission to hospital and the emergency room (i.e., one or more admissions), visits to primary care services (i.e., more than 21 visits), medication use (i.e., dispensing of more than 13 drugs belonging to a different 5-digit group of the anatomic-therapeutic classification), expenditure (.i.e., healthcare cost above € 4315.1). To assess the performance multimorbidity measures, we built six logistic regression models for each of the investigated outcomes adjusted by age, gender, and socioeconomic status: a baseline model (i.e., age, gender, and socioeconomic status as independent variables, and all first-order interactions between them), and five models that added each multimorbidity measure to the baseline model. The socioeconomic status was stratified into four categories of pharmaceutical co-payment: very low (recipient of social rescue aids), low (annual income < € 18 000), moderate (annual income € 18 000 to € 100 000), and high (annual income > € 100 000).
The performance of each model was assessed using four different statistics. For the primary analysis, we chose the area under the curve of the receiving operating characteristics (AUCROC) curve, which assesses the discrimination capacity of the model as the threshold varies and ranges from 0.5 (low discrimination capacity) to 1 (high discrimination capacity). Additionally, we conducted secondary analyses using the Akaike information criterion (AIC), pseudo-R squared (pR2), and the area under the precision-recall (AUC-PR). The AIC estimates the in-sample prediction error by taking into account the trade-off between the goodness of fit (overfitting) and the model simplicity (underfitting); the range of values that may take AIC depend on the study sample, with lower and higher values indicating better and poorer performance, respectively[28]. The pR2 assesses the goodness-of-fit and the variability explained and ranges from 0 (poor fitness of the model) to 100 (very good fitness of the model). The AUC-PR curve shows the trade-off between precision (i.e., low false-positive rate) and recall (i.e., low false-negative rate) and returns a value between 0 and 1, less biased than the ROC curve towards overestimating in outcomes with low frequency [29]. All analyses were performed using the R statistical package (version 3.6.2) [30].