2.1 Samples
Data were collected online by an independent polling company (Ipsos) in April and May 2015. Quota sampling was employed to obtain samples representative of the general population with respect to the marginal distributions of sex, age, occupation, region, and population density of the UK (n=1,509), France (n=1,501), and Germany (n=1,502). Sample weights were calculated using the random iterative method (RIM) to match the latest data available in each country (census 2011 for the UK and Germany, census 2012 for France).
We only briefly summarize the most important differences between the three samples here. The interested reader is referred to Table A.1 (Appendix) for a comprehensive overview of the marginal distributions of sex, age, educational level, occupational status, and income in the three samples. Participants in the German sample (mean age = 50.0 years old) were slightly older than participants in the French (48.4 years old) and UK samples (47.8 years old). Participants in the German sample were more likely to have a low educational background (23.4%) than participants in the French (7.6%) and UK samples (8.1%). Participants in the French sample were more likely to be unemployed/inactive (48.4%) than participants in the German (41.5%) and UK samples (39.4%).
As participants could only proceed through the survey by answering each item, there were no missing data.
2.2 Measures
PROMIS domains and item banks
We used the PROMIS-29 v2.0 Profile to assess seven core domains of health: physical function, fatigue, pain, anxiety, depression, sleep disturbance, and the ability to participate in social roles and activities (referred to as participation in the remainder of this article)(27). The visual analogue scale (VAS) item expressing pain intensity on a scale ranging from 0 to 10 was not used in this study. Each domain is assessed with four items, and the domain scores are expressed as T-scores (M = 50 & SD = 10) with the US general population as a reference. Note that due to the invariance property of IRT, T-scores obtained from the PROMIS-29 are on the same metric as the scores Revicki used in his analysis, though these scores were generated using different items. For desirable constructs (e.g., physical function), higher T-scores indicate better health, whereas for undesirable domains (e.g., depression), higher T-scores indicate poorer health states.
The psychometric properties of the PROMIS-29 profile, including evidence of construct and criterion validity, have been reported elsewhere(28–31). An earlier analysis of the data used in this study revealed that scores on the seven health domains of the PROMIS-29 are measurement invariant across the UK, France, and Germany except for one item(32). Hence, the predictor scores of self-reported health that we used in this study are invariant with respect to nationality.
EQ-5D-5L
The EQ-5D-5L is a standardized, patient-reported, and preference-based instrument to measure generic health[3-8]. Five health dimensions are involved: mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. Each dimension of the EQ-5D-5L has five levels (i.e., response options): “No problems” (or 1), “Slight problems” (2), “Moderate problems” (3), “Severe problems” (4), and “Extreme problems” (5). These define 55 or 3125 different health states. The value assigned to each of these health states is determined by so-called value sets, developed by EuroQoL using time trade-off (TTO) and visual analogue scale (VAS) as preference elicitation methods(4,8). The maximum value for a health state is 1.00 or “full health”. The minimum value depends on the value set applied and can be negative, then considered “worse than dead”. For example, a pattern of 11111 is translated to a health state value of 1, while the pattern 54545 may correspond to -0.2. Note that persons in different countries value health states differently, so the EQ-5D index value is country-specific(8,9,11,12,25).
EQ-5D index values can be derived from EQ-5D-5L using either the crosswalk to the 3L value set or using the new 5L value sets(8). Crosswalks to the 3L value sets are available for ten countries, including the US, the UK, France, and Germany(4,8). A 5L value set is available for Germany(12). There is also one for England, which is not equivalent to the UK, and none yet for France(9,10). We therefore used the 3L crosswalk set for all three samples, thereby ensuring comparability among our samples and to Revicki’s model, which used the 3L value set for the US(8,24,25).
2.3 Statistical analysis
2.3.1 Relationships among individual health domains and health utility across the UK, France, and Germany
To obtain a first impression of the form of the relationships among individual health domains and HU and to judge whether the relationships are stable across the three countries under investigation, we plotted the seven domain scores against health utility in the UK, France, and Germany.
2.3.1 Optimal models for predicting health utility in the three countries
We applied stepwise regression with backward selection to find the best models to predict the EQ-5D score for the UK, France, and Germany, starting with full models that incorporated linear, quadratic, and cubic effects for the same seven PROMIS domains as Revicki. Because sociodemographic factors such as age and sex are known to be useful in predicting HU, they were also entered as possible predictors(13).
The Bayesian information criterion (BIC) was used to steer the inclusion and exclusion of predictors in the stepwise regression analyses(33). To minimize the risk of significance by chance, for each model estimated, we used 10-fold cross-validation(34). With this in-sample cross-validation technique, the initial dataset is randomly split into 10 subsamples of approximately equal size. One of these subsamples is kept for validation, while the other nine subsamples are used for parameter estimation. This process is repeated ten times, and the results are averaged across repetitions.
The root mean square error (RMSE) and the mean absolute error (MAE) were used as measures of the prediction precision. Note that we deliberately chose to use different criteria than those used by Revicki because measures of precision and bias, such as the RMSE and the MAE, are preferred over either R2-based or information-based (AIC and BIC) criteria(35). In addition, we determined the width between the 95% empirical limits of agreement and compared them to the 95% theoretical limits of agreement (i.e., ± 1.96 * SD(residuals)). To check the prediction performance along the HU continuum, especially for low levels of HU, Bland-Altman plots were used. We used R version 3.4.1, IBM SPSS Statistics version 23, and Microsoft Excel version 15 to run the analyses.
2.3.2 Impact of misspecified mapping functions on the prediction performance
To the best of our knowledge, as of February 2020, the mapping function reported by Revicki was the only one available for predicting EQ-5D scores from the PROMIS-29(24). Hence, we were interested in quantifying the detrimental effect of applying this foreign mapping function to the data collected in Europe. Note that application of Revicki’s model to the data collected in the UK, France and Germany (i) disregards the country specificity of the EQ-5D, (ii) does not utilize the potential predictive value of the PROMIS-29 health domains not used by Revicki, (iii) does not take higher-order effects into account, and in combination with the foregoing, (iii) disregards country dependency of the form of relationships (i.e., the specific values of the regression coefficients used).
Because we were also interested in which factor is mainly responsible for the differences in prediction performance, we moved stepwise from Revicki’s model to our models as follows: First, we used the five health domains of Revicki’s model, but with regression coefficients optimized towards the data collected in each country separately. Second, we investigated the incremental value of adding either sleep disturbance, participation, or both to the prediction equation. Third, we allowed for incorporation of quadratic and/or cubic effects (M3).