Our study demonstrated evidence of the performance of EQ-5D-5L and SF-6D index scores in measuring health utility in people living with HIV/AIDS, showing a moderate correlation between the two measurements. Both have shown discriminative capacity and validity in measuring the health status of PLWHIV. However, some considerable overlaps existed in the two measurements, and there were significant differences in their performance, which were in accordance with the results reported in previous studies about the differences between the EQ-5D and SF-6D in the general population and several patient groups, such as rural residents in China and patients with Pompe disease, diabetes, mental health, chronic low back pain, stroke and breast cancer[11,17,25,26,28−30].
In our study, for the mean and median EQ-5D-5L and SF-6D scores, when assessing the same sample of people living with HIV/AIDS, the EQ-5D-5L values exceeded the SF-6D values regardless of whether the whole sample or any of the subgroups was being considered, with a mean difference of 0.124 and a median difference of 0.180. This result was consistent with some studies that proved the difference in the value [11, 17,25,26,28−30]. The ICC for the whole sample was 0.59, which meant a moderate correlation. We could consider this an acceptable but not very good level of agreement for the two measurements, especially at the more serious and mild ends of the scales. The two kinds of plots also revealed the details of the marked differences between the two measures. The lack of agreement highlighted the importance of considering the reasons behind the differences to assess the suitability of the instruments within a population of PLWHIV, which is important to health technology assessment and policy making. Some previous studies have explored the reasons for the differences in the EQ-5D and SF-6D to measure health utility. We mainly discussed on the following three points. First, valuation methods were considered to explain the difference. The EQ-5D-5L is based on the time-trade-off (TTO) method, whereas the SF-6D made use of the standard gamble (SG) technique [26]. Previous studies have shown that SG technique produced higher values than the TTO method [11, 17, 29], and crossover occurred in one study in which TTO values for milder states were higher than the SG values[8, 25, 30]. Our study was in accordance with this result. HIV/AIDS has transformed into a chronic disease. With scaled ART, PLWHIV can maintain good physical health. We considered that it had milder states than some diseases with disability.
Second, in our study, both the EQ-5D-5L and SF-6D performed better in monitoring changes in social and psychology aspects than physical aspects for people living with HIV/AIDS; among these, the SF-6D appeared to detect more changes and had larger effect sizes than the EQ-5D-5L. This result is somewhat surprising in that the richer descriptive system of the SF-6D might make it easier to identify changes in psychological aspects, which are often smaller and more unnoticeable than physical aspects. Based on the ROC curves and AUCs, both measures revealed good ability in discriminating health status, and the SF-6D seemed more sensitive in discriminating health status. One previous study demonstrated that the difference in SE was inherently driven by the smaller SD of the SF-6D, which was a consequence of the narrower range of the index scores [26]. We considered that the reason lies in the discrepancies in the descriptive systems’ contents. In a given sample population, all participants should complete the two measures simultaneously, whereby their health status would be described by the EQ-5D-5L, which includes the five areas of mobility, self-care, usual activities, pain/discomfort and anxiety/depression, while the areas of physical functioning, role limitations, bodily pain, vitality, social functioning and mental health are obtained from the SF-6D. Different descriptive contents defined the application and appropriateness. The EQ-5D-5L emphasizes the physical aspect of health more, while the SF-6D emphasizes mental health and social adaptation more. With combined antiretroviral therapy greatly improving the survival of people living with HIV/AIDS, HIV/AIDS has transformed from being a terminal illness to being a chronic disease. A rising challenge for this population is full health, which requires more consideration be given to mental health and family and society rehabilitation. Therefore, the results implied that researchers have to choose between the two instruments based on the appropriateness of the descriptive system for the severity of potential problems the patient group may encounter. From this perspective, we preferred to apply the SF-6D to measure health utility in PLWHIV during the cART.
Third, we also considered that the various scoring algorithms contributed partly to the discrepancy of the two measures[19, 31]. The validation algorithms for the EQ-5D-5L and SF-6D are presented in Tables 1 and 2 in the Methods section. There were two different kinds of algorithms that had an effect on the index score generation. For the same health status, the different scoring algorithms assigned different index scores; the worst health status measured by the EQ-5D-5L was − 0.391 (worse than death), while the SF-6D index score was 0.331. These variations resulted in different descriptive systems and different theories of scoring systems from which to choose. One previous study proved that the interpretation of the constant terms and the interaction items were the two key factors [8]. The SF-6D interpreted the constant as an expected value that was equal to one, whereas the EQ-5D interpreted the difference between the constant and one as ‘any move away from full health’. For the interaction effects, the SF-6D had a simple dummy named ‘MOST’, which meant that the value 1 subtracted MOST if any dimension was at the ‘most severe’ level. The EQ-5D had a dummy named N3, which was similar to ‘MOST’. MOST had a coefficient of -0.085, while N3 had a coefficient of 0 in the Chinese validation set for the EQ-5D-5L.
In addition, the preferences of the source population may also be a possible reason for the difference. The EQ-5D-5L values reflected Chinese patient preferences, while the SF-6D values reflected UK patient preferences.
Based on these factors, users needed to pay more attention to the characteristics of the target population. We can summarize some principles for making selections. First, for the general population or a mild patient population with generally good health, the EQ-5D-5L and SF-6D were likely to perform similarly, but for a sicker population, the performance of the two measures seemed different. Second, for a patient population with greatly impacted mental health and mild or minimally impacted physical health, we suggested selection of the SF-6D; such patients could include those with mental health problems, HIV/AIDS, or early stage breast cancer and patients in the controlled disease period. Otherwise, for a patient population with greatly impacted physical health, we suggested selection of the EQ-5D-5L; such populations could include patients with disease loss capacity and patients in the advanced disease period. Third, we should also consider the availability of the scoring algorithm, the origin of the population used for the validation set, the extent of change in health status and resource allocation, when using cost-utility analysis to inform local decisions.
There were some limitations in our study. First, the results are limited to our sample population of people living with HIV/AIDS who had good ART, and thus, these results may not be generalizable to all people living with HIV/AIDS, including patients with failed ART. Second, we used the SF-12 as the gold standard to establish the comparisons; however, the results of the SF-6D are derived from the SF-12, which could generate bias for the results to some extent. Third, we constructed a cross-sectional study and could not capture the responsiveness of the two measures. Fourth, depressive and anxiety symptoms were measured based on self-reports, which could over- or underestimate these symptoms.