Data
The assessment of psychometric properties of the LBoT measure is based on 3 telephone surveys of Victorians with a managed TAC claim for injury in a transport accident: four waves of a Longitudinal Study survey 2012-2016 with 1,556 respondents in wave 1; and two repeated cross sectional surveys conducted on behalf of the TAC from October 2011-2017, the Client Outcome Survey (COS) with 5,238 respondents and the Client Experience Survey (CES) with 1,964 interviews from 604 unique respondents. While the Longitudinal and COS surveys focused on recovery outcomes, the CES survey focussed on perceptions of service and included respondents who had more enduring disabilities. The three surveys are described in detail in Electronic Supplementary Material 1.
The Life Back on Track (LBoT) Measure
Respondents were asked to rate whether they considered their lives to be back on track as:
“In other research, TAC clients often talk about trying to 'GET THEIR LIFE BACK ON TRACK' following a transport accident. This can mean different things to different people. Thinking about your own circumstances right now (today), how would you rate the extent to which you have been able to 'get your life back on track', on a scale from 1 to 10, where 1 means ‘not at all’, and 10 means ‘completely back on track’?” In the CES for respondents who had more enduring disabilities, the recall time was 2 weeks preceding the interview.
Analyses
Construct Validity
Construct validity is the degree to which the LBoT measure captures what it intends to measure – in this case general subjective wellbeing in recovery from a transport accident. A known-groups validation was conducted, based on the principle that certain specified groups of TAC clients are expected to score LBoT differently from others, and the LBoT measure should be sensitive to these differences.
The known-groups were identified based on a construct analysis of qualitative survey data (see Supplementary Material 2) and represented by the following indicators: (a) self-reported injury severity levels (a 5-point scale: Very Severe, Severe, Moderate, Slight and Very Slight); (b) depression subscale of the Depression Anxiety Stress Scales (DASS)-21 (Lovibond & Lovibond, 1995; Henry & Crawford, 2005) (which contains 7 items, each with a four-point severity scale: none of the time, some of the time, a lot of the time, most of the time); (c) pain; (d) financial ability to get by (a 4-point scale: with great difficulty, with some difficulty, fairy easily, very easily); (e) expected time to recovery (4 response levels: already recovered as much as possible, will be in the next few months or so, will be within a year, and will take longer than a year); (f) ability to cope with their injuries given its nature (a 5-point Likert scale: very poor, poor, moderate, good, and very good) (Lazarus, 1993; Hepp, et al., 2011), and (g) ability to bounce back from the accident (on a 10-point scale ranging from strongly disagree to strongly agree).
Pain was measured using the Numerical Rating Scale (NRS) in the TAC Longitudinal Study. This validated scale asks respondents to rate their level of pain on a scale of 0 to 10, where 0 is no pain at all and 10 worst possible pain (Sendbeck, et al., 2015; Thong, et al., 2018). The rating was then recoded as none (0), mild (1–2), moderate (3–5), strong (6–8), severe (9–10). In the COS, respondents were asked “the amount of bodily pain they had in the past 7 days” and had to choose between 6 options: 1 – none, 2 – very mild, 3 – mild, 4 – moderate, 5 – severe, and 6 – very severe.
The Kruskal-Wallis H test was used to test for statistically significant overall differences between groups, and Dunn’s test for differences in pairs between groups (presented in the Supplementary Material Table 3). A p-value <0.05 was taken as statistically significant in this and all other hypotheses tests in the paper.
Criterion Validity
Criterion validity, in this context, refers to the extent to which the LBoT measure correlates with an external standard measure. Criterion validity is commonly assessed through the investigation of the concurrent validity and predictive validity of the measurement. Concurrent validity involves comparing the LBoT measure to a standard measure of wellbeing in recovery at the same time point. There is no gold standard for wellbeing in recovery and based on the availability of data we focused on a closely related conept of HRQoL as measured by the EQ-5D-3L - the most widely used preference-based health-related quality of life instrument in the world (Richardson, et al., 2014). The EQ-5D-3L contains five dimensions with each dimension measured using one item (which includes mobility, usual activities, self-care, pain/discomfort, and anxiety/depression) as well as a single-item EQ-VAS (EuroQoL Group, 1990). Concurrent validity was measured using the Spearman correlation coefficients between LBoT scores and overall HRQoL scores from the EQ-5D-3L utility scores and the EQ-VAS (visual analogue scale). All five dimensions were assessed using three response levels (no problem, some problems, and extreme problems). The EQ-VAS lies on a scale of 0 (worst imaginable health state) to 100 (best imaginable health state). The EQ-5D-3L asked respondents about their health ‘today’ (i.e. at the day of the interview). The EQ-5D-3L was scored by using the original UK tariff (Dolan, 1997).
Predictive validity was assessed using LBoT scores in wave 1 of the Longitudinal Study in a logit model predicting return to the same job with the same duties and employer, conditional on being in employment prior to the accident and controlling for injury severity, age, gender, education, employment type and country of birth. Validity was assessed as the ability of the logit regression to classify post accident employment status correctly using the c statistic (area under the receiver operating characteristic curve). A value closer to 1 and further from 0.5 suggests greater discrimination and therefore stronger validity (Hosmer & Lemeshow, 2000).
Reliability
Reliability refers to the extent to which LBoT scores are affected by random error. We focus on whether LBoT is consistent across time. The Longitudinal Study was used to identify clients who were in a stable condition across two survey time periods (waves 1 and 2). As the time interval between the two waves was relatively long (i.e. ≥ 3 months), we constructed samples with TAC clients who were in a relatively stable condition between waves as indicated by individual and combinations of scores on measures of pain, financial ability to get by, DASS score, a single-item global health rating from the Short-Form 12 Health Survey (SF-12) (Ware, et al., 1996), main labour market activity, and vocational status.
Patients were defined as being in a stable condition if they gave the same pain rating, the same DASS group (Normal, Mild, Moderate, Severe, Extremely Severe), same SF-12 rating and same vocational status in waves 1 and 2. Reliability was measured by the intraclass correlation coefficient (ICC). An ICC of 0.75–0.90 is generally classified as good while an ICC larger than 0.9 is classified as high or excellent reliability (Koo & Li, 2016).
Sensitivity and Responsiveness
Sensitivity of a measure is the ability to detect differences between groups while responsiveness is the ability to detect changes. Sensitivity was evaluated as the extent of the difference in response from those who reported that they had recovered to those who said they had not. We chose pain levels and employment status to measure responsiveness, that is, respondents who were considered to be in a stable condition across two waves (same pain rating and same vocational status) were excluded. To adjust for potential bias due to non-response, scores were weighted by the inverse probability of non-response from wave 1 to each of the subsequent waves. Two sets of inverse probability weights were used, one based on the average non-response to that time, and the other based on the probability of non-response for all three periods after wave 1 (see Supplementary Table 3.4). The probability of non-response was estimated with a logit model with age, gender, education, area of residence (rural vs metro), longitudinal survey cohort, involvement in accident (road user), injury type, injury severity, recovery expectation, language spoken at home and country of birth as covariates.
To measure both sensitivity and responsiveness, the effect size and the standard response mean was used. A minimum effect size of 0.41 is recommended with an effect size of 1.15 considered as moderate and 2.70 considered as strong (Ferguson, 2009). A standard response mean of 0.5 is generally considered as indicating moderate responsiveness with a value of 0.8 and above indicating strong responsiveness (Ferguson, 2009).
The existence of ceiling and/or floor effect can threaten responsiveness and cause measurement inaccuracy. The potential ceiling and floor effects of the LBoT measure were examined based on the full sample as well as by gender and age groups. Ceiling or floor effects are taken as evident if ≥ 15% respondents scored the best or the worst of a measure (McHorney & Tarlov, 1995).