Study design and participants
This was a bi-centre prospective cross-sectional translation and validation study with repeated measures, consisting of Phase 1 and Phase 2. In Phase 1, the forward-backward translation, establishment of face and content validity and cross-cultural adaption of the pre-final USE-MS-G was performed. In Phase 2, examination of construct validity and reliability of the USE-MS-G was done. The study was conducted at the outpatient MS-Clinic of the Clinical Department of Neurology, Medical University of Innsbruck, Austria and Department of Neurology, Clinic for Rehabilitation Münster, Austria from 12.2.2019 to 15.06.2020. Ethical approval was received from the Ethics Committee of the Medical University of Innsbruck (reference number EK1260/2018; 13.12.2018).
Information brochures and invitations for study participation were displayed in the MS-Clinic, the Clinic for Rehabilitation, the Austrian MS Society patient magazine and on their website; they were also forwarded to MS support groups. Upon agreement, severely disabled PwMS (Expanded Disability Status Scale (EDSS)(8) ≥ 8) were visited at home to facilitate their participation. Additionally, during their regular visits, PwMS were notified about the study by Clinic staff. All procedures followed the tenets of the Declaration of Helsinki and written informed consent was obtained from all participants. Research data are available on reasonable request ([email protected]).
A random cross-sectional cohort of patients with clinically definite MS according to the McDonald’s criteria (9) version valid at the time of diagnosis and with any MS phenotype was recruited into this study. PwMS of any ethnicity with very good German language skills, aged ≥ 18 years with different levels of functioning were included (EDSS scores from 0 (no disability) to 9.0 (severe disability); see (10) for a detailed study protocol.
Exclusion criteria were comorbidities potentially affecting subjective self-efficacy ratings (e.g., malignant diseases, other neurological or psychiatric disorders), a relapse of MS within 2 months prior to the study or any change in medication within 4 weeks of the study commencement. A relapse between test and retest required the exclusion of the participant.
The Austrian dataset was also pooled with a dataset from the UK development sample to test for invariance by language.
Outcome measures
Demographic (gender, age) and disease specific data (disease duration, MS phenotype, disease-modifying treatment) were retrieved from patients’ files. The current EDSS was assessed by neurologists. Questionnaire data were collected twice within a 14–21-day period (test, retest).
The original USE-MS has been shown to be reliable and valid for assessing self-efficacy in PwMS (5). Scoring is achieved by summing up all 12 items while items 5, 7–9 and 11 are reversed scored. The USE-MS involves a 4-point Likert scale (0 = strongly disagree to 3 = strongly agree). A higher summary score signifies stronger self-efficacy beliefs in people.
Validated questionnaires used to evaluate external construct validity of the USE-MS-G were recommended by governmental or patient organisations (11, 12). These included the German language versions of the General Self-Efficacy Scale (GSE) (13), Resilience Scale (RS-13) (14), Multiple Sclerosis International Quality of Life (MusiQoL) (15), Hospital Anxiety Depression Scale (HADS) (16, 17) and Neurological Fatigue Index Multiple Sclerosis (NFI-MS) (18). Scoring and psychometric properties of these scales are described in detail in the study protocol (10).
Sample size
Phase 1
Patients were recruited until saturation was achieved, indicating that no further information could be obtained from additional interviews.
Phase 2
For Rasch analysis, a sample size of 243 participants has been shown to provide accurate estimates of item and person locations irrespective of the scale targeting (19). Moreover, with polytomous items, ≥ 10 observations per category are recommended (20). It is further relevant to collect a wide range of responses across the latent trait under consideration, i.e., self-efficacy (19).
Translation, face and content validity and cultural adaption
In Phase 1, following guidelines for the cross-cultural adaptation of patient-reported outcomes (21, 22) and its enhanced version from the University of Leeds, UK, a forward-backward translation process was conducted by 6 bilingual translators, 3 native in German, 3 in English. This comprised a synthesis of translations and expert committee consensus. Pretesting (Test 1, T1) and face-to-face cognitive interviews regarding the questionnaire wording were carried out in male and female PwMS across the disability range. After 30 recorded interviews, saturation was achieved. Cross-cultural equivalence between the USE-MS and USE-MS-G was accomplished in the semantic, idiomatic, experiential and conceptual areas (21, 22). Qualitative content analysis of the verbatim interview transcriptions was performed (described in detail elsewhere (10)). During all stages of the iterative adaption process of the USE-MS-G, consensus was reached with the original scale developers (5).
Statistical analyses
External validity and test-retest reliability
Correlational analyses were performed between the USE-MS-G and other measures to determine convergent and discriminant construct validity. We hypothesised moderate to high positive correlations of the USE-MS-G with the GSE, RS-13 and MusiQol and moderate to high negative correlations with the HADS and NFI-MS. Spearman’s Rank correlation coefficients of 0.3–0.49 were considered low, 0.5–0.69 moderate and ≥ 0.7 strong (23); they were calculated with their 95% confidence intervals (CI) and p-values corrected for multiple comparisons using a Bonferroni correction. Descriptive statistics and external validity estimates were performed IBM SPSS software (IBM SPSS Statistics; Version 26.0. Armonk, NY: IBM Corp.) or GraphPad Prism Version 8 (GraphPad Software, La Jolla, CA). Statistical significance was defined as two-tailed p-value < 0.05.
Test–retest reliability was determined using Lin’s concordance correlation coefficient (24) (rc) between Test 2 (T2) and Test 3 (T3). The rc (0–1; 95% CI) was used to estimate the amount of agreement between the test and retest USE-MS-G data. The Pearson correlation coefficient was calculated as a measure of precision and a Bias correction factor, Cb as a measure of accuracy (24). MedCalc software (https://www.medcalc.org/) was used to determine the rc.
Internal validity: Rasch analysis
Rasch analysis uses the mathematical Rasch model to assess whether a summary score for a scale can be calculated with confidence (7). Internal construct validity of the USE-MS-G was determined by examining the deviations from model expectations, i.e. the way in which persons are expected to interact with test items to produce linear measurement (7). The model expects that the probability of a person providing a certain answer to an item is a logistic function of the difference between the person ‘ability’ (perceived self-efficacy) and the item ‘difficulty’. This is checked visually by inspection of item characteristic curves and numerically by the analysis of variance (ANOVA) fit statistics (uniform DIF; non-uniform DIF (25)). The USE-MS-G contains 4 response categories and hence, the polytomous Rasch model was chosen for the current study (26).
Using different chi square (χ2) fit statistics, USE-MS-G data were tested against the model expectations of unidimensionality. That is, the ‘ability’ and ‘difficulty’ are required to relate to the same construct of self-efficacy (described in detail elsewhere (26)). Perfect values for the different fit statistics and unidimensionality are provided in Table 3. Using a residual item correlation matrix between all items the expectation of local independence was examined. Item residuals represent the difference between an item’s expected and observed values, divided by its standard deviation for standardisation. Residual correlations of + 0.2 above the mean correlation of the total matrix indicate local dependence (27). This denotes a confounding factor inducing an association between items, or multidimensionality (28, 29). In the presence of item-dependency, two “super-items” can be created from alternative items and compared with each other running a robust conditional chi-square test of fit (28). The proportion of common to total variance retained in a bi-factor equivalent solution corresponds to the explained common variance (ECV) (30). For a unidimensional scale, the ECV should be > 0.9, indicating that > 90% of the variance is common and retained in the latent estimate (28).
Table 3
Model Fit of the USE-MS-G to the Rasch model
Analysis | Item residual | Person residual | (Cond.)1 Chi-Square2 | PSI3 | Alpha | Unidimensionality4 |
| Mean | SD | Mean | SD | Value (df) | p | | | % tests > 5% | 95% CI |
12-item scale | 0.05 | 2.19 | -0.31 | 1.33 | 138.46 (48) | 0.000 | 0.86 | 0.87 | 23.0 | 5-9.9 |
2 super-items | 0.15 | 0.88 | -0.59 | 1.01 | 29.24 (22) | 0.138 | 0.85 | 0.86 | 4.2 | 1.8–6.6 |
Ideal values | 0.00 | 1.00 | 0.00 | 1.00 | | > 0.01* | > 0.70 | > 0.70 | < 5.0 | LCI < 5 |
Abbreviations: T1/2: testlet, or super-item 1/2; Cond.: conditional; df: degrees of freedom; PSI: person separation index; Alpha: Cronbach’s alpha; SD: standard deviation; CI: confidence interval; LCI: lower bound of the 95% CI |
1Conditional Chi-Square: only applicable for the super-item solution; for the item-based solution, the Chi Square is shown |
2Chi-Square of T1: 3,392 (4), p = 0,494; Chi-Square of T2: 1,054 (4), p = 0,901. Perfect values are > 0.004* (Bonferroni adjusted) |
3The PSI indicates the reliability and differentiation of strata |
4Based on independent t-tests to compare person residuals which are positively and negatively loading on the first principal component |
*Bonferroni adjusted and variable with number of items |
The property of invariance means that all participants recognise the difficulty in identical items regardless of their self-efficacy (31). If certain groups of participants respond differently to items, e.g. males and females, the assumption of invariance is violated, called differential item functioning (DIF) (31). The USE-MS-G and original USE-MS data (N = 485) were pooled and tested for invariance by language (English; German) to equate the language versions. Every item was examined for absence of DIF by gender (male; female), age (quartile groups), disease duration (quartile groups), timepoint (test; retest) and centre (Innsbruck; Münster). Bonferroni adjustment was performed wherever appropriate for the number of tests undertaken. Reliability was evaluated by the Person Separation Index (PSI, range 0–1) and Cronbach’s alpha, which should be ≥ 0.85 for individual use or 0.70 for group use (32). Scale precision was examined by the standard error of measurement (SEM) and minimum detectable change (MDC) based on a 95% CI (33). Rasch Analysis was conducted with RUMM2030 software (http://www.rummlab.com.au/) which was based upon the unrestricted or partial credit model (34).