Study participants
The study sample consisted of 58 patients with debilitating UL spasticity due to SCI (n=31), stroke (n=25), TBI (n=4), or other diagnosis (n=6). Eight patients had undergone spasticity-correcting surgery on both the right and left arms, on different occasions, for a total of 66 interventions. The mean age of the patients was 57 years (range 19–79). The mean time since the injury was 8.1 years (range 1–26). Preoperative allocation to a treatment regimen was based on the residual volitional motor control in the UL and on cognitive ability: 25 patients were assigned to HFR (38%), 30 to LFR (45%), and 11 to NFR (17%). Demographic and clinical characteristics of the study participants are listed in Table 2. Of 66 collected questionnaires, five were excluded because planned surgeries were postponed, resulting in a maximum of 61 questionnaires for analyses. Fifty-one patients completed the questionnaire twice for test-retest reliability; however, three completed questionnaires were excluded for missing answers, leaving 48 questionnaires for the test-retest analyses. The average time range between survey 1 and 2 was 6.7 days (range 4–10 days).
Translation and adaption
Initial forward translation of the ‘tryout’ version of ArmA: In our search for a questionnaire that is sensitive for change in a heterogenic population of patients with neurological injuries, the choice fell on ArmA. To make a preliminary feasibility assessment of ArmA in a Swedish clinical setting, the original English version of the questionnaire was first translated into Swedish by two bilingual clinicians using a forward translation procedure. This first version is referred to as the tryout-ArmA-S. Testing of the tryout-ArmA-S, which was originally developed for patients with unilateral hemiplegia, revealed that patients with bilateral UL motor impairment after SCI were confused by the term the affected arm as they had bilateral UL spasticity. Another confusion arose from the original instructions, which specified that the response option 0 (no difficulty) be selected if the activity is never done. However, this has nothing to do with the patient’s affected/treated arm, causing difficulties in selecting option 0 (no difficulty) versus option 4 (unable to do) and increasing the need for explanation in a face-to-face situation. Patients thought that most questionnaire items were meaningful. After using the tryout-ArmA-S for 18 months in our clinical setting, we decided to proceed with psychometric evaluation despite its shortcomings. We therefore conducted a proper back-translation procedure.
Back-translation procedure: The guide to completion and questionnaire items in ArmA were easily translated from English to Swedish. Item 10 in section B (handle a home telephone) was changed because such phones are rarely used in Sweden nowadays. It was replaced by the item handle your phone. Some additional minor adjustment was made in the demographic part of ArmA-S: SCI was added as a neurological condition, and information about the caregiver was expanded to include hours and type of assistance (caregiver or professional). To facilitate completion of the questionnaire by patients with bilateral UL motor impairment, the term affected arm in the ArmA-S was clarified by adding the arm that will be, or is treated. The most significant modification in the Swedish version of ArmA was done to minimize the risk of faulty/misleading responses when a specific activity was never done by patients. Misinterpretation could lead to false-negative results if the patient argues the activity was never done before surgery (which equals score 0, no difficulty) even though the true reason is the severely impaired UL and the postsurgery score is 1 (no difficulty) to 4 (maximum difficulty). Thus, although the patient improved after surgery, the scoring indicates the opposite. To help patients select proper responses, the option never done (score 0) was added to ArmA-S, resulting in a six-point Likert scoring system. Further, instead of presenting the response options as digits (0–4), we changed the Likert-scale to verbal statements, describing the degree of difficulty as ranging from no difficulty to unable to perform, which are converted to scores 0–4. Instead of circling a response digit, the respondent is asked to mark with an X the appropriate verbal statement for each activity.
Validity of final ArmA-S
The content validity of final version of ArmA-S was judged to be good based on opinions from both patients and expert clinicians. Further, this version was recognized as having good face validity in the sense of being clear, understandable, and easy to complete. All patients except one (5%) responded that the final version was easy or moderately easy to complete, and all patients thought the questions were moderately to very relevant. With no exceptions, the clinicians responded that the measurement tool would be useful in clinical settings for patients with hemiplegia, but also for other patient groups with neurological injuries. Of the 20 patients who were timed, 10 (50%) completed the questionnaire in less than 10 minutes and 90% in less than 20 minutes. In the analyses of floor or ceiling effects, the baseline score before surgery was used. One (1.6%) of 61 completed questionnaires had the highest possible score (0 points) on section A, one (1.6%) had the highest score (0 points) on section B, and six (9.8%) had the worst possible score (52 points) on section B. Yet, the median (interquartile range) scores for sections A and B were 12 (8–17) and 46 (37–49), respectively. Thus, there were no floor or ceiling effects. The analyses of construct validity revealed great variety in the correlation between ArmA-S section A and B and the other outcome measures (Table 3). The GRT had the highest correlation with section B of the final ArmA-S (rS=0.59; p<0.000), whereas DASH had little or no correlation with sections A and B (rS=0.05, p=0.75 for both correlations).
Reliability of the final ArmA-S
The internal consistency of the final ArmA-S version was high, with a Cronbach’s alpha coefficient of 0.94 for section A and 0.93 for section B (n=61). Test-retest reliability, analysed for 48 patients, resulting in a quadratic weighted Cohen’s kappa coefficient of 0.86 (95% confidence interval [CI], 0.78–0.95) for section A and 0.83 (95% CI, 0.67–1.00) for section B. The responsiveness was analysed in patients who completed the survey before their spasticity-correcting surgery and 3 months afterward (n=55). As hypothesized, assessment of longitudinal validity revealed little or low correlation between the mean change in the total score on section A and the mean change in all other outcome measures (DASH: rS 0.2; p=0.32; MAS: rS 0.2; p=0.14 and GRT: rS 0.2; p=0.30). The equivalent analysis for section B revealed little or low positive or negative correlation with the other measures (DASH: rS 0.3; p=0.13; MAS: rS -0.2; p=0.21 and GRT: rS -0.2; p=0.001). The analysis of the mean change in final ArmA-S total score from the pre-surgical survey to the 3-month follow-up (Table 4) showed significant increases in both section A and section B (p< 0.001). Eleven patients who reported little of no use of the hand before surgery (section B score, 49–52) had some active use of the hand, as captured by the lower section B score 3 months after surgery (30–48 in this subgroup). Significant improvements were also shown in the mean change in GRT and MAS (p<0.001) but not DASH (p=0.732).
MIC was estimated with a distribution-based method and a criterion-based method (Table 5). For the study population as a whole using a distribution-based method the MIC for section A and B was shown to be 3.2 points and 6.8, respectively. Using a criterion-based method across the whole study population (n=55) resulted in a decrease of 6.1 points in section A and a decrease of 6.5 in section B.
When inspecting the data for analyses of responsiveness (pre- and postsurgical items of sections A and B of ArmA-S), we noted some highly questionable responses to questionnaire items, mainly in section B. Specifically, even though we had added the response option never done to the scoring system, quite a few patients selected the never done option before surgery (no difficulty), but had selected one of the response options no, mild, moderate, severe difficulty, or even unable to do activity after surgery. This indicates an unsuccessful outcome, which was not in accordance with the empirical experiences of patients’ capabilities after the surgical intervention. Thus, the content of the translated version of ArmA still seemed to entail uncertainty. In complementary explorative data analyses, we therefore applied a score transformation to data, based on known characteristics of patients.
Complementary explorative data analysis:
In complementary analyses, we compared the original scores with the transformed scores. The scores were transformed as follows. If the pre-surgical score was 0 (never done) and the postsurgical score was 0 (no problem) or between 1 and 4 (various degrees of difficulty), the pre-surgical score was considered an error and was changed to score 4 (unable to do). This transformation required that functional status before surgery clearly indicate that the patient was unable to do the specific activity.
For all test-retest questionnaires, 40% of patients made this error. The results from the corrected analysis substantially lowered the CI for the adjusted scale, resulting in a quadratic weighted kappa coefficient of 0.91 (95% CI 0.85–0.97) for section A and 0.96 (95% CI 0.93–0.99) for section B. Therefore, in the recommended Swedish version, the guidance was clarified and the scale was altered to make it easier to complete the questionnaire correctly and independently and to minimize identified errors without changing the original instructions of the scale. See appendix 2 for the recommended ArmA-S.
A complementary analysis was made in which the participants were split in two sub-cohorts based on diagnosis (SCI n=18 and Stroke n=20). Splitting the cohort resulting in a quadratic weighted Cohen’s kappa coefficient of 0.92 (95% confidence interval [CI], 0.85–0.99) for section A in the SCI group and 0.79 (95% CI, 0.62–0.97) in the stroke group. Corresponding figures for section B was 0.79 (95% CI, 0.51–1.07) and 0.82 (95% CI, 0.67–1.0), respectively. The analysis of the mean change in final ArmA-S total score from pre-intervention to the 3-month follow-up showed significant increases for both groups in both section A and B. When comparing the mean change in final ArmA-S total score between the subgroups a significant difference was demonstrated for section B (p=0.016) in favour of the SCI -group, but not for section A (p=0.116).