We investigated concurrent validity and responsiveness for clinical upper limb measures and sensor-based AU metrics in the acute, subacute, and chronic phases after stroke. These measures cover the distinct ICF domains of upper limb body functions and capacity, perceived arm use performance, and real-life arm use performance. For the first time, we distinguished between activity types and specified AU performance by excluding gait activities that bias correlations between capacity and performance. Regarding concurrent validity, we found moderate to strong correlations between capacity measures and AU metrics, as well as between capacity measures and perceived performance in all phases.
We evaluated the responsiveness means of the longitudinal validity between observed and perceived changes. Across all UL measures, the responsiveness was the highest within the first month poststroke. Compared to clinical upper limb measures, correlations were considerably lower between changes in AU metrics and changes in perceived performance. Intensity and duration metrics were most consistently related to measures of body functions/ activities in these early stages. Symmetry metrics were related to body functions/ activity capacity and perceived performance measures in all periods, although to a lesser extent.
Minimal important change cut-off values could be estimated for eight clinical measures and nine AU metrics within the first three months poststroke, especially in the period between 10 and 28 days. Beyond this 3-month timeframe, MIC could only be calculated for the FMA-UE and ARAT. Overall, the discriminative power of the MICs ranged from acceptable to good.
Concurrent validity
We investigated concurrent validities for the clinical measures FMA-UE, ARAT, BBT, and MAL, as they are most frequently used in stroke rehabilitation research78 and recommended for research and clinical stroke rehabilitation8–10.
Within the construct of body functions and activities' capacity, our results presented very strong associations across these clinical measures in the post-acute, early subacute, and chronic phases after stroke. Hence, including coherent recovery phases, our data suggest that high concurrent validity remains independent of time within the first year after stroke. Previous work investigating singular time points found similar associations in the subacute28,53 and chronic phases30,37,79,80. Although each outcome assesses capacity differently, our results support that their underlying constructs share broad similarities. This underpins the understanding that upper limb function is a prerequisite to carrying out activities81–83. Irrespective of their strong interrelations, the FMA-UE and ARAT should not be used interchangeably since they have distinct diagnostic and prognostic value49,84–88.
Revealing associations between capacity and perceived performance, we found predominantly strong associations between the perceived arm use amount/quality and capacity scores across time points (range rs 0.66-0.90). Therefore, it can be assumed that participants recalled using their abilities during daily life to a large extent. Other studies reported lower correlation coefficients between patient-perceived arm use and capacity measures in the subacute (range rs 0.43-0.52)28 and chronic phases after stroke (range rs 0.37-0.62)30. A weaker relationship between the FMA and the MAL-AOU (rs 0.35), but comparably high correlations for MAL-QOM (rs 0.76) were shown in a cross-sectional study including 223 individuals89. High variability of associations with perceived arm use may be related to the individual and cultural context90.
Objective real-world AU metrics captured by wearable sensors are well-suited auxiliary measures to complement patient-reported arm use. Interestingly, the correlations between perceived performance (MAL) and real-life arm use performance AU-i/AU-d of the affected upper limb were weaker than the relation between perceived performance and capacity measures. This discrepancy could be due to some MAL items requiring a specific motor capacity, such as dexterity and force, but involve only minor movement amplitudes and, hence, acceleration. For instance, the MAL holding a book (item 1), handwriting (item 7), stabilising (item 8), and buttoning clothes (item 14) could fall below the thresholds applied for functional movement. However, discrepancies between the MAL and AU metrics were also observed with the very low conventional thresholds in AUconv metrics (supplement Figure S1).
Discrepancies between outcome domains might be inherent to their different measurement constructs. Designated as the match-mismatch paradox, discrepancies between capacity and perceived/real-life performance of the upper limb have been investigated extensively14,91–95. The disparity between capacity and perceived performance or between perceived performance and AU metrics could be due to underestimating or overestimating the individuals' perceptions. However, considering the context-dependent nature of performance measures, accurate judgment on over- or underestimation requires a valid criterion (ground truth). The same principle applies to real-life arm use performance metrics: To date, it remains unclear whether the current convention on the AU metrics' calculation and aggregation over 24 hours adequately reflects the actual arm use within an individual context and environment. In a small validation sample of stroke survivors with mild-to-severe upper limb impairments, we previously demonstrated estimated thresholds correctly identifying functional unilateral and bilateral movement in 80% when excluding whole-body movements. Daily tasks performed in a home environment optimised thresholding increased specificity by 15% compared to AUconv methods. However, in this prospective cohort study, we applied these methods to data derived from settings such as intensive care, acute hospitals, and rehabilitation clinics. In the acute hospitalisation phase (D3 and D10), we observed a less pronounced relation between AU-I metrics (affected and bilateral movements) and capacity measures (range rs 0.44-0.60), which could be linked to the acute care setting with its environmental restraints. Factors other than hemiparesis, such as infusion lines and cables for vital parameter measurement, can limit movement amplitude and speed. In the subacute and chronic phases, we observed higher concurrent validity of total aff. AU-I metrics and capacity measures (range rs 0.72-0.84) than previously reported correlations of the respective phases (range rs 0.56-0.70)96–98.
Specifying AU metrics to functional movements during non-gait activities revealed high concurrent validities with upper limb capacity and perceived performance. Across time points, we found high proportions of explained variance between affected AU-d, symmetry metrics, and capacity assessments.
Noteworthy, correlations between capacity (FMA-UE, ARAT) and total affected AU-d metrics (and duration ratio) ranging from 0.75 to 0.92 (D3, D10, D28 and D90) were considerably higher than previously reported (range 0.44 to 0.66) for the subacute98–100 and chronic phase97. In line with our expectations, removing gait sequences and applying optimised thresholds lead to higher correlations for total aff. And bilateral AU-d metrics in particular. Previous work that similarly removed gait and non-functional upper limb movements101 reported comparable magnitudes in the correlation between the use duration ratio and the ARAT (r = 0.82) for the chronic phase after stroke. However, Geed et al. (2023) reported a lower correlation between the duration ratio and the 28-item MAL-AOU (r= 0.61) compared to our results (rs =0.78) 101. Concurrent validity is often evaluated between unilateral capacity and the arm use symmetry between the affected and nonaffected. 21,93,95,99. Interestingly, our data shows that symmetry metrics are consistently stronger associated with unilateral capacity than unilaterally affected AU. This might be because daily activities are less frequently carried out with the affected arm, whereas bilateral activity makes up for the largest proportion of arm use patterns102,103. Compensational arm use strategies were seen by inverse associations in our data between impairment and unilateral nonaff. AU-d in the late subacute and chronic phase were also reported in previous investigations102–104.
Clinical implications of concurrent validity
A large proportion in variance (>50%) is explained between function/activities capacity and the duration of real-life arm use (rs >0.7) across recovery phases when specifying sensor-based physical activities. However, a remaining proportion remains unexplained between these different measurement constructs, which justifies a holistic assessment of upper limb functioning. Optimally, each measurement construct should be assessed individually to draw conclusions regarding the transfer of functional abilities into real-life activities. Lower correlations between capacity and AU reveal discrepancies that require further investigation on individual or subgroup levels. Noteworthy, although reluctantly seen in research, lower correlations suggest an added value entailed by unique information. In this regard, metrics quantifying bilateral and unilateral AU might add information not captured by assessments of one-sided upper limb function and activity capacity. These should be accounted for when implementing sensor-based activity monitoring into clinical stroke rehabilitation.
Responsiveness
Using perceived change as a criterion to assign meaningfulness to change magnitudes, upper limb capacity measures presented strong interrelations in the period D10-D28. Still, they remained at moderate levels in later periods and within the first days poststroke. A linear relationship between the GRPC scores, the FMA-UE105, and the ARAT43 was previously only reported for the first few weeks following stroke. Therefore, our findings, including longitudinal correlations of multiple outcome measures, crucially contribute to this body of evidence. Responsiveness can be negatively affected by floor and ceiling effects106, which are particularly known for the FMA-UE and ARAT87,100,107–109. We observed floor effects for the FMA-UE only at the timepoint D3 (17%), but no ceiling effects were identified. Observing the score distribution of the ARAT, we found that floor effects decreased from D10 (36%) to D90 (17%), and ceiling effects increased from D10 (12%) to D90 (30%).
Responsiveness was moderate for the MAL and low for AU metrics at periods D10-D28, whereas at D39-D90, only changes in total aff. and bilateral AU-i presented a weak interrelation to perceived changes (GRPC scores). Although we found linear relationships between perceived change and observed changes in sensor-based AU, these correlations were highly time-dependent. Furthermore, the diminished associations of change between measurement constructs illustrate that change in upper limb capacity, to some extent, translates to changes in performance in the first weeks after stroke. Inpatient rehabilitation that applied to 80% of our sample in this period might have affected these associations through physical exercise and behavioural education. Future work should address the influence of potential modulators on the responsiveness of sensor-based AU metrics. In summary, our results indicate that the responsiveness of clinical measures and AU metrics highly depends on the investigated period poststroke.
Minimally important change
We present eight MIC cut-off values for clinical upper limb outcome measures and nine MICs for sensor-based AU metrics that apply to conventional stroke rehabilitation within the first three months after stroke. Responsiveness of the outcome measure is a prerequisite to estimating MIC cut-off values that accurately distinguish between important and nonimportant change.
Clinical upper limb measures
MICs for the FMA-UE could be estimated for all three investigated periods. Ranging MICs between 4 and 7 points within the first three months after stroke illustrate that the relevant change is rather dynamic than constant over time. By gradually expanding the duration between measurement time points, our study design adhered to the characteristic recovery curve of upper limb capacity85,110 and AU metrics111 in which most change occurs within the first months after stroke. Therefore, the FMA-UE and AU-d change's median, for instance, remained relatively constant at periods D10-D28 and D28-D90 (Table 4 and Table 5), although the latter period has a two-fold duration.
The reported cut-off values need to be interpreted regarding their classification performance, which correctly distinguished between important and nonimportant FMA-UE score changes in 66% to 75% of the sample. Major factors influencing the magnitude of MIC values, such as the duration of the investigated period, baseline severity, intrinsic patient's expectation of recovery, type of intervention, and type of criterion, should be considered for interpretation 13,41,112–116. For instance, an MIC of ≥4 points for the FMA-UE was reported concerning a 3-week conventional rehabilitation in patients with mild-to-moderate impairment117. In contrast, considerably higher MICs were reported in severely impaired patients exhibiting larger change rates during four weeks of repetitive task training (≥9 points)44 and during eight weeks of conventional rehabilitation (MIC ≥13 points)105. Hence, the MICs presented in this study should be interpreted based on the period’s baseline conditions and duration, respectively.
For the ARAT, BBT, and MAL, we provide the first anchor-based MICs that apply to a conventional rehabilitation setting. These cut-off values show high accuracy for the BBT, ARAT, and MAL within the period D10-D28, ranging from 81% to 87%. In our sample, an increase of ≥5 points in ARAT score benchmarks important change in the first weeks after stroke. This relatively small MIC still exceeds the measurement error (limits of agreements) estimated at 2.3 points in the early subacute recovery phase54.
Sensor-based AU-metrics
We provide the first MIC cut-off values for sensor-based AU metrics for clinical stroke rehabilitation. Seven MIC estimates cover the different AU components of intensity, duration, and symmetry for the first month poststroke. In this period, a total aff. AU-d ≥32 minutes correctly classified important change in only 59% but correctly classified nonimportant change below this cut-off in 73% of our sample. It is important to note that this total aff. AU-d includes both unilateral and bilateral movements that exceed functional thresholds. Therefore, change rates and the MIC were lower for unilateral aff. AU-d. The negative MIC on unilateral affected AU-i needs to be treated with caution. This MIC takes a negative value due to balanced weights on sensitivity and specificity parallel to incidences where negative change values were aligned with a perceived important change. The MIC estimate for unilateral aff. AU-iconv also showed a very low value (≥1 activity count) and lower discriminative performance (accuracy of 67%). A low change rate or even a reduction in arm use could be linked to most of our cohort being discharged from the rehabilitation clinic between D28 and D90 (69% of cases). The discharge could perturb behaviour and promote the development of learned non-use, as movement therapy and self-reflection of movement behaviour might be reduced.
We estimated MICs based on recommended minimal correlation coefficients of ≥0.3 between perceived and objective change73. However, the change values between the groups perceiving and not perceiving change in a certain period were variable and partly overlapped.
The weak relationship between perceived change and change in AU illustrates that methods evaluating meaningful change in real-life arm use need to be adapted. It would be important to evaluate whether patients are aware of change globally or specifically. We revealed discrepancies between absolute change magnitude (of related constructs) and important change from a patient's perspective across recovery phases (Figure 5). One reasonable approach would be associating change with specific ADL tasks that are, per se, meaningful and executable with different impairment levels. Tasks like drinking or eating could be used, focusing on change more specifically. Detecting such meaningful tasks in real-life data and associating them with specified perceptions could improve the face validity and could consequently lead to robust MIC estimates.
Clinical implications responsiveness and MICs
Our findings demonstrate that the relationship of score changes between two time points cannot be inferred from respective cross-sectional correlations.
Perception of change applies differently to the distinct measurement constructs of capacity, perceived arm use, and real-life arm use. Our MICs indicate how to interpret the relevance of change for the individual. We disclosed the diagnostic properties of estimated MIC values and illustrated associations of change in clinical outcome and AU performance concerning the patient's perception (Figure 5). Therefore, we highlight the need for further investigation of factors that could potentially influence the preconception of change relevance and awareness/ meaning of change in daily life. Discrepancies between the magnitude of observed change and its perceived relevance are frequent91 and should be further determined at the individual and subgroup level. Sensor-based AU metrics can provide valuable feedback to patients and therapists118–120 for effectively tailoring behavioural techniques to individual needs121. Our presented MICs for AU symmetry demonstrate balanced classification rates and serve as a valuable benchmark for evaluation and goal setting in stroke rehabilitation.
Limitations
The following limitations should be considered when interpreting our findings:
Firstly, sample sizes varied across time points and were lower, particularly at D28 and D365. However, we evaluated differences in distributions of patients' characteristics by complete or partially missing data. Amongst characteristics, we only found differences in the proportions of the dominant affected side at D28 and D365, which could influence AU behaviour.
Secondly, arm use metrics derived from wrist sensors reflect central tendencies of end-effector accelerations but are not sensitive to more distal activities such as hand dexterity. Fine motor tasks might evoke only low acceleration magnitudes falling below functional thresholds. Wearable devices that enable quantification of hand dexterity in clinical and home environments are needed to specify upper limb performance further. In addition, a main limitation of sensor-based metrics in general is that it remains unclear if movements (even above thresholds) were indeed purposeful or completed successfully. Validation of movement classifiers in larger stroke samples and various environments is needed, allowing for accurate detection of purposeful movements.
Thirdly, we did not evaluate the proportional effects of applied activity classification methods. Aiming to provide reference data on concurrent validity and responsiveness for future and current evidence, we included highly specified AU metrics (functional movement, no-gait) and conventional non-specific arm use, including all activities. Future research needs to evaluate the partial effect of functional movement, gait exclusion, or no specification.
Lastly, since the study design did not include a test-retest scenario in stable patient conditions, no standardised errors of measurements could be estimated within our sample. Enabling clinical interpretability MIC cut-off values are needed for these measurement errors. Values ≥ 1.96 of the presented Guyattes response ratio (GRR) indicate that the MIC exceeds measurement error if it can be assumed that the subsample perceiving nonimportant change remained constant. Our results showed large variabilities and overlapping distribution between groups (important/nonimportant); hence, the GRR should not be seen as an indicator.