All procedures were approved by the Northwestern University Institutional Review Board. To evaluate the accuracy of each tracking method, we recorded data using the IMUs, Vive, and Vicon from one healthy participant. To demonstrate the utility of the IMUs and Vive, we also recorded data from eight participants with stroke using these sensors in the laboratory during reaching tasks (detailed in System Evaluation).
Common Background
We provide a general overview of the common and specific aspects of both the IMU- and Vive-based methods in the following sections, starting with the commonalities. Further details are in the Appendix. Prior to taking any sensor measurements, we performed the following critical steps to help ensure proper estimation of wrist position: measuring the length of the arm and forearm and aligning the sensors on the limbs. We measured the arm from the acromion to the antecubital fossa, and we measured the forearm from the antecubital fossa to the center of the ventral aspect of the wrist. We placed the arm and forearm sensors proximal to the elbow and wrist, respectively, and aligned one axis of each sensor with the long axis of the respective limb (Figure 1a). This step ensured that the directions of the arm and forearm were always known, regardless of each limb’s orientation in space.
After taking sensor measurements, we determined the 3D orientations of the arm and forearm sensors to estimate each limb’s 3D orientation. We then used these quantities, as well as the measured limb lengths, to construct a serial kinematic chain model that models the upper extremity (UE) and estimates the wrist position with respect to the shoulder, denoted as the position vector Pwriw. The general characteristics and recording procedures for each method are discussed below.
IMU-based Tracking
For the IMU-based method, we used a 9-axis IMU (Figure 1b) containing a 3-axis accelerometer, 3-axis magnetometer, and 3-axis gyroscope (Trigno IM Sensor, Delsys Inc.) to record raw inertial data. We imported the raw IMU data into MATLAB and resampled the accelerometer and gyroscope data to the magnetometer’s sampling frequency of 74 Hz for analysis.
The accelerometer measures the summed acceleration component from gravity and inertial acceleration in the world frame. The gravity component, facing directly down, and the magnetic measurements, which point in the direction of the Earth’s magnetic field, can provide a stable reference frame to define a zero orientation for orientation estimation. However, these are susceptible to some distortions that must be accounted for. For accelerometers, non-gravitational accelerations (e.g., ballistic movements) partially mask gravity. In motor-impaired stroke survivors, this distortion is usually small due to the relative strength of gravitational acceleration and general inability to produce large non-gravitational accelerations. Many metal alloys in nearby objects, such as in walls, floors, beds, and tables, can distort the magnetic field and thus orientation estimates[33]. Because these distortions can have magnitudes of a substantial fraction of the Earth’s magnetic field, the magnetometer is much more susceptible to measurement distortion than the accelerometer. Therefore, it is critical to calibrate the magnetometer whenever the local magnetic environment changes. We calibrated the magnetometer by recording while rotating the IMUs by 90° increments four times about the sensor’s positive and negative x, y, and z axes, which typically took approximately three minutes. We performed this calibration before putting the IMUs on the arm and forearm. We then used a least-squares ellipsoid fitting method to find the parameters that transformed the recorded magnetometer measurements into a sphere[35,36]. Afterwards, we used the recorded accelerometer recordings to calibrate the accelerometer and remove any cross-axis misalignment between the accelerometer and magnetometer sensor axes[37].
Following calibration, we fused the accelerometer, gyroscope, and magnetometer measurements to estimate sensor orientation by implementing a modified version of the improved explicit complementary filter[24,28]. Complementary filters (CFs) are widely used to compute orientation using 9-axis IMU measurements[24,28–30] because they select only the low frequency, stable components of the accelerometer/magnetometer estimations and the high frequency, drift-free components of the gyroscope estimations. We used this specific version due to several important features, including no singularities, flexible gain selection, gyroscope bias compensation, and decoupled magnetometer influence from roll and pitch estimations[24,28]. In general, higher filter gain increases the cutoff frequency, which weighs the accelerometer/magnetometer estimations more, reducing signal drift but introducing more noise, whereas lower gain weighs the gyroscope estimations more, reducing noise but increasing drift. We generally selected the lowest possible gains after considering the calibration and raw measurement quality.
A typical recording session using the IMU proceeded as follows. Prior to recording participant data, we calibrated the sensors in the testing environment. Afterwards, we performed the limb measurement and sensor alignment steps as described in Common Background. At the start of data collection, the participant performed a brief calibration procedure necessary for developing the kinematic chain model (i.e., data calibration procedure, see Appendix for details). This consisted of a static, neutral pose for 30-60 seconds orientation estimations for baseline orientation removal, followed by passive flexion-extension (FE) four times without pronation-supination (PS) motion. After the data calibration procedure, the participant relaxed and then performed the assessment tasks.
Vive-based Tracking
In this study, we used first generation lighthouses and trackers. Each Vive lighthouse sweeps alternating horizontal and vertical IR lasers that are detected by the photodiode-containing trackers (Figure 1c) within the play area. The time delay from the onset of each sweep emitted from the fixed lighthouses to detection by the photodiodes allows determination of the position and orientation of any tracker in the play area. For evaluating arm kinematics, the required play area is relatively small (1.5x1.5 m) since participants can perform arm movements while standing or sitting in a fixed location.
To estimate elbow angle using the Vive, we positioned the lighthouses such that the trackers were always visible to them. A common acceptable setup was having two lighthouses approximately two meters apart face each other on an oblique plane relative to the patient. This was critical to obtain valid pose estimations from the Vive, as failure to do so would increase the chance of occlusion (i.e., lighthouse could not detect the trackers) and therefore introduce potentially large, unpredictable error in pose estimation. Since the required play area was relatively small, it was easy to find an appropriate lighthouse setup. Once placed on the limb, each tracker had an associated position and orientation estimation, expressed with respect to a common world frame. We neglected the position estimations because they can sometimes become unstable during movement[38]. Using the SteamVR Unity Plugin (found in the Unity assets store), we loaded virtual representations of the Vive trackers into Unity 3D to observe their position and orientation in real time. We then extracted and recorded the orientation estimations of each tracker, as well as time stamps synchronized with the computer’s clock because the sampling rate varied with the frame rate of Unity, into text files using the Unity Scripting Application Programming Interface. We subsequently imported these files into MATLAB and interpolated the data to a sampling rate of 50 Hz.
A typical recording session using the Vive proceeded as follows. We first positioned two lighthouses to capture a sufficiently large play area. We confirmed this by moving the trackers in the play area and observing the real-time movement of their virtual representations in Unity without recording data. If the virtual representations ceased to move simultaneously with tracker movement, signifying tracker occlusion, we adjusted the position of the lighthouses until they redetected the trackers. Afterwards, we performed the limb measurement and sensor alignment steps as described in Common Background. Because there was no required data calibration step, the participant simply proceeded with the assessment once the data recording began.
System Evaluation
For each task, we report the endpoint distance (EPD)–i.e., the scalar distance from the shoulder to the wrist computed as the magnitude of the position vector Pwriw–estimated by the IMU-based and Vive-based methods.
The first task (reaching task) assessed each method’s estimation accuracy in comparison to the gold standard Vicon motion capture system. The Vicon system was set up in an enclosed room with eight Vicon Vantage cameras surrounding the workspace. We calibrated the cameras using the system’s specified procedure[21]. We placed reflective markers corresponding to the Vicon UE model onto a healthy human participant (Figures 2a and 2b)[20]. This participant then performed a calibration procedure (static pose with shoulder abducted and elbow extended, Figure 2a) optimized for the Vicon system prior to performing the desired task[20,21]. All Vicon data were sampled at 200 Hz and imported into MATLAB for analysis.
Due to signal interference between the Vicon Vantage cameras and our first generation Vive lighthouse and tracker setup, it was impossible for us to record valid EPD estimations from the Vicon and Vive simultaneously. Therefore, we designed a sequence of multiplanar reaching movements that a healthy human participant could reliably replicate across recording sessions. Specifically, the participant stood at a fixed position relative to a platform with six different marked targets within reaching distance (Figure 3). Starting with the UE extended and at the side, the participant reached towards and touched the first target with the tip of the third finger, then maintained the reach without any radial or ulnar deviation for several seconds, and subsequently returned the UE to the initial position. This sequence was repeated for the remaining five targets. We averaged all samples while maintaining a reach at a target to estimate the EPD at that target.
We collected data from this healthy participant across four different recording sessions: one with Vicon (four reaches to target 1, five reaches to remaining targets, one with both IMU and Vicon (five reaches to all targets), one with both IMU and Vive (four reaches to all targets), and one with Vive (five reaches to all targets). The combined Vicon data contained nine reaches to target 1 and 10 reaches to each remaining target (59 total reaches). We averaged the estimated EPDs per target to obtain the ground truth EPD estimation for each target. We also computed the active range of motion (AROM), defined as the difference between the EPD prior to a reach and the EPD at the target, for all reaches per target and averaged them to obtain the ground truth AROM for each reach. Next, we determined the IMU and Vive’s estimated EPD at each target for all 54 (nine per target) reaches and subtracted them from the corresponding ground truth EPD. Likewise, we computed the IMU and Vive’s estimated AROM for each reach and subtracted them from the corresponding ground truth AROM. We then computed the mean and standard deviation of the absolute values of these differences in estimated EPD and AROM. Finally, we computed the differences in the estimated mean AROM between the reaches to targets one, two, and three for all three methods. These differences allowed us to measure and compare how sensitive each method was at detecting changes in AROM in a given direction and simulated how the methods could detect improved AROM from a course of therapy.
The second task (sweep task) demonstrated each method’s ability to provide data to derive clinically relevant kinematic metrics in stroke survivors. We placed IMUs and Vive trackers on the limbs of eight chronic stroke survivors with UE motor impairment (Figure 2c) enrolled in an ongoing clinical trial (NCT03401762) investigating a six-week training protocol of myoelectric computer interface therapy for stroke[12,39]. Four participants performed this task once, and four participants performed this task at two different times (one on weeks 4 and 6 of training, one on weeks 0 and 4, and two on weeks 0 and 6). Starting with the affected hand resting on the ipsilateral thigh while sitting, the participant attempted to abduct the affected shoulder to 90° while extending the affected elbow to 180°, then subsequently horizontally swept to the contralateral side (internally rotated the shoulder) as far as possible. This was repeated three times, yielding three “sweeps” per session. If performed by a motor-intact person, each sweep should trace out a semicircle in the horizontal plane with a radius equal to the sum of the arm and forearm lengths, as the elbow should be fully extended throughout each sweep.
We derived two different clinically relevant metrics from the wrist position: horizontal sweep area and smoothness. We computed the horizontal sweep area by projecting Pwriw onto the horizontal plane and then using MATLAB’s polyarea function. This horizontal sweep area is a task-specific scalar and serves as an example kinematic metric that can be derived from the wrist position. Furthermore, the sweep area is effectively a 2D range of motion to all sides of the body, or effectively a “workspace,” which makes it a functionally relevant measure[40]. We also estimated the smoothness of each sweep, a measure characteristic of unimpaired movements that has been shown to increase with stroke recovery[41,42]. After differentiating all Pwriw estimations from the initiation (relaxed state immediately prior to shoulder abduction) to termination (relaxed state immediately after maximal shoulder internal rotation) of the sweep using smooth noise-robust differentiators[43], we computed the magnitude of resulting 3D velocity vector to obtain the endpoint speed (EPS). Afterwards, we extracted the peak envelope of the EPS using MATLAB’s envelope function (spline interpolation over local maxima) to reduce noise introduced by filter gain selection and numerical differentiation while maintaining the overall shape of the speed profile. We applied the same peak envelope to both the IMU and Vive data. From the peak envelope of the EPS, we computed the normalized mean endpoint speed (i.e., smoothness), defined as the max EPS divided by the mean EPS during the sweep[41]. After estimating both kinematic metrics for each sweep, we calculated and compared the mean of each metric over all sweeps. We computed Pearson’s correlation coefficient between the IMU and Vive’s estimated mean for each metric. For the four chronic stroke participants that performed the task at two different times, we also computed the change over time in the estimated metrics for each method and compared them.