Consider the complexity of stroke related complications, individual needs for stroke rehabilitation varies. To provide possibility for personalized rehabilitation, Up-limb Rehabilitation Device and Utility System (UarDus) is proposed along with 3 HMI strategies. The UarDus is composed of four main subsystems which are mechanical, actuation, sensory and control (Fig. 1a). Mechanical, actuation and control subsystem are the hardware foundation of the 3 HMI strategies, thus is explained first. Sensory subsystem requirement for each HMI strategy is different, robot-in-charge strategy is purely passive and no sensory subsystem is required, for therapist-in-charge and patient-in-charge strategy distinct sensory subsystems are developed. Hence. sensory subsystems are discussed in each HMI strategy subsection.
The base mechanical structure of UarDus contains a wearable upper-limb exoskeleton (which allows to power the motion of shoulder, arm, forearm and hand) and a 3-axis translational motion platform which supports the weight of upper-limb exoskeleton and helps to coordinates patient’s SHR with the exoskeleton during rehabilitation process. Arm, forearm and hand can be adjusted in size for different personnel. The whole system has DoF up to 14 and provides sufficient dynamic wear comfortableness to user of different body sizes (Fig. 1b).
Mechanical structure and actuation mechanism of the base exoskeleton
Understand and mimic the working principle of human up-limb is the challenging aspect in the exoskeletons design. Anatomically, the human upper limb can be divided into 3 major joints, which are shoulder, elbow, and wrist (Fig. 2a-c) [16]. Featuring the human shoulder, its motion is mainly related to the 4 bones (thorax clavicle, scapula, and humerus) and the 3 corresponding joints SC, AC and GH. Though, all the 3 joints share similar working mechanism, which all can be treated as ball-and-socket joint [18]. Consider the overall SHR of shoulder [19] and the possible rotation range of shoulder (Table 1), a 6-DoF mechanism is proposed for shoulder. This mechanism contains 3 translational joints and 3 rotational joints. The 3 perpendicular translational joints (P1, P2, and P3 joints in Fig. 2d) are designed to compensate the motion caused by AC and SC and assist the movement of the GH center, while the 3 rotation joints (R6, R7, and R8 in Fig. 2d) are used to mimic the rotation in GH.
The angle between the clavicle and the Z-axis in Fig. 2c is defined as \(\:{\gamma\:}_{s}\). The \(\:{\gamma\:}_{a}\) is donated as the angles between the AC-GH line and the clavicle, while \(\:{\gamma\:}_{g}\) is the angle between the humerus and the AC-GH line, respectively. The angle between the humerus and the Z-axis is denoted as \(\:{\theta\:}_{1}\). The length of the clavicle is represented as \(\:{l}_{c}\), and the distance between the center of the AC joint and the GH joint (Fig. 2c) is denoted as \(\:{l}_{s}\). By establishing a mathematical relationship between the displacement of the GH joint center and \(\:{\theta\:}_{1}\), it is possible to control three translational joints to move the exoskeleton arm, thereby compensating for the wearer's SHR [19].
Bone length can be assumed to be proportional to height (represented by h) of exoskeleton user [53]. Therefore, the ratio of the wearer's height to the measured volunteer's height is used as a scaling factor in the formula, making the identified patterns universally applicable. The value of \(\:{\theta\:}_{1}\) can be obtained through an IMU worn on the wearer's forearm. Thus, the functional relationship expressions are derived as shown in equations (1)-(3). Adopting the empirical data [19], the \(\:{\gamma\:}_{s0}=99^\circ\:\), \(\:{\gamma\:}_{a0}=143^\circ\:\), \(\:{h}_{0}=170\:\text{c}\text{m}\), \(\:{\alpha\:}_{1}=70^\circ\:\), \(\:{\alpha\:}_{2}=150^\circ\:\), \(\:{\alpha\:}_{3}=180^\circ\:\), \(\:{\beta\:}_{1}=80^\circ\:\), \(\:{\beta\:}_{2}=140^\circ\:\), \(\:{\beta\:}_{3}=180^\circ\:\) For more accurate personalized parameters such as\(\:{\gamma\:}_{s0}\), \(\:{\gamma\:}_{a0},{l}_{s},{l}_{c}\),etc., CT scans can be performed in order to enhance the accuracy of the GH joint center displacement trajectory expressed by the formula (1–3) shown as follows:
$$\:\begin{array}{c}\left(\genfrac{}{}{0pt}{}{{Y}_{GH}}{{z}_{GH}}\right)=\left(\genfrac{}{}{0pt}{}{{l}_{c}sin\left({\gamma\:}_{s}\left({\theta\:}_{1}\right)+{\gamma\:}_{s0}\right)-{l}_{s}sin({\gamma\:}_{a}\left({\theta\:}_{1}\right)+{\gamma\:}_{s}\left({\theta\:}_{1}\right)+{{\gamma\:}_{a0}+\gamma\:}_{s0})}{-{l}_{c}cos\left({\gamma\:}_{s}\left({\theta\:}_{1}\right)+{\gamma\:}_{s0}\right)+{l}_{s}cos\left(\right({\gamma\:}_{a}\left({\theta\:}_{1}\right)+{\gamma\:}_{s}\left({\theta\:}_{1}\right)+{{\gamma\:}_{a0}+\gamma\:}_{s0})}\right)\frac{h}{{h}_{0}}\:\#\left(1\right)\end{array}$$
$$\:\begin{array}{c}{\gamma\:}_{a}\left({\theta\:}_{1}\right)=\left\{\begin{array}{c}0^\circ\:\#for\:0^\circ\:\le\:{\theta\:}_{1}<{\alpha\:}_{1}\\\:35(10{\left(\frac{{\theta\:}_{1}-{\alpha\:}_{1}}{{\alpha\:}_{2}-{\alpha\:}_{1}}\right)}^{3}-15{\left(\frac{{\theta\:}_{1}-{\alpha\:}_{1}}{{\alpha\:}_{2}-{\alpha\:}_{1}}\right)}^{4}+6{\left(\frac{{\theta\:}_{1}-{\alpha\:}_{1}}{{\alpha\:}_{2}-{\alpha\:}_{1}}\right)}^{5}\#for\:{\alpha\:}_{1}\le\:{\theta\:}_{1}<{\alpha\:}_{2}\#\\\:35^\circ\:\#for\:{\alpha\:}_{2}\le\:{\theta\:}_{1}<{\alpha\:}_{3}\end{array}\right.\#\left(2\right)\end{array}$$
$$\:\begin{array}{c}{\gamma\:}_{s}\left({\theta\:}_{1}\right)=\left\{\begin{array}{c}15\left(10{\left(\frac{{\theta\:}_{1}}{{\beta\:}_{1}}\right)}^{3}-15{\left(\frac{{\theta\:}_{1}}{{\beta\:}_{1}}\right)}^{4}+6{\left(\frac{{\theta\:}_{1}}{{\beta\:}_{1}}\right)}^{5}\right)\#\#for\:0^\circ\:\le\:{\theta\:}_{1}<{\beta\:}_{1}\\\:15^\circ\:\#for\:{\beta\:}_{1}\le\:{\theta\:}_{1}<{\beta\:}_{2}\#\\\:15^\circ\:+9\left(10{\left(\frac{{\theta\:}_{1}-{\beta\:}_{2}}{{\beta\:}_{3}-{\beta\:}_{2}}\right)}^{3}-15{\left(\frac{{\theta\:}_{1}-{\beta\:}_{2}}{{\beta\:}_{3}-{\beta\:}_{2}}\right)}^{4}+6{\left(\frac{{\theta\:}_{1}-{\beta\:}_{2}}{{\beta\:}_{3}-{\beta\:}_{2}}\right)}^{5}\right)\#\#for\:{\beta\:}_{2}\le\:{\theta\:}_{1}<{\beta\:}_{3}\end{array}\right.\#\left(3\right)\end{array}$$
Compared to shoulder joints, the mechanism of the elbow (EL) joint is simple. As shown in the Fig. 2b, the EL joint connects of the distal end of the humerus with the proximal end of both radius and ulna, which is regarded as 1-DoF uniaxial hinge joint in this work (joint R10). The rotation of the forearm occurs at the distal end of the radius and ulna. In addition to the 2 DoFs introduced by wrist, these 3 uniaxial joints together (presented by R12, R13, and R14 in Fig. 2d) is mechanically equivalent to a ball-and-socket joint. Consider the mechanism for GH Joint, similar design is adopted (Fig. 3).
Table 1
The rotation range of upper-limb joint [54]
Joint | Shoulder | Elbow | Wrist |
Flexion | + 180° | + 145° | + 85° |
Extension | -80° | -10° | -70° |
Abduction | + 180° | / | 20° |
Adduction | -50° | / | -40° |
Internal rotation | + 90° | / | / |
External rotation | -90° | / | / |
Supination | / | / | + 90° |
Pronation | / | / | -90° |
Consider the needs of personalized training for both arms, the presence of passive joint R4 and the R5 (in Fig. 2d), allows bilateral training and the translational joint P9 and P11 enables the lengths adjustment of the exoskeleton linkages. To further improve the wear comfortableness and training result, joint P1, P2 and P3 are also adjustable for user with distinct body dimension. To provide a better kinematic description, the DH parameters for the upper-limb is established (Table 2).
Table 2
The DH parameters of the exoskeleton
link\(\:\left(\varvec{i}\right)\) | \(\:{\varvec{\alpha\:}}_{\varvec{i}}\) (deg) | \(\:{\varvec{a}}_{\varvec{i}}\) \(\:\left(\varvec{m}\right)\) | \(\:{\varvec{d}}_{\varvec{i}}\) \(\:\left(\varvec{m}\right)\) | \(\:{\varvec{\theta\:}}_{\varvec{i}}\) (deg) | \(\:\mathbf{o}\mathbf{f}\mathbf{f}\mathbf{s}\mathbf{e}\mathbf{t}\) (deg) | \(\:{\varvec{\theta\:}}_{\varvec{m}\varvec{i}\varvec{n}}\) (deg) | \(\:{\varvec{\theta\:}}_{\varvec{m}\varvec{a}\varvec{x}}\) (deg) |
1 | 60 | 0 | 0 | \(\:{\theta\:}_{1}\) | -24.8 | 0 | 120 |
2 | -60 | 0 | 0 | \(\:{\theta\:}_{2}\) | 85.49 | 0 | 38 |
3 | 0 | \(\:{a}_{1}\) | 0 | \(\:{\theta\:}_{3}\) | 114.8 | -35 | 120 |
4 | 0 | \(\:{a}_{2}\) | 0 | \(\:{\theta\:}_{4}\) | 0 | 0 | 120 |
5 | -90 | 0 | 0 | \(\:{\theta\:}_{5}\) | 90 | -40 | 40 |
6 | 90 | 0 | 0 | \(\:{\theta\:}_{6}\) | 90 | -90 | 90 |
7 | 0 | 0 | 0 | \(\:{\theta\:}_{7}\) | 0 | -90 | 90 |
HIM strategies
Stroke patients require personalized rehabilitation therapy to recover from complicated complications. Even the same patient possess different needs in different rehabilitation stages [55]. Therefore, distinct rehabilitation treatment should be adopted accordingly. To address this issue, HMI strategies are employed to adjust rehabilitation therapy based on individual needs. Three different modes are designed for rehabilitation and daily life assistance, which are robot-in-charge, therapist-in-charge, and patient-in-charge mode.
The robot-in-charge training strategy aims at assisting patients who are unable to move or possess limited mobility. In this mode, the upper-limb exoskeleton guides the patient's limb along a pre-defined path recommended by a physician. The therapist-in-charge training strategy is suitable for patients at all stages. The patient-in-charge training strategy is intended for patients who are capable of low-intensity exercises. In this mode, both rehabilitation and daily activities assistance can be achieved by patient him/herself. The following sections provides more details about these 3 modes.
-
Robot-in-charge strategy
Figure 5 provides the control logic flowchart for the 3 rehabilitation modes. The Robot-in-charge mode is presented by green blocks. In this mode, the rehabilitation treatment therapy (motion of each joint) suggested by the therapist is collected, and the motion trajectories for each exoskeleton joint are planned and executed. The modes involve a dual-closed-loop control logic (Fig. 6 upper panel). Firstly, the servo motors used in this study integrates internal PID control loops, allowing precise control of the motor's position. An additional closed-loop control system is constructed with the aid of IMUs placed at each end of joints (shown in Fig. 8b) to eliminate errors caused by the mechanical structures (such as harmonic gear). The discrete PID control can be represented as follows:
$$\:\begin{array}{c}{\text{U}}_{\text{k}}={\text{k}}_{\text{p}}{\text{e}}_{\text{k}}+{\text{k}}_{\text{i}}\sum\:_{\text{j}=0}^{\text{k}}{\text{e}}_{\text{j}}+{\text{k}}_{\text{d}}\left({\text{e}}_{\text{k}}-{\text{e}}_{\text{k}-1}\right)\#\left(5\right)\end{array}$$
Where, \({U_k}\)represents the system's output at time k, \({e_k}\) is the error at time k between the expected value and the actual value, and \({k_p}\),\({k_i}\) and \({k_d}\) are the proportional, integral, and derivative coefficients, respectively. The lower left panel in Fig. 6 suggests this control system is capable of accurately tracking the joint angles.
-
Therapist-in-charge strategy
The control logic of therapist-in-charge mode is represented by the orange blocks in Fig. 5. In this mode, the therapist is required to wear a master, motor-less exoskeleton, which possess similar mechanism as the base exoskeleton. As shown in Fig. 7, 2 possible mapping methods, which utilizes IMU and encoder matrices, are proposed.
IMUS capture the arm's motion state in real-time through IMU 1–3, as illustrated in Fig. 7. At the beginning of the therapist-in-charge mode, all IMU values are reset to zero. During system operation, the roll, pitch, and yaw angles obtained by IMU1 control the exoskeleton joints 1–3, the roll angle obtained by IMU 2 controls joint 4, and the roll, pitch, and yaw angles obtained by IMU 3 control the rehabilitation exoskeleton joints 5–7 accordingly. This process can be represented by the following equation:
$$\:\begin{array}{c}\left\{\begin{array}{l}{\left[\begin{array}{c}\text{J}\text{o}\text{i}\text{n}\text{t}1\\\:\text{J}\text{o}\text{i}\text{n}\text{t}2\\\:\text{J}\text{o}\text{i}\text{n}\text{t}3\end{array}\right]}_{\text{S}\text{h}\text{o}\text{u}\text{l}\text{d}\text{e}\text{r}}=\:{\left[\begin{array}{c}\text{R}\text{o}\text{l}\text{l}\\\:\text{P}\text{i}\text{t}\text{c}\text{h}\\\:\text{Y}\text{a}\text{w}\end{array}\right]}_{\text{I}\text{m}\text{u}1}\\\:{\left[\text{J}\text{o}\text{i}\text{n}\text{t}4\right]}_{\text{E}\text{l}\text{b}\text{o}\text{w}}=\:{\left[\text{P}\text{i}\text{t}\text{c}\text{h}\right]}_{\text{I}\text{M}\text{U}2}-\:{\left[\text{P}\text{i}\text{t}\text{c}\text{h}\right]}_{\text{I}\text{M}\text{U}1}\\\:{\left[\begin{array}{c}\text{J}\text{o}\text{i}\text{n}\text{t}5\\\:\text{J}\text{o}\text{i}\text{n}\text{t}6\\\:\text{J}\text{o}\text{i}\text{n}\text{t}7\end{array}\right]}_{\text{w}\text{r}\text{i}\text{s}\text{t}}=\:\:{\left[\begin{array}{c}\text{R}\text{o}\text{l}\text{l}\\\:\text{P}\text{i}\text{t}\text{c}\text{h}\\\:\text{Y}\text{a}\text{w}\end{array}\right]}_{\text{I}\text{m}\text{u}3}-{\left[\begin{array}{c}\text{R}\text{o}\text{l}\text{l}\\\:\text{P}\text{i}\text{t}\text{c}\text{h}\\\:\text{Y}\text{a}\text{w}\end{array}\right]}_{\text{I}\text{m}\text{u}2}\end{array}\right.\#\left(6\right)\end{array}$$
Encoders capture the arm's motion state in real-time utilizing Encoder 1–7, as illustrated in Fig. 7. The positions of the therapist exoskeleton and upper limb rehabilitation exoskeleton are calibrated based on the absolute value of encoders. During system operation, the real-time values from Encoder 1–7 are loaded to control the exoskeleton joints 1–7. This process can be represented by the following formula:
$$\:\begin{array}{c}{\text{J}\text{o}\text{i}\text{n}\text{t}}_{\text{I}}\:=\:{\text{E}\text{n}\text{c}\text{o}\text{d}\text{e}\text{r}}_{\text{i}}\#\left(7\right)\end{array}$$
-
Patient-in-charge strategy
Deep learning models are renowned for its robust predictive capabilities, making it adaptable for wider range of user. In this study, a patient-in-charge training strategy is developed based on deep learning technology, targeting post-stroke patients in chronic phase and without symptoms such as clonus, flexor spasms (because these two symptoms trends to generate fake intentions). To make a reasonable intention prediction, 4 steps are followed, which are data acquisition, data pre-processing, intention recognition model construction, and model performance evaluation.
Consider the geometry complexity of human upper-limb, to capture and measure HMI forces effectively is crucial for the exoskeleton operation. To address the challenge, an interaction sensor matrix for arm and a forearm is designed (in Fig. 8a-f), which contains 3 IMU units and 21 piezoelectric pressure sensors. The essential part of the mechanism is the sensor block (Fig. 8d), which contains 3 pressure senor units. The installed position of sensor blocks in arm/ forearm holder can be largely adjusted in one direction (0–30 mm), and each pressure sensor unit can be compressed up to 8mm, so as to fit people with diverse arm size.
Based on the Arm Motor Ability Test (AMAT) [56] and the upper-limb joint range of presented in Table 1, 15 rehabilitation actions for training are designed (Fig. 9). In this study, motion intention is captured and analyzed based on triggering patterns of sensor matrix. The correlation between action and active sensors are also highlighted.
For data collection, 4 healthy volunteers (authors of this work) are participated in the process, with each action repeated 250 times for each volunteer. This resulted in a total 15,000 action data sets. For these data sets, 80% are randomly selected for training, while the remaining 20% are allocated as test set. The repeated number for each of the 15 actions is equal in both training and test set.
Prior to data collection, calibration is performed on both the IMU and pressure sensors. The data collection frequency is fixed at 40 Hz. Notably, in the real practice, the recorded data pattern of individual sensors varied significantly even for the same posture/action. In order for a decent learning process, individual action data set with diversity are obtained on purpose
Data processing techniques, such as normalization and feature extraction, are essential for deep learning. Consider the different measurement ranges of sensors and the distribution patterns of each dataset, training without normalization may reduce training efficiency and the convergency of the created model might be more challenging. Normalization involves two steps. Firstly, scaling all numbers within each column to the range of [0,1] utilizing\(\:\frac{\left(\varvec{x}-{x}_{Min}\right)}{{x}_{Max}-{x}_{Min}}\). Secondly, adjust the mean value of dataset to 0 based on \(\:\frac{\left(\varvec{x}-\mu\:\right)}{\sigma\:}\). Figure 10 (b) shows the triggering patterns of the pressure sensors for the 15 different actions, with the aid of the aforementioned normalization process.
In the field of intention recognition, deep learning methods are extensively exploited. However, most of these methods such as MobileNet [57] are predominately optimized for computer vision related applications. In addition, model size and real-time performance are also neglected, which largely constrains their deployment in wearable devices [10]. For these reasons, the model is proposed as depicted in Fig. 11. This model is designed to meet the requirements of high prediction accuracy, decent real-time performance, and lightweight deployment.
As shown in Fig. 11a, the model takes a tensor \(\:Xϵ{R}^{N\times\:L\times\:W}\) as input, where N, L, and W represent batch size, length, and width, respectively. The tensor X goes through Blocks1, Blocks2, and Blocks3, resulting in the final output tensor \(\:Xϵ{R}^{N\times\:M}\), where M represents the number of classes. Based on the data format, it can be noted that L = 200, W = 27, and M = 15. The hyperparameters for each module of the model are determined by the ‘Genetic Algorithm’ (Appendix Table 1).
To tackle time-series data in convolutional neural networks, it is necessary to preprocess the data and perform feature construction [13, 14]. Feature construction is fundamental for deep learning models to extract relevant features, which significantly influences the model's performance. The Block 0 module integrates the feature construction process into the proposed deep learning model, capable of learning feature from the data and construct parameters based on feature. As shown in Fig. 11b, Block 0 is the model's first layer, it processes 2 dimensional \(\:200\times\:27\) raw data matrix and its output being a \(\:3\times\:200\times\:27\) feature matrix. Specifically, Block 0 employs 3 parallel branches to obtain the data's embedding [47], local features and global features. Block 0 output data via a channel-wise concatenation. This approach serves as a "rough filter" for the data, helping the model quickly learn significant features (e.g., local features) and speeding up model convergence. The following section elaborates on the working principles of the 3 branches of the Block 0 module.
In the first branch, local features of the original data are obtained using the TimesBlock [58] module. The core idea of TimesBlock here is to decompose complex time-series variations into different periods using Fourier transformation. In the second branch, the original data tensor is projected to a high-dimensional space using embedding. Embedding comprises Token Embedding, Segment Embedding, and Position Embedding [47]. In the third branch, global features of the original data are obtained using the Encoder module [47]. The mathematical basis of this module is the self-attention mechanism, with the deployment of Encoder module, linear complexity is achieved, which accelerates the calculation [59]. The point convolution operation on this branch is used to adjust dimensions only. The output tensor of the 3 branches is concatenated along the channel dimension, allowing the input tensor to construct both local and global information. The output tensor of the Block 0 module can be represented as \(\:{X}_{O}ϵ{R}^{C\times\:H\times\:W}\).
The feature extraction structure designed for this dataset should balance accuracy with lightweight requirements. To meet this goal, Block 1 and Block 2 modules are designed based on two classic feature extraction structures, which are ResNet and MobileNet [57]. In addition, Squeeze-and-Excitation (SE) attention mechanism is also incorporated in the above 2 blocks to further reduce parameters redundancy. The output tensor of the Block1 module can be represented as \(\:{X}_{O}ϵ{R}^{C\times\:H\times\:W}\). Block 2 (Fig. 11b) is an inverted residual structure composed mainly of the SE module and depth wise separable convolution module [60]. The core idea of SE is to evaluate the importance of each feature channel. Based on the evaluation, vital features are excited while less important features are squeezed for the task. The squeeze and excitation operations in SE is represented as follows:
$$\:\begin{array}{c}\left\{\begin{array}{l}{Z}_{c}={F}_{sq}\left({u}_{c}\right)=\frac{1}{W\times\:H}\sum\:_{i=1}^{W}\sum\:_{j=1}^{H}{u}_{c}\left(i,j\right)\\\:S={F}_{ex}\left(z,W\right)=\sigma\:\left(g\left(z,W\right)\right)=\sigma\:\left({W}_{2}\delta\:\left({W}_{1}z\right)\right)\\\:{X}_{c}={F}_{scale}\left({u}_{c},\:{s}_{c}\right)={s}_{c}{u}_{c}\end{array}\right.\#\left(8\right)\end{array}$$
Where \(\:{Z}_{C}ϵ{R}^{C\times\:1\times\:1}\), while \(\:{u}_{c}\) \(\:S\), \(\:{X}_{c}\) represents input tensor, computed weight vector of channel dimension, output of particular channel. Besides, \(\:g\) and \(\:\sigma\:\) are donated as fully connected layer, and sigmoid activation function respectively. The core idea of depthwise separable convolution is to split a complete convolution process into two steps: depthwise convolution and pointwise convolution. Compared with standard convolution, it reduces the parameter number by 30% [57]. The output tensor of the Block 2 module can be represented as \(\:{X}_{O}ϵ{R3}^{C\times\:H\times\:W}\).
Block 3 consists of convolution layers, batch normalization layers, and pooling layers. The convolution layer is used to extract features from specific data segments, while batch normalization ensures proper backpropagation gradients and alleviates the vanishing gradient problem. The pooling layer is employed to reduce the H and W dimensions of the tensor to 1. Point convolution is used instead of fully connected layers. The activation functions used in this study are as follows:
$$\:\begin{array}{c}\left\{\begin{array}{l}Relu\left(x\right)=max\left(0,\:x\right)\\\:swish\left(x\right)=xsigmoid\left(\beta\:x\right)\end{array}\right.\#\left(9\right)\end{array}$$