Twenty-one participants (CS = 10; HC = 11) completed EEG recordings while performing a VR-guided motor task. CS and HC groups did not differ with regard to sex (χ²[1] < 0; p > 0.05). CS participants (63.3 ± 10.2 years) had a higher mean age than HC participants (46.3 ± 11.3 years; t[18] = 2.31, p = 0.0319). Six CS participants had left-sided infarct while four had right-sided infarct. From this data set, a total of 4030 models were trained with a total compute time of 103 hours.
We used 13 different ML algorithms (see Methods) to explore the effect of classification accuracy on disease state, stimulation state, movement states, frequency band, time period and electrode location. Time periods examined were pre-stimulation, intra-stimulation at 5 and 15 minutes after stimulation began, and post-stimulation, hereafter referred to as ‘Pre’, ‘Intra5’, ‘Intra15’ and ‘Post’. Features were tested in limited combinations to 1) reduce the exponential increase in model expansion and 2) to narrow clinical interpretability of the parameter space. Note that all accuracies depicted are the classification accuracies on the validation set; no samples in the validation set were used to train any of the models.
To investigate the baseline model accuracy of discriminating between healthy and chronic stroke participants, as a control we compared each algorithm using all electrodes, frequency bands, and movement states grouped together. We observed that the mean accuracy of classifying HC and CS participants was 71.1% for the sham group and 83.4% for the stim group during the pre-stimulation time period, likely due to the increased hemispheric asymmetry evident in the recordings relative to healthy controls (p < 0.0016). A higher mean classification accuracy for stim versus sham groups persisted throughout all intra- and post-stimulation time periods and was greatest at the intra5 time period (stim: 80.4 ± 11.7% versus sham: 58.5 ± 9.1%; t[24] = 6.3653 p < 1.4e-6, Fig. 1).
For the intra5, intra15 and post-stimulation time periods, the five ensemble models (global, hard, me, train, uni) converged to produce similar accuracies: intra5 (93.6%), intra15 (90.0%), and post-stimulation (93.8%) (Supplementary Table 1).
To investigate the baseline model accuracy in discriminating between sham and stimulation states, as a control we compared each algorithm using all electrodes, frequency bands, and movement states grouped together. We observed that the mean accuracy of classifying stim versus sham state was 70.4% for the HC group and 86.8% for the CS group during the pre-stimulation time period (p < 0.00023). A higher classification accuracy for the sham versus stim groups persisted throughout all time periods and was greatest at the intra15 time period (CS: 80.3 ± 10.8% versus HC: 64.0 ± 8.4%; t[24] = 5.4557, p < 4.5e-5, Fig. 2).
For the intra5, intra15 and post-stimulation time periods, the five ensemble models (global, hard, me, train, uni) converged to produce similar accuracies: intra5 (92.3%), intra15 (92.0%), and post-stimulation (92.0%) (Supplementary Table 2).
To investigate the accuracy of each algorithm in discriminating between hold and reach movement states, we created models using all electrodes and frequency bands (Fig. 3). We observed that the mean accuracy of classifying hold versus reach was 68.6 ± 0.2% for the CS sham group, 72.2 ± 0.5% for the CS stim group, 79.6 ± 0.3% for the HC sham group, and 71.6 ± 0.6% for the HC stim group at the pre-stimulation time period (Fig. 3C). A higher mean classification accuracy for sham groups persisted throughout all time periods except for the CS cohorts at the pre and intra15 time periods. At the intra15 time period, the mean accuracy for classifying hold versus reach was 75.3 ± 1.3% for the CS stim group and 71.5 ± 1.5% for the CS sham group (t[23] = 9.7250; p = 8.45e-10, Fig. 3, Supplemental Table 3).
In the CS stim group, LR, LDA, and DT each performed this classification with 76.1% accuracy at the shortest average training time of 0.77 sec per model (lines superimposed in Fig. 3). By comparison, XGBoost performed this classification with 75.2% accuracy at the longest average training time of 1 minute 3.8 seconds sec per model.
To investigate the accuracy of each algorithm in discriminating between hold and reach movement states by frequency band, we created models using all electrodes recorded in the CS stim group. We observed that the mean accuracy of classifying hold versus reach was consistently higher in the stimulation and post-stimulation time periods (Fig. 4A, note that for the pre-stimulation period, RF classification accuracy was below 65% for all frequency bands and is not shown). Interestingly, 10 out of 13 algorithms showed no differences in accuracy between frequency bands. At the intra15 time period, LR, LDA, and DT classified hold versus reach equally at the highest accuracy (76.1%, Fig. 4B) for all frequency bands.
In contrast, the five ensemble models (global, hard, me, train, uni) showed the highest classification accuracy for the gamma frequency band at the intra15 time period in comparison to all other bands (alpha = 74.5%, beta = 75.0%, theta = 75.1%, delta = 75.2%, and gamma = 75.6%; Supplementary Table 4), although this difference was not statistically significant (p = 0.37).
To investigate the accuracy of each algorithm in discriminating between hold and reach states by electrode laterality, we created models using all frequency bands. We chose the electrode overlying primary motor cortex (C3 and C4) according to the contralateral hand used to perform the reaching task. That is, if the right hand performed the task, then the C3 electrode was labeled as the ipsi-stimulated electrode while the C4 electrode was labeled as the contra-stimulated electrode and vice versa. We observed that the mean accuracy of classifying hold versus reach was consistently higher in the contra-stimulated electrode for the sham groups (i.e., HC and CS) except for the pre-stimulation period in the HC sham cohort. In contrast, in the CS stim group, classification accuracy was highest in the ipsi-stimulated electrode during the stimulation periods only (intra5: t[29]=-4.26, p = 0.0003; intra15: t[23]=-3.72, p = 0.0011, Fig. 5A, Supplemental Table 5).
Notably, the classification accuracy was similar in both the contra-stimulated and ipsi-stimulated electrodes in the HC stim group, suggesting that the higher accuracy in the ipsi-stimulated electrode in the CS stim group is likely not driven purely by stimulation artifact. Moreover, the classification accuracy in the ipsi-stimulated electrode in the CS stim group is constant relative to the pre-stimulation state, again suggesting a physiological response to stimulation that peaks at the intra15 time period (Fig. 5A, Supplemental Table 5).