We begin by laying out the structure of our results. Subjects came into the clinic and performed routine activity and clinical scales wearing a motion capture suit. The movement data was analyzed and kinematic features were identified from the suit data. We regressed these kinematic features against the clinical scales and predicted future disease progression. Then we applied the kinematic features to regress against FXN gene expression - a key FA molecular biomarker. We compared our features to disease progression with those from a large cohort of clinical scores and show here that our features can outperform the individual's prediction of disease progression by the clinical scales, because we use richer data sources and better features.
Nine FA participants and 3 healthy controls participated in the study. In our study, we used a motion-capture suit to record the behavior of FA patients and healthy controls. All subjects stayed a day and night at our clinical research facility, performing a range of standardized clinical tasks as well as wearing the suit during their daily routine in the clinic - none of the subjects had issues with wearing the suit. To enable monitoring of the FA disease progression on a longitudinal timescale the trial consisted of 4 clinical measurement points: day 1 visit, 3 weeks, 3 months, and 9 months.
To benchmark how FA patients’ movement differs from normal we analyzed two sub-assessments of the SCAFI scale: the 8 Meter Walk (8MW) and the 9 Hole Peg Test (9HPT). Currently clinicians only use the crude measure of duration of these activities for quantifying disease severity21. We defined a series of kinematic features of patient performance (see Table 1), which can be used to objectively distinguish the differences in the behavior of FA patients from the control population. More specifically, in the 8MW case, we focused on the behavioral changes of the full-body kinematics whereas for the 9HPT we focused on the upper body kinematics since subjects were seated during the task (Fig. 1c-e). The latter is important as current scales exhibit ceiling effects9 resulting in the exclusion of wheelchair bound patients from clinical trials. These features have been inspired by the currently used clinical scales, standard gait analysis methods, works on other neurodegenerative diseases with similar movement disorders32–37 and direct clinical experience. Several of these features are intuitively meaningful, perhaps even subjectively visible by eye, while others capture complex and subtle spatiotemporal patterns that may escape even very experienced clinicians. A detailed explanation of all these features is presented in the supplementary methods section.
Table 1
– Suit features from the 8MW task and 9HPT task which were used to train the Gaussian Process regression algorithm against the clinical scales.
ID
|
Feature name
|
Description
|
Num. of features
|
Suit features from the 8MW task
|
F1
|
Coefficient of variation in the walk cycle duration
|
Ratio of the standard deviation to the mean of the walk cycle duration
|
1
|
F2
|
Workspace probability density volume & entropy
|
Volume occupied by the joints calculated using the 3D location of the joints
|
2
|
F3
|
Lower body joint variability
|
Average variability of the hip and knee joint velocities
|
6
|
F4
|
Walk autocorrelation & decay
|
Peaks of the three principal components and delay of the autocorrelation of the joint angular velocities
|
4
|
F5
|
Channel-delay cross-correlation
|
Eigen spectrum values (1, 5 & 35) of the channel-delay cross-correlation matrix
|
3
|
F6
|
Extremities velocity
|
Average peak velocities of the extremities (wrists and ankles)
|
4
|
F7
|
Walk complexity
|
Human movement complexity metric and degrees of freedom to explain 90% variance
|
2
|
F8
|
Leg’s root mean square power spectrum
|
Average energy per walk cycle of the hip and knee joint velocities
|
6
|
F9
|
Joint velocities correlation coefficient
|
Pearson’s correlation coefficients between lower body joints
|
11
|
F10
|
Head - spine movement plane area
|
Area and variability of the head movements on the frontal and sideways plane
|
3
|
Suit features from the 9HPT task
|
F11
|
Average upper body joint velocity
|
Average joint angular velocities of the shoulder and elbow joints
|
5
|
F12
|
Upper body complexity
|
Human movement complexity metric and degrees of freedom to explain 90% variance of the upper body joint velocities
|
2
|
F13
|
Workspace probability density volume & entropy
|
Volume occupied by the joints calculated using the 3D location of the joints of the upper body
|
2
|
F14
|
Upper body autocorrelation full-width at half-maximum
|
The width of the autocorrelation curve (of the joint angular velocities of the upper body joints) at the point when it reaches a value of 0.5
|
5
|
F15
|
Channel-delay cross-correlation
|
Eigen spectrum values (1, 5, 30 & 300) of the channel-delay cross-correlation matrix
|
4
|
F16
|
Arm root mean square power spectrum
|
Average energy of the shoulder and elbow joint angular velocities
|
5
|
F17
|
Wrist average velocity
|
Average velocities of the wrist in space
|
1
|
F18
|
Upper body joints’ correlations
|
Pearson’s correlation coefficients between upper body joints
|
3
|
F19
|
Logistic fit on upper body joints’ velocity
|
Scale parameter of the logistic distribution of upper body joint’s angular velocities and wrist’s velocity in space
|
9
|
F20
|
Head spine movement plane area
|
Area of the head movements on the frontal and sideways plane
|
1
|
Most of these features are significantly different between FA patients and controls thereby quantitatively capturing the effects of FA on movement. To develop a clinically useful and improved measure of deterioration we set out to predict the continuous values of the clinical scale quantitatively. We calculated Pearson’s correlation coefficient of each feature with respect to the SARA and SCAFI scales and most of the features presented absolute correlations in the range of 0.3–0.5 with respect to the two clinical scales. Therefore, none of them can be independently used for monitoring disease progression. However, a more robust prediction can be potentially achieved by combining all these behavioral features - the same way as is applied in the standard clinical scales.
The relationships linking our features and clinical scales are non-linear. Hence, we used a Gaussian Process (GP) Regression algorithm to find the mapping between the extracted behavioral features and the SARA & SCAFI clinical scales. GP regression is a state-of-the-art method that applies a non-linear regression and can capture the uncertainty in the presence of high variability in the data in a principled manner38.
Firstly, we did a cross-sectional prediction of the clinical scales using the suit features from the corresponding visits and the leave one subject out cross validated results are shown in Fig. 2. The algorithm achieved a coefficient of determination (R2) of 0.69 and a root mean square error (RMSE) of 2.54 when predicting SARA scales using suit features of 8MW and a R2 of 0.45 and a RMSE of 5.10 when using suit features of 9HPT. When predicting SCAFI, the algorithm’s performance increased in both the cases of 8MW and 9HPT suit features with R2 of 0.82 and 0.77, respectively. It should be noted that one subject could not do the 8MW test and did only the 9HPT. This establishes that our methodology can be used to predict the clinical scales for non-ambulatory patients too. The challenge with leave-one-subject-out cross-validation is that every time the algorithm is tested it is on a new subject with completely new dynamics. Nevertheless, the suit features can still predict the disease state of the patients with good accuracy. The features selected by the feature selection algorithm for predicting SARA and SCAFI are presented in Supplementary Figs. 15 and 16.
Since our patients do not cover the whole range of the clinical scores (0–40), the algorithm's performance is not very good in predicting the SARA scores at the higher end of the scale (e.g., the two sporadic values at the top right corner of the plot). It is clear from the results that the GP regression performance can be improved with a bigger dataset. It can also be observed that both the 8MW and 9HPT suit features are better at predicting SCAFI when compared to SARA. This should not be surprising as both 8MW and 9HPT are part of the SCAFI test suite and the suit features of the 8MW and 9HPT subtasks will have more predictive power at predicting the SCAFI score.
We then wanted to analyze how well the kinematic features extracted from the suit data of 8MW and 9HPT can accurately predict the longitudinal disease progression occurring in FA patients when compared to scores obtained following conventional assessment by clinicians (Fig. 3). First, we wanted to understand how the clinical scales change over a year as a function of their day-1 clinical scale. In Fig. 3a and 3d, we have plotted the change in SARA and SCAFI scales respectively against their day-1 clinical scale for the FA patients of our study and also patients from EFACTS study9 (a two-year longitudinal study with a larger cohort size).
We used GP regression to predict the month-9 SARA and SCAFI scales of the subjects from our study using the suit features from day-1 8MW and suit features from day-1 9HPT. We also predicted the month-12 SARA and SCAFI scales of the EFACTS study using the day-1 SARA and SCAFI scores from the EFACTS study. The results of the longitudinal prediction for the SARA and SCAFI are presented in Fig. 3b and Fig. 3e respectively. For the longitudinal predictions of SARA, both the day-1 suit features of 8MW and 9HPT achieved a good leave-one-subject-out cross-validated RMSE of 1.70 and 1.16 in comparison with a RMSE of 2.31 using day-1 SARA. Again, for the longitudinal predictions of SCAFI, the 8MW and 9HPT day-1 suit features outperformed day-1 SCAFI (RMSEs of 0.09 and 0.11 vs 0.25). Please see Supplementary Fig. 17 for plot of the R2 of the results. This implies that our suit features contained sufficiently rich information not only to score the disease state of the patient in the present, but also predict how the disease would evolve. The features selected by the feature selection algorithm for the longitudinal predictions of SARA and SCAFI are presented in Supplementary Figs. 18 and 19.
We then plotted the RMSE of the longitudinal predictions as a function of number of subjects used to build the machine learning model (Fig. 3c and Fig. 3e). The model using suit features achieved better performance with a smaller number of subjects (n = 7) compared to the model using the clinical scales with a larger cohort from the EFACTS study (n = 425). This establishes that a small population size is sufficient to build prediction models with high accuracy when using the rich set of suit features which would therefore significantly reduce the numbers of patients required in the context of drug development.
FA is caused by a GAA-repeat expansion in the FXN gene leading to transcriptional repression of FXN and the disease. The length of GAA-repeat has been shown to correlate with age of onset and inversely correlate with disease severity. We therefore hypothesized that FXN blood levels might inversely correlate with disease severity. We predicted the FXN mRNA levels of the subjects using 4 sets of predictors: 8MW suit features, 9HPT suit features, SARA and SCAFI and the results for the leave-one-subject-out cross-validation is presented in Fig. 4a-d. The scatterplot of the measured vs predicted FXN levels using are presented in the first two columns. The first column shows the predictions using the 8MW & 9HPT suit features and the second column shows the predictions using the SARA and SCAFI clinical scales. 8MW and 9HPT achieved a R2 of 0.60 and 0.53 (and a RMSE of 0.53 and 0.62) for the leave-one-subject-out cross-validated case. In comparison, both SARA and SCAFI achieved only R2 values close to zero (with RMSE values of 0.89 and 1.00). (Please see Supplementary Fig. 20a and 20b for the features selected by the feature selection algorithm for the prediction of FXN and Supplementary Fig. 20c for a scatter plot of the FXN against SARA and SCAFI).
The total scores of SARA and SCAFI, might be poorer at prediction because they contain less information. We reasoned that by using the individual components of the SARA and SCAFI scales as predictors, we would improve the predictive capacity (see Supplementary Fig. 21). Although this led to an improvement (R2 of predictions using the components of SARA increased to 0.33 and that of SCAFI increased to 0.19) the prediction using the suit features of the 8MW (R2 of 0.60) and 9HPT (R2 of 0.53) still outperforms the individual components of SARA and SCAFI in predicting FXN levels.