In this study, we proposed a three-step approach to assess the outcome of an AI model, which was established to predict lower extremity joint moments during walking in patient with CP 10, with the aim of evaluating the model’s feasibility in clinical routine. In the first step, we calculated the error (RMSE) between the measured and corresponding predicted moments. Next, we proposed new thresholds to classify the moments into three labels (Green: acceptable, Yellow: acceptable with caution, and Red: unacceptable) based on the computed error. Additionally, we grouped the AI model input (joint kinematics during walking in the sagittal plane) according to their corresponded joint moment label, investigating how changes in kinematics affect the predicted moments. At the last step, we attempted to establish an LDA model to predict the label of a newly joint moment by the AI model.
With significant differences in RMSEs (Pvalues < 0.001) between all labels (Table 1), our proposed thresholds (LKL and UKL) successfully classified the joint moments in presence of deformities (Fig. 2). The advantage of this labeling lies in the fact that LKL and UKL not only consider the measurement error but also take into account clinical relevance. To the best of our knowledge, such thresholds for evaluating the validity of measures for gait kinetics are introduced for the first time.
Technically, if the relative error between the predicted and measured moment (e.g., RMSE) is less than a certain amount (LKL), then we expect that the relative error does not (or only slightly) influence our interpretation of the predicted moment. Thus, individually assessing, the AI model demonstrated the best performance regarding the knee joint moment, with 73% labeled as Green (acceptable) and only 4% as Red (unacceptable). Conversely, for the ankle joint moment, with 14% labeled as Red and only 34% as Green, the performance could be considered poor. As for the hip joint moment, the population was more evenly distributed between Green and Yellow (44% and 49%, respectively), with only 7% as Red, suggesting moderate performance. Considering the average results for the entire training group population (50.6% Green, 41.3% Yellow, and 8.3% Red), the overall performance of the AI model was rated as moderate.
The results for the kinematic features presented in Table 2 were aligned with the joint moment label results. The kinematic differences between labels could be distinguished by six features (Pvalues < 0.05) for the hip moment, with ROM pelvic tilt as the most prominent feature, having the lowest Pvalues between labels (< 0.001). This finding aligned with the work of Wolf et al. 19, who identified pelvic tilt ROM during stance as the most relevant feature out of more than 3000 features to characterize CP gait. It is noteworthy that the greater the ROM pelvic tilt, the more severe the patients' conditions become (ranging from 6.1 to 9.5 degrees from Green to Red). The most prominent feature for the knee moment was MAX knee flexion (Green to Red: 34.8 to 51.9 (deg)). This feature, along with MEAN knee flexion (Green to Red: 19.8 to Red 41.4 (deg)), indicated a shift from mild to severe crouch knee gait in patients with CP 20. There was no feature regarding the ankle moment exhibited significant difference for Yellow vs. Red.
Another prominent feature (Pvalue < 0.001) was the GPS value. The averaged GPS values significantly increased from Green to Red (ranging from 12.7 to 16.4 degrees, 12.9 to 19.9 degrees, and 12.7 to 14 degrees) for the hip, knee, and ankle moments, respectively, except for the ankle moment in the Yellow vs. Red comparison. The GPS values indicated that kinematics had the tendency to deviate further from typically developed reference through Green to Red for all the joint moments. Overall, comparing the kinematic results with the population of labels, it can be concluded that in the presence of more severe deformities and a more strongly varying gait pattern, the probability of the predicted joint moment being labeled as Red increased.
A linear discriminate model was established to investigate whether the AI model could be applied for clinical routine use. Since the model accuracy, as a ratio of all correctly predicted labels to the test population, is very general, the label sensitivity was presented in Table 3 as the probability of correctly predicted labels in each true label population. Following the previously discussed results, the LDA model for the knee joint moment had the highest accuracy (75%) with a Green sensitivity of 94.7%, while the ankle joint moment performance was poor. Additionally, both the model accuracy and label sensitivity for all joint moments increased significantly when the two labels of Green and Yellow were combined. Overall, the Green and Red labels can be accepted as they are for all joint moments. For the knee joint moment, Yellow can be accepted with good confident, considering that only a few may belong to Red, while for the ankle joint moment, Yellow label should to be considered with high caution, indicating a high tendency toward Red.
There are limitations to this study. The AI model used kinetics and kinematics in all three planes, while for this study, only the sagittal plane moments and angles were considered. This limitation may affect the accuracy of the LDA model, indicating a scope for improvement in the label predictor model. Future work should focus on enhancing the AI model to achieve better predictions of joint moments in the presence of severe deformities. Additionally, developing the LDA model to incorporate kinetics and kinematics of all planes would be beneficial.
Overall, the three-step assessment through the labeling the joint moments appears to be a helpful approach. The three-color coded system is simple yet practical in classifying the data. Utilizing the AI model to predict the joint moments in the sagittal plane for patients with CP having mild or moderate severity is recommendable, particularly for the knee and hip joint moments. However, the general performance is still rated as moderate. Although AI models can be time- and cost-effective and facilitate the clinical applications, they still require further development to be considered as an adequate substitution for daily clinical gait measurement routines.