In this study, we included 82 patients with cervical spinal cord injury (SCI). We discovered that the predictive capability of the feature fusion model was enhanced compared to using radiomics or CNNs individually. Ultimately, we found that the Rad + ResNet-50 model had the highest prognostic predictive value for cervical SCI, with AUCs of 0.940 and 0.922 in the training and test cohorts, respectively.
Contributions of This Study and Comparison with Previous Studies
Functional prognosis after cervical spinal cord injury (SCI), particularly regarding regaining functional independence, is a primary focus in rehabilitation. Predicting SCI recovery can be challenging, but the ASIA grade is a key benchmark for assessing clinical recovery and long-term prognosis [22]. In our study, 82 cervical SCI patients were divided into two groups based on whether their ASIA index improved after one year. Table 1 shows preoperative and postoperative ASIA grades with p < 0.05, indicating clinical significance. Yann Facchinello et al. also highlight that the severity of the neurological deficit at admission, indicated by the ASIA grade, is a critical predictor of recovery [23].
In addition to the ASIA score, MRI images are crucial for prognosis, with metrics shown to be reproducible and predictive of clinical outcomes[24].Hyun-Joon Yoo et al. used machine learning algorithms to create a prediction model for SCI patients based solely on clinical variables, achieving promising results [25]. However, their approach did not include the prognostic value of imaging, particularly MRI.
The BASIC (Brain and Spinal Injury Center) score, developed to predict neurological improvement based on high signal intensity on axial T2-weighted MRI, has proven effective [25] [10]. However, this score focused on a limited set of manually measured MRI features, such as injury length and spinal canal compromise, overlooking many other potential features.
Our study leverages deep learning radiomics to extract and model MRI features. This approach reduces the risk of missing significant features, as evidenced by the combined model (Rad + ResNet50 AUC: 0.9220), which demonstrates improved performance.
We noted that in another study using deep learning, Okimatsu and colleagues developed a CNN model to quantify radiographic characteristics and predict 1-month neurological outcomes in acute SCI patients, achieving 71% accuracy[26]. We extended the follow-up period (1-year) and also improved the model's accuracy to 92.2%. Integrating handcrafted radiomics (HCR) and deep transfer learning (DTL) features for cervical SCI prognosis is novel, and our approach uses standard image data without special training, highlighting its potential.
Training deep learning models on small datasets has limitations, though deep transfer learning can help. André Wirries suggested that using small datasets to train deep learning models for clinical applications is practical[27]. We enhanced model performance through random shifts, rotations, and horizontal flips during training to increase data diversity[28]. Expanding the dataset and controlling feature numbers further improved results. We implemented rigorous feature selection processes in our experiments[29, 30].
ResNet’s structure uses shortcut connections, enabling effective fusion training[31]. To evaluate different models, we tested ResNet-18, ResNet-50, and ResNet-101. Both deep learning and combined models showed that ResNet-50 performed best. Despite having fewer layers and higher Top-1 and Top-5 error rates than ResNet-101[31], ResNet-50’s moderate depth proved advantageous in this research. This indicates that while deeper networks may offer superior learning capabilities, selecting an appropriate network depth for specific research can lead to better model fitting and other benefits, such as reduced computational resources and time. Additionally, ResNet-18 underperformed compared to ResNet-50, suggesting lower-dimensional ResNet networks may lack sufficient fitting capacity.
Deep Learning Model Interpretability
Although deep learning shows promise in disease diagnosis and prognosis, the interpretability of these models can hinder their broader application[32, 33]. Some studies use post-hoc methods or supervised machine learning models to interpret deep learning algorithm outputs[34]. For example, Yixin Wang visualized SVM features using a SHAP-based method[34]. In our study, we generated Grad-CAM images to enhance model interpretability, similar to how Kim, Y. and colleagues used heatmaps to identify regions crucial for prognosis prediction[35].
limitations
There are some limitations to this study that further studied and explored in future work. First, the sample size of cervical SCI was small. Although our findings reflect the predictive ability of deep learning features to a certain extent, more data would be more convincing. Second, this was a restrospective study, more prospective data are needed to verify the effectiveness of the model.