Among the total 4961 study participants, 1704 (34.3%) experienced one or more rehospitalization events during the first post-injury year. Of those 1704 rehospitalized, 421 (24.7%) participants had a prolonged stay (≥ 17 days), 1254 (73.6%) participants had a non-prolonged stay, and the rest 29 (1.7%) were excluded from analysis because of unknown LOS.
Table 1 displays the demographic characteristics of rehospitalized versus non-rehospitalized participants. The mean age was about 42 years, 79% were males and 30% were minority race. The majority was discharged to a private residence. More than two thirds of each group was working and had a high school degree or less.
The average rehospitalization events was 1.7 ± 1.1 times ranging from 1 to ≥ 7 time during the first year after injury. One third of rehospitalized participants had at least one readmission during the first-year post-injury because of UTI (36.9%) followed by PI (11.3%) (Table 1).
Table 2 revealed the performance of the seven ML classification models for predicting rehospitalization. In comparing the accuracy rate of the seven models, the least accurate models were SVM, DT and NB followed by LR then ensemble models which are RF, Adaboost and XGBoost. Regarding sensitivity rate, SVM had the least ability to correctly predict the rehospitalized patients (61.4%) while NB had the highest (79.9%). For the seven applied models, the specificity was higher than the sensitivity rates. The lowest specificity value was for SVM, while the highest was for RF.
Most evaluation metrics were highest in the RF model (best performing model). After performing RFE for RF model, the accuracy and sensitivity increased, while all other evaluation metrices almost remained almost the same or slightly decreased.
For the prolonged LOS model, Table 3 showed that the lowest AUC was for DT and NB, followed by SVM, LR and XGboost and the highest was for Adaboost. All the evaluation metrices increased after applying RFE for the last model.
Figure 1 showed the confusion matrices for the best performing models in rehospitalization and prolonged LOS prediction after RFE.
The selected variables for building the rehospitalization RF and prolonged LOS model based on RFE dimensionality reduction were summarized in Table 4.
Table (1) Demographic characteristics of study participants.
Socio-demographic characteristics at time of injury
|
Non- rehospitalized
|
Rehospitalized
|
Test of significance**
|
(n = 3257)
|
(n = 1704)
|
(p-value)
|
Age at time of injury
|
Mean (SD)
|
41.5 (18.0)
|
42.8 (17.4)
|
0.003*
|
Median [IQR]
|
39.0 [25, 56]
|
42.0 [27, 57]
|
Sex
|
Male
|
2573 (79.0)
|
1353 (79.4)
|
0.914
|
Female
|
681 (20.9)
|
350 (20.5)
|
Other, Transgender
|
3 (0.1)
|
1 (0.1)
|
Place of residence at discharge
|
Private
|
3017 (92.6)
|
1484 (87.1)
|
< 0.001*
|
Nursing Home
|
158 (4.9)
|
159 (9.3)
|
Hospital
|
43 (1.3)
|
30 (1.8)
|
Assisted Living
|
12 (0.4)
|
13 (0.8)
|
Other
|
12 (0.4)
|
6 (0.4)
|
Race
|
|
White
|
2264 (69.5)
|
1180 (69.2)
|
0.152
|
Black
|
726 (22.3)
|
407 (23.9)
|
Other
|
240 (7.4)
|
100 (5.9)
|
Marital status
|
|
Single
|
1493 (45.8)
|
706 (41.4)
|
0.001*
|
Married
|
1229 (37.7)
|
652 (38.3)
|
Other
|
528 (16.2)
|
338 (19.8)
|
Type of insurance
|
Private
|
1733 (53.2)
|
824 (48.4)
|
0.004*
|
Medicare/Medicaid
|
1087 (33.4)
|
649 (38.1)
|
Other
|
382 (11.7)
|
196 (11.5)
|
Occupation
|
Working
|
2127 (65.3)
|
1082 (63.5)
|
< 0.001*
|
Not working
|
768 (23.6)
|
471 (27.6)
|
Student
|
305 (9.4)
|
105 (6.2)
|
Other
|
47 (1.4)
|
38 (2.2)
|
Level of education
|
High school or less
|
2111 (64.8)
|
1150 (67.5)
|
0.120
|
Associate or Bachelor
|
799 (24.5)
|
366 (21.5)
|
Postgraduate studies
|
277 (8.5)
|
149 (8.7)
|
Other/unknown
|
70 (2.1)
|
39 (2.3)
|
Reasons of rehospitalization ***
|
Urinary tract infection
|
630 (36.9)
|
-
|
-
|
Pressure injuries
|
192 (11.3)
|
-
|
Other causes
|
882 (51.8)
|
-
|
Total LOS (n = 1675) ****
|
Mean (SD)
|
-
|
17.5 (36.2)
|
-
|
Median [IQR]
|
-
|
7 [3, 17]
|
*Statistically significant |
**Test of significance for age is Mann Whitney test, for sex is Monte Carlo test and for the rest of variables is Chi-Square.
***Reasons of rehospitalization were created by checking reason of hospital admission at each rehospitalization event in the first-year post-injury and merging them on one variable
****The TLOS information are unknown for 29 participants
Table (2) Performance of machine learning models in the prediction of rehospitalization during the first-year post TSCI
Classification model
|
Accuracy score
(5-fold cross-validation)
|
F1 score
|
Sensitivity
(Recall)
|
Specificity
|
Area under curve
(AUC)
|
Support Vector Machine
|
63.4 ± 0.6%
|
62.5%
|
61.4%
|
63.7%
|
62.6%
|
Decision Tree
|
63.0 ± 2.1%
|
66.4%
|
67.0%
|
64.2%
|
65.6%
|
Naïve Bayes
|
59.2 ± 2.4%
|
70.7%
|
79.9%
|
52.6%
|
66.3%
|
Logistic regression
|
69.8 ± 2.7%
|
67.6%
|
63.2%
|
75.2%
|
69.3%
|
Extreme gradient boost
(XG Boost)
|
72.7 ± 1.0%
|
73.2%
|
69.4%
|
80.1%
|
74.4%
|
AdaBoost
|
74.2 ± 2.0%
|
73.6%
|
69.4%
|
78.7%
|
74.8%
|
Random forest*
|
Full model (53)
|
72.0 ± 2.1%
|
75.8%
|
72.5%
|
80.7%
|
76.6%
|
Reduced model** (26)
|
75.5 ± 1.1%
|
75.7%
|
73.4%
|
79.0%
|
76.2%
|
* The best model to be used is the Random Forest classification model |
**Reduced after doing dimensionality reduction using Recursive Feature Elimination (RFE) |
Table (3) Performance of machine learning models in the prediction of prolonged stay during rehospitalization the first-year post TSCI.
Classification model
|
Accuracy score
(5-fold cross-validation)
|
Sensitivity
(Recall)
|
Specificity
|
Area under curve
(AUC)
|
Decision Tree
|
61.8 ± 5.9%
|
30.3%
|
78.4%
|
55.1%
|
Support Vector Machine
|
77.3 ± 0.6%
|
43.4%
|
64.1%
|
58.4%
|
Naïve Bayes
|
30.1 ± 3.2%
|
14.5%
|
91.9%
|
55.1%
|
Logistic regression
|
74.3 ± 4.5%
|
53.9%
|
68.3%
|
62.1%
|
Random forest
|
77.3 ± 0.6%
|
21.1%
|
93.8%
|
59.5%
|
Extreme gradient boost
(XG Boost)
|
70.1 ± 2.8%
|
34.2%
|
88.4%
|
58.0%
|
AdaBoost *
|
Full model (55)
|
69.6 ± 2.4%
|
55.3%
|
68.3%
|
61.8%
|
Reduced model** (27)
|
66.9 ± 2.0%
|
59.2%
|
70.3%
|
64.7%
|
* The best model to be used is Ada-boost classification model |
**Reduced after doing dimensionality reduction using Recursive Feature Elimination (RFE) |
Table (4): The rank-one variables in RF Rehospitalization and Adaboost prolonged LOS models after Recursive feature elimination.
Variables
|
RF Rehospitalization model
|
Adaboost prolonged LOS model
|
Socio-demographic variables at time of injury
|
Marital status
|
Married
|
√
|
-
|
Single
|
√
|
-
|
Residence
|
Private
|
-
|
√
|
Hospital
|
-
|
√
|
Assisted living
|
-
|
√
|
Gender
|
√
|
√
|
Family income
|
√
|
√
|
Type of insurance
|
Medicare/Medicaid
|
√
|
-
|
Private
|
√
|
-
|
Age at time of injury
|
√
|
√
|
Occupation
|
Working
|
√
|
-
|
Education
|
High school or less
|
√
|
-
|
Associate or Bachelor
|
√
|
-
|
Clinical and neurological assessment
|
Body Mass index (BMI)
|
√
|
√
|
Number of Days from Injury to initial rehabilitation admission.
|
√
|
√
|
Etiology of trauma
|
Vehicular
|
√
|
√
|
fall or hit by flying object
|
√
|
-
|
Violence
|
-
|
√
|
Sport/Recreation
|
-
|
√
|
Performance of spinal surgery during initial hospitalization
|
√
|
√
|
Associated vertebral injury
|
√
|
√
|
Associated injuries with SCI event
|
√
|
√
|
Level of spinal injury
|
Lumbar
|
-
|
√
|
Sacral
|
-
|
√
|
Anal reflexes
|
Voluntary anal contraction
|
√
|
√
|
Deep anal pressure
|
√
|
-
|
ASIA Motor score
|
√
|
√
|
ASIA sensory scores
|
Pin Prick
|
√
|
√
|
Light Touch
|
√
|
√
|
Bladder management using intermittent catheterization program.
|
√
|
-
|
Functional Independence Measure (FIM) total motor score.
|
√
|
√
|
Medical history one-year prior to injury
|
Diagnosis of depression by a health professional prior to the SCI.
|
√
|
√
|
Anxiety diagnosed by a health professional.
|
Panic disorder
|
-
|
√
|
PTSD
|
-
|
√
|
Multiple diagnosis
|
-
|
√
|
the number of times a participant drank alcohol during the year before spinal cord injury
|
√
|
-
|
Reasons of rehospitalization
|
Pressure Injuries (PI)
|
N/A
|
√
|
Urinary tract Infection (UTI)
|
N/A
|
√
|
The SHAP values of the most important variables in model prediction were plotted (Figure (2–3)). The summary plot arranges the variables based on their SHAP value magnitudes in a descending manner. These values are then utilized to illustrate how the first 15 variables impacted the model output. Color-coded representation is employed to signify the feature values, where red indicates high values and blue indicates low values. A SHAP value positioned to the right of the midline denotes a positive impact on prediction, while a value to the left indicates a negative impact. The plot displays the variables in order of their predictive influence and demonstrates how varying values of each variable influenced the prediction outcome.(26)
Figure 3 shows the top 15 variables that have the highest SHAP values for rehospitalization prediction. For the sociodemographic characteristics, the higher the family income, the older age at time of injury, being working, bachelor’s degree and private insurance and being single were associated with lower risk of rehospitalization. Regarding clinical and neurological assessment, the higher the FIM total motor score, ASIA motor and sensory scores and preservation of deep anal pressure were also associated with reduced risk.
In the rehospitalized group, it is shown in Fig. 4, that presence of PI as a reason for rehospitalization is a top predictor. Similar to rehospitalization predictors, FIM and ASIA scores, age at time of injury, BMI, rate of alcohol drinking one-year prior injury and time lapse between time of injury to initial rehabilitation admission were considered important predictors.