Study design and population
This prediction score developed in accordance with the guidelines of the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis[13]. The study protocol was approved by the Institutional Review Board of Minhang Hospital affiliated to Fudan University. Written informed consent was obtained from all patients or their welfare guardians for data collection and subsequent analysis. The data that support the findings of this study could be available from the corresponding author under reasonable request.
This was a retrospective study using prospectively collected hospital-based dataset. We included all the ischemic stroke patients, confirmed by Computer Tomography (CT) or Magnetic Resonance Imaging (MRI), admitted to our stroke center within 24 hours after symptom onset from January, 2018 to July, 2019. 70% of them were randomly selected as the model development dataset and the remaining 30% of them as the model validation dataset. As for patients with unknown symptom onset time, we defined the time last known well as the onset time. Patients with hemorrhage stroke, transient ischemic stroke or pre-stroke disability were excluded from this study. The unfavorable outcome at 3-month was defined as modified Rankin Scale (mRS) >2 and it was evaluated through out-patient routine visit or structured telephone interview by experienced neurologists or trained nurses who were blind to the patients’ archives. Patients lost follow-up or with severe complications during hospitalization were also excluded. Ischemic stroke subtype was classified according to the Trial of Org 10172 in Acute Stroke Treatment (TOAST) criteria by the experienced neurologists at our center[14].
Predictors included in the multivariate logistic regression
Stroke severity was assessed on admission with the NIH Stroke Scale (NIHSS). Risk factors like alcohol consumption and smoking habit were defined as binary variable (0 No or 1 Yes). We gave 1 (Yes) to the patients who had those habits no matter whether they have quit it or not. Other risk factors like hypertension and atrial fibrillation (AF) were defined according to standard clinical criteria and confirmed within the 24 hours after admission. The blood sample was measured within 24 hours after admission. All laboratory tests (White Blood Cell, Low Density Lipoprotein, Homocysteine, Glycated hemoglobin, Creatinine, and Fibrinogen) were treated as binary variables in our model. Recanalization therapy including intravenous thrombolytic therapy (IVT) and endovascular thrombectomy (EVT) was also treated as binary variables (0 Yes 1 No).
Strategies for developing a novel and parsimonious prediction score
In order to facilitate the use of our prediction model, we transformed our model into a scoring system. All the continuous predictors were dichotomized into binary variables by calculating their Youden index. The optimal cut-point of the variable age was 71 years old, stroke severity was 6, White Blood Cell (WBC) was 6.83×109/L, Low Density Lipoprotein (LDL) was 2.93mmol/L, Homocysteine (HCY) was 13umol/L, Glycated hemoglobin (GH) was 6.2%, Creatinine was 81mmol/L, and Fibrinogen was 2.98g/L respectively. These dichotomized variables were put into the multivariate logistic regression. Forward and backward step-wise logistic regression were applied by using the likelihood ratio test with Akaike’s information criterion (AIC) as the stopping rule. The P value of the selection was set at <0.1 and <0.2. The model with the smallest AIC value, indicating the best predictive ability, was our target model. The β-coefficients of the predictors in the target model was rounded to its nearest integer as the score of each predictor. The S2AFI score was generated by calculating the sum of each score. The optimal threshold of the novel score was determined from receiver operating characteristic (ROC) curve analysis.
Statistical analysis
Since we dichotomized all the continuous variables, Fisher exact test was used to compare the differences between the groups. The analysis of the variables as continuous variable was shown in online supplement material.
All the predictors included in the multivariate logistic regression analysis were confirmed with no strong collinearity (variation inflation factors <2). The Youden index of those predictors was calculated by Liu Method[15]. The performance of the novel prediction score was tested with the ROC curve analysis. After the prediction score developed, we validated its performances in two datasets: one is the 30% validation dataset and the other is all the patients included in this study. The area under the curve (AUC) of the score as a continuous predictor and dichotomized predictor in these two datasets was calculated respectively. The calibration of the score was assessed for goodness of fit by plotting the estimated probability on the x-axis against the observed probability along the y-axis compared with the diagonal line, representing perfect calibration. P for trend test was applied to test whether the higher score is related to the higher risk of achieving an unfavorable outcome at 3-month. The statistical analysis was performed on STATA (Version 15.0 Stata Corp College Station, Texas, USA) and R software (R version 3.5.3 The R Foundation for Statistical Computing). Two tailed P value less than 0.05 was considered statistically significant.