Study design and patients
The study design is shown in Fig. 1. We conducted a retrospective, multicenter cohort study at Saitama Medical University Hospital and Self-Defense Forces Central Hospital in Japan, which are designated medical institutions for infectious diseases under the Infectious Disease Control Law, Japan. We enrolled adult patients (≥ 18 years old) with confirmed COVID-19 by molecular diagnostic methods (quantitative reverse transcription polymerase chain reaction [RT-qPCR] or loop-mediated isothermal amplification [LAMP]) and hospitalized for isolation and treatment under the Infectious Disease Control Law. Patients who did not undergo routine blood examinations (complete blood count, serum biochemical tests, and coagulation tests) within 10 days of initial symptom onset were excluded from this study. We also excluded patients who were treated with oxygen therapy before hospitalization.
First, patients who were hospitalized from February to October 2020 were enrolled for a derivation dataset. Then, patients who were hospitalized from November 2020 to March 2021 were enrolled for a temporal validation dataset. In Japan, four waves of COVID-19 have occurred from February 2020 to May 2021. The patients admitted in the first and second waves and in the third and fourth waves were included in the derivation and temporal validation datasets, respectively (Appendix 1). A comparison dataset for comparing risk scoring models included all patients admitted during the study period. Clinical information was retrospectively collected from the hospital electrical medical records and included clinical records and laboratory findings. The primary outcome was in-hospital clinical deterioration within 14 days of hospitalization.
Definitions
Clinical characteristics and laboratory findings at admission were used to derive and validate the risk scoring model. Clinical deterioration was defined as administration of oxygen therapy with SpO2 < 93% on room air during the hospitalization. The observation period was defined as the period from patient’s admission to patient’s discharge or 14 days after the admission, whichever came first. The day of initial symptom onset was defined as the day of symptom appearance according to the patients or their family members. For asymptomatic patients, initial symptom onset was determined as the day of hospitalization. Disease severity was classified by a clinician with 8 years’ experience in infectious disease physician (KI) according to the 8-category ordinal scale recommended by the World Health Organization (30).
Statistical analysis
Continuous variables are expressed as the mean and standard deviation or median and interquartile range (IQR) and were compared using a t-test or Wilcoxon rank-sum test for parametric or non-parametric data, respectively. Categorical variables are presented as frequency and percentage (%) and were compared using a chi-square test or Fisher’s exact test, as appropriate. A two-sided p value < 0.05 was considered statistically significant. All statistical analyses were conducted using R (v 4.0.2; R Foundation for Statistical Computing, Vienna, Austria; http://www.R-project.org/).
Candidate predictor selection and model development
Based on the literature, 12 candidate predictor variables were selected from clinical characteristics and potential biomarkers associated with clinical deterioration. Self-reported clinical symptoms were excluded for better objectivity. Values unavailable for at least 25% of the patients in the derivation dataset were also excluded. Finally, 9 candidate predictor variables—age, sex at birth, body mass index [BMI], comorbidities of diabetes mellitus and hypertension, NLR, BUN, LDH, and CRP—were selected for analysis by consensus at a team meeting during the derivation phase. There were no missing values for these 9 candidate predictor variables in the derivation dataset (Appendix 2).
The model building process for developing the risk score was conducted according to the method reported by Knight et al. (31) with minor modifications. In the first step, generalized additive model (GAM) fit to a Cox regression models were built by incorporating continuous variables with P-spline smoothers in combination with categorical variables as linear components. A criterion-based approach to variable selection was applied based on the deviance explained and restricted maximum likelihood. Second, optimal cutoff values for continuous variables were selected from visually inspected plots of component continuous variables with P-spline smoothers. Third, final models using categorized variables were specified with least absolute shrinkage and selection operator (LASSO) Cox regression. L1-penalized coefficients were derived using 10-fold cross-validation to select the value of lambda (minimized cross-validated sum of squared residuals) in the derivation dataset. Shrunk coefficients were converted to a point with appropriate scaling to create the risk scoring model.
Discrimination of the developed risk scoring model—named the Age, BMI, CRP, LDH [ABCD] Risk Score—was evaluated using the area under the receiver operating characteristic (ROC) curve and concordance statistics (C-statistics) in the derivation dataset. The 95% confidence interval (95% CI) of the C-statistics was calculated by bootstrapped resampling (2000 samples). Calibration of the ABCD Risk Score was assessed by using a calibration plot and Brier score.
Model validation
A temporal validation dataset of patients was used for validation of the ABCD Risk Score obtained in the derivation phase. The same clinical and laboratory data were available for analysis in both cohorts. There were no missing values for the ABCD Risk Score in the temporal validation dataset. Discrimination and calibration were evaluated in a validation dataset. The cutoff values of the ABCD Risk Score for three risk groups—low, intermediate, and high—were determined by consensus at a team meeting. Kaplan–Meier survival curves for the patients in each risk group were generated to illustrate the partitioning of the risk of disease deterioration, and differences in clinical deterioration between risk groups were assessed by log-rank test.
Comparison with other risk scoring models of clinical deterioration in COVID-19
The ABCD Risk Score was compared within the comparison dataset with previously reported risk scoring models. Sixteen risk scoring models for clinical deterioration were extracted from the literature; 12 were excluded due to a lack of clinical symptoms, CT findings, or ultrasound findings in the comparison dataset (13–15, 17–19, 21–24, 27, 28). Finally, four risk scoring models were selected for evaluation in this study (16, 20, 25, 26). Discrimination, calibration, and decision curve analysis of each risk scoring model was evaluated in the comparison dataset (Fig. 1). Because the rate of missing values was 20% for D-dimer in the comparison of the risk scoring models, the missing values were imputed by a random forest imputation method.