2.1 Study design, setting, and participants
We established a multi-disciplinary team, including emergency physicians, data scientists, information engineers, nurse practitioners, and quality managers for this project (Figure 1). After our literature review, we decided to use the previous study about predicting mortality in older ED patients with influenza as the main reference [4]. We identified all older patients (≥65 years old) with influenza who visited the ED between January 1, 2009, and December 31, 2018, from the EMRs of three hospitals: Chi Mei Medical Center, Chi Mei Hospital, Liouying, and Chi Mei Hospital, Chiali. The present study hospitals are not the hospitals for developing the GID score. The criteria of influenza are defined as the diagnosis of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) of 487 or 488 or a prescription of Oseltamivir, Peramivir, or Relenza in the index ED visit.
2.2 Definitions of variables
We included age, sex, vital signs, and past histories of hypertension (ICD-9: 401-405), diabetes (ICD-9-CM: 250), COPD (ICD-9-CM: 496), CAD (ICD-9-CM: 410-414), stroke (ICD-9: 436-438), malignancy (ICD-9: 140-208), congestive heart failure (CHF, ICD-9-CM: 428), dementia (ICD-9: 290), bedridden, feeding with a nasogastric tube, and nursing home resident, laboratory data including white blood cell count (WBC), bandemia, hemoglobin, platelet, serum creatinine, CRP, procalcitonin, glucose, Na, K, GOT, and GPT for this study. We adopted 10 potential predictors proposed in the previous study as the feature variables for the ML [4]: (1) tachypnea (respiratory rate >20/min); (2) severe coma (GCS ≤8); (3) history of hypertension; (4) history of CAD; (5) history of malignancy; (6) bedridden; (7) leukocytosis (WBC >12,000 cells/mm); (8) bandemia (>10% band cells); (9) anemia (hemoglobin <12 mg/dL); and (10) elevated CRP (>10 mg/dL). The patients who did not have a record of subsequent follow-up were excluded. Missing laboratory data were treated as the normal values (i.e., respiratory rate: 12/min, GCS: 15, WBC: 7000 cells/mm, band form: 0%, hemoglobin: 12 mg/dL, and CRP: 2.5 mg/dL).
2.3 Outcome measurements
The outcome measurements were binary coded as the follows: (1) hospitalization; (2) complications with pneumonia (ICD-9-CM: 480-486): (3) complications with sepsis or septic shock (ICD-9-CM: 038, 790.7, 995.91, 995.91, 785.52); (4) admitted to intensive care unit (ICU); and (5) death.
2.4 Ethical statement
The present study was approved by the institutional review board in the Chi Mei Medical Center. Informed consent from the participants was waived because this study is retrospective, and it contains de-identified information, which does not affect the rights and welfare of the participants.
2.5 Data processing, comparison, and application in the HIS
First, we extracted, transformed, and validated the data from the HIS into a data mart. Missing and ambiguous data were carefully processed at this step. Second, we used the synthetic minority oversampling technique (SMOTE) preprocessing the algorithm for the model training and testing because of imbalanced samples. Third, we compared accuracy, precision, sensitivity, specificity, positive predictive value, negative predictive value, F1, and the area under the curve (AUC) among the analyses of the random forest, logistic regression, support vector machine (SVM), K-nearest neighbors (KNN), and light gradient boosting machine (LightGBM) and decided to use the random forest algorithm for its best model evaluation in most outcomes. All four models were trained and tested on a randomly partitioned 70%/30% split of the data. Fourth, we deployed the model in the AI web service and integrated it with the HIS in the ED. After two-months of pilot testing and validating, we launched the condition prediction application in the HIS to assist physicians for decision making in real time.
2.6 Patient and public involvement
Patients and the public were not be involved in this study.