2.1 Data Extraction and Inclusion Criteria
This study qualified for UCLA IRB exception status (waiver of consent) because there was no direct contact with patients and all data in this study was de-identified. Data was extracted from the Perioperative Data Warehouse, which was developed by the UCLA Department of Anesthesiology and Perioperative Medicine and distributed to multiple centers across the United States 19. All patients with an ICD-10 code of I60 (nontraumatic SAH) or S06.6 (traumatic SAH) between 2013 and 2022 at UCLA were included.
De-identified clinical data spanning the entire hospital admission was extracted for each patient. This information included basic demographics, vitals, routinely collected clinical labs (complete blood count with differential, basic metabolic panel, arterial blood gas, Hemoglobin A1c (HgA1c), and cerebrospinal (CSF) analysis), intracranial pressure (ICP), respiratory variables (O2 flow, FiO2, EtCO2, airway grade, intubation attempts, nitric oxide), fluid status variables (volume of maintenance intravenous (IV) fluid, urine output, blood loss, blood administered), feeding (gastric feeding, emesis), and saturation (pulse oximetry, cerebral saturation)). Finally, the verapamil administration time was recorded if verapamil was administered. A complete list of clinical predictor variables is provided in Supplemental Table I.
2.2 Feature Extraction
For all patients, clinical data collected before ICU admission and after verapamil injection time or time-of-discharge from the ICU (depending on if verapamil was given) were excluded. ICP and mean arterial pressure (MAPs) time-series were encoded into a 20-dimensional feature vector containing values corresponding to the 5th, 10th, …, 95th, and 100th percentiles over the examined time period. Sera values such as complete blood count (CBC) results were encoded as a 5-dimensional feature vector representing the minimum, maximum, median, mean, and count of measurements. Additional “dependent variables” were derived from the originally extracted clinical variables, such as “total count of ICP measurements”.
2.3 Predictive Model Architecture, Training, and Cross Validation
We developed two models to aid in the prediction and interpretation of CVRV risk: a "prospective” predictive model and “retrospective” predictive model. The prospective model was developed to provide a clinically useful tool to assess CVRV risk in ICU patients. Prospective model predictions were performed using 4 hours, 1 day, 3 days, 5 days, 7 days, and 10 days of ICU data starting from the time of ICU admission. A “binary” prospective model was trained to predict between two outcomes, namely if the patient would need verapamil prior to ICU downgrade or not. A more advanced “trinary” prospective model was trained to predict amongst three outcomes, namely “will never get verapamil”, “will get verapamil within three days”, or “will get verapamil after three days” from the time of prediction.
Retrospective models were developed to explore the events preceding CVRV by analyzing variable importance scores in groups of patients at the same chronological stage prior to vasospasm. For the retrospective model, predictions were performed using all ICU data up until 4 hours, 1 day, 3 days, 5 days, 7 days, and 10 days before the prediction target: verapamil injection or ICU downgrade.
Each model was tested with two different predictor sets: an “institutional” and “conservative” (Supplemental Table I). The institutional predictor set contained all clinical variables, whereas the conservative predictor set contained only variables which are strictly measured in a highly standardized manner across medical institutions. Examples of such standardized variables include vital signs, routine lab values, ICP, etc.
The candidate network architectures tested were Logistic Regression, K Nearest Neighbors, Naïve Bayes, Decision Trees, Support Vector Machines, Gaussian Process Classifiers, Ridge Classifier, Random Forest Classifier, Quadratic Discriminant Analysis, Ada Boost Classifer, Gradient Boosting Classifier, Linear Extra Trees Classifier, Extreme Gradient Boosting, Light Gradient Boosting Machine (LightGBM), and CatBoost Classifier.
Models were trained and tuned using the PyCaret 20 ML library (v3.1.0). We employed a stratified five-fold cross-validation scheme to report the average performance of models when trained on different subsets of the entire dataset.
To correct class imbalance in the training set, Synthetic Minority Over-Sampling Technique (SMOTE) 21 was utilized with validation test distribution of values kept as-is. Model hyperparameters were further tuned using grid search on each model type to yield a higher AUC that was calibrated to the outcome across propensity scores. The best model (highest AUC) was then set as the final model. The best model was found to be a LightGBM 22 model in all analyses. We reported ROC curves and AUC values for each fold and calculated the average AUC (± 1 standard deviation (SD)) where 1 = best classifier, 0.5 = random classifier. We also created a PR (precision-recall) curve for each time point with an AP (average precision) value (± 1 SD). Lastly, we reported “Variable Importance Scores” (relative rankings of weights placed on factors used to train the model) for the retrospective models, to help interpret what characteristics the models focused on at different prediction time points.