Study design and population
A total of 2,858 pediatric patients who underwent congenital heart surgery between December 2015 and December 2018 at the Children’s Hospital of Zhejiang University School of Medicine were enrolled in the present analysis. Exclusion criteria included patients who died during the surgery, patients who lacked intraoperative anesthesia records, or patients who underwent surgery without CPB, for which the selection process of eligible participants are shown in Supplementary Fig. S1. The final cohort for this study included 1,964 patients; those were randomly split into a development set (n=1,375, 70%) and a validation set (n=589, 30%). All methods were carried out in accordance with relevant guidelines and regulations. This retrospective study was performed according to relevant guidelines and approved by the institutional review board of the Children’s Hospital, Zhejiang University School of Medicine with a waiver of informed consent (2018_IRB_078).
Data collection and pre-processing
The following data elements were requested: gender, age, height, and weight of patients; diagnoses and types of procedures; surgical time, CPB time and aortic cross-clamping time; surgical access route; preoperative and postoperative oxygen saturation; intraoperative anesthetic record data; and postoperative complications.
The most challenging part of the data preprocessing is the time-series vital signs data during surgery with different length which cannot be directly used to construct the prediction model. The evidence-based literature supporting temperature management in cardiac surgery suggests that mild (32℃-35℃), moderate (28℃-32℃), or deep hypothermic (< 28℃) is used to protect the brain and other vital organs during cardiopulmonary bypass20. Firstly, we divided surgery into three phases according to the changes in temperature, namely, the pre- (normal temperature - 35℃), intra- (< 35℃), and post- (35℃ - normal temperature) hypothermic periods. Blood pressure variability including the coefficient of variation and slope was used to measure blood pressure fluctuations of different phases of surgery (Fig. 1). The coefficient of variation was defined as the standard deviation divided by the mean of each blood pressure sequence. In addition, the average changes (the slope) were also calculated as follows:
To further capture the dynamic temporal pattern of blood pressure during surgery, we used a k-means algorithm to cluster the pattern of blood pressure changes in distinct trajectories (Fig. 1). In time-series analyses, the smoothed formulation of dynamic time warping (soft-DTW) was used to measure the similarity between two temporal sequences, which may vary in length and speed21. To perform clustering of the blood pressure, we constructed a matrix R whose elements Ri,j equal the blood pressure trajectory similarity calculated by soft-DTW between patient i and patient j. Next, we performed k-means clustering on the similarity matrix R and the number of clusters was determined by maximizing the average silhouette coefficient and minimizing the within clusters sum of squares (a more detailed description of determining the optimal number of clusters is illustrated in Supplementary Fig. S2). Collectively, the extracted items including blood pressure variability and clustered trajectories combined with patient characteristics, were summarized into 45 features (detail shown in Table 1), which were subsequently used to construct the machine learning prediction model.
The missing values were imputed using multivariate imputation via chained equations package in R22. Class imbalance is also a problem in this study since the number of patients with postoperative complications is relatively small in compassion with the number without complications in some scenarios. It is important to properly adjust your metrics and methods to adjust for your goals23. In this study, different weights were given to positive and negative samples in the classification.
Postoperative complication labels
The label of whether the patient had any complications after surgery and what kind of complications occurred was collected by clinicians based on the review of medical records. Based on more than 30 defined complications (detailed definition of the types of complications is listed in Supplementary Table S1), we classified complications into five complication classes: lung complication, cardiac complication, rhythm complication, infectious complication, and other complications24. Cardiac complication indicates that a complication symptom appeared in the heart except for arrhythmia, such as cardiac dysfunction resulting in low cardiac output, pulmonary hypertension, and so on. Rhythm complication indicates that any cardiac rhythm other than normal sinus rhythm. Infectious complication is defined as the successful invasion and growth of organisms in the tissues of the host such as sepsis, urinary tract infection, and wound infection. Other complications indicate that the symptoms of complications in other organs apart from the lung and heart such as thrombosis, liver dysfunction, ascites, and so on. It is worth mentioning that a patient can experience multiple postoperative complications. In this study, we defined two tasks, binary classification and multi-label classification, to predict whether the corresponding patient has complications and what kind of complications.
Statistical analysis
The patients were categorized according to whether they had experienced postoperative complications. Categorical variables were presented as counts and percentages, and continuous variables as median with interquartile range (IQR) as 25th and 75th percentiles. The Chi-square test was used to compare categorical variables of patients with and without this outcome, and the continuous variables were compared using the Mann-Whitney U test. All tests were two sided, and statistical significance was set at P < 0.05 for all analyses. The difference among multiple clusters were test by Kruskal-Wallis H-test. Data analyses were performed using the published package in the Python (version 3.7) programming environments.
Model development and evaluation
XGBoost implements machine learning algorithms under the Gradient Boosting framework and provides a parallel tree boosting that solve many data science problems in a fast and accurate way25. This model has been approved in previous study18. To understand how single feature relate to the model output we used SHAP values, which are suited for complex models such as neural networks and gradient-boosting machines26. The impact of each feature on the model is represented using Shapley values, which are from the game theory and provide a theoretically justified method for allocation of a coalition’s output among the members of the coalition26.
During training, we used 5-fold cross-validation on the development set to tune hyper parameters for each classification. The optimal model parameters were determined in a random search of 500 different combinations of hyper parameters of XGBoost. For the final binary classification model, we used learning rate as 0.01, gradient boosted trees as 292, maximum tree depth as 3, and minimum child weight of any branch in the trees as 5. For the final multi-label classification model, these parameters respectively were 0.02, 140, 5, and 4.
The accuracy, area under the receiver operating characteristic curve (AUC), recall, and F1 score were the metrics used to evaluate binary classification performance. The accuracy, micro-recall, micro-F1 score, and macro-AUC were the metrics used to evaluate multi-label classification performance. The F1 score is a measure of test data accuracy, which is a weighted average between precision and recall. The micro average calculates metrics globally by counting the total true positives, false negatives, and false positives; while the macro average calculates metrics for each label and find their unweighted mean. We compared the performance of our prediction model with four risk adjustment models mentioned above in the binary and multi-label classification. For patients undergoing multiple procedures, the procedure with the highest level was scored. We assessed the RACHS-1 category, the ABC score, the STS-EACTS mortality and morbidity score as a predictor of postoperative complications by using the univariable logistic regression respectively.