Machine learning to predict outcomes of fetal cardiac disease: a pilot study

doi:10.21203/rs.3.rs-4045996/v1

BACKGROUND: Prediction of outcomes following a prenatal diagnosis of congenital heart disease is challenging. Machine learning (ML) algorithms may be used to reduce clinical uncertainty and improve prognostic accuracy.

METHODS: We performed a pilot study to train ML algorithms to predict postnatal outcomes based on clinical data. Specific objectives were to predict 1) in-utero or neonatal death, 2) high-acuity neonatal care and 3) favourable outcomes. We included all fetuses with cardiac disease at Sunnybrook Health Sciences Centre, Toronto, Canada, from 2012 – 2021. Prediction models were created using the XgBoost algorithm (tree-based) with 5-fold cross validation.

RESULTS: Among 211 cases of fetal cardiac disease, 61 were excluded (39 terminations, 21 lost to follow-up, 1 isolated arrhythmia), leaving a cohort of 150 fetuses. Fifteen (10%) demised (10 neonates) and 70 (52%) of live births required high acuity neonatal care. Of those with clinical follow-up, 57/82 (70%) had a favourable outcome. Prediction models for live birth, high acuity neonatal care and favourable outcome had AUCs of 0.75, 0.82 and 0.72, respectively. The most important predictors for death were the presence of non-cardiac or genetic abnormalities and more severe structural heart disease. High acuity of postnatal care was predicted by increased nuchal thickness, lower gestational age (GA) and birthweight and favourable outcome was predicted by normal fetal right ventricular function, no tricuspid valve abnormalities, and normal GA/weight at birth.

CONCLUSION

Prediction models using ML provide good discrimination of key prenatal and postnatal outcomes among fetuses with congenital heart disease.

Machine Learning

fetal cardiology

congenital heart disease

outcomes

Predictive analytics, in other words data-driven predictions generated from a group of patients and applied to a specific case, are increasingly being used by healthcare providers to reduce clinical uncertainty, improve their prognostic accuracy and potentially improve patient management and outcomes [1, 2]. This would be particularly useful in the field of fetal cardiology where outcome prediction for the individual fetus remains difficult [3-6]. Indeed, upon the diagnosis of a cardiac lesion, patients are typically counselled by a multi-disciplinary team, including fetal cardiologists, maternal-fetal medicine specialists, and neonatal intensive care physicians. This counselling results in parents being provided with a range of possible clinical outcomes which can vary between full surgical repair, no neonatal intervention or single ventricle palliation with associated risks of significant morbidity and death [7-14]. The ambiguity in predicting clinical outcomes creates challenges for families trying to not only grapple with a complicated cardiac diagnosis but to make decisions regarding pregnancy termination vs. continuation vs. palliative compassionate care.

Machine learning technology is emerging as a new and exciting method to offset this uncertainty and is already being used in a variety of settings[15-17]. A recent publication from Dr. Moon-Grady’s group in San Francisco showed that neural networks can be trained to identify normal and abnormal fetal hearts. The next step is to use a dataset comprised of clinical factors and image information to predict the progression and postnatal outcomes of fetuses with specific forms of congenital heart disease (CHD)[18].

This retrospective pilot study aimed to investigate utilizing machine learning (ML) algorithms to create predictive models for salient prenatal and postnatal outcomes for fetuses with congenital heart disease.

Patient population

This retrospective study included all fetuses diagnosed with CHD from January 2012 to December 2021 at a single tertiary centre in Canada (Sunnybrook Health Sciences Centre). The hospital performs approximately 20,000 prenatal ultrasounds and up to 500 fetal echocardiograms per year. Peri- and postnatal outcomes were collected from Sunnybrook, but also from the affiliated obstetric and neonatal centers (Mount Sinai Hospital, Toronto, Michael Garron Hospital, Toronto and The Hospital for Sick Children, Toronto) depending on the ultimate location of delivery and postnatal care. The study was approved by the Research Ethics Board of all participating institutions as well as and Johns Hopkins University where the analysis was performed. The requirement for individual patient consent was waived for a retrospective study.

Clinical characteristics and neonatal outcomes (Table 1) were collected through chart review. The severity of congenital heart disease was defined according to the Hoffman criteria as mild, moderate or severe (See Appendix A) [19].

Machine learning algorithms were then developed to predict the following outcomes of interest: 1) in utero demise/stillbirth or death within 72 hours of birth despite planned active care, 2) need for high level neonatal care (delivery at a tertiary care hospital, prostaglandins, neonatal intensive care or intensive care admission, mechanical ventilation, neonatal surgical or catheter intervention < 30 days of life) and 3) favourable postnatal outcomes defined as survival without severe developmental delay at last follow up, which was extracted from the patient’s chart. The severity of congenital heart disease was defined according to the Hoffman criteria as mild, moderate or severe (See Appendix A) [19].

Predictive features and clinical outcomes

The feature set consisted of 70 potential predictors; 62 out of 70 predictors were integrated in all three models including information about demographics, comorbidities, medical management, and fetal structural findings from the fetal echocardiogram comprising cardiac anatomy. Additionally, 7 more predictors including labor induction, mode of delivery, sex, gestational age at birth, birth weight and Apgar score were used in the models predicting the need for high acuity neonatal care and favourable outcomes (69 predictors total for these 2 models). Finally, the ML model predicting the risk of adverse outcomes also included postnatal cardiac intervention (surgical or catheter based) information in addition to the 69 variables listed above.

Data preprocessing

Missing values imputation

We generated an analysis dataset for every ML model, comprising a subset of patients with exclusively recorded outcome values. Three separate analysis datasets were constructed, each aligning with the corresponding outcome and its associated number of predictors. To address missing information within each dataset, we employed a predictive imputation method [20]. This method considers the similarity between patients in each dataset. An iterative imputation algorithm was implemented, allowing up to 50 cycles. In each cycle, a decision tree regressor was applied to each dataset, aiding in discerning patterns among patients and relationships between predictors to approximate the missing measurements.

After estimating missing values in the three analysis datasets, predictor variables with more than two categories underwent transformation using one-hot encoding.

Tree based machine learning model induction and evaluation

The XGBoost tree-based ML algorithm [21] was applied to each of these datasets. The use of the XGBoost algorithm facilitated the categorization of patients into two distinct groups, allowing for the assessment of non-linear relationships between predictors and their respective outcomes. To improve the XGBoost predictions, optimization was performed using the area under the receiver-operating characteristic curve (AUC) as a benchmark to evaluate model effectiveness. Furthermore, the XGBoost algorithm underwent hyperparameter tuning [22] to achieve optimal results. This tuning process involved 5-fold cross-validation (CV), utilizing Bayesian optimization techniques [23] and implementing a search grid to identify the combination of XGBoost parameters that maximized the area under the curve (AUC).

We have employed SHAP (SHapley Additive exPlanations) method to gain insights into influence of individual features on the model's predictions [25]. SHAP values were calculated for each predictor across all patients, and we illustrated the impact of each feature on the model’s log-odds prediction through a beeswarm plot. Features with higher SHAP values contribute more significantly to the model's decision-making process, and are displayed further away from the center regardless of whether they increase or decrease the predicted outcome.

In estimated the 95% confidence interval (CI) for the AUC metric, bootstrapping was employed with 500 resamples per fold across the 5-fold CV, yielding a cumulative total of 2500 bootstraps for each model. The CI was then determined utilizing the standard error derived from the distribution of bootstrapped AUC values. All the analyses were implemented using Python version 3.9.12.

Clinical outcomes

Between January 1, 2012 and December 31, 2021 a total of 1576 fetal echocardiograms were performed, of which 211 (13%) fetuses were diagnosed with congenital heart disease. Sixty-one cases (29%) were excluded due to pregnancy termination (N=39), loss to follow up (N=21) and benign arrhythmia (N=1) (isolated premature atrial contractions with structurally normal heart). This left a total cohort of 150 fetuses for analysis. At the diagnostic fetal echocardiogram (mean 24 6/7 weeks gestation), there were 63 (37%) cases with minor cardiac abnormalities and 46 (31%) with major cardiac abnormalities. In another 41 (27%) the fetuses had an initial normal fetal echocardiogram but later had milder forms of CHD at prenatal follow up. Non-cardiac abnormalities were seen in 24/111 (22% cases) and genetic abnormalities were present in 19/63 (30% of those tested prenatally).

There were 15 (10%) perinatal deaths (5 in utero, 10 neonatal). Among the 135 live births, 70 (52%) neonates needed high acuity neonatal care. Of the liveborn patients with follow up, 57/82 (70%) were alive at last follow up without severe developmental delays. Table 1 depicts the summary of maternal and fetal characteristics stratified by the need for high acuity neonatal care.

Performance of prediction models

Figure 1 depictes the area under the receiver operating characteristic (ROC) curves for the three XGBoost ML models. Prediction models for fetal or neonatal death, high acuity neonatal care and an favourable outcome had AUC’s of 0.75 (range 0.742 to 0.758), 0.82 (range 0.814 to 0.826) and 0.72 (0.717 to 0.723), respectively. Performance metrics obtained from the 5-fold cross-validation were aggregated as presented in Table 2. The ROC curves (Figure 1) and associated AUC values collectively indicate that the three XGBoost models have effectively balanced sensitivity and specificity as displayed in Table 2. This equilibrium between true positive and false positive rates suggests that the inducted models have found a reasonable compromise between correctly identifying positive instances and minimizing false alarms. The ML model for high acuity neonatal care performed better relative to the other two models and may be attributed to the higher prevalence of outcomes (ie. balanced data distribution of the two patient classes).

Figures 2A, 2B and 2C depict the most salient features for each of the prediction models, which were derived utilizing absolute SHAP values. Features informing absence of non-cardiac abnormalities, higher maternal age, nuchal thickness size and indication of no previous births contributed significantly towards the prediction of fetal or neonatal death. High acuity neonatal care was impacted by lower values of Apgar at 1 minute, birth weight, maternal anti-Ro positive antibody, gestational age at birth, and presence of more severe structural CHD, and older maternal age. A favourable outcome was predicted by the absence of genetic abnormalities and higher values of gestational age at birth and a normal right ventricle and/or tricuspid valve .

This pilot study created a ML model to predict fetal or neonatal death, the need for high acuity neonatal care and a favourable clinical outcome in fetuses with CHD. The strongest predictors of outcome were extra-cardiac abnormality, an underlying genetic diagnosis and severity of CHD. The use of the SHAP (SHapley Additive exPlanations) method is a specific ML modality that identifies the variables that contribute to the ML decision-making algorithm in a binary manner, creating a model where the variable can increase or decrease the predicted outcome [24]. This differs from a regression analysis which is a more linear statistical finding and predicts a specific outcome (positive or negative) as opposed to either direction.

This type of ML model has the potential to become more accurate with increased numbers of cases. This would allow a formal, tailored prediction model for families that would give improved prognostication and specific actuarial risks of adverse outcomes. Such a model could expand to include additional relevant outcomes and morbidities, such as the need for repeated surgical or catheter interventions, exercise intolerance, arrhythmias or cardiac failure. A future ML model could include diagnostic testing data such as anatomic ultrasound and fetal echocardiogram images [25-27]. Serial fetal echocardiograms from first trimester to delivery also have the potential to create even more accurate ML models to determine which fetuses will have progressive changes[28, 29].

A favourable outcome was predicted in our model by the variables of fetal right ventricular function and tricuspid valve abnormalities. This has been noted previously as a risk factor for in-utero demise in fetuses with CHD [30, 31] . The physiology of the fetal heart is right ventricle dominant. Given that a fetal heart can only increase heart rate to improve cardiac output unlike the neonate who can increase stroke volume, the interplay between the right heart and fetal well-being is very delicate [32-34]. Abnormal right ventricular function and tricuspid valve abnormalities were not predictors of in utero or neonatal death in our algorithm. However, this may be due to the small size of the study.

This study validated the feasibility of developing ML models for fetuses with CHD but would require a large, multicenter prospective patient database to create a truly functional model for individual patients. Creating a real-time ML model for clinicians would improve accuracy of prenatal counselling both expectant parents and health care providers alike.

LIMITATIONS

The study has the limitations of a retrospective study. First, we had 21 patients who were lost to follow. We would presume, however, that those cases did not have critical cardiac disease, otherwise they most certainly would have been evaluated at the Hospital for Sick Children for surgical or interventional management. Second, the type of cardiac lesions was very heterogeneous. We sought to offset this limitation by categorizing them into mild vs. severe diseases. Finally, the fetal echocardiogram images were not used in the machine learning model for this study, and we only included image interpretation from final reports. This facilitated the analysis as it overcomes potential errors in image recognition. We opted for the 5-fold CV technique to assess the ML models over the train-test split criterion. This choice was driven by two main considerations. First, given our limited patient population, CV allowed for more effective use of the data, resulting in a more thorough evaluation of the ML model's generalization performance. Second, some of the analysis datasets exhibited a significant imbalance in outcomes, and employing CV could help alleviate this imbalance by ensuring that each fold represents a balanced distribution of both patient classes.

Prediction models using ML provide good discrimination of key prenatal and postnatal outcomes among fetuses with CHD. A prospective, multi-center registry to gather more robust data has the potential to provide the clinician with clearer information in order to more accurately counsel families with a fetal diagnosis of CHD.

Congenital heart disease = CHD

Machine Learning = ML

CV = cross-validation

There are no conflicts of interest to disclose.

This research study was not funded.

Oberije, C., et al., A prospective study comparing the predictions of doctors versus models for treatment outcome of lung cancer patients: a step toward individualized care and shared decision making. Radiother Oncol, 2014. 112(1): p. 37-43.
Hatch, S., Uncertainty in medicine. BMJ, 2017. 357: p. j2180.
Donofrio, M.T., et al., Diagnosis and treatment of fetal cardiac disease: a scientific statement from the American Heart Association. Circulation, 2014. 129(21): p. 2183-242.
Pinto, N.M., et al., Prenatal cardiac care: Goals, priorities & gaps in knowledge in fetal cardiovascular disease: Perspectives of the Fetal Heart Society. Prog Pediatr Cardiol, 2020. 59: p. 101312.
Carvalho, J.S., et al., Clinical impact of first and early second trimester fetal echocardiography on high risk pregnancies. Heart, 2004. 90(8): p. 921-6.
Yu, D., L. Sui, and N. Zhang, Performance of First-Trimester Fetal Echocardiography in Diagnosing Fetal Heart Defects: Meta-analysis and Systematic Review. J Ultrasound Med, 2020. 39(3): p. 471-480.
Morgan, C.T., et al., Improving Prenatal Diagnosis of Coarctation of the Aorta. Can J Cardiol, 2019. 35(4): p. 453-461.
Freud, L.R., et al., Fetal aortic valvuloplasty for evolving hypoplastic left heart syndrome: postnatal outcomes of the first 100 patients. Circulation, 2014. 130(8): p. 638-45.
MacColl, C.E., et al., Risk factors associated with in utero demise in fetuses with congenital heart disease: A case/control study. Cardiology in the Young, 2012. 22: p. S74.
Allan, L.D., H.D. Apfel, and B.F. Printz, Outcome after prenatal diagnosis of the hypoplastic left heart syndrome. Heart, 1998. 79(4): p. 371-3.
Sharland, G.K., K.Y. Chan, and L.D. Allan, Coarctation of the aorta: Difficulties in prenatal diagnosis. British Heart Journal, 1994. 71(1): p. 70-75.
Weber, R.W., et al., Foetal echocardiographic assessment of borderline small left ventricles can predict the need for postnatal intervention. Cardiology in the Young, 2013. 23(1): p. 99-107.
O'Brien, S.M., et al., An empirically based tool for analyzing mortality associated with congenital heart surgery. J Thorac Cardiovasc Surg, 2009. 138(5): p. 1139-53.
Brown, D.W., et al., Prenatal diagnosis of congenital heart disease and association with morbidity among survivors of initial palliation for single ventricle heart disease: Analysis of the national pediatric cardiology quality improvement collaborative database. Circulation, 2011. 1).
Bernard, O., et al., Deep Learning Techniques for Automatic MRI Cardiac Multi-Structures Segmentation and Diagnosis: Is the Problem Solved? IEEE Trans Med Imaging, 2018. 37(11): p. 2514-2525.
LeCun, Y., Y. Bengio, and G. Hinton, Deep learning. Nature, 2015. 521(7553): p. 436-44.
Ouyang, D., et al., Video-based AI for beat-to-beat assessment of cardiac function. Nature, 2020. 580(7802): p. 252-256.
Arnaout, R., et al., An ensemble of neural networks provides expert-level prenatal detection of complex congenital heart disease. Nat Med, 2021. 27(5): p. 882-891.
Hoffman, J.I. and S. Kaplan, The incidence of congenital heart disease. J Am Coll Cardiol, 2002. 39(12): p. 1890-900.
Fouad, K.M., et al., Advanced methods for missing values imputation based on similarity learning. PeerJ Comput Sci, 2021. 7: p. e619.
Chen, T. and C. Guestrin, XGBoost: A Scalable Tree Boosting System, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, Association for Computing Machinery: San Francisco, California, USA. p. 785–794.
Bergstra, J. and Y. Bengio, Random search for hyper-parameter optimization. J. Mach. Learn. Res., 2012. 13(null): p. 281–305.
Snoek, J., H. Larochelle, and R.P. Adams, Practical Bayesian Optimization of Machine Learning Algorithms. Advances in neural information processing systems (NIPS), 2012: p. 2951-2959.
Lundberg, S.M., et al., From Local Explanations to Global Understanding with Explainable AI for Trees. Nat Mach Intell, 2020. 2(1): p. 56-67.
Athalye, C. and R. Arnaout, Domain-guided data augmentation for deep learning on medical imaging. PLoS One, 2023. 18(3): p. e0282532.
Athalye, C., et al., Deep learning model for prenatal congenital heart disease (CHD) screening can be applied to retrospective imaging from the community setting, outperforming initial clinical detection in a well-annotated cohort. Ultrasound Obstet Gynecol, 2023.
Truong, V.T., et al., Application of machine learning in screening for congenital heart diseases using fetal echocardiography. Int J Cardiovasc Imaging, 2022.
Gardiner, H.M., First-trimester fetal echocardiography: routine practice or research tool? Ultrasound Obstet Gynecol, 2013. 42(6): p. 611-2.
Zidere, V., et al., Comparison of echocardiographic findings in fetuses at less than 15 weeks' gestation with later cardiac evaluation. Ultrasound Obstet Gynecol, 2013. 42(6): p. 679-86.
MacColl, C.E., et al., Factors associated with in utero demise of fetuses that have underlying cardiac pathologies. Pediatr Cardiol, 2014. 35(8): p. 1403-14.
Jepson, B.M., et al., Pregnancy loss in major fetal congenital heart disease: incidence, risk factors and timing. Ultrasound Obstet Gynecol, 2023. 62(1): p. 75-87.
Rudolph, A.M., Circulatory changes during gestational development of the sheep and human fetus. Pediatr Res, 2018. 84(3): p. 348-351.
Rudolph, A.M. and M.A. Heymann, The fetal circulation. Annu Rev Med, 1968. 19: p. 195-206.
Sun, L., et al., Understanding Fetal Hemodynamics Using Cardiovascular Magnetic Resonance Imaging. Fetal Diagn Ther, 2020. 47(5): p. 354-362.

TABLE 1: Maternal and fetal characteristics stratified by high acuity neonatal care

	N	High acuity neonatal care	N	No high acuity neonatal care	N	All patients	p
Maternal Factors
Maternal age (years)	70	35.1±5.2	65	33.6±4.8	135	34.4±5.1	0.11
Previous livebirths	63	0.8±1.2	60	0.9±1.0	123	0.8±1.1	0.78
Twin pregnancy	68	7 (10.3%)	65	1 (1.5%)	133	8 (6.0%)	0.06
Maternal diabetes (all types)	69	10 (14.5%)	65	3 (4.6%)	134	13 (9.7%)	0.07
Anti-Ro positive antibody	69	4 (5.8%)	64	27 (42.4%)	133	31 (23.3%)	<.001
Fertility treatments (IVF or IUI)	69	11 (19.0%)	65	8 (14.0%)	134	19 (14.2%)	0.62
Fetal Factors
Increased NT (3.5 mm and above)	40	9 (22.5%)	28	4 (14.3%)	68	13 (19.1%)	0.54
Genetic Abnormalities	35	12 (34.3%)	28	7 (25.0%)	63	19 (30.2%)	0.58
Non-cardiac abnormalities	56	15 (26.8%)	55	9 (16.4%)	111	24 (21.6%)	0.25
Fetal growth restriction	60	12 (20.0%)	62	3 (4.8%)	122	15 (12.3%)	0.01
Placental abnormalities	56	8 (14.3%)	55	1 (1.8%)	111	9 (8.1%)	0.03
Severity of structural CHD (major)	60	39 (65.0%)	36	15 (41.7%)	96	54 (56.3%)	0.03
Hoffman cardiac severity > mild	54	19 (35.2%)	31	18 (58.1%)	85	37 (43.5%)	0.07
Cardiac Diagnoses
Right heart disease	70	29 (41.4%)	65	17 (26.2%)	135	46 (34.1%)	0.07
Left heart disease	70	9 (12.9%)	65	0 (0%)	135	9 (6.7%)	0.003
Atrioventricular septal defect	69	7 (10.1%)	64	1 (1.6%)	133	8 (6.0%)	0.06
Right ventricle hypoplasia	69	17 (24.6%)	64	8 (12.5%)	133	25 (18.8%)	0.08
Abnormal right ventricle function	69	9 (13%)	64	1 (1.6%)	133	10 (7.5%)	0.02
Left ventricle hypoplasia	70	13 (18.6%)	65	4 (6.2%)	135	17 (12.6%)	0.04
Abnormal tricuspid valve	69	14 (20.3%)	64	7 (10.9%)	133	21 (15.8%)	0.16
Abnormal pulmonary valve	70	7 (10.0%)	64	3 (4.7%)	134	10 (7.5%)	0.33
Abnormal aortic valve	70	6 (8.6%)	65	1 (1.5%)	135	7 (5.2%)	0.12
Abnormal mitral valve	69	4 (5.8%)	65	2 (3.1%)	134	6 (4.5%)	0.68
Pericardial Effusion	70	10 (14.3%)	65	3 (4.6%)	135	13 (9.6%)	0.08
Postnatal Outcome
Sex (male)	63	28 (44.4%)	57	27 (47.4%)	120	55 (45.8%)	0.85
Gestational age at birth (weeks)	69	35.3±5.0	57	38.0±2.2	126	36.5±4.2	<.001
Birth weight (kg)	60	2.4±1.0	54	3.0±0.5	114	2.7±0.9	<.001
Apgar 1 min	50	7.1±2.4	44	8.6±1.0	94	7.8±2.0	<.001
Apgar 5 min	50	8.2±1.7	44	8.9±0.4	94	8.5±1.3	0.01
Cardiac intervention (postnatal)	64	27 (42.2%)	45	3 (6.7%)	109	30 (27.5%)	<.001

SLE, systemic lupus erythematosus; NT, nuchal thickness; GU= genitourinary; ; VSD, ventricular septal defect; SVD, spontaneous vaginal delivery; IVF, in vitro fertilization; IUI, in utero insemination

Table 2: Evaluation of the prediction models for fetal or neonatal death, high acuity neonatal care and favourable outcome algorithms. PPV and NPV represent positive and negative predictive values, AUC represent the area under the receiver-operating characteristic curve.

Outcome	Number of patients	Outcome prevalence	Number of features in the model	AUC	Sensitivity	Specificity	PPV	NPV
Perinatal death	150	0.90	62	0.75	0.63	0.67	0.94	0.20
High acuity neonatal care	135	0.52	69	0.80	0.73	0.75	0.76	0.72
Favourable outcome	87	0.31	70	0.72	0.63	0.57	0.40	0.77

No competing interests reported.

AppendixA.docx

Machine learning to predict outcomes of fetal cardiac disease: a pilot study

Status:

Journal Publication

Version 1

Abstract

Figures

Introduction

Methods

Results

Discussion

Conclusion

Abbreviations

Declarations

References

Tables

Additional Declarations

Supplementary Files

Status:

Journal Publication

Version 1