Explainable machine learning framework for dynamic monitoring of disease prognostic risk

doi:10.21203/rs.3.rs-4549551/v1

Download PDF

Research Article

Explainable machine learning framework for dynamic monitoring of disease prognostic risk

https://doi.org/10.21203/rs.3.rs-4549551/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Patients’ conditions continue to change after the diagnosis, with each patient showing a different time course. Here, we propose a dynamic prognostic risk assessment framework based on longitudinal data during hospitalization, using coronavirus disease (COVID-19) as an example. We extracted electronic medical records of 382 COVID-19 cases treated at Tokyo Shinagawa Hospital between 27 January and 30 September 2020. Gradient boosting decision trees were used to predict the maximum clinical deterioration, including deaths, from the data at initial diagnosis. Random survival forests were then used to calculate a 7-day cumulative hazard function to dynamically assess the risk of mortality of patients on each day during hospitalization. SurvSHAP(t) was applied to provide a time-dependent explanation of the contribution of each variable to the prediction. The prediction at initial diagnosis agreed well with the actual severity (area under the receiver operating characteristic curves = 0.717–0.970), but some cases showed discrepancies between actual and predicted prognosis. The dynamic mortality risk assessment during hospitalization could discriminate between deceased and surviving patients 1–2 weeks before the outcome. Early in hospitalization, C-reactive protein (CRP) was an important risk factor for mortality, while in the middle period peripheral oxygen saturation (SpO₂) increased its importance and platelets and β-D-glucan were the main risk factors immediately before death. Dynamic risk assessment considering heterogeneous risk factors and time-to-event is useful for the early detection of patients who deteriorate rapidly after hospitalization. This framework provides healthcare professionals with the explainable real-time guidance for clinical decision-making during hospitalization.

COVID-19

initial risk classification

dynamic risk assessment

machine learning

survival analysis

During hospitalization, various adverse events and complications arise from the disease and treatment and sometimes lead to death [1]. Therefore, healthcare professionals take great care not to overlook signs of deterioration based on blood tests and vital sign measurements [2,3]. Due to an increase in hospital admissions from Coronavirus disease (COVID-19) and the aging population, healthcare professionals, who are responsible for numerous critically ill patients, face an intensified burden [4,5]. Predicting patient outcomes and timely issuing alerts can significantly reduce the burden on healthcare professionals and allow for the advanced preparation of optimal medical resources. However, the heterogeneity of risk factors and signs of deterioration makes uniform risk assessment challenging, even for patients with the same disease [6]. Taking COVID-19 as an example, various causes of death have been reported, including pneumonia, organ failures such as heart and renal failure, and exacerbation of pre-existing diseases [7]. To promptly detect the various types of deterioration that may lead to death, it is desirable to develop a personalized monitoring method that tracks changes in multiple prognostic biomarkers over time.

Recent reports indicate that machine learning, utilizing electronic health records, can accurately predict individual deterioration and mortality, thereby aiding healthcare professionals in clinical decision-making [8,9]. In particular, deep learning models, applied to extensive longitudinal intensive care unit (ICU) data, have demonstrated the ability to dynamically assess the risks of mortality and acute kidney injury [10,11]. While predictive models using machine learning and deep learning can handle large volumes of heterogeneous clinical information effectively, their “black box” nature — which obscures how decisions are made — restricts their acceptance in clinical practice [12]. Recent algorithms that explain and visualize the rationale behind individual predictions are likely to facilitate healthcare professionals’ acceptance of machine learning models [13,14]. Another important aspect of the predictive model for deterioration and mortality is the time horizon of the prediction. There is a considerable time lag between when the data is acquired and when patients deteriorate or die, compounded by the individual variability in each patient’s disease course [15]. Establishing an appropriate time horizon of the prediction is crucial for detecting serious events early and implementing preemptive interventions [10].

In this study, we first confirm the predictability of prognosis at the initial diagnosis with a machine learning model to predict the highest severity level a COVID-19 patient might reach, based solely on data from the initial diagnosis. We then propose a dynamic prognostic risk assessment framework that continuously evaluates hospitalized patients using their longitudinal data (Fig. 1). In the dynamic prognostic risk assessment, we implement machine learning survival analysis to account for the time lag between data acquisition and events. Survival analysis is a statistical framework for analyzing the duration until an event such as death or recurrence occurs [16]. Recently, several methods have been proposed to integrate machine learning into survival analysis that can consider the heterogeneity of a population and the interactions between variables [17-19]. We introduce Random Survival Forests (RSF), a survival analysis using random forest, to achieve both predictive performance and explainability. We further apply SurvSHAP(t) [20], an extension of SHapley Additive exPlanations, to our machine learning survival model to extract and visualize the factors impacting individual risk for mortality.

We propose a machine learning framework that provides explainable, dynamic prognostic assessments during hospitalization. This framework is designed to enhance healthcare professionals’ understanding of patient conditions, enabling precise treatment allocation and optimal use of medical resources.

2.1 Study design, setting, and population

This retrospective cohort study screened 382 patients with COVID-19 treated between January 2020 and September 2020 at the Department of Respiratory Medicine, Tokyo Shinagawa Hospital. Patients aged <18 years and those with unknown severity changes during hospitalization or clinical outcomes were excluded from the analysis. This included cases in which the prognosis was unknown because of hospital transfer, observation censoring, or patients for whom the point at which severity changed could not be sufficiently extracted from the records. Inpatients with a discrepancy of >4 days between the date of the first diagnosis and the date of admission were excluded from the prognostic screening but were included in the dynamic risk assessment. The included cases were randomly divided into training and validation datasets at a 2:1 ratio using the stratified sampling method to preserve the distribution of the severity outcomes. Data splitting was common in all the analyses.

For risk classification at initial diagnosis, patients who had severity changes were included in the dataset. Overall, 201 outpatients and 161 inpatients were included. We extracted data from the electronic medical records, including 84 variables, such as symptoms, background information, and blood and urine biomarkers, for prognostic screening at initial diagnosis (Table S1).

For the dynamic prognostic risk assessment framework, the dataset included 182 inpatients with longitudinal data collected during hospitalization. From the electronic medical records, 72 variables, including blood and urine biomarkers, vital information, and background information during hospitalization, were extracted for dynamic risk assessment (Table S2). The median length of stay for inpatients was 11 days.

2.2 Severity classification

We adopted the oxygen-support status scores proposed by Grein et al. [21] to classify patient severity. The score has an ordinal scale of 1–6 based on the type of oxygenation level: 1, discharged or not hospitalized; 2, room air; 3 low-flow oxygen support; 4, non-invasive intervention, including nasal high-flow oxygen therapy and/or noninvasive positive pressure ventilation; 5, invasive mechanical ventilation and/or ECMO; and 6, death. In addition to the original score, where 1 indicated discharge status, outpatients were also given a score of 1.

2.3 Machine learning for prognostic classification at initial diagnosis

Gradient boosting decision trees (GBDT) [22,23] were used to predict the highest severity from data at initial diagnosis. GBDT is a relative of random forest and uses boosting technology to improve the prediction performance. Among various GBDT implementations, the Python package LightGBM was used because of its superior computing speed. We used all variables with missing values of <15 %, given that COVID-19 severity and mortality are affected by various factors. Although there were 265 biomarkers at the first visit, most were not measured in non-hospitalized patients. We selected the tests performed in more than half of the non-hospitalized patients, resulting in 53 biomarkers obtained from 4 days before to 1 day after the initial diagnosis. Although missing values in the training data were imputed before learning to reduce bias, prediction was performed for the validation data containing missing values. Missing values were imputed using the missForest package in R (The R Project for Statistical Computing) [24].As a pre-processing step, variables with absolute Spearman correlation coefficients of ≥0.85 in the training data were excluded, leaving only those with the strongest correlation with severity. Hyperparameter tuning by Bayesian optimization with cross-validation using the Optuna framework and training with the optimal parameters were repeated 100 times. The ensemble average of the models obtained from the iterations was used for the final prediction. To evaluate the importance of variables in the prediction, we used the Shapley Additive exPlanations (SHAP) values [13] for the validation data.

2.4 Dynamic risk assessment of mortality during hospitalization

We used random survival forests (RSFs) [17] to evaluate patient mortality risk during hospitalization. The RSF calculates the final hazard function as an ensemble of hazard functions estimated by the survival trees. RSF like random forest is robust to outliers and enables an accurate risk assessment of event occurrence. To dynamically predict mortality risk, 72 variables were used in the analysis, including background factors, e.g., age and body mass index (BMI), as well as biomarkers and vital information measured over time during hospitalization. We used variables that were measured in most patients (>180 patients for biomarkers and >150 patients for background and vital information). The missing values were imputed using the most recent values until the next values were measured. If there were no values before imputation, the median values for numerical values and mode values for categorical values measured for all patients were applied. For BMI, height, and weight, missing values with no previous measurements were imputed as the median values for men and women. The Python package scikit-survival containing the RSF implementation was used for the analysis. To evaluate the dynamic risk of mortality, we used the 7-day cumulative hazard function (CHF; i.e., the estimated probability of death within 7 days) calculated using the RSF as an index to evaluate mortality risk. The 7-day CHF was calculated for each patient on each day of hospitalization.

In clinical practice, it is important to identify the factors that increase the risk of mortality. The variable importance of the RSF models was calculated for the validation data using SurvSHAP(t) [20], which provides a time-dependent explanation for the survival function predicted using SHAP. SurvSHAP(t) allowed us to determine how each variable affects the survival function at each time point. We obtained the mean aggregated SurvSHAP(t) for all days during hospitalization, which showed how the variable works to predict each patient at every time point.

3.1 Patient demographics

The median age of the 382 patients with COVID-19 was 39 years; 233 were male and 149 were female (Table 1). Of the 51 inpatients who required oxygen, 30 required low-flow oxygen, three needed high-flow oxygen, eight were treated with invasive ventilation/ECMO, and 10 died.

3.2 Prognostic predictability of COVID-19 at initial diagnosis

To evaluate the predictability of COVID-19 prognosis at initial diagnosis, we constructed prognostic models according to the following information available at initial diagnosis: symptoms, background information, and biomarkers. We predicted the maximum level of severity [21] that patients would reach. The AUC was 0.717 for predicting whether the severity would be ≥2, 0.878 for ≥3, 0.951 for ≥4, 0.952 for ≥5, and 0.970 for 6 (Fig. 2a). When we examined the concordance between the predicted probability and the actual severity outcome, we found that most patients with an actual severity of ≥3 had a high probability of have a severity of ≥2, but patients with an actual severity of 1 or 2 showed a wide distribution from low to high predicted probability, indicating that there is no clear distinction between them (Fig. 2b, top panel). For the other predictions, the predicted probabilities and actual severity showed good concordance; however, there were some cases where the risk of severity was not properly determined, e.g., those that ended up with a mild disease but had a high probability of being severe (Fig. 2b).

Subsequently, we evaluated the variable importance of each prediction. The presence of pneumonia was identified as the most important predictor of whether severity would be ≥2 (Fig. 2c, leftmost panel). Most patients with a severity of ≥2 who were hospitalized showed pneumonia at initial diagnosis, and all patients with a severity of ≥3 showed pneumonia (Fig. 2d). Conversely, as the target severity level of prediction increased, the importance of symptoms decreased; yet, biomarkers such as the lymphocyte count, prothrombin time (PT), and C-reactive protein (CRP), creatinine, and amylase levels were the top predictors (Fig. 2c). BMI was also an important factor in predicting whether the severity was ≥4 or ≥5. Age was an important factor for all severity levels. Among the important predictors, we examined the distribution of age and biomarkers according to the actual severity prognosis and found that age and BMI were higher in patients with a severity of ≥3 than in others, and biomarkers such as PT, CRP, creatinine, red blood cell volume distribution width, and blood glucose values were higher in patients with a severity ≥4 than in others (Fig. 2e). Lymphocyte counts, estimated glomerular filtration rate (eGFR)-creatinine, and albumin values were lower in more severe cases than in less severe cases, and platelet counts were specifically low in mortality cases (Fig. 2e). The amylase level showed a peculiar distribution, being low in patients with severity levels of 4, 5, and 6 and very high in some patients with a severity level of 6 (Fig. 2e).

3.3 Dynamic mortality risk assessment based on longitudinal data during hospitalization

Outcome screening based on information from the initial diagnosis was accurate but incomplete. Therefore, the patient's prognosis is not entirely determined at the initial diagnosis, and there is room for change in prognosis depending on the disease course and treatment after admission. When we investigated the changes in severity status after admission, we found that oxygen administration and noninvasive ventilation were initiated within 5 days after admission, whereas invasive ventilation was often introduced >5 days after admission; death occurred >20 days after admission (Fig. 3a). Additionally, many deaths occurred without invasive ventilation (Fig. 3a, top panel). This is because most mortality cases involved elderly individuals who were not eligible for invasive ventilation or ECMO even when their condition deteriorated. These observations suggest that during the few weeks between admission and death or discharge, patients undergo changes in their condition that cannot be fully assessed based on their oxygen-support status.

Then, we used RSF [17] to evaluate the mortality risk of the patients during hospitalization. For all four mortality cases in the validation dataset, an increase in mortality risk was observed approximately 1–2 weeks before the outcome, and cumulative hazard function (CHF; i.e., the estimated probability of death within 7 days) reached approximately >0.3 at the time of death (Fig. 3b). Conversely, in patients recovering from invasive or noninvasive ventilation, the CHF increased at approximately 1 week after admission, as in the mortality cases, but then decreased, and the CHF rarely exceeded 0.2. In mild cases that progressed with oxygen administration or room air, there was little increase in CHF. Similar changes in CHF were observed in the training dataset, which included six deaths, suggesting the generality of the predictive model between the training and validation datasets (Fig. 3c).

The average contribution of the variables to the RSF prediction for each patient was then evaluated using the mean aggregated SurvSHAP(t). In addition to background factors, such as age and BMI, blood test items that were measured multiple times during hospitalization, such as platelets, amylase, and β-D-glucan, had the highest importance in mortality prediction (Fig. 3d). Age, amylase level, and platelet count were identified as important predictors of mortality at initial diagnosis. Most of the important variables showed high contributions only in mortality cases; however, some, such as age and BMI, showed a high mean aggregated SurvSHAP(t) in milder cases, suggesting that they are nonspecific factors. Conversely, important blood test items such as β-D-glucan, platelets, and calcium (Ca) also contributed less in some mortality cases, suggesting heterogeneity of deterioration. In the training data, these predictors were also of high contribution, especially in mortality cases, although there was some shuffling of the rankings (Fig. 3e).

3.4 Explanation of the rationale for estimated mortality risk in severe cases

For the mortality risk assessment of COVID-19, a machine-learning model based on blood markers was proposed in Nature Machine Intelligence (NMI) 2020 [25]. This model was built using samples immediately prior to death or discharge, and lactate dehydrogenase (LDH), lymphocytes, and CRP were identified as key features. Mortality risk assessment using a decision tree based on these three features has been proposed in the study to facilitate clinical application (Fig. 4a). After about 10 days of hospitalization, the RSF and NMI models showed equivalent performance as measured by accuracy and F1-score. However, immediately following hospitalization, the RSF model outperformed the NMI model, indicating its superiority in early prognostic prediction (Fig. 4b and 4c).

To examine the factors contributing to mortality risk over time, we calculated SurvSHAP(t) daily for the severe cases included in the validation dataset. The patients and times had different combinations of factors associated with CHF changes (Fig. 4d). For example, in young patients #2 and #6, saturation of peripheral oxygen (SpO₂) was associated with an increase in CHF immediately after admission. However, the contribution of SpO₂ decreased with the application of ECMO/invasive ventilation. In contrast, in patients #3 and #4, who were elderly and not eligible for ECMO/invasive ventilation, the contribution of SpO₂ was higher >10 days after hospitalization. β-D-glucan had a higher contribution 1–2 weeks before death than at any other time in patients #3, #4, and #6. Platelets showed a high contribution in patients #4 and #6, and BMI also contributed to the mortality risk in these patients. Of note, in patient #4, BMI was measured for the first time on day 7, so there was a rapid increase in the contribution of BMI on day 7. Ca and blood amylase levels were elevated immediately before death, suggesting electrolyte abnormalities and multi-organ failure. CRP, which is also a major predictor of mortality at initial diagnosis, contributed immediately after hospitalization.

In this study, we developed a framework to assess the prognostic risks of individual patients with COVID-19. Although we were able to predict the prognosis of COVID-19 with reasonable accuracy based on symptoms and blood test results at initial diagnosis, the predicted risk and actual prognosis deviated for some patients, suggesting that the prognostic risk may change during hospitalization. Our proposed RSF model, which updates predictions at 1-day intervals using longitudinal data during hospitalization, continued to show high performance with an F1-score of >0.95 immediately after hospitalization. Visualization of predictors using SurvSHAP(t) allowed the model to be explainable for each patient, and the predictors were common in both training and test data. Such a dynamic prognostic framework would provide healthcare providers with the real-time guidance they need to make clinical decisions during hospitalization.

Partial predictability of the prognostic model at initial diagnosis can be explained by key variables. CRP levels are important for identifying moderate and severe cases that may require oxygen administration. There was a clear difference in CRP levels between severity levels 3 and 4, but there was little difference between severity levels of ≥4 (Fig. 2e). The same was true for eGFR-creatinine and lymphocyte values, with a clear difference between severity levels 3 and 4 but little difference between higher severities. This suggests that inflammation reflected by CRP and lymphocytes and renal dysfunction reflected by eGFR-creatinine are entry points for severe disease, but further progression of severity and mortality are mainly influenced by other factors. Platelet and amylase levels are the most likely factors associated with the progression of disease severity and mortality, given their importance in prognostic models during hospitalization. Platelets were characteristically lower in a few mortality cases than in other severe cases, and amylase levels were very high in some mortal cases (Fig. 2e). Mortal cases characterized by these factors may be cases in which signs of mortality are already present at initial diagnosis. However, it should be noted that some deaths did not show signs at initial diagnosis and are difficult to distinguish from severe or even mild cases.

Several RSF predictors that varied early in the case of death were identified. Platelet and PT values were among the top 10 early predictors of the RSF model for COVID-19 mortality. Patients with COVID-19 are prone to arterial and venous thrombosis, and in severe cases, disseminated intravascular coagulation (DIC) can complicate their condition and lead to death [26]. DIC was observed in only 0.6 % of COVID-19 survivors, whereas it occurred frequently in 71.4 % of mortality cases [27]. Our results support the idea that coagulation abnormalities, which may lead to DIC, are early predictors of COVID-19 mortality. Amylase has also been identified as an important predictor of the dynamic mortality risk. Amylase is mainly secreted by the pancreas and salivary glands. Previous studies have reported elevated blood amylase levels in patients with severe COVID-19 [28]. Blood amylase levels are regulated by the balance between amylase production and clearance, and elevated blood amylase levels indicate damage to the producing tissues or kidneys related to clearance. Damage to these tissues may be a key event in the death of patients with COVID-19. Changes in blood cell composition were included in the RSF predictors, but they were not specific for patients who died (Fig. 4). Platelets form platelet-neutrophil complexes when activated, triggering neutrophil release via neutrophil-extracellular traps (NETs) [29,30]. Neutrophilia is not an independent risk factor for mortality but may act as a predisposing factor for DIC, leading to death. β-D-glucan is not only an indicator of fungal infection but also an indicator of sepsis-related gastrointestinal leakage [31]. Even if there is no fungemia or bacteremia, blood β-D-glucan levels during COVID-19 infection induce NETs and may be associated with hypercytokinemia and severe inflammation [32]. By integrating these dynamically changing risk factors using RSF, we could accurately assess how close each patient was to death at each time point.

Previous study has used the approach of building models using data immediately before outcomes such as death and hospital discharge to predict prognosis, and then applying them to other time points [25]. Although this approach can identify factors directly associated with mortality and has good predictive accuracy immediately before the outcome, it has not performed well in early prognostic prediction. Even in a single disease, COVID-19, at least 10 factors are associated with death, and their importance changes over time, with CRP and SpO₂ being more important in the relatively early phases, while β-D-glucan, platelets, and calcium are more important just before death. In COVID-19, viral proliferation is thought to be the primary pathogenesis in the first few days after onset, while inflammatory response and coagulation abnormalities due to host immunity are thought to be the main pathogenesis after one week after onset [33], requiring a decision to apply antiviral drugs and neutralizing antibodies in the early onset phase and steroids and anticoagulation therapy after the first week [34,35]. For clinical decision making according to the phase of hospitalization, it is necessary to present the risk of mortality with different rationales for each patient and phase.

Our prognostic framework can be used for various aspects of disease prevention and treatment. The early prognostic prediction just after hospitalization, which screens for potentially severe cases with high sensitivity, will be applicable to estimate the number of hospital beds and the amount of medical equipment required. Additionally, owing to the recent shortage of hospital beds, the number of patients treated at home is increasing, and cases of sudden aggravation or death during home treatment have become a cause of concern. By using the RSF model to identify early signs of death, it is possible to reduce the number of deaths during home treatment and ensure safer home treatment. Coagulation factors, key predictors of the RSF model of mortality, are potent targets for early intervention. Heparin administration reportedly improves the prognosis of severe DIC in patients with COVID-19 [36]. The RSF model may be used to determine the appropriate timing of anticoagulation therapy and evaluate treatment effects over time.

This study had several limitations. First, the study cohort was small, and the data were obtained from a single hospital. Because machine learning is affected by the data size, the model will require updates as we expand this study to multiple centers and increase the sample size. Second, we included only patients with COVID-19 diagnosed in 2020. Therefore, it is necessary to examine whether these models can be applied to the newer COVID-19 strains and other infectious diseases. Yet, these data were obtained at a time when vaccination was rarely practiced and can be considered valuable data unaffected by vaccination. Third, the study also considers the daily records of the same patient as independent data for training, which is a technical challenge for RSF. Considering the risk in the previous day may allow for more accurate predictions.

We proposed an explainable dynamic prognostic framework for COVID-19 that updates the risk of mortality on a day-by-day basis and provides healthcare professionals with real-time guidance for clinical decision-making. Implementing this framework in clinical practice will require more extensive validation to ensure its general applicability, but it has the potential to enable early identification of patients at risk of death or deterioration.

GBDT

Gradient Boosting Decision Trees

SHAP

Shapley Additive exPlanations

RSF

Random Survival Forests

SurvSHAP(t)

SHapley Additive exPlanations for survival data

CHF

cumulative hazard function

ECMO

extracorporeal membrane oxygenation

Conference Presentation:

This study was presented as a poster at the 6^th Workshop on Virus Dynamics in July 2023.

Acknowledgments:

I would like to thank Elsevier Language Editing services for editing to eliminate possible grammatical or spelling errors and to conform to correct scientific English. This study was funded by the Japan Society for the Promotion of Science KAKENHI grants (JP20K21837 and JP21K02356 to T.I.), Japan Science and Technology Agency (JST) Moonshot R&D Grant (JPMJMS2025 to E.K.), JST CREST Grant (JPMJCR20H4 to E.K.), and Japan Agency for Medical Research and Development grants (JP21wm0325007, JP20fk0108412, JP20fk0108413, and JP21gm5010003 to E.K.). The funder played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.

Funding:

This study was funded by the Japan Society for the Promotion of Science KAKENHI grants (JP20K21837 and JP21K02356 to T.I.), Japan Science and Technology Agency (JST) Moonshot R&D Grant (JPMJMS2025 to E.K.), JST CREST Grant (JPMJCR20H4 to E.K.), and Japan Agency for Medical Research and Development grants (JP21wm0325007, JP20fk0108412, JP20fk0108413, 243fa627003h0003 and JP21gm5010003 to E.K.). The funder played no role in study design, data collection, analysis and interpretation of data, or the writing of this manuscript.

Conflict of Interest Statement:

All authors declare no financial or non-financial competing interests.

Data Availability:

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Code availability:

The underlying code for this study and training/validation datasets is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.

Author Contributions:

T.I., M. Shinoda, K.S., E.K., and M. Shinkai designed the study. M. Shinoda, K.A., S.O., K.K., and M.S. collected the data. T.I., M.O., and E.K. participated in the data analysis. All authors interpreted the data, participated in the writing and critical review of the manuscript, and approved the final version.

Ethics Approval:

This study was approved by the local institutional review board of RIKEN and Tokyo Shinagawa Hospital (approval number: 20-A-06). This study was conducted in accordance with the 1964 Declaration of Helsinki and its later amendments, or comparable ethical standards. The requirement for informed consent was waived because of the retrospective nature of this study.

Forster, A. J., Murff, H. J., Peterson, J. F., Gandhi, T. K. & Bates, D. W. The incidence and severity of adverse events affecting patients after discharge from the hospital. Ann. Intern. Med. 138, 161–167 (2003).
Odell, M., Victor, C. & Oliver, D. Nurses’ role in detecting deterioration in ward patients: systematic literature review. J. Adv. Nurs. 65, 1992–2006 (2009).
Mok, W. Q., Wang, W. & Liaw, S. Y. Vital signs monitoring to detect patient deterioration: An integrative literature review. Int. J. Nurs. Pract. 21 Suppl 2, 91–98 (2015).
Shang, Y. et al. Management of critically ill patients with COVID-19 in ICU: statement from front-line intensive care experts in Wuhan, China. Ann. Intensive Care 10, 73 (2020).
Adhikari, N. K. J., Fowler, R. A., Bhagwanjee, S. & Rubenfeld, G. D. Critical care and the global burden of critical illness in adults. Lancet 376, 1339–1346 (2010).
Perotte, A., Ranganath, R., Hirsch, J. S., Blei, D. & Elhadad, N. Risk prediction for chronic kidney disease progression using heterogeneous electronic health record data and time series analysis. J. Am. Med. Inform. Assoc. 22, 872–880 (2015).
Renu, K., Prasanna, P. L. & Gopalakrishnan, A. V. Coronaviruses pathogenesis, comorbidities and multi-organ damage–A review. Life Sci. 255, 117839 (2020).
Nielsen, A. B. et al. Survival prediction in intensive-care units based on aggregation of long-term disease history and acute physiology: a retrospective study of the Danish National Patient Registry and electronic patient records. Lancet Digit Health 1, e78–e89 (2019).
Johnson, A. E. W. et al. Machine Learning and Decision Support in Critical Care. Proc. IEEE Inst. Electr. Electron. Eng. 104, 444–466 (2016).
Tomašev, N. et al. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572, 116–119 (2019).
Thorsen-Meyer, H.-C. et al. Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records. Lancet Digit Health 2, e179–e191 (2020).
Quinn, T. P., Jacobs, S., Senadeera, M., Le, V. & Coghlan, S. The three ghosts of medical AI: Can the black-box present deliver? Artif. Intell. Med. 124, 102158 (2022).
Lundberg, S. & Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv [cs.AI] (2017).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why Should I Trust You?’: Explaining the Predictions of Any Classifier. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1135–1144 (Association for Computing Machinery, New York, NY, USA, 2016).
Pereira, N. L. et al. COVID-19: Understanding inter-individual variability and implications for precision medicine. Mayo Clin. Proc. 96, 446–463 (2021).
Kleinbaum, D. G. & Klein, M. Survival Analysis: A Self-Learning Text, Third Edition. (Springer New York, 2011).
Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. aoas 2, 841–860 (2008).
Lee, C., Zame, W., Yoon, J. & van der Schaar, M. DeepHit: A Deep Learning Approach to Survival Analysis With Competing Risks. AAAI 32, (2018).
Lee, C., Yoon, J. & Schaar, M. van der. Dynamic-DeepHit: A Deep Learning Approach for Dynamic Survival Analysis With Competing Risks Based on Longitudinal Data. IEEE Trans. Biomed. Eng. 67, 122–133 (2020).
Krzyziński, M., Spytek, M., Baniecki, H. & Biecek, P. SurvSHAP(t): Time-dependent explanations of machine learning survival models. Knowledge-Based Systems 262, 110234 (2023).
Grein, J. et al. Compassionate Use of Remdesivir for Patients with Severe Covid-19. N. Engl. J. Med. 382, 2327–2336 (2020).
Ke, G. et al. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. in Advances in Neural Information Processing Systems (eds. Guyon, I. et al.) vol. 30 (Curran Associates, Inc., 2017).
Friedman, J. H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 29, 1189–1232 (2001).
Stekhoven, D. J. missForest: Nonparametric missing value imputation using random forest. Astrophysics Source Code Library ascl:1505.011 Preprint at https://ui.adsabs.harvard.edu/abs/2015ascl.soft05011S (2015).
Yan, L. et al. An interpretable mortality prediction model for COVID-19 patients. Nature Machine Intelligence 2, 283–288 (2020).
Fraissé, M. et al. Thrombotic and hemorrhagic events in critically ill COVID-19 patients: a French monocenter retrospective study. Crit. Care 24, 275 (2020).
Tang, N., Li, D., Wang, X. & Sun, Z. Abnormal coagulation parameters are associated with poor prognosis in patients with novel coronavirus pneumonia. Journal of Thrombosis and Haemostasis vol. 18 844–847 Preprint at https://doi.org/10.1111/jth.14768 (2020).
Liu, F. et al. ACE2 Expression in Pancreas May Cause Pancreatic Damage After SARS-CoV-2 Infection. Clin. Gastroenterol. Hepatol. 18, 2128–2130.e2 (2020).
Guo, L. & Rondina, M. T. The Era of Thromboinflammation: Platelets Are Dynamic Sensors and Effector Cells During Infectious Diseases. Front. Immunol. 10, 2204 (2019).
Kimball, A. S., Obi, A. T., Diaz, J. A. & Henke, P. K. The Emerging Role of NETs in Venous Thrombosis and Immunothrombosis. Front. Immunol. 7, 236 (2016).
Leelahavanichkul, A. et al. Gastrointestinal Leakage Detected by Serum (1→3)-β-D-Glucan in Mouse Models and a Pilot Study in Patients with Sepsis. Shock 46, 506–518 (2016).
Saithong, S. et al. Neutrophil Extracellular Traps in Severe SARS-CoV-2 Infection: A Possible Impact of LPS and (1→3)-β-D-glucan in Blood from Gut Translocation. Cells 11, (2022).
Siddiqi, H. K. & Mehra, M. R. COVID-19 illness in native and immunosuppressed states: A clinical-therapeutic staging proposal. The Journal of heart and lung transplantation: the official publication of the International Society for Heart Transplantation vol. 39 405–407 (2020).
Gandhi, R. T., Lynch, J. B. & Del Rio, C. Mild or Moderate Covid-19. N. Engl. J. Med. 383, 1757–1766 (2020).
Murakami, N. et al. Therapeutic advances in COVID-19. Nat. Rev. Nephrol. 19, 38–52 (2023).
Tang, N. et al. Anticoagulant treatment is associated with decreased mortality in severe coronavirus disease 2019 patients with coagulopathy. J. Thromb. Haemost. 18, 1094–1099 (2020).

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Explainable machine learning framework for dynamic monitoring of disease prognostic risk

Status:

Version 1

Abstract

Figures

1. Introduction

2. Patients and methods

3. Results

4. Discussion

5. Conclusion

Abbreviations

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1