Artificial Intelligence to predict mortality in COVID-19 hemodialysis patients relying on demographics and clinical data

doi:10.21203/rs.3.rs-1574373/v1

Download PDF

Research Article

Artificial Intelligence to predict mortality in COVID-19 hemodialysis patients relying on demographics and clinical data

https://doi.org/10.21203/rs.3.rs-1574373/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Background: Hemodialysis patients represent a major proportion of the chronically ill population worldwide with an increased risk for severe forms of COVID-19.

Methods: In this multicentre retrospective cohort study, we used the Brazilian Society of Nephrology's database. The study covered two steps: a training-test subset and a temporal validation subset. To predict mortality in COVID-19 hemodialysis patients, were evaluated: Random Forest (RF), Support Vector Machine, and TabNet algorithms. Performance was compared using the area under the receiver operating characteristic curve (AUROC).

Results: The overall mortality was 21.46%. The prediction models performed with an AUROC of at least 0.69. In the model containing the six most important variables according to the mRMR algorithm to predict 90-days mortality, the RF model showed an AUROC of 0.80, a sensitivity of 0.62, and a specificity of 0.79.

Conclusion: A temporal large-scale validation demostrated a good performance in predicting mortality and potential for identifying patients at risk of death.

COVID-19

Hemodialysis

Hemodialyses

Chronic Kidney Failure

Machine Learning

Deep Learning

In December 2019, a novel coronavirus named severe acute respiratory syndrome (SARS) coronavirus 2 (SARS-CoV-2) emerged in Wuhan, China, and spread rapidly around the globe. By March 2020, the World Health Organization (WHO) has declared the disease caused by this coronavirus outbreak (COVID-19) a pandemic [1]. The clinical presentation of the infection by the SARS-CoV-2 ranges from asymptomatic to critical disease with multi-organ failure and death.

The prevalence of chronic kidney disease (CKD) is about 9.1% worldwide. The disease, which is frequently associated with comorbidities and high mortality risk, affecting 687.5 million patients globally - in 2017 estimative - representing a substantial global economic burden [2–3]. The Brazilian Dialysis Census estimated 139.691 patients on a chronic dialysis program by July 2019, ranking the country as third in the number of patients on kidney replacement therapy by dialysis in the world [4].

Patients undergoing hemodialysis (HD) have an abnormal immune response to viral infections due to their uremic state [5–6]. Also, they often present major comorbidities such as cardiovascular disease, diabetes, and cerebrovascular disease [7–10], and are at risk of outbreaks or cross-contamination in the dialysis centers [11–12]. Accordingly, they have an increased risk for the severe forms of COVID-19 infection and a high COVID-19 related mortality [13–15].

Machine Learning (ML) algorithms have been used over the last few decades to predict and classify all sorts of data, including medical. As they get access to new data, they adapt internal parameters to improve their performance, hence the “learning”. When faced with unique challenges, ML algorithms can aid physicians in predicting mortality, by analyzing massive quantities of data quickly and with minimal human intervention. Additionally, in the context of risk stratification, ML models generate better performance than the traditional prediction models.

Our research team wondered if it was possible to predict mortality in COVID-19 HD patients without the need for laboratory tests or invasive physiological data, using ML models. The present study provides an unprecedented attempt to develop and test the capabilities of ML models to predict mortality in patients with COVID-19 on HD relying only on demographics/comorbidities and clinical findings.

Data source and preparation

This is a multicenter observational retrospective cohort study that resorted to different ML algorithms to predict mortality in COVID-19 HD patients.

To properly develop and employ ML models, the data used for training-test should reflect as much as possible the reality that the model itself will utilize after validation. The database used in this research was provided by the Brazilian Society of Nephrology (BSN) [15]. It was built through an online form available for 52 BSN-affiliated dialysis centers. The form is available online (http://censo-sbn.org.br/reglgCovid19). In case of unclear or ambiguous information, a researcher of the BSN Registry would contact the dialysis center for data validation.

We collected the records of adult patients (≥ 18 years with kidney failure undergoing HD for at least 3 months, registered between August/2020 and September/2021, with a confirmed diagnosis of COVID-19 (by PCR, rapid antigen testing, or serology) were selected. A total of 1809 records were retrieved. The database contains sociodemographic data and information related to hemodialysis, and COVID-19 infection. Despite BSN efforts, there was still some missing data (Supplemententary material - Table S2). Instead of removing those records, we handled them by inputting a -1 value in the missing variable. The decision was taken since in real-life scenarios not all variables are available or filled in.

We selected the patient’s sociodemographic characteristics (such as age, gender, and race), comorbidities, clinical findings in the COVID-19 infeccion (CF) by the time of diagnosis, and mortality. Aiming for simpler applicability, we narrowed our variables choices to two main groups: demographics/comorbidities (D/C) and CF. Using subsets of the database three scenarios were evaluated: only the D/C data, only the CF data, and the combination of both D/C and CF data (Table 1). Complementary information as to the variables compounding the two groups can be found in Supplemententary material.

Data availability

The main data supporting the results of this study are available in the manuscript and its Supplementary Information. The raw datasets from the BSN-affiliated dialysis centers cannot be made available due to recent Brazilian data protection laws in terms, which guarantees the anonymity and protection of medical and personal information of patients. The data were made available respecting the anonymity of patients for research purposes from the corresponding authors on reasonable request.

Table 1

General characteristics of the subsets of the database
	Training set n = 1196	Test set n = 300	Temporal-validation set n = 312
Demographics/Comorbidities (D/C)
Age, years	59 ± 15	58 ± 15	60 ± 14
Male sex	717 (60.0)	176 (58.86)	181 (58.01)
Skin color
White	624 (52.21)	159 (53.18)	165 (53.18)
Nonwhite	571 (47.79)	140 (46.82)	147 (46.82)
Obesity (BMI > 30kg/m²)	211 (17.76)	45 (15.05)	58 (18.83)
Diabetes	482 (42.32)	120 (41.10)	94 (30.13)
Previous stroke	43 (3.78)	14 (4.79)	9 (2.88)
Hypertension	1050 (87.64)	252 (86.30)	223 (71.47)
Positive HIV serology	8 (0.70)	4 (1.37)	1 (0.32)
Heart failure	177 (15.54)	37 (12.67)	33 (10.58)
Chronic liver disease	18 (1.58)	2 (0.68)	3 (0.96)
Chronic obstructive pulmonary disease	31 (2.72)	10 (3.42)	8 (2.56)
Peripheral arterial obstructive disease	71 (6.23)	17 (5.82)	11 (3.53)
Previous myocardial infarction	68 (5.97)	11 (3.77)	29 (9.29)
Previous or current neoplasia	38 (3.34)	7 (2.40)	23 (7.37)
Current smoking	26 (2.82)	8 (2.74)	6 (1.92)
Previous smoking	83 (7.29)	21 (7.19)	21 (6.73)
Use of RAAS inhibitors	555 (48.73)	129 (44.18)	189 (60.58)
Clinical findings (CF)
Fever	690 (57.69)	170 (56.67)	200 (64.10)
Cough	646 (54.01)	151 (50.33)	184 (58.97)
Dyspnea	460 (38.46)	108 (36.0)	137 (43.91)
Fatigue and malaise	352 (29.43)	82 (27.33)	136 (43.59)
Myalgia	291 (21.33)	79 (26.33)	87 (27.88)
Gastrointestinal symptoms	193 (16.14)	50 (16.67)	33 (10.58)
Altered mental status	47 (3.93)	6 (2.0)	16 (5.13)
No signs or symptoms	138 (11.54)	50 (16.67)	24 (7.69)
Values are n (%) or mean ± SD
BMI: body mass index ; HIV: Human immunodeficiency virus; RAAS: renin-angiotensin-aldosterone system.

We divided the dataset into two main subsets, one for training-test and another for temporal validation. The training-test subset was randomly split into a training set (80%) and a testing set (20%).

We evaluated three ML supervised algorithms: Random Forest (RF), Support Vector Machine (SVM), and Deep Neural Network (DNN). Most of the ML algorithms frameworks are available in modern programming languages. We used Python (version 3) and also the Sci-kit learn library for coding [16].

Random Forest implements sets of decision trees, each tree slightly different from the others, and the final output is based on a voting system of the different outputs given by the trees. It inherits all benefits from the decision tree algorithm while does not overfit as much [17].

Support Vector Machine is based on statistical learning theory and it searches for hyperplanes that could be used to divide the data into two or more classes. It creates new dimensions based on the input ones to create more distinct boundaries between classes. So, it performs well on high dimensional data (multiple variables), as our dataset. A downside is that it requires careful preprocessing of the data and tuning of the parameters [18].

Deep neural networks have become popular in the last few years. In 2019 Google researchers proposed an application of DNNs on tabular data that used building blocks for performing feature transformers that work to eliminate irrelevant features in several steps. Whereas the second block or attentive transformers utilize the most relevant sequence of features, due to the weight combination of all encoded input vectors. Finally, the mask or third block ensures the model focuses on the most relevant features generated by attentive transformers [19].

Model optimising

During the training step, we elected the area under the ROC curve (AUROC) as the optimizing metric for all models to assess the overall classification performance. Other metrics such as sensitivity and specificity were opportunely used as complementary information.

We performed 5-fold stratified cross-validation to reduce overfitting and improve the model's generalization. The stratified version was chosen to minimize the impact of the unbalanced ratio of the outcomes (4.66:1).

Machine learning models have some internal parameters that do not change during training-test or temporal validation. These values are chosen at the beginning of the training and have a significant impact on the model. They are known as hyperparameters and much research has been done to optimize the search for the best values using less computational time. To tune the hyperparameters we used the Tree-structured Parzen Estimator algorithm, implemented in the Optuna library to optimize computational processing [20]. The chosen hyperparameter ranges are in availiable in the Supplemententary material.

Following the training-test step, we applied the mRMR (Maximum Relevance — Minimum Redundancy), a feature selection algorithm that identifies the most relevant features among all previously selected while ignoring the redundant ones - hence its name [21]. After that, we used the SHAP (SHapley Additive exPlanations) algorithm to estimate the impact of each of the six features previously identified by the mRMR algorithm [22].

Finally, we proceeded with the validation step with the records collected after the end date chosen for the training-test step. In this way, a temporal property was added to the validation. To simplify and validate our findings so far, the validation used the best performing ML model and only the six variables selected by the mRMR.

The bootstrap method is a resampling technique used to estimate statistics on a population by sampling a dataset with replacement and performing ML prediction and evaluation. For the bootstrap validation, AUC is computed as a mean from the AUC of a 1000 iterations, with some of them falling out of the range of 95% CI. The confidence interval was calculated using Delong's method with concatenated predictions [23].

An overview of the study, since the data source to the employment of different ML models in the training-test and validation subsets, is depicted in Fig. 1.

A flowchart of the handling of the database is in Fig. 2. In the training-test subset, 1497 patients from 68 different dialysis centers were included in the final analysis (1134 diagnosed by RT- PCR testing of a nasopharyngeal swab, 124 by rapid antigen testing, and 239 by serologic tests). From the 312 patients comprising the validation subset, 307 were diagnosed by RT- PCR, 1 by rapid antigen testing and 5 by serologic tests.

The clinical characteristics of the patients at the time of diagnosis are in Table 1. The mean age of the patients was 59 years (standard deviation of 15 years), 893 (59.7%) patients were men and 601 (40.4%) women.

The average duration of dialysis was 5 years. As to the comorbidities, a history of diabetes was present in 696 (38.49%), cardiovascular disease in 1525 (84.34%), and chronic lung disease in 49 (2.71%), with 1682 (93.03%) having at least one major coexisting illness recorded. Overall, the most common comorbidity was heart failure affecting 247 (13.66%) patients. Two-hundred and twelve (11.72%) infected patients were asymptomatic.

Regarding the management of the disease, 1286 (71.68%) patients were treated on an outpatient basis, 522 (28.31%) needed hospitalization, and 489 (27.04%) were admitted to an intensive care unit (ICU). The overall mortality was 21.46% (388). Other epidemiological or medical data related to the dataset can be found in Table 1.

The performance of the mortality prediction models were segmented into three datasets. In the first dataset, based solely on demographics/comorbidities, the best model was the RF, with an AUROC of 0.69, a sensitivity of 0.57, and a specificity of 0.70. Corresponding values for the SVM and TabNet were 0.65, 0.60 and 0.70, and 0.59, 0.28 and 0.89, respectively (Table 2)

In the subset where only the clinical findings were taken into account, TabNet exhibited the best performance with an AUROC of 0.75, sensitivity of 0.67, and specificity of 0.75. The corresponding values for the SVM and RF were 0.71, 0.71 and 0.72, and 0.66, 0.51 and 0.71, respectively. (Table 2)

In the combined (aggregated) dataset containing the six most important variables to predict mortality in 90-days according to mRMR - dyspnea, altered mental status, diabetes, asymptomatic, hypertension, and age - the RF had the best performance, with an AUROC of 0.80, sensitivity of 0.62 and specificity of 0.79. In this setting, the corresponding values for the SVM and TabNet were 0.71, 0.71 and 0.72, and 0.76, 0.72 and 0.78, respectively (Table 2).

We next interpreted the six most important variables using an RF model in the combined dataset according to their SHAPs value. Dyspnea was found to be the most important variable, increasing the “log odds'' around 0.10. Figure 3A. Age was the second most important variable, with values above 70 years implying a “log-odds” around 0.5, and altered mental status - the fifth most important- have a higher “log-odds'', around 0.15, although the absence of these features shows a log odds near to 0. The SHAP values of other variables are in Fig. 3A.

The next step of our study comprised the assessment of mortality. First, we estimated the probability of 90-day mortality both in the training-test subset (Fig. 3C) and in the temporal validation one (Fig. 3D). We noticed no overlapping of the 80% confidence interval of the probability strata during this period.

To better explore mortality interactions in our model, we decided to accomplish a subgroup analysis accounting for the patients' vaccination status, the regional location of the dialysis center in the country, and the type of hospital management (public, private and philanthropic).

In the prospective RF (temporal-validation subset) using the combined D/C and CF scenario, the AUROC was 0.78 (95% CI: 0.73–0.83). The algorithm's performance considering vaccinated patients with at least one dose of a COVID-19 vaccine and unvaccinated revealed AUROC values of 0.75 (95% CI: 0.67–0.82) and 0.83 (95% CI: 0.85–0.89) with significant overlap of their 95% confidence interval, showing some predictive robustness regarding the increase in vaccination.

According to the geographic region, in the Middle West region of Brazil the AUROC was 0.48 (95% CI: 0.29–0.67). There was no considerable difference between the South and Southeast regions, AUROC 0.78 (95% CI: 0.69–0.86) and 0.79 (95% CI: 0.72–0.84), and reganding the type of hospital management: Philanthropic and private hospitals revealed an AUROC of 0.74 (95% CI: 0.59–0.87) and 0.79 (95% CI: 0.73–0.85) respectively. However, worse performance was observed in public facilities with an AUROC of 0.48 (95% CI: 0.29–0.67).

Table 2

Performance of the models by type of variables included in the training and test datasets
Data used	Model	Sensitivity	Specificity	PPV	NPV	LR+	AUROC
Demographics/ comorbidities	Random Forest	0.57	0.70	0.31	0.87	1.9	0.69
	Support Vector Machine	0.60	0.70	0.32	0.88	2	0.65
	TabNet	0.28	0.86	0.33	0.83	2.33	0.59
Clinical findings	Random Forest	0.51	0.71	0.31	0.86	1.75	0.66
	Support Vector Machine	0.71	0.72	0.38	0.91	2.53	0.71
	TabNet	0.67	0.75	0.39	0.75	2.68	0.75
Aggregated data	Random Forest	0.62	0.79	0.40	0.90	2.95	0.80
	Support Vector Machine	0.71	0.72	0.38	0.91	2.53	0.71
	TabNet	0.62	0.78	0.39	0.89	2.81	0.76

In this multicentre observational retrospective cohort study we aimed to evaluate the feasibility and utility of ML models to predicting mortality in HD patients with COVID-19. Unlike most mortality studies, [24] the present study is based only on demographics/comorbidities and clinical findings collected by several dialysis centers. It is an innovative initiative, using one of the largest databases of patients with COVID-19 on dialysis in the world.

In a meta-analysis by Chen et al. the mortality rate in COVID-19 HD patients was 22.4% (95% CI: 17.9– 27.1%), and significant statistical heterogeneity among the studies was found (I2 = 87.1%, p < 0.001), but no publication bias. [11] Also, according to the same authors, patients from non-Asian countries had a higher mortality rate (26.7%, 95% CI: 22.5–31.0%), and in studies considered to be of good quality, mortality was estimated to be 23.8% (95% CI: 20.2–27.6%), which reconciles with the overall mortality of 21.46% in the present study.

We observed a noticeable difference in the 90-day mortality in the presence of the following variables: dyspnea, advanced age, diabetes, absence of symptoms (asymptomatic), altered mental status, and arterial hypertension. It is worth mentioning that dyspnea is reported as one of the most prevalent clinical findings in several studies on COVID-19 HD patients, right after fever.

According to Chen et al. dyspnea was present in 16 studies involving HD patients with COVID-19, affecting 438 of 1246 patients (35.2%; 95% IC 16.9–36.6%) [11]. In the present study, dyspnea emerged not only as a frequent finding but also as the most relevant variable associated with mortality in HD patients. This result is consistent with the work dating from the beginning of the pandemic by Zou et al. that found that dyspnea was an independent risk factor for death (OR = 1.146; 95% CI: 1.026 to 1.875; p = 0.034) [24].

Interestingly, as can be seen in Fig. 3B, the presence of dyspnea in patients over 60 years would not increase the probability of death. In contrast, higher odds ratios values for mortality were observed in patients under 60 years with dyspnea. We believe that such findings should be further explored in the future.

In the temporal validation subset, we observed an increase in the number of patients vaccinated with at least one dose. In addition, substantial advances as to the treatment protocols and understanding of this disease had emerged by that time. These changes caused a dataset shift. The increased vaccination rates generated a prior probability shift, as it modified the mortality distribution of patients with COVID-19.

Furthermore, the vaccination altered the distribution of latent covariates when compared to observable covariates generating a simple covariate shift. The progress in therapeutics changed the presentation of clinical findings, and impacted on mortality, as commented in the RECOVERY study [25]. However, even in the face of all these disease modifiers, our algorithm performed consistently.

It deserves comments that several algorithms proposed to predict mortality in patients with COVID-19 deal with invasive physiological and laboratory data, such as in SAPS II and APACHE II [26], which consume a large amount of financial as well as human resources. The application of these scores can be particularly challenging in remote regions, with limited access to laboratory testing.

To improve the performance of the models in the absence of utilization of invasive data, we used hyperparameter search algorithms, despite literature claiming that such a strategy may not be necessary when using RF [27]. Of note, even without invasive data, the AUROC was not inferior to previously proposed algorithms.

Traditional models resorting either to scores with a sum of cutoff points or logistic regression may not be able to reflect the non-linear complexity between the dependent variables and the predictive ones since they are essentially linear. Furthermore, models such as logistic regression are less prone to capturing interaction effects when compared to decision trees. The RF exhibits approximately 70% higher prediction performance in comparison to logistic regression [28], and the advantage is maintained in medical and biomedical datasets. ML models are more flexible, being more susceptible to overfitting, since they learn directly from the data, generating hyperplanes with high variance [29]. In contrast, regressive models are based on assumptions and a priori knowledge, showing less variance and overfitting. [30]

Using Artificial Intelligence, we have successfully explored a large Brazilian database applying different validation processes. All used ML models exhibited excellent performance in predicting 90-days mortality, especially the ones using combined data (D/C and CF). These models have shown consistent performance by internal and temporal validation, even in the presence of data shifts, ensuring high reproducibility, fidelity, uniformity, and possible future clinical implementation. It should be underscored that the multicenter nature of the study increases its external validity, especially considering the high COVID-19 variability between different populations.

One of the highlights of this study is the use of simple and easy-to-obtain variables, strictly clinical, without involving laboratory or imaging testing, which would entail a high cost. Also, to the best of our knowledge, no previous studies tried to predict mortality in COVID-19 HD patients using ML.

Despite these strengths and original design, the study portrays some limitations, such as the requirement of preprocessing the data, and a safe and stable internet connection to ensure the security of the patients' data. A further large-scale external validation in different populations is warranted for clinical use deployment.

In summary, our study is the first attempt to develop ML models to predict mortality in patients with COVID-19 on HD relying on demographics/comorbidities and clinical features. We resorted to three different ML models – Random Forest, Support Vector Machine, and TabNet – reporting their performance. Despite the study limitations, considering the impact of this ongoing pandemic, our findings and conclusions are conspicuous and could be useful to help the management of such harmful disease in HD patients worldwide. In the future, this proposed model could allow fast and effective screening of COVID-19 HD patients to guide appropriate interventions and improve their prognosis while reducing costs.

Ethics approval and consente to participate

All methods were carried out in accordance with revelant guidelines, regulations and ethics committee. This study was approved by Institutional Review Committee of the Federal University of Sao Paulo under the number 4.454.227, and performed in accordance with the Declaration of Helsinki. Due to the retrospective nature of the study, written informed consent for participation was waived by the Federal University of Sao Paulo Ethics committee. All the data were anonymized before its use.

Availability of data and materials

All data generated or analysed during this study are included in this published article or its supplementary information files.

Consent for publication

Not Applicable (NA)

Competing interests

All other authors declare no competing interests.

Funding

No funding.

Author’s contributions

RFRJ conceived and designed the study. RFRJ and ARVFA trained and tested the models and analyzed the datasets. MAMP wrote the manuscript. RFRJ, MAMP and ARVFA produced the graphs and figures. JRL and RS contributed to expert review of the manuscript. All authors edited, reviewed, approved the final manuscript for submission, and were involved in general design of the methodology, interpretation of the data, and critical revision of the manuscript. We ensured that all the authors had access to all the raw datasets. RFRJ and ARVFA have reverified the all the datasets independently. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

Acknowledgments

We thanks the Fundação de Amparo à Pesquisa do Estado do Rio de Janeiro (FAPERJ), Rio de Janeiro, Brazil for the scholarship to MAMP and the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for the scholarship to ARVFA.

Novel Coronavirus Pneumonia Emergency Response Epidemiology Team. The epidemiological characteristics of an outbreak of 2019 novel coronavirus diseases (COVID-19) in China. Zhonghua Liu Xing Bing Xue Za Zhi. 2020; 41: 145–151.
Carney EF. The impact of chronic kidney disease on global health. Nat. Rev. Nephrol 2020, 16, 251.
GBD Chronic Kidney Disease Collaboration. Global, regional, and national burden of chronic kidney disease, 1990–2017: A systematic analysis for the Global Burden of Disease Study 2017. Lancet 2020 395: 709–733.
Neves PDMM, Sesso RCC, Thomé FS, et al. Brazilian dialysis inquiry 2019. Braz J Nephrol. 2021; 43(2):217–227.
Betjes MG: Immune cell dysfunction and inflammation in end-stage renal disease. Nat Rev Nephrol 2013; 9: 255–265.
Vaziri ND, Pahl MV, Crum A, et al. Effect of uremia on structure and function of immune system. Ren Nutr 2012; 22: 149–156.
Rombola G, Brunini F. COVID-19 and dialysis: why we should be worried. J Nephrol 2020; 33(3): 401–403.
Saran R, Robinson B, Abbott KC et al. US renal data system 2017 Annual data report: Epidemiology of kidney disease in the United States. Am J Kidney Dis 2018; 71: 501.
Zhang L, Zhao MH, Zuo L, et al. CK-NET Work Group: China Kidney Disease Network (CK-NET) 2015 Annual data report. Kidney Int Suppl 2020; 10: e95.
Zhang J, Cao F, Wu SK, et al. Clinical characteristics of 31 hemodialysis patients with 2019 novel coronavirus: a retrospective study. Ren Fail 2020; 42(1): 726–732.
Chen CY, Shao SC, Chen YT, et al. Incidence and Clinical Impacts of COVID-19 Infection in Patients with Hemodialysis: Systematic Review and Meta-Analysis of 396,062 Hemodialysis Patients. Healthcare 2021; 9,47.
Xiong F, Tang H, Liu L et al. Clinical Characteristics of and Medical Interventions for COVID-19 in Hemodialysis Patients in Wuhan, China. J Am Soc Nephrol 2020; 31(7): 1387–139.
Yang X, Yu Y, Xu J et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: A single-centered, retrospective, observational study. Lancet Respir Med2020; 8: e26.
Aydin Bahat K, Parmaksiz E, Sert S. The clinical characteristics and course of COVID-19 in hemodialysis patients. Hemodial Int 2020; 24(4): 534–540.
Lugon JR, Neves PD, Pio-Abreu A, et al. Evaluation of central venous catheter and other risk factors for mortality in chronic hemodialysis patients with COVID-19 in Brazil. Int Urol Nephrol 2022; 54: 193–199.
Pedregosa F, Weiss R, Brucher M. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 2011; 12: 2825–30.
Breiman L. Random forests. Machine Learning. 2001; 45(1): 5–32.
Cortes C, Vapnik V. Support-vector networks. Machine Learning. 1995; 20(3): 273–97.
Arik SO, Pfister T. Tabnet: Attentive interpretable tabular learning. arXiv 2019. arXiv preprint arXiv:1908.07442.
Akiba T, Sano S, Yanase T, Ohta T, and Koyama M. 2019. Optuna: A Next-generation Hyperparameter Optimization Framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery; Data Mining (KDD '19). Association for Computing Machinery, New York, NY, USA, 2623–2631.
Peng H, Long F, Ding C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. IEEE Transactions on pattern analysis and machine intelligence 2005; 27(8):1226–38.
Lundberg SM, S-I Lee. "A unified approach to interpreting model predictions." Advances in neural information processing systems 2017; 30: 4765–4774.
DeLong ER et al. Comparing the areas under two or more correlated receiver operating characteristic curves: A nonparametric approach. Biometrics 1988; 44: 837–845.
Zou R, Chen F, Chen D, et al. Clinical characteristics and outcome of hemodialysis patients with COVID-19: a large cohort study in a single Chinese center. Ren Fail 2020; 42(1): 950–957.
Quiñonero C, Sugiyama M, Schwaighofer A, Lawrence ND. Dataset Shift in Machine Learning Shift in Machine Learning. The MIT Press, 2009.
Nielsen AB, Thorsen-Meyer HC, et al. Survival prediction in intensive-care units based on aggregation of long-term disease history and acute physiology: a retrospective study of the Danish National Patient Registry and electronic patient records. Lancet Digital Health 2019; 1: e78–89.
Probst P, Bischl B, Boulesteix A-L. Tunability: Importance of hyperparameters of machine learning algorithms. Journal of Machine Learning Research 2019; 1–32.
Couronné R, Probst P, Boulesteix AL. Random forest versus logistic regression: a large-scale benchmark experiment. BMC Bioinformatics 2018; 19, 270.
Deo RC, Nallamothu BK. Learning about machine learning: The promise and pitfalls of big data and the electronic health record. Circ Cardiovasc Qual Outcomes 2016; 9:618e20.
Goldstein BA, Navar AM, Carter RE. Moving beyond regression techniques in cardiovascular risk prediction: applying machine learning to address analytic challenges. Eur Heart J 2017; 38:1805e14.

No competing interests reported.

Supplementarymaterial.docx

Download PDF

Version 1

posted

You are reading this latest preprint version

Artificial Intelligence to predict mortality in COVID-19 hemodialysis patients relying on demographics and clinical data

Status:

Version 1

Abstract

Figures

Background

Methods

Data source and preparation

Data availability

Model optimising

Results

Discussion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1