In the present study, LR, DT, and SVM as classic supervised ML classifiers and four developed ensemble classifiers including, RF, BC, AdaBoost, and GBDT were implemented to predict the severity status and death occurrence in hospitalized children with COVID-19. The clinical efficacy of the algorithms was measured by accuracy, AUC, F1 score, precision, sensitivity, specificity, and NPV. According to the indices, while all of the methods indicated the reasonable performance, the ensemble ML algorithms demonstrated greater predictive efficiency than the others. DT and RF had the highest accuracy to predict severity status, and GBDT and AdaBoost had the best performance for mortality prediction. Based on the optimal predictive models, respiratory distress and cough at the time of admission could be considered as the key factors to estimate the likelihood of severity status and death occurrence, respectively.
A number of ML algorithms have been proposed to predict the outcomes in COVID-19 studies (7, 20–25). A study investigated the prediction of early detecting of COVID-19 in 5664 children admitted in medical centers, based on laboratory findings using supervised learning Techniques. ANN, RF, SVM, DT, and GBDT were used to identify COVID-19 and standard 10-fold cross validation procedure was used to evaluate the performance of the five ML algorithms. The results of this study revealed that classification and regression trees (CART) models had the highest accuracy (92.5%) for binary classes based on laboratory outcomes. Leukocytes, Monocytes, Potassium, and Eosinophils were the most important features that could predict the COVID-19 in admitted children (20). In relation to the findings of the current study, DT with CART algorithm had the highest accuracy to predict the severity status in hospitalized children with COVID-19, however none of the laboratory findings were identified as the important features to predict the critical or severe status in patients.
In 2020, Ma et al. predicted the chest CT results among RT-PCR positive pediatric patients aged 16 and under 16 years old using Bayesian optimization process. In that study, the researcher used the clinical symptoms and laboratory results of 102 children with normal and 142 cases with abnormal CT findings to compare the performance of the suggested approach with regular techniques in learning models. Based on the presented results, the Bayesian optimization achieved an AUC = 0.84 with 0.82 and 0.84 for accuracy and sensitivity to predict CT outcomes, respectively. Their results showed age, lymphocyte, neutrophil, ferritin and C-reactive protein were the most related para-clinical results to predict CT findings for pediatric patients with positive RT-PCR testing (21). The radiographic findings were omitted for predicting the severity status in present study, because the abnormality in CT results could be considered as the crucial predictive for severe and critical status in pediatric patients with COVID-19. Based on the results, the clinical symptoms including; respiratory distress, cough, and fever at the time of admission, were identified as the most important features to predict the severity status in hospitalized children. Although the children’s ages were nearly the same in both studies, the non-parametric models used for predicting severity yielded significantly different results.
A cross-sectional study was conducted on 556 children in Serbia between 2020 and 2022. The research included 280 pediatrics with PCR-confirmed COVID-19 and 286 children with respiratory symptoms with negative result of PCR. The researchers used six ML techniques (RF, SVM, linear discriminant analysis, ANN, K-nearest neighbors, and DT) to help healthcare providers to detect children with COVID-19 in the rapid triage. According to performance indices, RF and SVM indicated the highest accuracy of 85% and 82.1%, respectively, and the most prominent features were shown as mean platelet volume (MPV), WBC, mean corpuscular hemoglobin concentration (MCHC), platelet distribution width, (PDW), and absolute lymphocyte count (LYM) to predict COVID-19 in the early stage (7). Although the outcomes of the present study differed, tree-based algorithms- specially DT and RF for severity status, and GBDT and AdaBoost for mortality prediction- were identified as the most effective ML techniques. However, the features recognized for predicting the outcomes were dissimilar.
544 hospitalized children with COVID-19 participated in a study at children hospital in China between 2022 and 2023, with 243 and 301 in the mild and severe categories, respectively. For prediction algorithms including LR, RF, XGBoost, AdaBoost, categorical Boost (CatBoost), and light Gradient Boosting machine (LightGBM) the potential attributes including patient characteristics, and medical information were taken into consideration. The performance of each ML model was evaluated from 5-fold cross-validation method (24). The results of the study demonstrated the RF + TomekLinks model as the better method with AUC of 82.1% choosing the 10 most significant variables, which was compatible with the findings of the current study.
A systematic review study in 2024 investigated the research methodologies, computational modeling strategies, and performance assessment standards used by studies employing ML techniques to establish clinical predictive models for children and adolescents infected with COVID-19. Ten studies published from 01/01/2020 to 10/25/2023 were included in the investigation, and widely used ML methods were tree-based models, such as XGBoost, DT, and CatBoost, and neural networks (ANNs), like multilayer perceptron (25). It was demonstrated that ML models could potentially develop correct clinical predictive models to boost the patient care by recognizing the high-risk individuals who may get the early interventions or personalized treatments. Despite the successful results, the consistency in reporting model development and validation approaches were not satisfying which was provided in this research.
There were some important considerations in this study that should be acknowledged. The model training was restricted to hospitalized children who made a significant proportion of COVID-19 hospital admissions. Furthermore, the multi factors might be related to severity status and mortality rate of the disease in children; basic characteristics, clinical symptoms, biochemical features, and radiographic findings, were included in ML algorithms. Utilizing ensemble ML algorithms, particularly boosting models, and SMOST enhanced the model validation indices significantly. This improvement enabled the identification of key features associated with severe disease outcome. Ultimately,
However, there were certain limitations to this study that it is necessary to be considered for further researches. First, it was an observational investigation conducted based on the data from a single-center referral hospital during the initial stage of the COVID-19 global outbreak. As a result, a limited number of children were included in this research. Additionally, due to inadequate documentation, some parameters influencing the illness prognosis were not accurately registered from the onset. Second, blood counts, biomarker parameters, and clinical symptoms involved in model training were obtained at the time of admission. The dynamic changes of important laboratory findings and clinical symptoms could improve the model validation, resulting in a potentially better clinical predictive model. Third, developing the more complex algorithms, such as ANNs, could potentially enhance the predictive capabilities compared to tree-based models, which were not considered in this study.