Machine learning to predict adverse drug reaction or event based on electronic health records: a systematic review and meta-analysis

doi:10.21203/rs.3.rs-4081881/v1

Download PDF

Research Article

Machine learning to predict adverse drug reaction or event based on electronic health records: a systematic review and meta-analysis

https://doi.org/10.21203/rs.3.rs-4081881/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Introduction

Machine learning (ML) for adverse drug reaction or event (ADR/ADE) prediction is emerging as the promising method for improving medical quality. We aimed to conduct a systematic review to comprehensively summarize ML prediction for ADR/ADE based on electronic health record (EHR).

Materials and methods

We systematically searched the PubMed, Web of Science, Embase, and IEEE Xplore databases from database inception to 21 Nov. 2023, to identify eligible studies. Any study that developed ML model to predict multiple ADR/ADE based on EHR was included in the final review. The pooled sensitivity, specificity and their 95% CI were calculated. Binary accuracy data were extracted for meta-analysis to derive area under the curve (AUC).

Results

5704 studies were identified, of which 10 studies met the inclusion criteria. Among the 20 ML methods reported in the including studies, Random Forest (RF) was reported the most (n=9), followed by Adaboost (n=4), eXtreme Gradient Boosting (n=3) and support vector machine (n=3). The mean AUC for ML prediction was 75.71% (26.00-94.57). RF combined with resampling based approaches might get high AUC, the mean was 82.92% (94.48-94.57). The length of stay, number of drugs, admission type, age, and high-risk drug used, such as antiviral agents, rifamycin, were the common risk factors for ADR/ADE prediction. The pooled estimated AUC of summary receiver operator characteristics was 72.00% (68.00-75.00).

Conclusions

Acceptable prediction performance of ADR/ADE with ML algorithmwas highlighted. More rigorous reporting standards and the new ML methods that take into account the unique challenges of ML research could improve future studies and help the application of ML models in clinical practice.

Machine learning

Adverse drug event

Prediction model

Electronic health record

With the benefits of medication therapy comes the potential for medication-related injuries. The medication-related injuries were associated with a significantly prolonged length of stay, the additional economic burden, and an almost 2-fold increased risk of death [1, 2]. Therefore, the World Health Organization had launched to the promotion and implement actions for improving medication safety and reducing the number of preventable adverse drug reactions or adverse drug events (ADRs/ADEs) in 2017 [3]. ADRs/ADEs in hospitalized patients are common, may be accompanied by substantial harm, and some of they can be preventable [2]. Therefore, several studies focused on the prediction of ADR/ADE.

The prediction of ADR/ADE based on drug-drug interactions (DDI) [3], chemical construction of drug [4], spontaneous reporting system (SRS) and health record [5] were the main aspect of concern by researchers. DDI can produce different types of responses by pharmacy, pharmacokinetics, and pharmacodynamics, including increasing or decreasing drug concentration, increasing risk of liver or kidney damage, and so on [6, 7]. Therefore, DDI is considered as one of critical aspect of drug research that may have adverse effects on patients and lead to serious consequences [8]. Chemical construction of drug has the structural alertness to adverse effects. The combination of chemical structures, target proteins, substituents, and enriched pathways may predict the occurrence of adverse event [9]. However, these methods take no account of the patients’ condition, including age, gender, length of stay, doses per patient, diagnosis, and so on. Compare to DDI and chemical construction, SRS could provide more detailed patient information, including age and gender, however, some information such as the length of stay, doses per patient, and the total number of patients using certain medication, were still not available, and the event by spontaneous reporting often lacked clarity regarding the diagnosis of adverse events [10].

The health records have the whole data for the entire hospitalization period of patients. Applying tools for ADR/ADE prediction in hospitalized patients could help clinicians to prevent ADRs/ADEs in a timely manner at the patient level [11]. In addition, insights obtained by such tools on why and when ADEs occur during hospitalization can promote the improvement of medication safety and achieve the objective of reducing the number of ADRs/ADEs.

As electronic health record (EHR) systems become increasingly popular, reusing data for hospitalization information in the systems can be realized. These data from EHR were real-world data in clinical, which could be easy to collect and used to develop ADR/ADE prediction models for inpatients. The EHR data can be scanned automatically for (potential) ADRs/ADEs using computerized algorithms, which was the attractive alternative to the laborious manual patient chart reviews [12]. Active surveillance basing EHR was more effective, detecting 10 times the number of ADRs/ADEs than voluntary reporting [13]. Therefore, more and more studies focused on predicting ADRs/ADEs based on EHR. The reasons of drug-related injuries were often complicated in clinical practice. Certain adverse event might be the result of a combination of drugs, while certain medicine could cause multiple related events, continuously or simultaneously. The studies that focused on predicting multiple adverse events were more in line with clinical needs and had practical value. However, traditional statistical methods were difficult to obtain satisfactory prediction results in the presence of multiple risk factors. The prediction results of logistic regression (LR) showed that the F1 score was only 17–36% [2, 15, 16].

Machine learning (ML), as the branch of artificial intelligence, is an interdisciplinary subject involving statistics, computer science, and many other fields. ML can handle complex nonlinear relationships between variables and outcomes, exhibiting stronger generalization capabilities and improved accuracy [17]. With the advancement of computer technology and the development of medical databases, ML has become a new research hotspot in medical practice, which has shown great capability in capability the fields of disease diagnosis [18], prescription analysis [19], complication monitoring [20], and disease prediction [5, 16]. However, no detailed overview and critical appraisal for the developed prediction model were provided. Therefore, we aimed to conducted a systematic review to comprehensively summarize ML prediction for ADR/ADE based on EHR.

This systematic review was conducted according to the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) Guidelines [21]. The review protocol was registered as a systematic review at PROSPERO under registration number CRD42023464771.

Search strategy and selection criteria

The PubMed, Web of Science, Embase, and IEEE Xplore databases were systematically searched for studies that met inclusion criteria. The researchers adopted the keywords derived from MeSH and Emtree, as well as their synonyms and the keywords identified from some initially reviewed papers. Terms such as machine learning, deep learning, artificial intelligence, predict and adverse drug reaction were used as the search strategy. The search strategy used in each database is listed in Supplement Table 1. The search was updated for the last time on 21/11/2023.

Table 1

Characteristics of included studies
Study ID	Location	No. of patients	No. of patient with ADR/ADE (%)	Population	Database	Age	Male (%)	No. of patient with ADE (%)	No. of ADE	No. of drugs that cause ADE	Type of ADE	No. of ML Algorithms	Optimal Model
Hu, 2022 [5]	China	1800	234 (13.00%)	Older inpatients	Self-built database	Total: 69.84 ± 8.14 No ADR/ADE: 70.12 ± 8.19 ADR/ADE: 67.95 ± 7.55	58.56%	234(13.00%)	296	21	32	7	Adaboost
Yu, 2021 [26]	China	1746	221 (12.70%)	Pediatric inpatients	Self-built database	Total: 3.84 ± 3.89 No ADR/ADE: 3.86 ± 3.85 ADR/ADE: 3.72 ± 4.12	Total: 65.00% No ADR/ADE: 64.70% ADR/ADE: 67.40%	221(12.70%)	247	27	28	7	GBDT
Langenberge, 2023 [16]	USA	210181	10957 (5.21%)	all population	MIMIC-IV (version 2.1)	No ADR/ADE: 59.8 ± 19.7 ADR/ADE: 63.3 ± 16.9	No ADR/ADE: 49.40% ADR/ADE: 51.50%	10957(5.21%)	27667	N	102	6	GBM
Karlsson, 2014 [27]	Sweden	16287	4128 (25.34%)	all population	Stockholm EPR corpus	N	N	4128(25.34%)	N	N	14.00	4	RF-ren
Ponraj, 2021 [28]	India	5000	N	all population	Self-built database	N	N	N	N	N	N	3	DT
Karlsson, 2016 [29]	Sweden	35711	16062 (44.98%)	all population	Stockholm EPR corpus	N	N	16062(44.98%)	N	N	27	5	No significant difference
Zhao, 2021 [30]	China	30703	N	all population	An EHR Database in China	N	N	N	N	N	N	6	XGBoost, AdaBoost, and RF
Zhao, 2015a [31]	Sweden	14303	2807 (19.62%)	all population	Stockholm EPR corpus	N	N	2807(19.62%)	N	N	27	9	RF
Zhao, 2015b [32]	Sweden	14696	2928 (19.92%)	all population	Stockholm EPR corpus	N	N	2928(19.92%)	N	N	14	3	RF-BWE
Zhao, 2016 [33]	Sweden	38709	5733 (14.81%)	all population	Stockholm EPR corpus	N	N	14.81%	N	N	19	4	RF-LWS
ADR/ADE, Adverse drug reaction or adverse drug event; ML, Machine learning, MIMIC, Medical Information Mart for Intensive Care, EHR, Electronic Health Record, XGBoost, eXtreme Gradient Boosting; GBDT, Gradient Boosting Decision Tree; RF, Random Forest; GBM, gradient boosting machine; DT, Decision trees; BWE, Bag of Weighted Events; LWS, learned weights in the weighted sampling

The inclusion criteria were that 1) papers focused on predicting ADR/ADE using ML algorithm based on EHRs, 2) papers focused on prediction of multiple ADR/ADE by using ML algorithm, 3) papers were with sufficient explanations about the research findings, and 4) papers applied machine learning to the establishment of prediction models. Studies were excluded if any following applied: 1) lacked of full-text paper, 2) review papers, 3) non-English papers, 4) paper focused on other medical safety events, such as radiation dermatitis or transfusion reaction, but not ADR/ADE, 5) focused on identifying ADR/ADE, but not predicting, 6) paper applied traditional algorithm, such logistic regression, but not ML, and 7) paper based on adverse drug reactions database, such as FDA Adverse Event Reporting System (FAERS).

Study selection and data extraction

After removing duplicate literature, two reviewers (QZ Hu and CQ Li) independently screened titles and abstracts for the retrieved studies and disagreements were resolved by consensus. If the title or abstract met the inclusion criteria, we retrieved full-text articles. According to the inclusion and exclusion criteria, the two reviewers independently reviewed the full texts of the papers, and disagreements were again resolved by consensus.

The author, year of publication, study setting, number of patients, population, database, data period, ML algorithms, predictive performance parameter, including accuracy, sensitivity(recall), specificity, precision (%), F1 (%), and area under the receiver operating characteristics curve (AUC), and important risk factors were extracted from the included publications.

Quality evaluation

Two tools were utilized to evaluate the quality of the articles. Two researchers independently evaluated the quality of the studies, and any disagreements were resolved by discussion. The prediction model risk of bias assessment tool (PROBAST) was used to guide developers in how to use them to assess risk of bias and applicability concerns [22]. It consisted of 20 signaling questions for assessment covering four domains (participants, predictors, outcomes and analysis, respectively). The first three domains were also assessed in terms of concern for the applicability (low, high, or unclear) of the study.

The checklist for the assessment of medical AI (ChAMAI) was used to assess the quality of artificial intelligence in medicine studies [23]. ChAMAI had 30 items which were divided into six dimensions: problem understanding, data understanding, data preparation, modeling, validation, and deployment [23]. All items were divided into high-priority items and low-priority items, and the answered for these items were as OK (adequately addressed), mR (sufficient but improvable), and MR (inadequately addressed) [23]. OK, mR, and MR were assigned scores of 0, 1, and 2 in high-priority items, while the scores were 0, 0.5 and 1 in low-priority items [24]. The maximum score for this assessment tool was 50 points.

Statistical analysis

The effects and 95% confidence intervals (95% CI) were performed by a random effects model [25]. The pooled sensitivity, specificity and their 95% CI were calculated by true-positive (TP), false-positive (FP), true-negative (TN), and false-negative (FN). The summary receiver operator characteristics (SROC) curve and the area under the curve (AUC) of SROC were used to assess the overall performance of ML, and P < 0.05 was considered to indicate a statistically significant difference. Heterogeneity of study statistics was assessed by using the Q statistic and I², where I² > 50% indicated significant [25]. The risk of publication bias was assessed using the funnel plot and regression test. Statistical analysis was performed by Stata 16.0 software.

Characteristics of the included studies

The database search identified 5704 studies that potentially qualified for inclusion. After deleting duplicate studies, the titles and abstracts were then filtered. Full texts of 36 studies were screened, and 10 met the inclusion criteria. A detailed overview of our selection procedure is shown in a PRISMA flowchart in Fig. 1.

In these studies, three studies used self-built database to develop prediction model of ADR/ADE [5, 26, 28], one used the Medical Information Mart for Intensive Care (MIMIC)-IV (version 2.1) [16], one used an EMR Database in China [30], and five studies used the Stockholm EPR corpus [27,29,31–32]. These studies were conducted in China [5, 26, 30], the United States [16], India 28, and Sweden [27,29, 31–33]. The included population of these studies included older [5], pediatric [26], and all population [16, 27–32].

Studies often applied multiple ML algorithms at the same time to evaluate the performance of different models. Among the 20 ML methods reported in the including studies, Random Forest (RF) was reported the most (n = 9), followed by Adaboost (n = 4), eXtreme Gradient Boosting (Xgboost) (n = 3) and support vector machine (SVM) (n = 3). Four studies modified the ML algorithms, such as adjusting the configuration, learned weights or tree sizes, to determine the appropriate model [27,29,32,33]. The details of included studies were showed in Table 1 and Supplement Table 2.

Evaluating the quality of studies

The risk of bias assessment results based on PROBAST showed that most studies were with high risk of bias but low risk of applicability concerns. Among 10 included studies, seven studies were with high risk of bias and three were with high risk of applicability concerns. Only 3 studies had both low risk of bias and applicability concerns [5, 16, 26]. The evaluation result detail and diagram of PROBAST are showed in Supplement Table 3 and Supplement Fig. 1.

The quality evaluation based on the ChAMAI checklist showed that the overall mean score was 23.7 (rang 20.00 to 32.50), which was below 50% of the maximum score. Three studies were with high score, ranging 25.00 to 32.50. The mean scores of the six dimensions (problem understanding, data understanding, data preparation, modeling, validation, and deployment) were 6.30, 2.40, 0.90, 6.00, 6.75, 1.35, respectively. Three dimensions, including data understanding, data preparation and deployment, had low mean scores, which was less than 50% of the maximum score. The dimensions, including problem understanding, modeling and validation, had high mean scores, especially for modeling with a score of full marks. The evaluation result detail and diagram of ChAMAI checklist are provided in Supplement Table 4 and Supplement Fig. 2.

ADR/ADE

Eight studies reported the types of ADRs/ADEs found, which number ranged 14 to 102 [5,16,26,27,29,31–33]. The detail can be found in Supplement Table 5 The drug allergy was mentioned in eight studies, including angioneurotic oedema, Steven-Johnson syndrome, anaphylactic shock, and contact dermatitis. Over sedation\hypotension, hypoglycemia, thrombocytopenia, cardiomyopathy due to drug and external agent, nephrotoxicity/creatinine disorder and drug-induced adrenocortical insufficiency were also commonly reported, which were mentioned in 7, 5, 5, 5, 4, and 4 studies, respectively.

Convulsions, hyperglycemia, respiratory depression, bronchospasm, and dyspnea were only mentioned in the study for children [26], while bradycardia was only in the study for older patients [5]. The incidences of above ADRs/ADEs were low, all below 1% [5, 26].

Predictive performance for different predictive methods

AUC was the important indicators of model performance. Seven studies reported AUC, and the range was 26.00%-94.57% [5, 16, 26, 27, 31–33]. The LightGBM, Adaboost, Catboost, Xgboost, and TPOT had high AUC, which mean value were over 90%. Although the mean value of RF was 82.92%, RF combined with resampling based approaches may get high AUC, which ranged 94.48–94.57%.

The accuracy, precision, sensitivity, specificity and F1 score were also used to evaluate the performance of models, which were reported in 7, 4, 5, 5, and 7 studies. The mean values of accuracy, precision, sensitivity, specificity and F1 score were 80.77%, 41.48%,46.02%, 63.37%, 51.16%, respectively. The summary of the predictive performance is provided in Fig. 2.

Meta- regression

The contingency tables were extracted from three prediction studies [5, 16, 26], which were showed in Supplement Table 6. The pooled estimated AUC of SROC was 72.00% (68.00–75.00) (Fig. 3), the pooled sensitivity was 40.00% (31.00–50.00) (I² = 99.33%), and the pooled specificity was 92.00% (87.00–96.00) (I² = 99.96%). Comparing with LR, the meta regression suggested that a higher specificity was acquired in ML algorithms [ML vs. LR: 93.00% (88.00–96.00) vs. 65.00% (65.00–66.00)], but not for sensitivity [ML vs. LR: 38.00% (29.00–49.00) vs. 68.00% (66.00–69.00)] was lower (Fig. 4 and Supplement Fig. 3). The heterogeneity for the pooled estimate was high (I² > 99%); therefore, the subgroup analysis by different population in Supplement Table 7. Publication bias of resulting from visual inspection of funnel plots are shown in Supplement Fig. 4.

Electronic health records refer to the digital version of patient’s paper chart and the associated health information systems, which have been widely applied in hospitals to record the disease variety, therapeutic regimen, test results and radiological images [34]. Prediction of ADRs/ADEs basing on EHR can help improve the quality of health care, however, the challenge of identifying ADEs from these observational data deserve attention. Due to the combination of structured and unstructured data, traditional statistical methods were difficult to predict adverse events basing on EHR. Certain keyword changes, such as altering the sequence of instructions or expressions, might have a great impact on the prediction results. ML algorithms have gained widespread applications in medical fields, including prognosis prediction, which aresuitable for complex data environments. Several Advanced ML algorithms, such as XGBoost, Light Gradient Boosting Machine (LightGBM), CatBoost, Gradient Boosting Decision Tree (GBDT), and RF, have been developed, offering refined techniques. There have been several systematic review articles on patient safety events signal detection using ML. However, these systematic reviews usually focused on the identifying and diagnosing safety events, the studies focused on the prediction of ADRs/ADEs were still lacking. In this study, we conducted systematic review and meta-analysis for the use of ML techniques and algorithms in predicting drug-related harm from EHRs and clinical notes.

The results showed that the most investigated database was the Stockholm EPR corpus in 10 including studies [27,29,31–33]. This database, which comes from Karolinska University Hospital in Stockholm, Sweden, contains large amounts of diagnosis information, drug administrations, clinical measurements and clinical notes in free [32]. MIMIC-IV was also used to establish the prediction models, which was the result of a collaboration between Beth Israel Deaconess Medical Center and Massachusetts Institute of Technology. The data of MIMIC-IV can be deidentified, transformed, and made available to researchers who have completed training in human research and signed a data use agreement [35]. Its available information includes patient measurements, diagnoses, procedures, treatments, and deidentified free clinical notes [31]. One study used a Chinese database [30]. This database included demographic data, procedures, and clinical notes (including personal history, diagnoses, and medications), while it did not mention more details about the database, including reference and the information of hospital, patient and data period 30. In the Chinese database, Stockholm EPR corpus and MIMIC-IV, patient data were described by both unstructured clinical narratives and by structured data regarding [27, 30, 36]. Three studies were based on self-built database, which were conducted in older adults, children, and all population [5, 26, 28], which data were structured clinical data, extracted from EHR of hospital by researchers. International Classification of Disease, Version 10 code was applied as standard terminologies for diagnoses in the studies which based on Stockholm EPR corpus and MIMIC-IV [16,27,29,31–33].

There were 20 type of ML algorithms applied to establish the prediction ADR/ADE models. In these ML algorithms, ensemble learning was the most commonly used categorization of models. It can combine predictions from multiple single weak learners to obtain more reliable and generalizable predictions [37]. Compared to a single weak learner, the linear combination of weak learners or majority voting in classification problems in regression problems can obtained the superior prediction [38]. Ensemble learning methods include bagging and boosting algorithms. Boosting algorithm trained weak learners, computed predictions, and selected the misclassified training samples. And then, it trained the subsequent weak learner with an updated training set that includes the misclassified instances from the previous training set [39]. Boosting algorithms, including GBDT, XGBoost, LightGBM, and so on, were applied in various sectors. The review revealed that five studies reported the performances of boosting algorithms [5, 16, 26, 30, 31], the aggregate performances of them were good. The average AUC and precision of boosting algorithms were 85.20% (72.00–92.00) and 46.78% (10.10-68.57). Among boosting algorithms, AdaBoost and XGBoost might perform better than LightGBM, gradient boosting machine (GBM), and GBDT basing on AUC, F1 scores and precision, while meta-regression could not be calculated due to lack of contingency tables.

Bagging algorithm, was aggregating the predictions of multiple decision trees, was different from boosting algorithm. It resampled data from the training set with the same cardinality as the starting set [40]. Therefore, bagging algorithm can reduce the classifier’s variance and overfitting. The representative algorithm was RF which was also the most frequently reported algorithm with a total of nine studies reporting [5,16,26,28,29–32]. RF can improve the predictions of the decision tree’s base classifier by using bagging approach [41]. It was widely accepted that EHR data with large numbers of sparse features could lead to the low predictive performance of RF [27]. Therefore, the RF algorithms in the included studies were improved by adjusting the configuration, learned weights, tree sizes, or combining with different resampling approach in some including studies. The results showed that the average AUC and precision of non-improved RF were 80.87% (74.30–94.00) and 35.89% (9.60–75.00), which of improved RF were 83.48% (74.64–94.57) and 51.13% (48.49–52.70). Among these modified RF, combined with resampled until an informative feature found or until no more features might have the best performance, the AUC and F1 score were 94.57% and 88.89%, respectively [27]. However, meta-regression could not be also calculated for the same reason.

Support vector machine, as one of the most popular ML algorithms, had been extensively applied into face recognition, disease prediction, image retrieval, data mining and other fields, and so on. Its basic idea was to find the maximum margin-type hyperplane in the input space which separated the training dataset [42]. It can solve the pattern recognition problems of small samples, nonlinearity and high dimension, especially in dealing with classification problems [43]. However, the performance of SVM was not as well as ensemble learning, which average AUC were just 63.00% (59.00–67.00). The reason might be the number of enrolled patients in the including studies were high, which resulted in poor performance of SVM [29–31].

Some studies reported the results of LR which was the common traditional statistical methods. The performance of LR was not inferior to non-LR methods, such SVM, artificial neural network (ANN), K-Nearest Neighbor (KNN) or Naïve Bayes (NB) in some studies [16, 30, 31]. The meta-regression showed that the non LR methods had a higher specificity than LR method, but less sensitively. Therefore, researchers should not blind faith in some novel ML algorithms, LR might also perform well. We should select the appropriate algorithms based on the unique research issue and application scenario.

Eight studies reported the type of ADRs/ADEs [5,16,26,27,29,31–33]. The incidence rates of different adverse reactions have large variation due to the difference in population and database. In these studies, allergy was most commonly mentioned, which incidence was between 1% and 6%. Drug allergy could be associated with any kind of medication, which could be classified to IgE-mediated, cytotoxic, immune complex, and cellular mediated [44]. Most allergies were transient, while some of them could lead to serious consequences, such as drug reaction with eosinophilia and systemic symptoms and Steven-Johnson syndrome. Over sedation\hypotension were also common and associated with blood pressure medications, sedative hypnotics or anesthetics. The incidence rate was in 0.3–2%, which might be higher in pediatric inpatients [26]. The reasons might be that the dosage of sedative hypnotics or anesthetics had great individual variations among children, and children’s sensitivity to these drugs also varies greatly [45]. In addition, respiratory depression, bronchospasm, and dyspnea were only fund in children. They might also be related to the use of sedative hypnotics or anesthetics; hence, these drugs should be used with caution in children.

Four studies reported the risk factors of ADR/ADE [5, 16, 26, 30]. Although the risk factors were variable in different studies or database, the length of stay, number of drugs, age, and high-risk drug used, were the commonly mentioned [5, 16, 26, 30]. The cross-sectional studies had shown patients who experienced drug-related injure received more medications during their hospitalization and had longer stays [2, 46, 47]. Age was also an important risk factors of ADR/ADE. Older patients were more likely to experience drug-related events due to multiple co-morbid illnesses, polypharmacy, difficulty monitoring prescribed medications, and age-related changes in pharmacokinetics and pharmacodynamics [48]. However, the younger older people were likely to experience ADEs than the older due to receiving high-risk medications [5, 14]. The frequently mentioned high risk drugs included glucocorticoid, anticoagulants, non-steroidal anti-inflammatory drugs, and chemotherapeutic drug, all of which had been shown to be high risk for ADEs [49]. For this reason, surgery was a protective factor of ADR/ADE. This was due to the fact that surgical patients were commonly only with intravenous fluid therapy during surgery, and were rarely treated with high-risk drugs, and therefore had a lower risk of ADEs. These results indicated that reduced lengths of hospital stay, simplified treatment regimens, and avoided the use of high-risk drugs could avoid ADR/ADE. Although these risk factors were often difficult to avoid in clinical treatment, physician, nurse or pharmacist should pay close attention to patients with risk factors and promptly treat ADR/ADE when it occurred.

There are several limitations in the present study. First, the quality of enrolled studies was not high. The results of quality assessment based on PROBAST and ChAMAI checklist showed that only three studies had the low risk of bias and scored higher than 25 (full marks was 50) [5, 16, 26]. Reporting data processing procedure is the important part of the model development, which can improve reproducibility, transparency, thorough explanations, however, these dimensions have a low score among studies. Most of the included studies did not report data description and data preprocessing tasks procedure. In addition, the studies basing on self-built database or the Chinese database did not apply standard terminologies, though using standard terminologies/ontologies might model performance and generalization [5, 26, 30]. Therefore, the high-quality researches were need to explore the optimal prediction model and risk factors of ADR/ADE. Second, contingency tables, which could be used to simply compare predictive performance by pooled estimate, were available in only three studies, involving 13 models. Third, great heterogeneity was observed in the including studies due to the differences in the database based, predictors, ML algorithms, hyperparameters and population included. For the research exploring predictive model, heterogeneity might only be avoided under some harsh conditions, such as the similar disease, age or region of patients, and the similar ML models and parameter, hence, the low heterogeneity could be difficult to obtain. Fourth, ML methods can improve prediction performance, but certain input variables such as chief complaint can be better processed when highly advanced pre-processing methods, such as natural language processing, are applied. Unfortunately, no studies have combined this method with ML for ADR/ADE prediction. Finally, application of new ML and deep learning methods for prediction of drug safety events on EHR have been rare, such as convolutional neural network, recurrent neural network, bidirectional long short-term memory with conditional random field algorithms, and so on. Therefore, researcher should pay more attention to the innovative ML algorithms for improving the ability of model prediction and promoting the application of achievements in the future.

This study system reviewed the studies that using machine learning to predict adverse drug reaction or event based on electronic health records. A total of 10 studies were included to conducted meta-analysis, and ML achieved a satisfactory result in most of the included studies. According to the meta-analysis, ML was the potential technology. ML could be added to EHR to the prediction of ADR/ADE to improve the quality of patient care and avoid the occurrence of some drug-related harm. More rigorous reporting standards and the new ML methods that take into account the unique challenges of ML research could improve future studies and help the application of ML models in clinical practice.

Author contributions

Funding

This work was supported by the Sichuan Science and Technology Support Program (grant number: 2023NSFSC1696), and Science and technology project of Chengdu Health Commission (grant number: 2022020).

Data availability

The authors declare that all the data included in this study are available within the paper and its Supplementary Information files.

Ethics approval and consent to participate

Not applicable.

Consent for publication

Yes.

Competing interests

The authors declare no competing interests.

AUTHOR CONTRIBUTIONS

All the authors were involved in the study. QZH and TX conceived the study. QZH, DZ and XQL wrote the initial protocol, did the literature search, and screened articles. QZH and ZYH wrote the initial manuscript. QZH and XQL did the statistical analysis. All authors provided crucial feedback on the study protocol and contributed important intellectual content to the manuscript, including revisions.

ACKNOWLEDGEMENTS

We thank the participants in our study.

FUNDING

This work was supported by the Sichuan Science and Technology Support Program (grant number: 2023NSFSC1696), and Science and technology project of Chengdu Health Commission (grant number: 2022020).

COMPETING INTERESTS

The authors have no conflicts of interest to declare.

Classen DC. Adverse drug events in hospitalized patients. excess length of stay, extra costs, and attributable mortality. JAMA. 1997;277(4):301–6.
Amelung S, Meid AD, Nafe M, Thalheimer M, Hoppe-Tichy T, Haefeli WE, Seidling HM. Association of preventable adverse drug events with inpatients' length of stay-A propensity-matched cohort study. Int J Clin Pract. 2017;71(10). 10.1111/ijcp.12990. Epub 2017 Sep 5. PMID: 28873271.
Medication without harm. In: WHO global patient safety challenge. Geneva, Switzerland: World Health Organization. 2017. https://apps.who.int/iris/bitstream/10665/255263/1/WHO-HIS-SDS-2017.6-eng.pdf?ua=1&ua=1 (Accessed Oct 23. 2023).
Gong Y, Teng D, Wang Y, Gu Y, Wu Z, Li W, Tang Y, Liu G. In silico prediction of potential drug-induced nephrotoxicity with machine learning methods. J Appl Toxicol. 2022;42(10):1639–50.
Hu Q, Wu B, Wu J, Xu T. Predicting adverse drug events in older inpatients: a machine learning study. Int J Clin Pharm. 2022;44(6):1304–11.
Huang J, Niu C, Green CD, Yang L, Mei H, Han JD. Systematic prediction of pharmacodynamic drug-drug interactions through protein-protein-interaction network. PLoS Comput Biol. 2013;9(3):e1002998.
Zitnik M, Agrawal M, Leskovec J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics. 2018;34(13):i457–66.
Zhang Y, Deng Z, Xu X, Feng Y, Junliang S. Application of Artificial Intelligence in Drug-Drug Interactions Prediction: A Review. J Chem Inf Model 2023.
Zheng Y, Peng H, Zhang X, Zhao Z, Yin J, Li J. Predicting adverse drug reactions of combined medication from heterogeneous pharmacologic databases. BMC Bioinformatics. 2018;19(Suppl 19):517.
Vallano A, Cereza G, Pedròs C, et al. Obstacles and solutions forspontaneous reporting of adverse drug reactions in the hospital. Br J Clin Pharmacol. 2005;60:653–8.
Shojania KG, Thomas EJ. Trends in adverse events over time: why are we not improving? BMJ Qual Saf. 2013;22(4):273–7.
Klopotowska JE, Wierenga PC, Stuijt CC, WINGS Study Group, et al. Adverse drug events in older hospitalized patients: results and reliability of a comprehensive and structured identification strategy. PLoS ONE. 2013;8(8):e71045.
Griffin FA, Resar RK. IHI Global Trigger Tool for Measuring Adverse Events, 2nd edn. IHI Innovation Series white paper Internet. Cambridge, MA: Institute for Healthcare Improvement. 2009. https://www.ihi.org/resources/Pages/Tools/IntrotoTriggerToolsforIdentifyingAEs.aspx. (Accessed Oct 23. 2023).
Hu Q, Qin Z, Zhan M, Chen Z, Wu B, Xu T. Validating the Chinese geriatric trigger tool and analyzing adverse drug event associated risk factors in elderly Chinese patients: A retrospective review. PLoS ONE. 2020;15(4):e0232095.
Ji HH, Song L, Xiao JW, et al. Adverse drug events in Chinese pediatric inpatients and associated risk factors: a retrospective review using the Global Trigger Tool. Sci Rep. 2018;8(1):2573.
Langenberger B. Machine learning as a tool to identify inpatients who are not at risk of adverse drug events in a large dataset of a tertiary care hospital in the USA. Br J Clin Pharmacol. 2023;89(12):3523–38.
Deo RC. Machine Learning in Medicine. Circulation. 2015;132(20):1920–30.
Chen B, Chen C, Wang J, Teng Y, Ma X, Xu J. Differentiation of Low-Grade Astrocytoma From Anaplastic Astrocytoma Using Radiomics-Based Machine Learning Techniques. Front Oncol. 2021;11:521313.
Hu Q, Tian F, Jin Z, Lin G, Teng F, Xu T. Developing a Warning Model of Potentially Inappropriate Medications in Older Chinese Outpatients in Tertiary Hospitals: A Machine-Learning Study. J Clin Med. 2023;12(7):2619.
Lei G, Wang G, Zhang C, Chen Y, Yang X. Using Machine Learning to Predict Acute Kidney Injury After Aortic Arch Surgery. J Cardiothorac Vasc Anesth. 2020;34(12):3321–8.
Moher D, Liberati A, Tetzlaff J, Altman DG, Group P. PRISMA Group Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. BMJ. 2009;339:b2535.
Moons KGM, Wolff RF, Riley RD, et al. PROBAST: A tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Ann Intern Med. 2019;170:W1–33.
Cabitza F, Campagner A. The need to separate the wheat from the chaff in medical informatics: Introducing a comprehensive checklist for the (self)-assessment of medical AI studies. Int J Med Inf. 2021;153:104510.
Zhou Y, Ge YT, Shi XL, et al. Machine learning predictive models for acute pancreatitis: A systematic review. Int J Med Inf. 2022;157:104641.
Borenstein M, Hedges LV, Higgins JP, Rothstein HR. A basic introduction to fixed-effect and random-effects models for meta-analysis. Res synthesis methods. 2010;1(2):97–111.
Yu Z, Ji H, Xiao J, et al. Predicting Adverse Drug Events in Chinese Pediatric Inpatients With the Associated Risk Factors: A Machine Learning Study. Front Pharmacol. 2021;12:659099.
Karlsson I, Boström. Henrik. Handling Sparsity with Random Forests When Predicting Adverse Drug Events from Electronic Health Records. IEEE International Conference on Healthcare Informatics 2014; 17–22.
Ponraj TE, Balan RVS, Vignesh K. Analysis and Prediction of Adverse Reaction of Drugs with Machine Learning Models for Tracking the Severity. Arab J Sci Eng 2021.
Zhao YX, Yuan H, Wu Y. Prediction of Adverse Drug Reaction using Machine Learning and Deep Learning Based on an Imbalanced Electronic Medical Records Dataset. 2021 5th International Conference onMedical and Health Informatics (ICMHI) 2021; 17–21.
Zhao J, Henriksson A, Asker L, Boström H. Predictive modeling of structured electronic health records for adverse drug event detection. BMC Med Inf Decis Mak. 2015;15(Suppl 4):S1.
Zhao J, Henriksson A, Kvist M, Asker L, Boström H. Handling Temporality of Clinical Events for Drug Safety Surveillance. AMIA Annu Symp Proc. 2015; 2015:1371-80.
Zhao J, Henriksson A. Learning temporal weights of clinical events using variable importance. BMC Med Inf Decis Mak. 2016;16(Suppl 2):71.
Alzu'bi AA, Watzlaf VJM, Sheridan P. Electronic Health Record (EHR) Abstraction. Perspect Health Inf Manag. 2021;18(Spring):1g.
Johnson AEW, Bulgarelli L, Shen L, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10(1):1.
Giesa N, Heeren P, Klopfenstein S, Flint A, Agha-Mir-Salim L, Poncette A, Balzer F, Boie S. MIMIC-IV as a Clinical Data Schema. Stud Health Technol Inf. 2022;294:559–60.
Ahn JM, Kim J, Kim K. Ensemble Machine Learning of Gradient Boosting (XGBoost, LightGBM, CatBoost) and Attention-Based CNN-LSTM for Harmful Algal Blooms Forecasting. Toxins (Basel). 2023;15(10):608.
Muflikhah L, Widodo N, Mahmudy WF et al. Prediction of Liver Cancer Based on DNA Sequence Using Ensemble Method. 3rd Int Seminar Res Inform Technol Intell Syst (ISRITI) 2020; 37–41.
Mienye ID, Sun Y. A Survey of Ensemble Learning: Concepts, Algorithms, Applications, and Prospects. IEEE Access. 2022;10:99129–49.
Sisodia DS, Verma A. Prediction performance of individual and ensemble learners for chronic kidney disease. IEEE 2018; 1027–31.
Basar MD, Akan A. Detection of chronic kidney disease by using ensemble classifiers. 2017 10th International Conference on Electrical and Electronics Engineering (ELECO) 2017; 544–547.
Wang H, Shao Y, Zhou S, Zhang C, Xiu N. Support Vector Machine Classifier via L0/1 Soft-Margin Loss. IEEE Trans Pattern Anal Mach Intell. 2022;44(10):7253–65.
Wang J, He F, Sun S. Construction of a new smooth support vector machine model and its application in heart disease diagnosis. PLoS ONE. 2023;18(2):e0280804.
Joint Task Force on Practice Parameters. American Academy of Allergy, Asthma and Immunology; American College of Allergy, Asthma and Immunology; Joint Council of Allergy, Asthma and Immunology. Drug allergy: an updated practice parameter. Ann Allergy Asthma Immunol. 2010;105(4):259–73.
Lyttle MD, Rainford NEA, Gamble C, Messahel S, Humphreys A, Hickey H, Woolfall K, Roper L, Noblet J, Lee ED, Potter S, Tate P, Iyer A, Evans V, Appleton RE. Paediatric Emergency Research in the United Kingdom & Ireland (PERUKI) collaborative. Levetiracetam versus phenytoin for second-line treatment of paediatric convulsive status epilepticus (EcLiPSE): a multicentre, open-label, randomised trial. Lancet. 2019;25(10186):2125–34.
Marcum ZA, Arbogast KL, Behrens MC, Logsdon MW, Francis SD, Jeffery SM, Aspinall SL, Hanlon JT, Handler SM. Utility of an adverse drug event trigger tool in Veterans Affairs nursing facilities. Consult Pharm. 2013;28(2):99–109.
Toscano Guzmán MD, Banqueri MG, Otero MJ, Fidalgo SS, Noguera IF, Guerrero MCP. Validating a Trigger Tool for Detecting Adverse Drug Events in Elderly Patients With Multimorbidity (TRIGGER-CHRON). J Patient Saf. 2021;17(8):e976–82.
Gurwitz JH, Field TS, Harrold LR, Rothschild J, Debellis K, Seger AC, Cadoret C, Fish LS, Garber L, Kelleher M, Bates DW. Incidence and preventability of adverse drug events among older persons in the ambulatory setting. JAMA. 2003;289(9):1107–16.
Mcelnay JC, Mccallion CR, Al-Deagi F, Scott MG. Development of a Risk Model for Adverse Drug Events in the Elderly. Clin Drug Investig. 1997;13(1):47–55.

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Machine learning to predict adverse drug reaction or event based on electronic health records: a systematic review and meta-analysis

Status:

Version 1

Abstract

Figures

Introduction

Materials and methods

Search strategy and selection criteria

Study selection and data extraction

Quality evaluation

Statistical analysis

Result

Characteristics of the included studies

Evaluating the quality of studies

ADR/ADE

Predictive performance for different predictive methods

Meta- regression

Discussion

Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1