Major findings
In this study, we simultaneously constructed two ANFIS models, namely the imbalanced model and the balanced model, to predict the warfarin maintenance dose, based on a retrospective multicenter database involving 35 centers and 15,108 patients after HVR. The major findings were as follows: (I) The imbalanced ANFIS model, based on a training set of 12,086 cases, could accurately predict the warfarin maintenance dose for Chinese patients undergoing HVR, with an ideal prediction percentage of 74.39%–78.16%, MAE of 0.37 mg/daily, and MSE of 0.39 mg/daily; (II) the balanced ANFIS model that used equal random-stratified sampling and was based on a training set of 2,820 cases also achieved an accurate prediction property of warfarin maintenance dose (ideal prediction percentage: 73.46%–75.31%; MAE: 0.42 mg/daily; MSE: 0.43 mg/daily); (III) compared to the imbalanced model, the balanced model had a significantly higher prediction accuracy in the low-dose warfarin group (internal validation: 14.46% vs. 3.01%; P < 0.001) and the high-dose warfarin group (34.71% vs. 23.14%; P = 0.047); (IV) the results of external validation were in line with the results of internal validation, thus strengthening the conclusion that the ANFIS model could improve the model prediction effect.
Summary of models
Table S4 summarizes the current warfarin prediction models. In 2004, Gage et al. first created a warfarin dosage prediction model based on 369 patients [19]. This study explored eight variables (age, sex, body surface area [BSA], race, amiodarone, simvastatin use, INR, CYP2C19) using an MLR model and achieved a 39% predictive ability to explain the variance of the warfarin maintenance dose. Of note, the CYP2C9*2 and CYP2C9*3 alleles contributed to a dominating weight in the said model. Since then, six further studies have been conducted in order to gain a higher predictive accuracy of the model in a Caucasian population [3, 20-22, 2, 23]. Although these studies achieved considerable predictive abilities (R2: 47%–73%) through involving certain pharmacogenomic information (e.g., CYP2C19, VKORC1, GGCX), they had two main limitations: the small sample sizes (< 350 patients), which limited the representation of the population; and a lack of external validation, which limited the extrapolation of models to large patient populations in real-world practice. In 2008, Gage et al. developed another pharmacogenetic algorithm based on 1,015 patients and nine predictors (age, BSA, smoking, race, amiodarone use, current thrombosis, CYP2C9, VKORC1, target INR) [24]; this model could explain 53%–54% of the variability in the warfarin dose in the derivation and validation cohorts. Furthermore, a nonprofit website was developed to facilitate the use of this pharmacogenetic and clinical equation (www. WarfarinDosing. org). The following year, the International Warfarin Pharmacogenetics Consortium (IWPC) created a novel pharmacogenetic algorithm that included 4,043 patients from 21 various research groups in nine countries and eight factors (age, weight, height, race, amiodarone status, enzyme inducers, CYP2C9 and VKORC1) [25]. This model could explain 43%–47% of the variability in the derivation and validation populations and provided accurate dose estimates, as evidenced by a low MAE (8.3 mg/week). In addition, the differences in the performance of the model in the low-dose (≤ 21 mg/week), medium-dose (21–49 mg/per week), and high-dose (≥ 49 mg/week) groups were evaluated. Although the Gage and IWPC models have addressed the above limitations, it may not be appropriate to directly extrapolate these results for a Chinese population due to the variation in warfarin sensitivity across ethnic groups (weight, dietary habit, drug interaction, genotype, adherence, etc.). All of these inherent issues have fueled the development of warfarin prediction models for the Chinese population. However, the current Chinese medical insurance coverage only covers genetic testing for warfarin dosage prediction for patients with a high risk of bleeding or labile INR values, which is a barrier to its utilization. Considering the latter limitation, the current models conducted for a Chinese population have included small sample sizes combining both clinical and pharmacogenomic variables, which limited the generalizability of models [26-32]. Therefore, developing the optimal prediction model for warfarin dose based on explicable clinical variables is a challenging task.
The MLR method presents certain irreconcilable issues such as poorly behavior of the non-linear relationship between variables; thus, the MLR is unlikely to be an optimal method for predicting the warfarin dose [33]. Recently, several artificial intelligence modeling technologies, including support vector machines and a general regression neural network, have been used for warfarin dosage predication [34, 35]; however, these models showed a relatively low predictive ability of < 50% in the ideal predicted percentage. Our study team has made numerous attempts in the field of warfarin model development and achieved a 63% predictive accuracy based on BPGA and ANFIS models [10, 36, 12]. In this study, we further included 15,108 patients who underwent HVR from 35 centers and used balanced training set preprocessing with the equal random stratified sampling method. Compared with the results of the IWPC model, both the imbalanced and the balanced ANFIS models had better performance in terms of ideal prediction percentage (73.46%–74.39% for ANFIS vs. 45.5% for IWPC) and MAE (2.59–2.95 mg/week for AFNIS vs. 8.5 mg/week for IWPC) in external validation cohorts. Hence, the AFNIS method based on big data is a feasible and optimal modeling technology to improve the prediction ability for estimating the warfarin maintenance dose.
Reasons for improved prediction property in low- and high-dose groups
Patients receiving low or high warfarin doses are more vulnerable to thromboembolic and bleeding events due to difficulty with INR control. To date, no study has been specifically designed to address this concern. Our previous studies found an extremely low prediction accuracy in the low-dose group (0.0% by BPNN [11] and 9.1% by ANFIS [12]) and high-dose group (0.0% by BPGA [10]). Considering the distribution of patients across different doses in the training set, the proportion in the medium-dose group was higher than that in the low- and high-dose groups (low-dose: 10.41%, medium-dose: 81.81%, high-dose: 7.78%). This explains why our previous models showed better performance in the medium-dose group but poor performance in the low-dose group. This is known as the CIL problem. insufficient data learning of smaller-scale categories, resulting in an unsatisfactory prediction effect of the model in a [37]. To address this problem, we used the equal random stratified sampling method, which can balance the number of patients in each group through random sampling [38]. The model results using the balanced training set indicated an increased prediction accuracy compared to the imbalanced model (low-dose: 14.46%–24.34% vs. 3.01%–3.62%; high-dose: 29.58%–34.71% vs. 21.12%–23.14%).
Clinical relevance
When lacking genetic information in a clinical setting, this AFNIS method could provide high accurate warfarin dose estimates on the basis of clinical variables (including age, disease, weight, tricuspid valve disease, albumin level, creatinine level, usage of the first dose, and dosage of the first dose). This could aid physicians and pharmacists in the selection of patients who will likely be suited to low or high doses of warfarin, thus allowing earlier and more aggressive intervention to control INR.
Strengths and limitations
The main strengths of this study were as follows: first, this study used a large sample of 15,108 Chinese patients from 35 centers who received warfarin after HVR to develop and validate the models; second, we applied the equal random stratified sampling method to address the CIL problem that resulted in the low predicted ability in the low- and high-dose groups; and third, we validated the models using both internal and external validation cohorts. However, this study also had some limitations. First, this was a retrospective study that may have a certain selection bias. Second, some of the possible determinants of warfarin dose such as diet information and patient genotypes (CYP2C9 and VKORC1) are not available in our study, which may limit the performance of the models. Third, clinical adverse events related to warfarin use were not examined in this study. Given the above limitations, using machine learning techniques, further prospective studies with more potential predictors need to be carried out to further improve the model performance.