Of the included 17,844 cases, 8729 (49.0%) were male and 9115 (51.0%) were female. The mean age was 62.3 years old (SD = 11.9) overall, 60.4 (SD = 11.8) for males, and 64.2 (11.7) for females. The 45- to 64-year-old age had the most cases (8,507 cases), followed by those aged above 65(6,966 cases). The mean Hba1c was 7.6% (SD = 1.7). There were 13,346 cases with Hba1c higher than 6.5% and 4,498 cases whose Hba1c were less than 6.5% (Table 1).
Table 1
Demographic data of training cases
Sex
|
Male
|
Female
|
All
|
Case number (%)
|
8,729 (49.0)
|
9,115 (51.0)
|
17,844
|
Mean age (SD)
|
60.4 (11.8)
|
64.2 (11.7)
|
62.3 (11.9)
|
Age rank
|
|
|
|
< 25 years old
|
22
|
20
|
42
|
25–44 years old
|
732
|
373
|
1,105
|
45–64 years old
|
4,531
|
3,976
|
8,507
|
≥ 65 years old
|
2,882
|
4,084
|
6,966
|
Unknown age
|
562
|
662
|
1,224
|
Mean Hba1c (SD)
|
7.6% (1.8)
|
7.6% (1.7)
|
7.6% (1.7)
|
Hba1c ≤ 6.5% (%)
|
2,337 (26.8%)
|
2,161 (23.7%)
|
4,498 (25.2%)
|
Hba1c > 6.5% (%)
|
6,392 (73,2%)
|
6,954 (76,3%)
|
13,346 (74.8%)
|
The data included 11 types of drugs. Compound drugs glimepiride (25,719), pioglitazone (25,720), and vildagliptin (25,726) were combined with metformin dosages of 500, 850, and 1000 mg, respectively. Nateglinide had dosages of 60 and 120 mg.
The most important feature was metformin with mean decreased MSE = 171.8, followed by glimepiride 156.6), acarbose (151.8), pioglitazone (148.1), glibenclamide (143.7), gliclazide (114,1), repaglinide (93.3), nateglinide (80.6), sitagliptin (74.0), and vildagliptin (21.9) (Table 2).
Table 2
Oral hypoglycemic agents (OHA) mean decreased mean square error (MSE), dosages, and codes
OHA
|
₸ Mean Decrease MSE
|
Item code(s)
|
Dosage
|
Metformin
|
171.8
|
25703
|
500 mg
|
Glimepiride
|
156.6
|
#25709/25719
|
2 mg
|
Acarbose
|
151.8
|
25721
|
100 mg
|
Pioglitazone
|
148.1
|
#25722/25720
|
15 mg
|
Glibenclamide
|
143.7
|
25708
|
5 mg
|
Gliclazide
|
114.1
|
25713
|
30 mg
|
Repaglinide
|
93.3
|
25701
|
1 mg
|
Nateglinide
|
80.6
|
§25712/25714
|
60 mg/120 mg
|
Sitagliptin
|
74.0
|
25718
|
100 mg
|
Vildagliptin
|
71.1
|
#25724/25726
|
50 mg
|
Linagliptin
|
21.9
|
25727
|
5 mg
|
# These three compound drugs are all combined with metformin. Glimepiride (25,719), pioglitazone (25720), and vildagliptin (25726) have 500, 850, and 1000 mg of metformin added, respectively.
§ Nateglinide has two dosages: 60 and 120 mg.
₸ MSE: mean square error
|
This study treated every season as ground truth from 2013 Q1 to 2015 Q1 and constructed nine datasets, each having a different sample size. For example, the dataset of 2014 Q4 had 12,677 and 3169 cases as training and test samples, respectively. Using other data as independent factors, we designed three kinds of models. The first used two seasons of data to predict drug usage of the third season. For example, model 9 (2015 Q1) used 2014 Q3 and 2014 Q4 to predict 2015 Q1. The other two types of models used three/four seasons to predict the drugs of the fourth/fifth seasons.
This study also evaluated differences in Hba1c between seasons. For example, we calculated the differences in mean Hba1c between 2015 Q1 and 2014 Q4 (0.87%), 2014 Q3 (0.98%) and 2014 Q2 (1.09%). We found that longer time distances had greater differences in Hba1c (Table 3).
We compared Bi-LSTM and SVM in the two-, three-, and four-season models. The RMSE of both two-season models (Bi-LSTM = 1.05 ± 0.07 and. SVM = 1.05 ± 0.17) was the best, followed by the three-season models (Bi-LSTM = 1.12 ± 0.03 and. SVM = 1.10±0.25) and four-season models (Bi-LSTM = 1.16 ± 0.04 and SVM = 1.09 ± 0.21). The sensitivity and specificity of the two-season Bi-LSTM model was and 0.68 ± 0.05.
The sensitivity of the Bi-LSTM models was not significantly different to each other (two seasons: 0.88 ± 0.03, three seasons: 0.88 ± 0.02, four seasons: 0.89 ± 0.02). The sensitivity of the SVM models gradually decreased non-significantly (two seasons: 0.83±0.16, three seasons: 0.80±0.21, four seasons: 0.77±0.23), but performed worse than the Bi-LSTM models.
The specificity of the Bi-LSTM models gradually decreased as the included seasons increased (two seasons: 0.68 ± 0.05, three seasons: 0.64 ± 0.05, four seasons: 0.59 ± 0.04). The specificity of the SVM models was not significant for any approach (two seasons: 0.69±0.32, three seasons: 0.71 ± 0.31, four seasons: 0.71 ± 0.30). According to the MCC evaluation, there were no significant differences between the six models. The two-season Bi-LSTM model (0.39 ± 0.06) had the shortest run time, followed by the three-season (0.47 ± 0.09) and four-season (0.52 ± 0.06) Bi-LSTM models. The SVM models had significantly longer run times (two seasons: 3.39±0.64, three seasons: 5.18 ± 1.07, four seasons: 5.36 ± 1.17) than the Bi-LSTM models (Table 4).