A. Dataset
The experiments were performed on a dataset consisting of patients, who were hospitalized to perform selective coronarography (SC) at the East Slovak Institute of Cardiovascular Diseases in Košice in the period from June 2017 to March 2018. During hospitalization, patients were given a complete personal and family history, essential physical characteristics, laboratory tests, echocardiography, and ECG examination were performed, too. Patients' treatments in our clinical trial were performed by European Society of Cardiology guidelines for coronary artery diseases. All information about patients was written as electronic health records. Using the tool we created and tuned with our colleagues [24][25], it was possible to create a structured form of patients' data. Our previous research views the overview of the dataset's structure, statistics, and attribute values [14]. This publication also finds more detailed information on patient selection, inclusion and exclusion criteria, dataset preparation method (completion of missing data, treatment of extreme values, and other necessary steps to obtain a well-prepared dataset).
Our dataset was prepared as a two-classes dataset. It means that attribute Nalez (finding of SC) was aggregated from six classes to two classes: class 0 corresponds to no findings, class 1 corresponds to any positive findings of SC. Unfortunately, the dataset seems unbalanced (38.61% of class 0, 61.39% of class 1). However, it is necessary to realize that the class with the finding corresponds to several degrees of severity (from mild, coronary narrowing from 10%, to severe finding, when the narrowing of the coronary vessels is up to 100%).
B. Selected machine learning methods
Machine learning methods provide a wide range of options. With different types of data, we focus on different methods. However, it is sometimes difficult to say in advance which method will be most suitable for a particular dataset. It is not possible to say that one of the machine learning algorithms is the best. The suitability of an algorithm depends on several factors, whether the size of the data set, the nature of the attributes, the range of attribute values, or the desired outcome and focus on a specific goal. For this reason, we decided to perform experiments using different methods and approaches.
We also chose methods based on the state of art presented above. We focused on classification types of algorithms, as our goal was not to predict the final value of the coronarography finding but to determine the significance of the monitored attributes concerning their impact on the target attribute, based on the classification of patients according to the severity of the coronarography finding. Random forest and Logistic regression algorithms were a clear choice for us. As a third algorithm, we chose the CART decision tree to compare the performance of one decision tree versus several decision trees contained in the Random Forest models.
All the following experiments and analyzes took place in the R Studio environment of the R programming language. We divided the dataset to train and test set via 80:20 ratio. We further applied 10-fold cross-validation on train set for all methods in order to fine tune their hyperparameters. Each model was evaluated by its accuracy on train and test set, too.
The importance of the attributes was evaluated using the varImp function included in the caret package. For classification types of tasks, the output of this function is a value representing the difference of the accuracy of the forecast is recorded in the out-of-bag data section and the same after permutation of each predictor variable. Finally, each tree's accuracy is averaged and normalized by standard deviation [26].
As a nice example of a simple model with good performance we present FIGURE 1 the resulting CART model (its accuracy on the train set is 68.92% and 68.94% on the test set).
Although the model construction was used only four attributes (ECG_QRS, ECG_STE, ESC, P_CAD), there are slightly more important overall. An overview of the significance of the attributes is given in Table 3.
Random Forest (RaF) is characterized by the ability to deal with high dimensional data, unbalanced classes, or robustness to outliers, and provides also information about importance of particular variables. The number of trees in our model was set at 350, and we reached 71.57% accuracy at the train set and 71.42% accuracy at the test set. FIGURE 2 represents VI via Mean Decrease Accuracy (MDA, interpreted as the number or proportion of observations that are incorrectly classified by removing the feature) and Mean Decrease Gini (MDG, Gini coefficient interpret how each variable contributes to the homogeneity of the nodes and leaves in the resulting RaF).
In addition, we can also approach VI within the generated model via package caret. The range of VI was extensive, including both positive and zero and negative values. For the following research, we worked with all VI values of this model. We offer a comprehensive overview of the VI of the RaF model in Table 3, where we summarize, in addition to the value of significance itself, the order of individual attributes within individual models.
The last algorithm chosen for this part of the experiment is Logistic Regression (LR). The highest accuracy we achieved on the train set was 71.24%. An overview of the VI of the LR model is given in Table 3. This LR model achieved 68.94% accuracy at the test set.
C. Evaluating experiments
Before we state the significance of individual attributes within the model and some final evaluation of the relevance, we will focus on evaluating the "success" of the models as such.
The presented evaluation is based on the confusion matrix of individual models applied to the test set. We focused on the overall assessment of models in terms of larger number of statistics. The values of the individual statistics were comparable among models, but we were able to identify small shades of differences. An overview of the mentioned statistics is given in Table 2.
Table 2
The overview of the statistics of the build models
|
CART
|
RaF
|
LR
|
Positive (class 0)
|
62
|
Negative (class 1)
|
99
|
Prevalence
|
38.51%
|
Predicted positive
|
30
|
48
|
62
|
Predicted negative
|
131
|
113
|
99
|
TP
|
21
|
32
|
37
|
FP
|
9
|
16
|
25
|
FN
|
41
|
30
|
25
|
TN
|
90
|
83
|
74
|
Accuracy
|
68.94%
|
71.42%
|
68.94%
|
Positive predictive value (PPV)
|
70%
|
66.67%
|
59.68%
|
Negative predictive value (NPV)
|
68.70%
|
73.45%
|
74.75%
|
False omission rate
|
31.30%
|
26.55%
|
25.25%
|
False discovery rate
|
30%
|
33.33%
|
40.32%
|
F1 score
|
0.4565
|
0.5818
|
0.5968
|
True positive rate (TPR), Sensitivity
|
33.87%
|
51.61%
|
59.68%
|
True negative rate (TNR), Specificity
|
90.91%
|
83.83%
|
74.75%
|
False positive rate (FPR)
|
9.09%
|
16.16%
|
25.25%
|
False negative rate (FNR)
|
66.13%
|
48.39%
|
40.32%
|
Positive likelihood ratio (LR+)
|
3.7258
|
3.1935
|
2.3632
|
Negative likelihood ration (LR-)
|
0.7274
|
0.5771
|
0.5395
|
Informedness
|
0.2478
|
0.3545
|
0.3442
|
Markedness
|
0.3870
|
0.4012
|
0.3442
|
Matthews correlation coefficient (MCC)
|
0.3097
|
0.3771
|
0.3442
|
Diagnostics odds ratio
|
5.1220
|
5.5333
|
4.3808
|
It is clear from the above table that we cannot determine the best-rated model in all aspects. Each of the created models has some specifics. The accuracy measure evaluates RF (71.42%) as the best model and the F1 score indicates that the best model is LR (0.5968). However, there are considerations that for unbalanced datasets (such as ours, where the prevalence of absence of narrowing on coronary arteries is 38.51%), it is better to focus on the Mathews correlation coefficient (MCC), which takes into account the ratio between positive and negative class in binary classification [27]. Also, in this case, the predominance of the RaF model is confirmed.
The CART model is characterized by the power to correctly classify negative cases (TNR − 90.91%), which in the context of our data corresponds to the presence of narrowing on the coronary vessels. On the contrary, the LR model classifies the positive cases best (TPR − 59.68%; class 0 corresponding to the absence of narrowing on the coronary vessels). However, its strength is not as significant as the strength of the CART model for coronary narrowing prediction.
The quality of the models should also be assessed according to their awareness of positive and negative cases (Informedness) and the level of confidence of the model concerning the prediction of values (Markedness). These metrics place the RaF model above the other two (Informedness 0.3545, Markedness 0.4012). The diagnostic odds ratio (DOR) is above 1 for all models.
In terms of considering the rate of correct prediction of values, the CART model achieved a higher success rate for positive cases (PPV − 70%; absence of narrowing) and the LR model for negative cases (NPV − 74.75%; presence of narrowing).
The DOR value does not indicate the strength of the individual models but suggests that they can be considered functional. Therefore, from our point of view, it would be wrong to exclude the results of any of the models. Still, we think it appropriate to consider the "performance" of individual models in the recommendations for selecting the order of significance of the monitored attributes. For this purpose, we have chosen metrics of sensitivity, specificity, and accuracy to determine the weight of the influence of individual factors. As a result, it manifests itself as the product of the value of the VI of unique attributes, sensitivity, specificity, and accuracy for each model separately. Subsequently, the sum of the weighted VI values was performed for each attribute, thus obtaining the final order of importance of the monitored attributes. We named this calculation as a new weighted agglomerative attribute importance metric. Its mathematical form is following:
\(VI= {\sum }_{i= CART, RaF, LR}norm\left({VI}_{i}\right)*{Acc}_{i}*{Sens}_{i}*{Spec}_{i},\) where norm(VI) are normalized values of VI of final models (CART, RaF, LR), Acc – accuracy of final models (CART, RaF, LR), Sens – Sensitivity of final models (CART, RaF, LR), Spec – Specificity of final models (CART, RaF, LR).
The following Table 3 provides an overview of the significance of individual attributes for particular models as well as the overall VI calculated by the formula above.
Table 3
The overall overview of the models variable importanc
Attribute
|
CART
|
RaF
|
LR
|
Overall
|
VICART
|
Seq.CART
|
VIRaF
|
Seq. RaF
|
VILR
|
Seq. LR
|
VI
|
Seq
|
ESC
|
27.4563
|
4
|
9.1835
|
2
|
2.179
|
9
|
0.5343
|
3
|
Age
|
9.2887
|
7
|
4.3432
|
5
|
3.8243
|
2
|
0.3970
|
5
|
Gender
|
38.9403
|
1
|
5.4696
|
3
|
3.7910
|
3
|
0.5858
|
2
|
F_CAD
|
0
|
-
|
0.2077
|
32
|
0.9193
|
32
|
0.0876
|
31
|
F_Stroke
|
0
|
-
|
0.4146
|
26
|
1.8694
|
12
|
0.1427
|
22
|
F_MI
|
0
|
-
|
1.9323
|
12
|
1.7917
|
14
|
0.1777
|
12
|
F_Hyperch
|
0
|
-
|
0
|
33
|
0
|
50
|
0.0342
|
47
|
F_HT
|
0
|
-
|
-1.3286
|
50
|
0.6542
|
44
|
0.0342
|
48
|
F_DM
|
0
|
-
|
-0.5568
|
41
|
0.0026
|
49
|
0.0199
|
50
|
F_AoS
|
0
|
-
|
0
|
33
|
0
|
50
|
0.0342
|
47
|
P_CAD
|
35.8718
|
2
|
4.335
|
6
|
1.6291
|
19
|
0.4266
|
4
|
P_Stroke
|
0
|
-
|
-0.1003
|
35
|
2.1731
|
10
|
0.1454
|
21
|
P_MI
|
23.1907
|
5
|
4.6572
|
4
|
0.9477
|
31
|
0.3300
|
6
|
P_Hyperch
|
0
|
-
|
0.4980
|
23
|
0.5375
|
46
|
0.0751
|
38
|
P_HT
|
0
|
-
|
-1.2422
|
49
|
0.7267
|
42
|
0.0402
|
46
|
P_DM
|
0
|
-
|
0.3978
|
27
|
2.2989
|
7
|
0.1648
|
16
|
P_AoS
|
0
|
-
|
1.272
|
17
|
1.0917
|
28
|
0.1240
|
24
|
Smoking
|
0
|
-
|
1.7714
|
13
|
1.8538
|
13
|
0.1768
|
14
|
S_Freq
|
0
|
-
|
-0.2764
|
37
|
1.4324
|
23
|
0.1020
|
29
|
S_Duration
|
0
|
-
|
0.4391
|
25
|
0.8313
|
34
|
0.0889
|
30
|
Alcohol
|
0
|
-
|
-0.9126
|
46
|
1.2303
|
26
|
0.0751
|
37
|
Weight
|
0
|
-
|
0.3448
|
29
|
1.2563
|
24
|
0.1088
|
25
|
Height
|
0
|
-
|
1.3322
|
15
|
1.7671
|
15
|
0.1610
|
18
|
BMI
|
0
|
-
|
2.0448
|
10
|
1.4493
|
22
|
0.1627
|
17
|
BP
|
6.1589
|
8
|
0.2921
|
30
|
1.4591
|
21
|
0.1516
|
20
|
Urea
|
0
|
-
|
-0.8061
|
43
|
0.9670
|
30
|
0.0640
|
41
|
Creat
|
0
|
-
|
0.2689
|
31
|
1.7142
|
16
|
0.1309
|
23
|
AST
|
0
|
-
|
-0.4358
|
39
|
0.7823
|
37
|
0.0639
|
42
|
Sodium
|
3.5982
|
11
|
1.3828
|
14
|
1.9578
|
11
|
0.1919
|
11
|
Potassium
|
0
|
-
|
-1.0540
|
47
|
0.6576
|
43
|
0.0414
|
45
|
Chol
|
0
|
-
|
0.3651
|
28
|
0.5565
|
45
|
0.0726
|
39
|
TG
|
3.4221
|
12
|
-1.1236
|
48
|
0.7419
|
41
|
0.0627
|
43
|
HDL
|
0
|
-
|
1.2960
|
16
|
1.6749
|
18
|
0.1552
|
19
|
LDL
|
0
|
-
|
-0.8406
|
44
|
0.1873
|
48
|
0.0222
|
49
|
CRP
|
0
|
-
|
0.8448
|
20
|
0.9739
|
29
|
0.1069
|
27
|
Chloride
|
3.8274
|
10
|
3.1369
|
7
|
0.8001
|
35
|
0.1776
|
13
|
FBG
|
0
|
-
|
2.0364
|
11
|
1.6987
|
17
|
0.1755
|
15
|
HIV
|
0
|
-
|
0
|
33
|
0
|
50
|
0.0342
|
47
|
HBs
|
0
|
-
|
0
|
33
|
0
|
50
|
0.0342
|
47
|
ECG_HR
|
0
|
-
|
-0.0269
|
34
|
0.7988
|
36
|
0.0752
|
36
|
ECG_Rhythm
|
0
|
-
|
2.0886
|
9
|
2.6793
|
4
|
0.2283
|
8
|
ECG_PQ
|
3.2619
|
13
|
0.4401
|
24
|
0.7503
|
39
|
0.1025
|
28
|
ECG_QRS
|
4.7327
|
9
|
0.7525
|
21
|
2.5476
|
6
|
0.2128
|
9
|
ECG_QT
|
0
|
-
|
-0.2473
|
36
|
1.5298
|
20
|
0.1079
|
26
|
ECG_LBBB
|
0
|
-
|
0
|
33
|
0
|
50
|
0.0342
|
47
|
ECG_RBBB
|
0
|
-
|
0.523
|
22
|
0.7434
|
40
|
0.0865
|
32
|
ECG_VES
|
0
|
-
|
-0.7926
|
42
|
1.2421
|
25
|
0.0788
|
34
|
ECG_SVES
|
0
|
-
|
-0.4026
|
38
|
1.1705
|
27
|
0.0851
|
33
|
ECG_STD
|
0
|
-
|
1.2300
|
18
|
0.2054
|
47
|
0.0765
|
35
|
ECG_STE
|
29.2528
|
3
|
10.6827
|
1
|
5.8683
|
1
|
0.7761
|
1
|
ECG_T
|
0
|
-
|
-0.4868
|
40
|
0.9166
|
33
|
0.0696
|
40
|
ECHO_EF
|
16.8490
|
6
|
2.1336
|
8
|
2.2466
|
8
|
0.2986
|
7
|
ECHO_PH
|
0
|
-
|
0.8875
|
19
|
2.6073
|
5
|
0.1936
|
10
|
Muscle_bridge
|
0
|
-
|
-0.8668
|
45
|
0.7540
|
38
|
0.0513
|
44
|
As the reader can see, the significance of the attributes is initially significantly decreasing, but gradually the displayed significance between the individual attributes decreases only slowly. All well-known RF can be seen between the most significant RF by our combined ranking formula, which confirms suitability of our calculations' [15][18]. Moreover, although not so significantly, there have been several less known RF in the foreground as well. Such examples are Sodium, Chloride, fibrinogen (FBG), and the interval length from the beginning of the P wave to the beginning of the ventricular complex in milliseconds (ECG_PQ).
D. Identifying potential new risk factors of CVD
Our goal was to calculate the importance of the individual attributes and possibly also identify new interesting relations. We focus on several RFs, where there is only a kind of awareness or doubt whether they could affect CVD. Based on this and our conclusion from previous research [14], we formulated the following research questions:
1. Is there a link between a higher level of fibrinogen and a more severe coronary finding, respectively increased cardiovascular risk?
2. Does fibrinogen have the potential to be an atherosclerosis risk factor compared to traditional RF?
In order to answer these questions, we trained new models where we have no longer considered the impact of all attributes. On the contrary, we have built models based on selected attributes only, concerning our second research target. We have chosen the selection of the method concerning the targeting of the negative class coverage, that is, in the context of our data classes corresponding to the presence of narrowing on coronary vessels. A specificity parameter that has reached the highest score for the CART model is focused mainly on this fact. In order to obtain comparable results, we retained the distribution on train and test set, as well as the distribution of records for 10-fold cross-validation. Algorithm parameters remained unchanged.
The combinations of attributes for building the models were chosen so that in some risk cases, they also take into account known factors, thus obtaining a sufficiently known as well as less known RF. Below are specific attribute settings for model selection:
1. Coronary_findings ~ P_CAD + FBG
2. Coronary_findings ~ P_CAD + FBG + HDL
As can be seen, we also included in the research attributes that are significant in the coronary risk assessment or whose effect is not apparent. For example, some sources point to its cardioprotective effect when it comes to HDL, so the higher the HDL level, the lower the risk for CAD. The ESC parameter speaks of the CVD risk classification, which also tells the CVD.
The first combination was not sufficiently clear regarding the determination of FBG levels concerning CF severity. So, the distribution of positive and negative CF patients was highly fragmented (seven different intervals of FBG values with alternation of target classes) without significant determination of CF severity (percentage ratio range of CF was approximately between 40–70%).
The combination of previous RF enriched by HDL (P_CAD, FBG and HDL) pointed out an interesting fact. The patient classification based on P_CAD and FBG values is comparable. Equally, patients are classified as coronary narrowing in the presence of confirmed CAD and patients who have not yet been diagnosed with CAD, but the fibrinogen level was above 3,875, inclusive. The remaining conclusions resulting from the described model are shown in FIGURE 4.
As is evident, despite the patient's previous absence of CAD, FBG levels may indicate whether the patient is at risk for coronary constriction (greater than 70% prediction of coronary constriction). In addition, if HDL levels are taken at the same FBG level range, higher HDL levels appear to be less likely to have coronary vasoconstriction (greater than 60% probability). With low HDL levels and lower FBG levels, there is also a higher probability of coronary vasoconstriction (over 60%).
Given the importance of the attributes within this model, it was surprising that the previous presence of CAD in the patient (VI of P_CAD − 13.99) is less significant than the attribute describing FBG (VI of FBG − 21.71) or HDL (VI of HDL − 14.20). It was also surprising that despite the low number of attributes included, the model's accuracy on the test set increased slightly (69.57%), the model's sensitivity also increased slightly (46.77%). However, the specificity decreased (83.84%).
Based on the above analysis, we can say that there exists a relationship between fibrinogen levels and cardiovascular risk. However, we cannot say with certainty whether the relationship of direct proportion applies, and thus that high levels of FBG lead to severe CF. For this reason, we performed the same experiments on the dataset with the original classification of the CF (class 0–5 depending on the increasing severity of the CF). The first combination of RF containing only previously diagnosed CAD and FBG levels was again quite unclear in its conclusions. Despite dividing FBG into five intervals, it was not demonstrably possible to determine whether a higher FBG level could be related to a more severe finding on the coronary vessels.
By applying the second combination of attributes (P_CAD + HDL + FBG), they obtained a relatively branched fibrinogen and HDL intervals tree. If patients had not previously been diagnosed with CAD, the model classified almost half of the cases into a class without a CF. If CAD had been previously diagnosed, the result of the CF varied depending on the different intervals for FBG and HDL levels. The following FIGURE 5 describes the percentage representation of each class for different intervals of FBG and HDL levels.
CART decision tree branch for P_CAD = 1 (presence of earlier diagnostics of CAD)
Before concluding this figure, we must again note vague claims about the relationship between HDL and cardiovascular risk. A higher HDL level is assumed to have a cardioprotective effect, so the cardiovascular risk level should be lower (in our case, we derive the cardiovascular risk level from the severity of the CF). Therefore, we focused on both HDL and FBG levels [28].
We start from the upper left corner of the FIGURE 5 (lower HDL level, higher FBG level). The first cell indicates a higher prevalence of more severe stenosis at high FBG levels (above 4.475) and low HDL levels (below 1.135). Once the FBG level decreases < 3.425; 4.475), the overall percentage of more severe CFs will also decrease. An increase or decrease is also evident at the same FBG level but a different HDL level (cut of value 1.098), increasing a negative coronary vascular stenosis. We see a similar phenomenon at FBG levels in the interval < 3.2; 3.425), based on the lower or higher HDL level (cut of value 1.345) is the difference in CF. Based on the experiments performed, it might seem that even a high HDL level is not entirely appropriate. However, it may have some significance in reducing cardiovascular risk. A good example is the cells on the right side of the image. With the same HDL level (above 1.632) and a different FBG level, there is a noticeable difference in the severity of CF. At a higher FBG value (above 3.9745), the overall rate of worse findings is higher than at a lower FBG value, where, on the contrary, there is a lower prevalence of adverse findings on coronary vessels. Thus, it can be inferred that both FBG levels and HDL levels can affect the severity of coronary vascular stenosis.
Based on the results of the experiments, we can provide answers to our research questions:
1. Is there a link between a higher level of fibrinogen and a more severe coronary finding, respectively increased cardiovascular risk?
Fibrinogen alone confirms a given direct dependence with increased cardiovascular risk and the severity of CF. But is not as significant as when the strength of evidence with other parameters is potentiated.
2. Does fibrinogen have the potential to be an atherosclerosis risk factor compared to traditional RF?
Fibrinogen has the potential to be considered of interest in RF atherosclerosis, but further research on larger patient samples is needed to confirm its equivalence to traditional RF.