Ethics statement
This study was approved by the Ethics Committee of Hasheminejad Clinical Research Development Center. All patients initially provided informed consent for access to their medical records for research purposes. All the methods were performed in accordance with the determined guidelines and regulations.
Settings
In this study the archived clinical data of ESRD patients who underwent AVF surgery from 2015 to 2018 at Hasheminejad Kidney Center (HKC) in Iran has been followed up by using predictive approach. Patients’ status were analyzed in two condition: during surgery and after surgery.
Participants and Variables
The following inclusion and exclusion criteria was employed for 300 patients who had undergone radio cephalic AVF (Cimino fistula) creation (Figure1). Seventy of these patients could not be followed up and also 67 of the population were expired and subsequently there were no records about their AVF status. Thus, they were excluded. In addition,50 patients who had over 50% missing value, were eliminated from processing to limit the data noise. As a result, merely 113 of the original 300 patients were analyzed.
Survival of arteriovenous fistula was considered as a target variable, which is classified in 2 groups (Table1) based on their medical records and consultation of surgeons. At initial stages of study, attributes such as ESR, CRP, RBC, WBC, PT, PTT, INR, HB, SI, TIBC and serum ferritin were evaluated for these patients. Out of these parameters, PTT, PT, INR, WBC and RBC on account of no significant effect on target variable, and also ESR and CRP due to 80% missing value, were excluded from the processing.
The involved variables such as serum ferritin were categorized into 3 subgroups as shown in Table 2 with respect to the recommendation of international guidelines for controlling of iron deficiency anemia (IDA) in CKD which mention that the upper limit of serum ferritin should be maintained at <500-800 ng/ml [10, 21-29]. Moreover, other study groups in Europe and US recommend, ferritin should be kept at 400-600 ng/ml [30] and 200-1200 ng/ml in HD patients [17]. Due to the availability and applicability of the population in this study, the limited cutoff level of HB could be analyzed. In this regard, HB was classified into only two groups (Table3) regarding the HB levels proposed by NKF-DOQF guidelines (11-12 g/dl) [31] and studies which had detected the relation between AVF failure and HB<8 g/dl, rather than other three strata (8-10, 10-12 and >12 g/dl) [6]. SI/TIBC classification (Table4) has been selected based on the recommendation of Kidney Disease Outcome Quality Initiative (K/DOQI) of the National Kidney Foundation [32] and the Best Practice Guidelines of the European Renal Association [23] which included cutoff level of 20-50% for ISAT.
Table 1
Table 2
Table 3
Table 4
Data Mining Process
Real data in large data bases and data warehouses usually encounter three complications that are: incomplete, noisy and inconsistent data. Thus, preprocessing is a pivotal step in knowledge discovery in data bases (KDD) because high-quality data will lead to a high-quality decision [18]. In this regard, the construction of data base and also both data cleaning and integration were done in IBM SPSS Statistic version 22. Decision tree was used in Rapid miner studio version 9 for analyzing the data. This was done because when there is no linear relationship between the attributes and for better understanding of the effect of variables on each other -such as determining the range of variable data impact on target variable- the rule mining and decision tree can assist us in analyzing.
Different decision tree (DT) algorithms can be used to classify data, whose target variable can include patients’ final status. With the aim of the rules extracted by decision tree, the data mining system can be trained to learn the rules that controlled the end state of patients. Then, the system can be asked by applying the rules to make prediction on patients whose final status has not been provided to the system [33]. A decision tree structure is like a flowchart which consists of three parts: 1) the nodes, which indicate test on attribute value; and the highest node in tree is the root node; 2) branches that signify the test outcomes, 3) and finally, leaves which represent class or class distribution [18]. We used decision tree-based Chi-squared Automatic Interaction Detection (CHAID) which is based on the chi-squared attribute relevance test [34].
The confusion matrix was used to measure the accuracy of the prediction method. Such representation is usually used for supervised algorithms, where each row of the confusion matrix represents the true value, but each column contains the predicted sample [35].
After preprocessing the data, CHAID decision tree was plotted respectively in Figures 2, 3, 4, 5 by Rapid miner studio version 9. Also, in all four states, AVF survival was the target attribute.