Study Area
This study was carried out in Oba Adejuyigbe General Hospital and Primary Health Care, Okeyinmi, Ado-Ekiti
Study Design
The study employed a case-control research design.
Ethical Approval
Ethical approval was collected from the Ethics Committee, College of Medicine and Health Sciences, Afe Bablola University, Ado Ekiti, Nigeria. Informed consent was obtained from all participants before the commencement of the study
Sample Size Determination
The sample size (N) was calculated using Fisher’s formula (Safranek, 2018)
n=z2p(1−p) / d2
Where: N = the desired sample size (when the population is greater than 10,000)
Z = is a constant given as 1.96 (or more samples at 2.0) which corresponds to the 95% confidence level.
p = prevalence (3.9%) (IDF, 2021)
q = 1.0 – p
d = acceptable error (5%)
However, 123 participants were recruited for this study; comprising of 59 apparently healthy subjects without T2DM (controls) and 63 T2DM subjects (tests).
Inclusion and Exclusion Criteria
The inclusion criteria for this study are non-pregnant and non-lactating women, those who gave their consent, and those within the age range of 20-95 years. The exclusion criteria for this study are pregnant women, nursing mothers, and, those who did not give their consent.
Sample Collection and Statistical Analysis
5ml of blood was collected from all subjects into a lithium heparin tube and fluoride oxalate tube via venipuncture. Fasting blood glucose was analyzed immediately. Electrolytes were estimated using an ion-selective electrode (ISE). BNP and Troponin I were determined using Enzyme-Linked Immunosorbent Assay (ELK Biotechnology) according to the manufacturer’s instructions. Lactate dehydrogenase (LDH), Aspartate aminotransferase (AST), Alanine transaminase (ALT), Creatinine phosphokinase (CPK), Fasting blood sugar (FBS), Total Cholesterol, and triglyceride were quantified using the enzymatic rate method following the manufacturer’s direction.
Machine Learning Analysis
Dataset related to previously published datasets where reported conventional risk factors like age, glycemic parameters, lipid profile, and demographic data were built. However, additional biochemical parameters were added (biomarkers of heart failure inclusive). In this present study, the dataset consists of 123 samples with 34 attributes. Data preprocessing was done by Label encoding. Lemmatization was used to transform the data into a categorical variable and grouping categories was created using factorize to group the Dataset into T2DM patients and non-T2DM patients. The T2DM subject group was Labeled 1 while non-T2DM subjects were labeled 0. Following this, the data were thoroughly checked for missing values as well as incorrect values which impact the quality of the model. To reduce the influence of missing values as well as incorrect values on the model performance the means from the data was applied.
Following data preprocessing, various machine learning classification models such as K-nearest neighbors (K-NN), Logistic Regression (LR), Support Vector Machine (SVM), Linear Discriminant Analysis (LDA), Classification and Regression Trees (CART), Naive Bayes (NB), and Random Forest were applied to the dataset and implemented in Python language. To carry out the ML analysis the data was split into training (70%), validation (20%) and test sets (30%). SHAP framework (shap 0.37.0 version) for interpreting additive feature importance in an ensemble Random Forest model. SHAP is a cutting-edge method that explains predictions made by complex ML and DL models (Christoph, 2019). The SHAP values explain the relative contribution of each feature in ML model prediction (Jangili et al., 2023). To assess the predictive power of these models we have considered the confusion matrix which provides Precision, Accuracy, F1 score and Recall. The “confusion_matrix” function from the Python library “sklearn. metrics”, is used for model evaluation (Pedregosa et al., 2011). This Python function takes actual and predicted values as inputs to obtain a confusion matrix.