This section discusses existing research efforts that healthcare prediction using various techniques in ML and DL. This survey gives a detailed discussion about the methods and algorithms which are used for predictions, performance metrics, and tools of their model.
2.1.1 ML-based Healthcare Prediction
In [109], the authors utilized a framework to create and assess ML classification models such as Logistic Regression, KNN, SVM, and RF for the prediction of diabetes patients. ML method was implemented on the Pima Indian Diabetes Database (PIDD) which has 768 rows and 9 columns. The forecast accuracy delivers 83 percent accuracy. Results of the implementation approach indicate how the Logistic Regression outperformed other algorithms of ML. The results indicated that only a structured dataset was selected but unstructured data are not considered, also model should be implemented in other healthcare domains like heart disease, and COVID-19, finally other factors should be considered for diabetes prediction, like family history of diabetes, smoking habits, and physical inactivity.
In [110], The authors developed a diagnosis system focusing on 4 prediction algorithm models (RF, SVM, NB, DT) to predict diabetes using two various databases (Frankfurt Hospital in Germany and PIDD provided by the UCI ML repository). the SVM algorithm performed with an accuracy of 83.1 percent. There are some aspects of this study that need to be improved, such as using a DL approach to predict diabetes may lead to achieving better results, furthermore, the model should be tested in other healthcare domains such as heart disease and COVID-19 prediction.
In [111], the authors proposed three ML methods (Logistic Regression - DT - Boosted RF) to assess the COVID-19 OpenData Resources from Mexico and Brazil. To predict rescue and death, the proposed model incorporates just the COVID-19 patient's geographical, social, and economic conditions, as well as clinical risk factors, medical reports, and demographic data. On the dataset utilized, the model for Mexico has a 93 percent accuracy, and an F1 score is 0.79. On the other hand, on the used dataset, the Brazil model has a 69 percent accuracy and an F1 score is 0.75. The three ML algorithms have been examined and the acquired results showed that Logistic Regression is the best way of processing data. The authors should be concerned about the usage of authentication and privacy management of the created data.
In [112], The authors introduced a new model for predicting type 2 diabetes utilizing a network approach as well as ML techniques (Logistic Regression, SVM, NB, KNN, Decision Tree, RF, XGBoost, and ANN). To predict the risk of type 2 diabetes, the healthcare data of 1,028 type 2 diabetes patients and 1,028 non-type 2 diabetes patients were extracted from de-identified data. The experimental findings reveal the models’ effectiveness with an Area Under Curve (AUC) varied from 0.79 to 0.91. The RF model achieved higher accuracy than others. This study relies only on the dataset providing hospital admission and discharges summaries from one insurance company. External hospital visits and information from other insurance companies are missing for people with many insurance providers.
In [113], The author proposed a healthcare management system that patients could use to schedule appointments with doctors and verify their prescriptions. It gives support for ML to detect ailments and determine medicines. ML models including DT, RF, logistic regression, and NB classifiers are applied to the datasets of diabetes, heart disease, chronic kidney disease, and liver. The results showed that among all the other models, logistic regression had the highest accuracy of 98.5 percent in the heart dataset. while the least accuracy is of the DT classifier which came out to be 92 percent. In the liver dataset the logistic regression with maximum accuracy of 75.17 percent among all others. In the chronic renal disease dataset, the logistic regression, RF, and Gaussian NB, all performed well with an accuracy of 1. In the diabetes dataset random forest with maximum accuracy of 83.67 percent. The authors should include a hospital directory as then various hospitals and clinics can be accessed through a single portal. Additionally, image datasets should be included to allow image processing of reports and the deployment of DL to detect diseases.
In [114], the authors developed an ML model to predict the occurrence of Type 2 Diabetes in the following year (Y + 1) using factors in the present year (Y). Between 2013 and 2018, the dataset was obtained as an electronic health record from a private medical institute. authors applied logistic regression, RF, SVM, XGBoost, and ensemble ML algorithms to predict the outcome of non-diabetic, prediabetes, and diabetes. Feature selection was applied to choose the three classes efficiently. FPG, HbA1c, triglycerides, BMI, gamma-GTP, gender, age, uric acid, smoking, drinking, physical activity, and family history were among the features selected. According to the experimental results, the maximum accuracy was 73 percent from RF, while the lowest was 71 percent from the logistic regression model. The authors presented a model that used only one dataset. As a result, additional data sources should be applied to verify the models developed in this study.
In [115], the authors categorized the diabetes dataset using SVM and NB algorithms coupled with feature selection for enhancing the accuracies of the model. PIDD is taken from the UCI Repository for analysis. For training and testing purposes the authors employed the K-fold cross-validation model, the SVM classifier was performing better than the NB method it offers around 91 percent correct predictions, however, the authors acknowledge that they need to extend to the latest dataset that will contain additional attributes and rows.
In [116], the authors introduced an unsupervised ML algorithm K-means clustering for the UCI heart disease dataset to detect heart disease in the early stage. PCA is used for dimensionality reduction. The outcome of the method demonstrates early cardiac disease prediction with 94.06 percent accuracy. The authors should apply the proposed technique using more than one algorithm and use more than one dataset.
In [117], the authors constructed a predictive model for the classification of diabetes data using the logistic regression classification technique. the dataset includes 459 patients for training data and 128 cases for testing data. The prediction accuracy using logistic regression was obtained at 92 percent. The main limitation of this research is that the authors have not compared the model with other diabetes prediction algorithms and so it cannot be confirmed.
In [118], the authors developed a prediction model that analyses the user's symptoms and predicts the disease using ML algorithms (DT classifier, RF classifier, and NB classifier) to solve health-related problems by allowing professionals to predict diseases at an early stage. A dataset is a sample of 4920 patient records with 41 illnesses diagnosed. A total of 41 disorders were included as a dependent variable. All of the algorithms achieved the same accuracy score of 95.12%. The authors noticed that overfitting occurred when all 132 symptoms from the original dataset were assessed instead of 95 symptoms. i.e., the tree appears to remember the dataset provided and thus fails to classify new data. As a result, just 95 symptoms were assessed during the data-cleansing process, with the best ones being chosen.
In [119], the authors built a decision-making system that assists practitioners to anticipate cardiac problems in exact classification through a simpler method and will deliver automated predictions about the condition of the patient’s heart. implemented 4 algorithms (KNN, RF, DT, and NB), all these algorithms were used in the Cleveland Heart Disease dataset. The accuracy varies for different classification methods. The maximum accuracy is given when they utilized the KNN algorithm with the Correlation factor which is almost 94 percent. The authors should extend the presented technique to leverage more than one dataset and forecast different diseases.
In [120], the authors applied three classification methods (NB, SVM, DT, and KNN) to the Cleveland dataset consisting of 303 cases and 76 attributes. Of these 76 traits, only 14 attributes are chosen for testing. authors performed data preprocessing to remove noisy data. The KNN obtained the greatest accuracy with 90.79 percent. To improve the accuracy of early heart disease prediction, the authors need to use more sophisticated models.
In [121], the authors proposed a model to predict heart disease utilizing a cardiovascular dataset used in this model and classified by using supervised ML algorithms (DT, NB, Logistic Regression, RF, SVM, and KNN). The results reveal that the DT classification model predicted cardiovascular disorders better than other algorithms with an accuracy of 73 percent. the authors highlighted that the ensemble ML techniques employing the CVD dataset can generate a better illness prediction model.
In [122], the authors attempted to increase the accuracy of heart disease prediction by applying a Logistic Regression using a healthcare dataset to determine whether patients have heart illness problems or not. The dataset was acquired from an ongoing cardiovascular study on people of the town of Framingham, Massachusetts. The model reached an accuracy prediction of 87 percent. the authors acknowledge the model could be improved with more data and the use of more ML models.
In [123], the author introduced an accurate classification to examine the breast cancer data with a total of 569 rows and 32 columns, because breast cancer affects one in every 28 women in India. Similarly employing a heart disease dataset and Lung cancer dataset, this research offered A novel way to function selection. This method of selection is based on genetic algorithms mixed with the SVM classification. The classifier results are Lung cancer 81.8182, Diabetes 78.9272. noticed that size, kind, and source of data used are not indicated.
In [124], the authors, predicted the risk factors that cause heart disease using the K-means clustering algorithm and analyzed with a visualization tool using a Cleveland heart disease dataset with 76 features of 303 patients, holds 209 records with 8 attributes such as age, chest pain type, blood pressure, blood glucose level, ECG in rest, heart rate as well as four types of chest pain. The authors forecast cardiac diseases by taking into consideration the primary characteristics of four types of chest discomfort solely and K-means clustering is a common unsupervised ML technique.
In [125], the authors aimed to report on the benefits of various DM methods and proven heart disease survival prediction models. From the observations, the authors proposed that Logistic Regression and NB achieved the highest accuracy when performed on a high dimensional dataset on the Cleveland hospital dataset and DT and RF produce better results on small dimensional datasets. RF delivers more accuracy than the DT classifier as the algorithm is an optimized learning algorithm. The author mentioned that this work can be extended to other ML algorithms, the model could be developed in a distributed environment such as Map-Reduce, Apache Mahout, HBase, etc.
In [126] the authors proposed a single algorithm named hybridization, that combines used techniques into one single algorithm, The presented Method has three phases, preprocessing phase, classification phase, and diagnosis phase. They employed the Cleveland database and algorithms NB, SVM, KNN, NN, J4.8, RF, and GA. NB and SVM always perform better than others, whereas others depend on the specified features. results attained an accuracy of 89.2 percent. Authors need to enhance accuracy, better accuracy is the key goal. Notice that the dataset is little, hence the system was not able to train adequately, so the accuracy of the method was bad.
In [127], the authors presented a study concentrated on the utilization of clinical data for liver disease prediction and investigate several ways of representing such data through this analysis by utilizing six algorithms Logistics Regression, KNN, DT, SVM, NB, and RF. The original dataset was taken from the northeast of Andhra Pradesh, India. includes 583 liver patient’s data whereas 75.64 percent are male and 24.36 percent are female. The analysis result indicated that the Logistics Regression classifier delivers the most increased order exactness of 75 percent depending on the f1 measure to forecast the liver illness and NB gives the least precision of 53 percent. Authors merely studied a few prominent supervised ML algorithms; more algorithms can be picked to create an increasingly exact model of liver disease prediction and performance can be steadily improved.
In [128], the authors aimed to predict coronary heart disease (CHD) based on historical medical data using ML technology. The goal of this study is to use three supervised learning approaches, NB, SVM, and DT, to find correlations in CHD data that could aid improve prediction rates. The dataset contains a retrospective sample of males from KEEL, a high-risk heart disease location in the Western Cape of South Africa. the model utilized NB, SVM, and DT. NB achieved the most accuracy among the three models. SVM and DT J48 outperformed NB with a specificity rate of 82 percent but showed to have an inadequate sensitivity rate of less than 50 percent.
In [129], the authors applied data mining and network analysis techniques in hospital admission and discharge data to analyze the disease or comorbidity footprints of chronic patients. A chronic disease risk prediction framework was created and evaluated in the Australian healthcare system to predict type 2 diabetes risk. Using a private healthcare funds dataset from Australia that spans six years and three different predictive algorithms (regression, parameter optimization, and DT). The accuracy of the prediction ranges from 82 to 87 percent. The hospital admission and discharge summary is the dataset's source. As a result, it does not provide information about general physician visits or future diagnoses.
2.1.2 DL-based Healthcare Prediction
In [130], the authors proposed a system for predicting the patients with the more common inveterate diseases with the help of the DL algorithms such as CNN for auto feature extraction and illness prediction so, they used KNN for distance calculation to locate the exact matching in the dataset and the outcome of the final prediction of the sickness. A combination of disease symptoms was made for the structure of the dataset, the living habits of a person, and also the specifies attaches to doctor consultations which are acceptable in this general disease prediction. In this study, the Indian chronic kidney disease dataset was utilized that comprises 400 occurrences, 24 characteristics, and 2 classes were restored from the UCI ML store. At last, a comparative study of the proposed system with other algorithms such as NB, DT, and logistic regression has been demonstrated in this study. The findings showed that the proposed system gives an accuracy of 95 percent which is higher than the other two methods. So, the proposed technique should be applied using more than one dataset.
In [131], the authors developed a DL approach that uses chest radiography images to differentiate between patients with mild, pneumonia, and COVID-19 infections, providing a valid mechanism for COVID-19 diagnosis. To increase the intensity of the chest X-ray image and eliminate noise, image-enhancing techniques were used in the proposed system. Two distinct DL approaches based on a pertained neural network model (ResNet-50) for COVID-19 identification utilizing Chest X-ray (CXR) pictures are proposed in this work to minimize overfitting and increase the overall capabilities of the suggested DL systems. The authors emphasized that tests using a vast and hard dataset encompassing several COVID-19 cases are necessary to establish the efficacy of the suggested system.
In [132], the authors presented a Cuckoo search-based deep LSTM classifier for disease prediction. The deep convLSTM classifier is used in the cuckoo search optimization, which is a nature-inspired method for accurately predicting disease by transferring information and therefore reducing time consumption. The PIMA dataset is used to predict the onset of diabetes. The National Institute of Diabetes and Digestive and Kidney Diseases provided the data. The dataset is made up of independent variables including insulin level, age, and BMI index, as well as one dependent variable. The new technique was compared to traditional methods, and the results showed that the proposed method achieved 97.591 percent accuracy, 95.874 percent sensitivity, and 97.094 percent specificity, respectively. authors noticed more datasets are needed, as well as new approaches to improve the classifier's effectiveness.
In [133], the authors presented a wavelet-based convolutional neural network to handle data limitations in this time of COVID-19 fast emergence. By investigating the influence of discrete wavelet transform decomposition up to 4-levels, the model demonstrated the capability of multi-resolution analysis for detecting COVID-19 Chest X-rays. The wavelet sub-bands are the CNN's inputs at each decomposition level. COVID-Chest X-ray-12 is a collection of 1,944 chest X-ray pictures divided into 12 groups that were compiled from two open-source datasets (National Institute Health containing several X-rays of pneumonia-related diseases where the COVID-19 dataset is collected from Radiology Society North America). COVID-Neuro wavelet, a suggested model, was trained alongside other well-known ImageNet pre-trained models on COVID-CXR-12. the authors acknowledge they hope to investigate the effects of other wavelet functions besides the Haar wavelet.
In [134], the authors developed a CNN framework for COVID-19 identification utilizing computed tomography images is suggested. The proposed framework employer a public CT dataset of 2482 CT images from patients of both classifications. the system attained an accuracy of 96.16 percent and recall of 95.41 percent after training using only 20 percent of the dataset. The authors stated that the use of the framework should be extended to multimodal medical pictures in the future.
In [135], the authors performed multi-disease prediction for intelligent clinical decision support by deploying a long short-term memory network and enhancing it with two processes to conduct multi-label classification based on patients’ clinical visit records. a massive data set of electronic health records collected from a prominent hospital in southeast china. The suggested LSTM approach outperforms several standard and DL models in predicting future disease diagnoses, according to model evaluation results. The F1 score rises from 78.9% and 86.4 percent, respectively, with the state-of-the-art conventional and DL models, to 88.0 percent with the suggested technique. The authors stated that the model prediction performance may be enhanced further by including new input variables and that to reduce computational complexity, the method only uses one data source.
In [136], the authors introduced an approach to creating a supervised ANN structure based on the subnets (the group of neurons) instead of layers, in the cases of low datasets, this effectively predicted the disease. The model was evaluated using textual data and compared to Multilayer Perceptron’s (MLPs) as well as LSTM recurrent neural network models using three small-scale publicly accessible benchmark datasets. On the Iris dataset, the experimental findings for classification reached 97 percent accuracy, compared to 92 percent for RNN (LSTM) with three layers, and the model had a lower error rate, 81, than RNN (LSTM) and MLP on the diabetic dataset, while RNN (LSTM) has a high error rate of 84. For larger datasets, however, this method is useless. This model is useless because not implement our model on large textual and image datasets.
In [137], the authors presented a novel AI and Internet of Things (IoT) convergence-based disease detection model for a smart healthcare system. Data collection, reprocessing, categorization, and parameter optimization are all stages of the proposed model. IoT devices, such as wearables and sensors, collect data, which AI algorithms then use to diagnose diseases. The forest technique is then used to remove any outliers found in the patient data. Healthcare data was used to assess the performance of the CSO-LSTM model. During the study, the CSO-LSTM model had a maximum accuracy of 96.16 percent on heart disease diagnoses and 97.26 percent on diabetes diagnoses. This method offered a greater prediction accuracy for heart disease and diabetes diagnosis, but there was no feature selection mechanism, hence it requires extensive computations.
In [138], the authors focused on the coronavirus epidemic, which constitutes a daily threat to global health. The majority of their research was aimed at detecting disease in people whose Xrays had been selected as potential COVID-19 candidates. Chest x-rays of people with COVID-19, viral pneumonia, and healthy people are included in the dataset. The study compared the performance of two DL algorithms, namely CNN and RNN. DL techniques were used to evaluate a total of 657 chest X-ray images for the diagnosis of COVID-19. VGG19 is the most successful model, with a 95% accuracy rate. The VGG19 model successfully categorizes COVID-19 patients, healthy individuals, and viral pneumonia cases. The dataset's most failing approach is InceptionV3. The success percentage can be improved, according to the authors, by improving data collection. In addition to chest radiography, lung tomography can be used. The success ratio and performance can be enhanced by creating numerous DL models.
In [139], the authors developed a method based on the RNN algorithm for predicting blood glucose levels for diabetics a maximum of one hour in the future, which required the patient's glucose level history. The Ohio T1DM dataset for blood glucose level prediction, which included blood glucose level values for six people with type 1 diabetes, was used to train and assess the approach. The distribution features were further honed with the use of studies that revealed the procedure's certainty estimate nature. The authors point out that they can only evaluate prediction goals with enough glucose level history, thus they can't anticipate the beginning levels after a gap, which doesn't improve the prediction's quality.
In [140], the authors used an 18-layer residual CNN pre-trained on ImageNet with a different anomaly detection mechanism for the classification of COVID-19 to construct a new deep anomaly detection model for speedy, reliable screening. On the X-ray dataset, which contains 100 images from 70 COVID-19 persons and 1431 images from 1008 non-COVID-19 pneumonia subjects, the model obtains a sensitivity of 90.00 percent specificity of 87.84 percent or sensitivity of 96.00 percent specificity of 70.65 percent. The authors noted that the model still has certain flaws, such as missing 4% of COVID-19 cases and having a 30% false-positive rate. In addition, more clinical data is required to confirm and improve the model's usefulness.
In [141], the authors developed COVIDX-Net, a novel DL framework that allows radiologists to diagnose COVID-19 in X-ray images automatically. Seven algorithms (MobileNetV2, ResNetV2, VGG19, DenseNet201, InceptionV3, Inception, and Xception) were evaluated using a small dataset of 50 photos (MobileNetV2, ResNetV2, VGG19, DenseNet201, InceptionV3, Inception, and Xception). Each deep neural network model can classify the patient's status as a negative or positive COVID-19 case based on the normalized intensities of the X-ray image. The f1-scores for the VGG19 and Dense Convolutional Network (DenseNet) models were 0.89 and 0.91, respectively. With f1-scores of 0.67, the InceptionV3 model has the weakest classification performance.
In [142], The authors created a DL approach for delivering 30-minute predictions about future glucose levels based on a Dilated RNN (DRNN). The performance of the DRNN models was evaluated using data from two electronic health records datasets: OhioT1DM from clinical trials and the in-silicon dataset from the UVA-Padova simulator. It outperformed established glucose prediction approaches such as Neural Networks (NNs), Support Vector Regression (SVR), and autoregressive models (ARX) (ARX). The results demonstrated that it significantly improved glucose prediction performance, although there are still some limits, such as the authors' creation of a data-driven model that heavily relies on past EHR. The quality of the data has a significant impact on the accuracy of the prediction. The number of clinical datasets is limited, however, often restricted. Because certain data fields are manually entered, they are occasionally incorrect.
In [143], the authors utilized a deep neural network to discover 15,099 stroke patients, researchers were able to predict stroke death based on medical history and human behaviors utilizing large-scale electronic health information. The Korea Centers for Disease Control and Prevention collected data from 2013 to 2016 and found that there are around 150 hospitals in the country, all having more than 100 beds. Gender, age, type of insurance, mode of admission, necessary brain surgery, area, length of hospital stay, hospital location, number of hospital beds, stroke kind, and CCI were among the 11 variables in the DL model. To automatically create features from the data and identify risk factors for stroke, researchers used a DNN/scaled Principal Component Analysis (PCA). 15,099 people with a history of stroke were enrolled in the study. The data were divided into a training set (66%) and a testing set (34%), with 30 percent of the samples used for validation in the training set. DNN is used to examine the variables of interest, while scaled PCA is utilized to improve the DNN's continuous inputs. This study sensitivity, specificity, and AUC values were respectively 64.32 percent, 85.56 percent, and 83.48 percent.
In [144] the authors proposed a glucose forecasting approach called (GluNet) that used a personalized DNN for forecasting the probabilistic distribution of short-term measurements having Type 1 diabetes based on their historical data that involved insulin doses, meal information, glucose measurements, and various factors. It utilized the newest DL techniques consisting of four components: post-processing, dilated Convolution Neural Network (CNN), label recovery/ transform, and data pre-processing. authors run the models on the subjects from the OhioT1DM datasets. The outcomes revealed significant enhancements over the previous procedures via a comprehensive comparison concerning the and Root Mean Square Error (RMSE) having a time lag of 60 mins Prediction Horizons (PH) and RMSE having a small time lag for the case of prediction horizons in the virtual adult participants. If the PH is properly matched to the lag between input and output, the user may learn the control of the system more frequently and it achieves good performance. Additionally, GluNet was validated on two clinical data sets. It attained an RMSE with a time lag of 60 mins PH and RMSE with a time lag of 30 mins PH. The authors point out that the model does not consider physiological knowledge, and that they need to test GluNet with larger prediction horizons and use it to predict overnight hypoglycemia.
In [145], the authors proposed the Short-Term Blood Glucose Prediction Model (VMD-IPSO-LSTM), which is a short-term strategy for predicting blood glucose (VMD-IPSO-LSTM). Initially, the Intrinsic Modal Functions (IMF) in various frequency bands were obtained using the Variational Modal Decomposition (VMD) technique, which deconstructed the blood glucose content. The short and long-term memory networks then constructed a prediction mechanism for each blood glucose component Intrinsic Modal Functions (IMF). Because the time window length, learning rate, and neuron count are difficult to set, the upgraded PSO approach optimized these parameters. The improved LSTM network anticipated each IMF, and the projected subsequence was superimposed in the final step to arrive at the ultimate prediction result. The data of 56 participants were chosen as experimental data among 451 diabetic Mellitus patients. The experiments revealed that it improved prediction accuracy at "30 minutes, 45 minutes, and 60 minutes." The RMSE and MAPE were lower than the "VMD-PSO-LSTM, VMD-LSTM, and LSTM," indicating that the suggested model is effective. The longer time it took to anticipate blood glucose levels and the higher accuracy of the predictions gave patients and doctors more time to improve the effectiveness of diabetes therapy and manage blood glucose levels. The authors noted that they still faced challenges, such as an increase in calculation volume and operation time. The time it takes to estimate glucose levels in the short term will be reduced.
In [146], The authors presented a paradigm for primary COVID-19 detection using a radiology review of chest radiography or chest X-ray, to reduce diagnosis time and human error. The researchers used a dataset of chest X-rays from verified COVID-19 patients (408 photographs), confirmed pneumonia patients (4273 images), and healthy people (1590 images) to perform a three-class image classification (1590 images). There are 6271 people in total in the dataset. To fulfill this image categorization problem, the authors plan to use CNN and transfer learning. For all of the folds of data, the model's accuracy ranged from 93.90 percent to 98.37 percent. Even the lowest level of accuracy, 93.90 percent, is still quite good. The authors will face a restriction, particularly when it comes to adopting such a model on a large scale for practical usage.
In [147], the authors proposed DL models for predicting the number of COVID-19 positive cases in Indian states. The Ministry of Health and Family Welfare dataset contains time-series data for 32 individual confirmed COVID-19 cases in each of the states (28) and union territories (4) since March 14, 2020. This dataset was used to conduct an exploratory analysis of the increase in the number of positive cases in India. As prediction models, RNN-based LSTMs are used. Deep LSTM, convolutional LSTM, and bi-directional LSTM models were tested on 32 states/union territories, and the model with the best accuracy was chosen based on absolute error. Bi-directional LSTM produced the best performance in terms of prediction errors, while convolutional LSTM produced the worst performance. For all states, daily and weekly forecasts were calculated, and bi-LSTM produced accurate results (error less than 3%) for short-term prediction (1–3 days).
In [148], the authors suggested a new type 1 diabetes prediction technique based on CNNs and DL to improve the robustness and accuracy of type 1 diabetes prediction. It was all about figuring out how to extract the behavioral pattern. Numerous observations of identical behaviors were used to fill in the gaps in the data. The suggested model was trained and verified using data from 759 people with type 1 diabetes who visited Sheffield Teaching Hospitals between 2013 and 2015. A subject's type 1 diabetes test, demographic data (age, gender, years with diabetes), and the final 84 days (12 weeks) of Self-Monitored Blood Glucose (SMBG) measurements preceding the test formed each item in the training set. In the presence of insufficient data and certain physiological specificities, prediction accuracy deteriorates, according to the authors.
In [149], the authors constructed a machine learning technique using the PIDD by NIDDK. PID's participants are all female and at least 21 years old. PID comprises 768 incidences, with 268 samples diagnosed as diabetic and 500 samples not diagnosed as diabetic. The eight most important characteristics that led to diabetes prediction. The accuracy of functional classifiers such as ANN, NB, DT, and DL is between 90 and 98 percent. On the PIMA dataset, DL had the best results for diabetes onset among the four, with an accuracy rate of 98.07 percent. The technique uses a variety of classifiers to accurately predict the disease, but it failed to diagnose it at an early stage.