AiDHealth: An AI-enabled Digital Health Framework for Connected Health and Personal Health Monitoring

doi:10.21203/rs.3.rs-2402505/v1

Download PDF

Research Article

AiDHealth: An AI-enabled Digital Health Framework for Connected Health and Personal Health Monitoring

https://doi.org/10.21203/rs.3.rs-2402505/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

We live in a digitally connected world inspired by state-of-the-art ICT technologies and networks, inasmuch as the use of digital gadgets and apps is exponentially increasing in all domains of life. In parallel, artificial intelligence has evolved as an essential tool in all sorts of applications and systems such as healthcare systems. Healthcare is the key domain where the use of ICT infrastructure, technologies and artificial intelligence are playing a major role in providing connected and personalized digital health experiences. The vision is to provide intelligent and customized digital health solutions and involve the masses in personal health monitoring. This research proposes AiDHealth as an intelligent personal health monitoring framework based on artificial intelligence for healthcare data analytics and connectivity for personal health monitoring. AiDHealth relies on various machine learning and deep learning models for achieving prediction accuracy in healthcare data analytics. The extensive Pima Indian Diabetes (PID) dataset has been used for investigation. The findings of our experiments illustrate the effectiveness and suitability of the suggested MLPD model. AdaBoost classifier performance has the highest accuracy in prediction when calculated to the individual classifiers. The AdaBoost classifier produced the best accuracy i.e., 0.975%. The results reveal improvements to state-of-the-art procedures in the proposed model. Next, we trained the models and produced a 10-fold cross-validation illness risk index for each sample. Our findings suggest a need for greater experiments to compare the above-mentioned machine learning methods. We identified the AdaBoost classifier and Decision Tree classifiers with the best prediction with an average of 0.975% and a work Curve Area (AUC) of 0.994%. Thus, because the design of the AdaBoost classifier is superior, it can forecast the danger of type 2 diabetes more accurately than the existing algorithms and lead to intelligent prevention and control of diabetes.

Artificial intelligence

Health monitoring

Machine learning

Decision tree

Medical dataset

We live in a digitally connected world which is inspired by state-of-the-art ICT technologies and networks. The use of ICT technologies is increasing in our everyday life more than ever before due to availability of smart devices, IoT devices, infrastructure, and applications [1] or [2]. Healthcare is the domain, which is related to everyone. That is why the technological advancements are promptly used to various aspects of healthcare to benefit from the potential of cutting-edge technologies for the betterment of people. Most recently, digital health has achieved huge focus from research and industry for providing such solutions to the society, which are helpful to maintain the health within the available budget or social care provided by the governments. Many such platforms have been developed by scientists and industry to bring about a revolutionary change in the healthcare scenarios [3–4], or [5].

Current research focuses on a new combinatorial or hybrid classification and predictive classifier for diabetes mellitus (DD), thus overcoming the problems of a single classifier. We have used different Machine Learning Technique (MLTs) to predict diabetes (DM) at an early stage to save human lives. These algorithms include support vector machines, naive networks, decision stumps, and PEM used to predict and improve prediction accuracy and performance. The most common complications are classified as microangiopathy and macroangiopathy, including diabetic nephropathy, retinopathy, neuropathy, diabetic coma, and cardiovascular disease [6]. Due to the high mortality, morbidity and related diseases of diabetes, prevention and treatment have attracted extensive and important attention. Insulin is the main treatment for type 1 diabetes, although in some cases it is also given to people with type 2 diabetes when high blood sugar cannot be controlled through diet, weight loss, exercise, and oral medication. Machine learning involves the development of algorithms and technologies that enable computers to learn and acquire intelligence based on past experience. It is a branch of artificial intelligence closely related to statistics. Learning it means that the system can recognize and understand the input data so that it can make decisions and predictions based on it [7].

Machine learning algorithms are divided into supervised and unsupervised. Supervised learning algorithms use past experience to predict the new or invisible data, while unsupervised algorithms derive inferences from data sets. Supervised learning is also called categorization [8] or [9]. These problems occur in both civilized and uncivilized countries. They are not limited to wars and natural disasters, but also to different dangerous health problems. These diseases kill humans early on based on their level and some criteria. What is classified as a risk disease is also called diabetes. Diabetes is dangerous now. This dangerous disease kills people at an early age, and as humans, we must help other humans get rid of it [10], therefore, how to diagnose and analyze diabetes quickly and accurately is a worthy research topic. In medicine, diabetes is diagnosed on the basis of fasting glucose, glucose tolerance, and random glucose levels [11], so a model was proposed to predict diabetes, and doctors could use it as a model to help predict diabetes.

In our previous research we have investigated challenges and proposed solutions for providing connected healthcare solutions which solve the interoperability and integration challenges of data among various devices and systems. For example, in the CareStore project [12], we proposed an innovative platform for seamless deployment of devices and apps used for personal health monitoring [13]. This research adds intelligence to personal health monitoring. The proposed intelligent engine analysis the healthcare data i.e., the glucose levels in the blood and produces a complete picture of the patients’ healthcare and produces predications showing trends in the health. The predictions help the physicians to advice the patients according to the health risks identified through intelligent models. The AiEngine in the AiDHealth architecture is dedicated for providing artificial intelligence for the healthcare data analytics [14]. AiEngine relies on various machine learning and deep learning models for achieving prediction accuracy in healthcare data analytics. For proof of concept, we have applied classification algorithms to the national institute of diabetes and digestive and kidney disease's PIMA-Indians diabetes dataset, which includes data on women with diabetes [15].

2.1. Data analytics of complications of diabetes

This study investigated the relationship between complications of diabetes mellitus and blood glucose, blood pressure, height, weight, hemoglobin, and body mass index. The aim of the study was to predict complications based on their symptoms [16].

2.2. Imbalanced classification problems

This study uses both qualitative and quantitative techniques to analyze classification problems that have imbalanced distributions of examples also known as imbalanced classification problems [17]. This project involves a binary variable, "Outcome," which holds either a value of 1 (yes) or 0 (no). Diabetic patients’ number 684, while non-diabetic persons number 1316. Figure 1 illustrates the distribution of the dependent variable. According to the store, more than half of the non-diabetic observations comprise the data, which is nearly twice the number of patients with diabetes. That means, the classifier of the data is biased. To summarize, the classification data in the dataset notes that the minority class of patients does not have strong predictive ability, specifically for those who have diabetes.

2.3. Density plot of glucose levels

We conducted all analyses using the charts below, it is clear that women who have been diagnosed with diabetes have significantly more glucose than women who have not. The density plot shows that Glucose levels overlap in both the categories of women, but the plot below reveals that Glucose is a reliable indicator of the response as shown in Fig. 2.

2.4. Density plot of Insulin Levels

The analysis was based on Insulin cannot be distinguished from the other women in the data, and as a result, it is impossible to determine if it is a good predictor of the response variable as shown in Fig. 3.

2.5. Density plot of Skin Thickness

The analysis was based on Skin Thickness as the plots below show, it's impossible to tell whether there is a difference between the two categories of women who have and don't have Diabetes. The fact that Skin Thickness did not provide a good prediction of the response variable implies that it may not be a good predictor as shown in Fig. 4.

2.6. Predictor variables correlation

The data were analyzed using different approaches as that a correlation plot was drawn to see if there was a linear relationship between all the numerical variables. When also looked at in the bivariate relationships, there was a moderate to high correlation between insulin and glucose, as well as between BMI and skin thickness as shown in Fig. 5.

This section presents the proposed solution as AiDHealth Framework, which provides connected intelligent digital health solution for health monitoring. The connectivity is provided by the HomeCare Platform, which is connected with the health monitoring devices (sensors) used by the citizens at home. The homecare platform is integrated with the Connected Intelligent Digital Health (CIDH) Platform. The intelligent CIDH platform consists of the infrastructure required for processing the machine learning and deep learning models as shown in Fig. 6.

3.1. HomeCare Platform

The HomeCare platform is an application installed on the smartphone of the citizen [18], whose health data is being collected for regular monitoring. The is very important for patients with chronic diseases such as diabatic patients. The medical sensor i.e., a Bluetooth-enabled glucometer automatically transfers the readings to the homecare platform [19]. The certified products to be used for personal health monitoring are available in market today. For example, Accu-Chek Instant glucometer by Roche Diabetes Care GmbH. The client app (smartphone app) collects the data from glucometer and stores in a local database, which is synchronized with the central database in CIDH platform.

3.2. Connected Intelligent Digital Health (CIDH) Platform

The intelligent CIDH platform is the core of the AiDHealth Framework [20]. It consists of a database server which receives patients’ vital signs from HomeCare Platform. The Hosting Server hosts the Server App which provides services to the Physician working in the clinic and monitoring the health of patients registered in the system [21]. The AiEngine enables intelligence by running the machine learning and deep learning models and predicts the health features of the patients. Based on the predications created by AiEngine and displayed on the Server app, the physician provides feedback to the patients depending upon their health conditions and future predictions. The following section discusses the machine learning and deep learning aspects of the AiEngine.

3.3. AiEngine

The proposed work shows the four practical steps as shown in Figure. 7. In order to perform tasks on the data, it must be correct and valid. Data preprocessing must be performed in the first place. We conducted all analyses using normalization means performing data transformation. The analysis was based on dataset contains no missing values, so there's no need to perform any normalization or cleaning on it. The process of data reduction is an integral part of the preceding preprocessing procedure. Undesirable traits have limited impact consider, how to apply a model to estimate the class label of an instance [22]. This category is referred to as general classifications because they assign a proper class label to each example in the dataset i.e., the testing set.

The data collected were mostly qualitative/quantitative, machine learning processes can be implemented using a handful of machine learning techniques. Unsupervised and supervised learning are among the most widely used learning techniques [23]. When historical data is available for a certain problem, supervised learning is used. When using a system like this, the inputs and corresponding responses are both provided to the system before it is used to make a prediction for new data. Many commonly employed supervised methods include artificial neural network, back propagation [24], decision tree [25] and support vector machine [26]. Supervised machine learning algorithms are used to identify diabetes status in the Pima Indian population based on binary classification. The subsequent section provides detailed discussion about how we have used the ML algorithms systematically.

In order to find out whether or not a patient has diabetes, we've used five different predictive models: Logistic Regression [27] K Nearest Neighbors [28] Classification Tree [29], Random Forest [30], AdaBoost [31] Classifier and ANN [32].

4.1. Logistic Regression

A complete model was constructed, with Outcome as the response variable and the remaining eight variables as predictor variables. Identifying the most important variables was accomplished through the use of the stepwise variable selection method. The final model, which was selected using the AIC as the selection criterion, produced a logistic regression model with the lowest AIC value of 593.85, as shown in the table below.

4.2. K-Nearest Neighbour (k-NN)

The K-Nearest Neighbor (k-NN) algorithm is simple, but it provides excellent results. The algorithm is lazy, nonparametric, and instance-based. This algorithm is equally applicable to classification and regression problems. K-Nearest Neighbors is applied in classification to discover the class to which a new unclassified object belongs. The constraint was that the object had to have a 'k' number of neighbors to be considered (where 'k' is the number of neighbors, of which there could be an odd number). Next, the distance between the nearest data points of the objects to the objects themselves was calculated, for example, by the Euclidean, Hamming, Manhattan, or Minkowski distance. The votes of the closest 'k' neighbors are used to compute the new class of the new object. The k-Nearest Neighbor predicts the outcome with an extremely high-level of accuracy.

4.3. Decision Tree

A decision tree is a prediction model that is commonly used in operations research, specifically in decision analysis, and is classified as such in machine learning. When the features and values for different variables are correlated, it represents the correlation between the features and values. It is possible to display different conditions and their possible consequences by using a tree-like model that contains conditional control statements, which is called a decision tree.

An example of a decision tree model is shown below. Each node represents a single object, and each branch represents the various possible values for this object. A decision tree will have three types of nodes: decision nodes, chance nodes, and end nodes. Decision nodes are the nodes that make decisions. One possibility exists along the path from one decision node to one end node, and each variable has its own value along the path between the two nodes. Going through each node simulates performing a "test" on an attribute, and each branch represents the result of the test results.

4.4. Random forests

In the field of machine learning, the decision tree is a well-known technique. However, because of the decision tree's ability to expand indefinitely, it can have a low bias but a very high variance, resulting in overfitting of datasets. Random forests were then developed as a type of classification method that combines multiple decision trees in order to correct for the tendency of decision trees to overfit to their training data during the training process.

Random forests are constructed by randomly sampling a subset of all candidate variables, resulting in a large number of decision trees with relatively uncorrelated models being constructed. If this is the case, the group of decision trees will outperform any of the individual constituent models as a result of the low correlation between them. As a result, the combination of uncorrelated models can produce more accurate predictions than any single prediction because they can protect each other from the errors introduced by their individual components. Even if some trees are incorrect, a large number of other trees will be correct, ensuring that the entire group moves in the same direction as a whole. Because of the uniqueness of the random forests method, it can produce extremely high-dimensional (many features) data even when no dimensionality reduction or feature selection is performed, and it can do so at a relatively fast rate. Furthermore, it can distinguish between the relative importance of different characteristics and the mutual influence between different characteristics. Creating a parallel method with decision trees within a forest is straightforward. More importantly, even if a significant portion of the features are not present, the accuracy can still be maintained in most cases.

4.5. Adaptive Boosting classifier

The AdaBoost (Adaptive Boosting) classifier is one of the most straightforward boosting algorithms available. Initially, AdaBoost assigns equal weights to each observation of training in order to minimize bias. This makes use of several weak models and gives higher weights to certain results that have been observed to be misclassified in previous studies. Because it makes use of multiple weak models, it is possible to combine the effects of the decision boundaries reached through multiple iterations. The accuracy of the misclassified observations is improved, resulting in an improvement in the accuracy of the overall iterations as a result of this improvement. Diabetes is a disease that develops as a result of a sustained high level of sugar obsession in the blood. In this paper, various classifiers are discussed, and a decision support system is proposed that uses the AdaBoost algorithm with decision Stump as a base classifier. It is worth noting that the accuracy obtained for AdaBoost calculation with choices stump as a base classifier.

4.6. Artificial neural network (ANN)

The present study employed a qualitative approach that involved artificial neural network (ANN) much like the human brain, mimics certain functions. In layman's terms, it can be understood as a collection of nodes known as artificial neurons. This network is capable of sending information to every single node on it. Neurons can be visualized as having a value of 0 or 1, and every node has an associated weight that represents the relative strength or importance of that particular node in the overall system. The structure of ANN is organized into multiple layers, starting with the input layer and continuing through the hidden layers to the output layer, where each layer processes the data and generates a useful output.

4.7. Stepwise proposed method

Step 1: Datasets/Inputs: First and foremost, the dataset was checked for any missing or null values, which served as a starting point for data cleaning. There were no null or empty values in the dataset, which was a good thing. The next step was to determine whether or not there were any outliers in the data. A join grid plot was created for each feature in the dataset in order to achieve this goal. It was discovered from the grid plot that the features Blood-Pressure, BMI, Glucose, and Skin-Thickness all had values of zero, which was found to be incorrect. The removal of outliers from the dataset was accomplished by directly eliminating the outlier values from the dataset; as a result, a modified dataset was provided as input, which was then divided into training and testing datasets.

Step 2: Tuning the hyper-parameters: Tuning the hyper-parameters is critical in the creation of the model because it has the potential to make or break the model [33]. For the purpose of obtaining the optimal set of values for the parameters, GridSearchCV has been employed. Grid search will train a model with all of the possible combinations of hyper-parameters and will extract the most useful information from the model [34]. Table 1 lists the parameters that were tuned as a result of the grid search and the values that were tuned as a result of the grid search.

Step 3: Fitting the classifiers: Based on the classifiers AdaBoost, that have been used with optimal values calculated through the method of tuning hyper-parameters, a diabetes prediction analysis has been conducted for the first time.

Step 4: AdaBoost: The accuracy of the misclassified observations is improved, resulting in an improvement in the accuracy of the overall iterations as a result of this improvement [35]. When using the AdaBoost classifier, a series of models with increased sample weights is trained, and on the basis of error, individual classification models are given Alpha confidence coefficients. Low errors result in a large Alpha, which indicates that the voting process is of greater importance.

Step 5: Results prediction: The test outcomes are estimated as a number between 0 and 1, with 0 denoting non-diabetic and 1 denoting diabetic respectively.

Step 6: Evaluation of prediction: Performance metrics such as accuracy, confusion matrix, and classification report are used to assess the model's overall performance. The Confusion Matrix is used to represent the values of true-positive, true-negative, false-positive, and false-negative in a true-positive or true-negative situation [30]. The accuracy of a situation is the percentage of circumstances that have the correct description when compared to the total number of circumstances. The accuracy, recall, F1, and support scores of the model are displayed in the classification report's visualization, which is also available online.

Step 7: Apply K-Fold Validation: Basically, this works by dividing the dataset into k-parts and referring to each break in the data as a "fold." k-1 folds must be used to train the classifier, and each retained fold must be checked against the classifier's output. Therefore, each fold of data must be replicated at least once so that it has an equal chance of being carried back to the testing set. The value of K used in this case is 5. The result is a more accurate estimate of the performance of the new data algorithm, which is based on the results of the testing. This is more dependable because the algorithm has been trained and tested on a variety of data several times before being used.

The accuracy of the classifiers used on the predicted diabetic data is taken into consideration when making a performance individual results as shown in Table 1. It has also been formulated in prediction a ROC AUC plot (refer to Fig. 8) of the classifiers, among other things. In the Figs. 8 the predicted accuracy graph plot, which shows that AdaBoost classifier performance has the highest accuracy in prediction when calculated to the individual classifiers.

Table 1

Evaluation metrics parameters based on different classifiers
No.	Classifiers	Accuracy Score
1	Logistic Regression	0.775
2	K Nearest Neighbors	0.880
3	Classification Tree	0.850
4	Random Forest	0.965
5	AdaBoost Classifier	0.975
6	ANN	0.802

Following the implementation of the ADABoost classifier, we obtain the ROC curve. Please refer to Figure. 8 for further information.

5.1. Machine learning classifiers models comparison with ROC curves

The accuracy of the classifiers used on the predicted diabetic data is taken into consideration when making a performance comparison as shown in Table 2. It has also been formulated in prediction a comparison with ROC curves plot (refer to Fig. 8) of the classifiers, among other things. In the Figs. 8 the predicted accuracy graph plot, which shows that, performance comparison with different ML classifiers, decision Tree has the highest accuracy in prediction when compared to the other classifiers [36].

Table 2

Evaluation metrics parameters based on different optimizers
No.	Classifiers	Accuracy Score
1	Decision Tree	0.994
2	Logistic Regression	0.788
3	K Nearest Neighbors	0.862
4	Random Forest	0.974
5	Radial Svm	0.820
6	AdaBoost Classifier	0.975
7	Linear Svm	0.780
8	Naive Bayes	0.758
9	ANN	0.802

Following the implementation of the AdaBoost classifier [31], we obtain the comparison with other models ROC curve. Please refer to Figure. 9 for further information.

The confusion matrix is explained in detail in Fig. 10, using the comparison with different ML models.

5.2. Model Training Accuracy and Loss

In order to perform any type of deep learning model analysis, we first visualize the results of the model in order to have a clear understanding of the data being output by the model and to arrive at an informed decision on the parameters or hyper parameters that affect the deep learning model.

It is simple to visualize the performance of any deep learning model, and it is even easier to make sense of the data that is being output by the model and to make an informed decision about the changes that should be made to the parameters or hyper parameters that affect the deep learning model. The training loss/accuracy and validation loss/accuracy are shown in Fig. 11. The accuracy plot shows that the model achieved high training accuracy on both datasets, which is consistent with our expectations. Also visible is that the model has not yet overlearned the training dataset, as evidenced by results that are comparable between the two datasets.

5.3. Model Confusion Matrix

The confusion matrix is explained in detail in Fig. 12, using the ANN algorithm.

5.4. Hyper parameter optimizers

The research design involved hyper parameters, which determine the network structure (number of hidden units) and the variables that determine how the network is trained (as shown in Table 3). We decided on 10 epochs as the number of epochs to use (the number of times the whole training data is shown to the network while training). The batch size is the number of sub samples that are sent to the network after the parameter update has occurred. When a single example can belong to multiple classes at the same time in a multilabel problem, the model attempts to determine whether the example belongs to that class or not for each of the classes involved in the problem. For each of the classes, binary cross entropy is used to determine how far away the prediction is from the true value. The final loss is calculated by averaging these class wise errors to obtain the final loss.

Then, γ is set to a value that is approximately equal to that of the momentum term, which is approximately 0.9. In order to make things clearer, we'll rewrite our unexciting SGD update in terms of the parameter update vector Δθt:

AdaDelta restricts the window of accumulated past gradients to a fixed size, which makes it easier to visualize the results. Instead of storing previous squared gradients inefficiently, the sum of the gradients is recursively defined as a decaying average of all previous squared gradients, as shown in the following example.

Table 3

Evaluation metrics parameters based on different optimizers
S. No	Optimizers	Evaluation Parameters
1		precision	recall	F1 score	Support
2	Adam	0.25	0.19	0.19	400
3	SGD	0.21	0.17	0.16	400
4	Adadelta	0.81	0.80	0.80	400
5	ML Compared models	0.83	0.83	0.83	500

In this research we have proposed an intelligent and connected solution for health monitoring. The AiDHealth framework provides connected health and health data analysis which are the two most essential requirements of digital health. With connected health feature, the citizens are able to use a homeCare platform, which connects their health monitoring data with the centralized intelligent system. The centralized healthcare system is used by the physician to monitor the citizens‘ health. As a proof of concept, we conducted the analyses using diabetes, an early-stage disease, is one of the important issues in the real world [37]. The analysis was based on a system designed to identify a patient's likelihood of developing diabetes. These algorithms were examined using six machine learning classification methods [38], and their results on various characteristics were measured on Pima Indian Diabetes datasets. We conducted all analyses using the AdaBoost algorithm, the results of the experiment demonstrate that the designed system is adequately accurate with a resolution of 0.97%. The goal is to improve detection and prevention of diabetes complications around the world by advancing classification techniques. This research proposes to utilize both feature extraction and feature selection to select the optimal features for use in the dataset.

Although the AUC weight used to build a generic ensemble classifier is good, the model with the higher AUC is given greater consideration. When inter-class redundancy is much higher (not linearly separable), random tree-based classifiers work best. By showing excellent performance against other frameworks, our proposed framework has proved itself capable of outclassing them in terms of AUC, which suggests great potential for diabetes prediction from the PID dataset. The machine learning classification algorithms that are used to create the system can be used to diagnose or predict other diseases in the future. Some other machine learning algorithms can be added for the automation of diabetes analysis, so the work can be extended and improved. A possible interpretation of this finding is that a web-based tool, to use is becoming a binding solution for AiEngine applications. In near future, we will fully implement AiEngine web tool connectivity to sophisticated apps from nearly any machine and operating system has become a standardized infrastructure.

Future research will have to meet the challenge of refining the model of all probable complications, including a methodical sequence of possible issues. The work can be expanded and improved using various additional profound learning algorithms and methodologies for automated diabetes analysis. The Deep learning technique can also adjust the last output layer in the hyperparameter tuning process since training too many parameters can easily cause the reduced the overfitting. If the data differs too much from the original dataset, after the output of the top layer is finalized, the model can set half the layer.

Author Contributions: Conceptualization, Mukhtiar Memon, Ghulam Ali Rahu; methodology, Ghulam Ali Rahu, Suhni Abbasi, Mukhtiar Memon; validation, Mukhtiar Memon, Suhni Abbasi, Habibullah Magsi; formal analysis, Habibullah Magsi; writing—original draft preparation, Mukhtiar Memon, Suhni Abbasi, Ghulam Ali Rahu; writing—review and editing, Suhni Abbasi, Ghulam Ali Rahu. All authors have read and agreed to the published version of the manuscript.

Conflict of Interest: The authors declare that they have no conflict of interest.

Funding: The authors did not receive any funding for the publications of the article.

Data Availability: Data transparency is not applicable.

Code Availability: Software application or custom code is not applicable.

LeBaron V, et at., "Leveraging Smart Health Technology to Empower Patients and Family Caregivers in Managing Cancer Pain: Protocol for a Feasibility Study" JMIR Res Protoc 2019;8(12):e16178
Abhishek Kumar, 2019. et al., "MACHINE LEARNING IMPLEMENTATION FOR SMART HEALTH RECORDS: A DIGITAL CARRY CARD", Vol 3 No 1 (2019): Global Journal on Innovation, Opportunities and Challenges in AAI and Machine Learning
Memon M, Wagner SR, Pedersen CF, Beevi FHA, Hansen FO. Ambient Assisted Living Healthcare Frameworks, Platforms, Standards, and Quality Attributes. Sensors. 2014; 14(3):4312-4341.
Patalano R, 2021 et al., An Innovative Approach to Designing Digital Health Solutions Addressing the Unmet Needs of Obese Patients in Europe. International Journal of Environmental Research and Public Health. 2021; 18(2):579.
Guisado-Fernandez E, et al., A Smart Health Platform for Measuring Health and Well-Being Improvement in People With Dementia and Their Informal Caregivers: Usability Study JMIR Aging 2020;3(2): e15600
Vijayan V V, Anjali C. 2016. Prediction and diagnosis of diabetes mellitus — A machine learning approach[C]// 2016.
Ritika, A. Tyagi. C Prediction Model for Diabetes Mellitus Using Machine Learning Techniques. // 2018.
Harleen Kaur, 2018. Vinita Kumari, Predictive modelling and analytics for diabetes using a machine learning approach //2018.
S. Saru, S. Subashree, 2019. analysis and prediction of diabetes using machine learning//2019
Minyechil Alehegn, Rahul Raghvendra Joshi, 2019. Type II Diabetes Prediction Using Combo of SVM//2019.
Quan Zou, Kaiyang Qu Yamei Luo, Dehui Yin,Ying Ju4, Hua Tang, 2018. Predicting Diabetes Mellitus With Machine Learning Techniques//2018.
J. Miranda et al., "Eye on Patient Care: Continuous Health Monitoring: Design and Implementation of a Wireless Platform for Healthcare Applications," in IEEE Microwave Magazine, vol. 18, no. 2, pp. 83-94, March-April 2017.
Miranda J, Cabral J, Wagner SR, Fischer Pedersen C, Ravelo B, Memon M, Mathiesen M. 2016. An Open Platform for Seamless Sensor Support in Healthcare for the Internet of Things. Sensors. 2016; 16(12):2089. https://doi.org/10.3390/s16122089
Daramola, O., & Moser, T. (2021). Towards Semantic Data Integration in Resource-Limited Settings for Decision Support on Gait-Related Diseases. In Advanced Concepts, Methods, and Applications in Semantic Computing (pp. 236-256). IGI Global.
Bhoi, S. K. (2021). Prediction of diabetes in females of pima Indian heritage: a complete supervised learning approach. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(10), 3074-3084.
Parastoo Rahimloo, Ahmad JAFARIAN2, 2016 Prediction of Diabetes by Using Artificial Neural Network, Logistic Regression Statistical Model and Combination of Them//2016.
Bagavan, P. (2017). Machine Learning Ensembles Predicting Liver Transplantation Outcomes from Imbalanced Data (Doctoral dissertation, State University of New York at Binghamton).
Bellazzi, R., Montani, S., Riva, A., & Stefanelli, M. (2001). Web-based telemedicine systems for home-care: technical issues and experiences. Computer Methods and Programs in Biomedicine, 64(3), 175-187.
Menon, Fatehi, F., Bird, D., Darssan, D., Karunanithi, M., Russell, A., & Gray, L. (2019). Rethinking models of outpatient specialist care in type 2 diabetes using eHealth: study protocol for a pilot randomised controlled trial. International journal of environmental research and public health, 16(6), 959.
Andenas, M., & Tofte, T. S. (2017). UN Principles on the Right of Anyone Deprived of Liberty to Bring Proceedings Before Court. University of Oslo Faculty of Law Research Paper, (2017-19).
Ammar, N.,2021. Bailey, J. E., Davis, R. L., & Shaban-Nejad, A. (2021). Using a Personal Health Library–Enabled mHealth Recommender System for Self-Management of Diabetes Among Underserved Populations: Use Case for Knowledge Graphs and Linked Data. JMIR Formative Research, 5(3), e24738.
Zarkogianni, K., Athanasiou, M., Thanopoulou, A. C., & Nikita, K. S. (2017). Comparison of machine learning approaches toward assessing the risk of developing cardiovascular disease as a long-term diabetes complication. IEEE journal of biomedical and health informatics, 22(5), 1637-1647.
Chauhan, T., Rawat, S., Malik, S., & Singh, P. (2021, March). Supervised and Unsupervised Machine Learning based Review on Diabetes Care. In 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS) (Vol. 1, pp. 581-585). IEEE.
Jaafar, S. F. B., & Ali, D. M. (2005, September). Diabetes mellitus forecast using artificial neural network (ANN). In 2005 Asian Conference on Sensors and the International Conference on New Techniques in Pharmaceutical and Biomedical Research (pp. 135-139). IEEE.
Al Jarullah, 2011 A. A. (2011, April). Decision tree discovery for the diagnosis of type II diabetes. In 2011 International conference on innovations in information technology (pp. 303-307). IEEE.
Yu, W., Liu, T., Valdez, R., Gwinn, M., & Khoury, M. J. (2010). Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. BMC medical informatics and decision making, 10(1), 1-7.
Lai, H., Huang, H., Keshavjee, K., Guergachi, A., & Gao, X. (2019). Predictive models for diabetes mellitus using machine learning techniques. BMC endocrine disorders, 19(1), 1-9.
Garcia-Carretero, R., Vigil-Medina, L., Mora-Jimenez, I., Soguero-Ruiz, C., Barquero-Perez, O., & Ramos-Lopez, J. (2020). Use of a K-nearest neighbors model to predict the development of type 2 diabetes within 2 years in an obese, hypertensive population. Medical & biological engineering & computing, 58(5), 991-1002.
Hosseini, S. M., Tazhibi, M., Amini, M., Zaree, A., & Hashemi, H. J. (2010). Using Classification Tree for prediction of Diabetic Retinopathy on Type II Diabetes. Journal of Isfahan Medical School, 28(104).
VijiyaKumar, K., Lavanya, B., Nirmala, I., & Caroline, S. S. (2019, March). Random forest algorithm for the prediction of diabetes. In 2019 IEEE International Conference on System, Computation, Automation and Networking (ICSCAN) (pp. 1-5). IEEE.
Vijayan, V. V., & Anjali, C. (2015, December). Prediction and diagnosis of diabetes mellitus—A machine learning approach. In 2015 IEEE Recent Advances in Intelligent Computational Systems (RAICS) (pp. 122-127). IEEE.
Alić, B., 2017. Gurbeta, L., & Badnjević, A. (2017, June). Machine learning techniques for classification of diabetes and cardiovascular diseases. In 2017 6th mediterranean conference on embedded computing (MECO) (pp. 1-4). IEEE.
Goyal, M., Reeves, N. D., Davison, A. K., Rajbhandari, S., Spragg, J., & Yap, M. H. (2018). Dfunet: Convolutional neural networks for diabetic foot ulcer classification. IEEE Transactions on Emerging Topics in Computational Intelligence, 4(5), 728-739.
Gadepalli, T. R., Khare, N., Bhattacharya, S., Singh, S., Maddikunta, P. K. R., & Srivastava, G. (2020). Deep neural networks to predict diabetic retinopathy. Journal Of Ambient Intelligence and Humanized Computing, 1-14.
Austin, P. C., & Lee, D. S. (2011). Boosted classification trees result in minor to modest improvement in the accuracy in classifying cardiovascular outcomes compared to conventional classification trees. American journal of cardiovascular disease, 1(1), 1.
Kandhasamy, J. P., & Balamurali, S. J. P. C. S. (2015). Performance analysis of classifier models to predict diabetes mellitus. Procedia Computer Science, 47, 45-51.
Artzi, 20 N. S., Shilo, S., Hadar, E., Rossman, H., Barbash-Hazan, S., Ben-Haroush, A., ... & Segal, E. (2020). Prediction of gestational diabetes based on nationwide electronic health records. Nature medicine, 26(1), 71-76.
Mercaldo, F., Nardone, V., & Santone, A. (2017). Diabetes mellitus affected patients classification and diagnosis through machine learning techniques. Procedia computer science, 112, 2519-2528

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

AiDHealth: An AI-enabled Digital Health Framework for Connected Health and Personal Health Monitoring

Status:

Version 1

Abstract

Figures

1. Introduction

2. Material And Method

2.1. Data analytics of complications of diabetes

2.2. Imbalanced classification problems

2.3. Density plot of glucose levels

2.4. Density plot of Insulin Levels

2.5. Density plot of Skin Thickness

2.6. Predictor variables correlation

3. Aidhealth Framework: An Ai-enabled Digital Health Framework

3.1. HomeCare Platform

3.2. Connected Intelligent Digital Health (CIDH) Platform

3.3. AiEngine

4. Classification Based On Machine Learning Classifiers

4.1. Logistic Regression

4.2. K-Nearest Neighbour (k-NN)

4.3. Decision Tree

4.4. Random forests

4.5. Adaptive Boosting classifier

4.6. Artificial neural network (ANN)

4.7. Stepwise proposed method

5. Results

5.1. Machine learning classifiers models comparison with ROC curves

5.2. Model Training Accuracy and Loss

5.3. Model Confusion Matrix

5.4. Hyper parameter optimizers

6. Conclusion And Future Work

Declarations

References

Additional Declarations

Status:

Version 1