Diabetes is a rapidly spreading disease affecting various age groups, including young individuals [1]. It is characterized by elevated blood sugar (glucose) levels. The condition can be categorized into two types: type 1 diabetes and type 2 diabetes. Type 1 diabetes is an autoimmune disorder wherein the body mistakenly attacks and destroys the cells responsible for producing insulin, which is crucial for absorbing sugar and generating energy.
Type 1 diabetes can develop regardless of obesity status. Obesity refers to having a body mass index (BMI) higher than the normal range [2]. Type 1 diabetes often manifests during childhood or adolescence. On the other hand, type 2 diabetes primarily affects adults, particularly those who are obese. In this type, the body either becomes resistant to insulin or fails to produce enough of it. Typically, type 2 diabetes is more prevalent among middle-aged and older individuals [1]. Additionally, there are several other factors that can contribute to the development of diabetes, including bacterial or viral infections, exposure to toxic or chemical substances in food, autoimmune reactions, poor diet, lifestyle changes, dietary habits, and environmental pollution. Diabetes can lead to various complications such as cardiovascular issues, renal problems, retinopathy and foot ulcers [1].
Data analytics is the systematic exploration and identification of concealed patterns within extensive datasets, ultimately leading to informed conclusions. Within the healthcare sector, this analytical process harnesses machine learning algorithms to analyze medical data and construct models that facilitate medical diagnoses. Machine learning, a branch of artificial intelligence (AI), empowers systems to learn autonomously and develop knowledge models capable of making predictions and decisions regarding unknown data or labels associated with given data.
Machine learning algorithms can be broadly classified into three categories: supervised learning, unsupervised learning, and semi-supervised learning. Supervised learning algorithms are employed in scenarios where human expertise is lacking, explanations for human expertise are elusive (such as speech recognition), solution adaptations are required for time series changes (like routing in computer functions), or customized solutions are needed for specific cases (such as user biometrics). Various types of supervised learning algorithms exist, including probability-based, function-based, rule-based, tree-based, and instance-based algorithms. Unsupervised learning, on the other hand, is a descriptive type of learning that aims to characterize or summarize data. Examples of unsupervised learning algorithms include clustering and association rule mining. Semi-supervised learning combines elements of both supervised and unsupervised learning. In this paper, a diabetes prediction system is presented for diagnosing diabetes. The supervised learning algorithm is utilized to learn from diabetes data and develop the diabetes prediction system. The accuracy of this prediction system is further enhanced through the utilization of pre-processing techniques.
The rest of the paper is organized as follows: Section 2 reviews the literature. Section3 presents the diabetes prediction system. Section 4 details the experimental setup and procedure. Section 5 discussed on results and discussion and Section 6 concludes the paper.