Data and Variables
This study is based on secondary data extracted from the Bangladesh Demographic and Health Survey (2017–2018 BDHS), the eighth nationwide survey to provide information on the demographics and health status of women and children. The BDHS (2017–2018) is a national survey that includes a sample of approximately 20,250 randomly chosen households that are nationally representative. Individual interviews were available to all ever-married women between the ages of 15 and 49 who are regular residents of the chosen houses or who spent the night before the survey in the chosen homes. The survey was conducted to provide accurate estimates for important national indicators, urban and rural areas, and each of the eight divisions: Barishal, Chattogram, Dhaka, Khulna, Mymensingh, Rajshahi, Rangpur, and Sylhet.
The World Health Organization (WHO) recommends an evidence-based, cost-effective model that pregnant women with uncomplicated pregnancies should receive a minimum of four ANC visits, with the first visit occurring before 14 weeks of gestation. This recommendation is based on a landmark cluster randomized WHO Antenatal Care Trial (WHOACT) conducted in 2001 and a careful review of the effectiveness of various types of models of maternal health care (Villar et al., 2001; Villar et al., 2003). According to the WHOACT, cutting the number of ANC visits to at least four did not increase the risk of negative outcomes for mothers and newborns more than standard Western ANC packages. It also had the potential to save money (Villar et al., 2001). Therefore, in this study, we set the outcome variable Antenatal care visit (ANC), which was measured as a binary outcome. Thus, antenatal care visits were categorized as “Yes” (taken 4 or more antenatal care visits, coded as 1) or “No” (less than 4 antenatal care visits, coded as 0) for all the models. The predictors (features) used in this study include mothers’ current age, divisions (Barisal, Chittagong, Dhaka, Khulna, Rajshahi, Rangpur, Sylhet, Mymensingh), individual age in 5-year groups, educational level of the respondent (No education, Primary, Secondary, Higher), wealth index status (Poorest, Poorer, Middle, Richer, Richest), types of residency (Urban, Rural), educational level of respondent’s partner (No education, Primary, Secondary, Higher), respondent employment status (employed, not employed/housewife), total number of children in household, health insurance (Yes, No), sex of household head (Male, Female), and household size.
Models
This study aimed to identify the determinants of the ANC utilization of mothers and to predict the factors associated with ANC visits in Bangladesh using different machine learning (ML) algorithms: logistic regression, K-nearest neighbors (KNN), random forest (RF), classification and regression tree (CART), lasso regression, light gradient boosting machine (LightGBM), and support vector machine (SVM). The Boruta algorithm is used as a feature selection criterion before running the ML algorithms to determine the risk factors to predict ANC visits in Bangladesh. The authors motivated the selection of machine learning models based on the literature (Bitew et al., 2020). In this regard, eighty percent of the total sample was randomly chosen and trained and then utilized for 10-fold cross-validation to fine-tune the model's parameters. To forecast the metrics of model performance, test data were taken from the remaining 20% of the random sample. To demonstrate how effectively the models, work in terms of antenatal care visits, model accuracy measures, including sensitivity, specificity, positive predictive value, and negative predictive value, were generated. To assess how well the model performed in separating the “Yes” and “No” cases, metrics such as the area under the curve (AUC) and receiver operating characteristic (ROC) curve were also used. The R programming language (version 4.1.2) and the caret package (Kuhn, 2008) were used for machine learning analysis.
Boruta Algorithm
To extract the pertinent risk indicators for ANC visits in Bangladesh, the Boruta algorithm was used. This is an algorithm that has been built around the RF classification to determine the features that are relevant and significant with regard to the outcome variable (Kursa et al., 2010).
Logistic Regression
When analyzing binary data, logistic regression is frequently employed as an inferential tool in population health research, but it may also be utilized as a binary classification model. Assessing an association between attributes and the probability of a specific outcome is the goal of logistic regression. To forecast the likelihood of a categorical dependent variable, the classification algorithm logistic regression is employed. In logistic regression, a binary variable (yes, true, normal, success, etc.) or (no, false, abnormal, failure, etc.) is used as the dependent variable (Dreiseitl and Ohno-Machado, 2002).
K-Nearest Neighbor (KNN)
Based on its capacity to identify both linear and nonlinear group borders, the KNN model was chosen. To determine its nearest neighbors, the primary idea of KNN relies on measuring the distances between the trained and tested data samples. The key determining factor in this classifier is the K value, which stands for the quantity of nearest neighbors. To estimate the value of a given observation, it depends on determining the optimal value for k. As a result, when k = 1, the new data object is simply given its closest neighbor's class. Although there are several numerical metrics, the "nearness" of observations is frequently measured using the Euclidean distance between observations (Ali et al., 2019; Larose, 2015).
Random Forest (RF)
The RF model is frequently utilized in machine learning scenarios because it is more adaptable and provides better predictive accuracy. The RF model generates decision trees by randomly selecting a set of predictor variables and repeatedly sampling the variables in the training data set. When a large number of these trees have formed, the forest is looked at to determine which characteristic consistently yields a higher prediction. These collections of comparatively uncorrelated models are capable of generating ensemble forecasts that are more precise than any of the individual predictions. This is due to how the trees protect against one another's errors (as long as they do not all constantly err in the same direction) (Segal, 2004).
Classification and Regression Tree (CART)
The CART is a machine-learning prediction approach that demonstrates how the values of the target variable can be predicted based on other factors. Each fork of the decision tree is divided into a predictor variable, and at the conclusion of each node is a prediction for the target variable. Depending on the threshold value of an attribute, nodes in the decision tree are divided into sub nodes. The training set is the root node, which is divided into two sets by taking the best attribute and threshold value into account. Additionally, the subsets are divided according to the same rationale. This continues until the tree has its final pure subset or has produced all of its potential leaves (Lewis, 2000).
LASSO Regression
Machine learning uses lasso regression to reduce the error. Moreover, it can be used to choose features by setting coefficients to 0. L1-norm regularization is another name for lasso regression (Muthukrishnan and Rohini, 2016).
Machine for Boosting Light Gradients (LightGBM)
LightGBM is a machine-learning gradient-boosting classifier that uses tree-based learning techniques. With quicker drive speeds, higher efficiency, less memory utilization, and better accuracy, it is made to be distributed and efficient (Gou et al., 2023).
Support Vector Machine (SVM)
The SVM is used because it enables the identification of complex correlations between data without the need for extensive manual manipulations. It is a great choice if you're working with tiny data sets that include tens of thousands to hundreds of thousands of characteristics. Due to their propensity for handling small, complex data sets, they frequently produce more accurate findings when compared to other algorithms (Jakkula, 2006).
Receiver operating characteristic (ROC) curve
For classifiers using data sets, another effective visualization tool is the ROC curve. The ROC curve demonstrates the classifier's sensitivity when true positive and false positive rates are considered. The true positive rate will increase and the area under the ROC curve will be set to 1 when the classifier performs admirably (Igual et al., 2017).