Performance Evaluation of Machine Learning Algorithms for Determining Influential Factors of Antenatal Care Visits in Bangladesh

doi:10.21203/rs.3.rs-3380672/v1

Download PDF

Research Article

Performance Evaluation of Machine Learning Algorithms for Determining Influential Factors of Antenatal Care Visits in Bangladesh

https://doi.org/10.21203/rs.3.rs-3380672/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Antenatal care (ANC) is a crucial part of the ongoing care that a mother receives before and throughout her pregnancy, at the time of delivery, and during the recovery period. This study aimed to explore the influential factors of ANC visits and evaluate the predictive model performance of identifying the determinants of ANC visits in Bangladesh using seven machine learning algorithms. This study is based on the secondary data extracted from the 2017-2018 Bangladesh Demographic and Health Survey (BDHS), which covered a nationally representative sample of 20,250 ever-married women aged 15–49 years. The final data consist of 4,946 mothers who gave birth in the three years preceding the survey. Descriptive and inferential statistical techniques along with machine learning algorithms were used for data analysis. Of the 4,946 women, most were middle-aged and in the age groups of 20-24 years (35.4%) and 25-29 years (26.2%). Receiving a greater number of ANC services was significantly positively correlated with the frequency of ANC visits. Higher wealth indices increase the chance of completing an ANC visit. The random forest (RF) model shows that age, richest, number of children, household size, and mother's primary education level are the top five important predictors of antenatal care (ANC) visits. The quality of ANC services in Bangladesh could be increased by having a better grasp of the identified risk factors and implementing them in short- and long-term initiatives.

Antenatal care visits

Machine learning algorithm

ROC curve

Bangladesh

Antenatal care, or ANC, is a crucial part of the ongoing care that a mother receives before and throughout her pregnancy, at the time of delivery, and during the recovery period (Haque et al., 2022a). It provides the opportunity to introduce pregnant women to the healthcare system and to improve the health of the mother and newborn through disease prevention, health promotion, and medical intervention throughout pregnancy (Jo et al., 2019). In 2020, complications during pregnancy and childbirth resulted in approximately 800 maternal fatalities per day worldwide, with 95% of these deaths occurring in developing countries (WHO, 2023). UN interagency forecasts show a 34% drop in the worldwide maternal mortality ratio (MMR), from 342 deaths per 100,000 live births in 2000 to 223 deaths per 100,000 live births in 2020. (UNICEF, 2023). Maternal mortality is the third highest cause of death among women aged 15-49 in Bangladesh, accounting for 14% of fatalities in this demographic group (Roy and Shengelia, 2016). In Bangladesh, the maternal mortality rate is 245 deaths for every 100,000 live births, and approximately 7,660 women die each year from preventable causes connected to pregnancy and childbirth (Collaborators and G.B.D., 2016). Even though Bangladesh has significantly decreased its MMR rate, it has remained stable over the past few years, with an estimated 196 per 100,000 live births in 2016, nearly identical to 2010 (NIPORT, ICDDRB, MEASURE Evaluation, 2017). Reducing the maternal mortality ratio (MMR) to below 70 by 2030 (WHO, 2020) is one of the main goals of SDG-3, which is undeniably a very challenging issue.

The influences on ANC visits and postpartum care have been the subject of numerous research efforts in Bangladesh (Haque et al., 2022b; Islam and Masud, 2018; Nizum et al., 2023). Mothers' education is viewed as a critical aspect in the prevention and treatment of poor health outcomes as well as the efficient utilization of healthcare services (Smith et al., 2003). Women with greater educational attainment are more prone to receive ANC services. Educated women are more likely to comprehensively understand the many facets of reproductive health care. As a result, they are more likely to recognize the significance of routinely utilizing ANC services (Bhowmik et al., 2020; Guliani et al., 2014). In addition to the frequency of ANC visits, a woman's decision to receive ANC is influenced by her partner's education, according to the findings of a study. The likelihood of avoiding ANC decreased substantially as the partner's education level increased (Bhowmik et al., 2020). In addition, the woman's age, number of children, education level of her spouse, and wealth index determined whether a woman received her first antenatal visit on time (Pervin et al., 2021). Antenatal care is the result of numerous socioeconomic and demographic variables. The residence of women had a significant effect on ANC visits. Women living in urban areas had more ANC visits than women living in rural areas (Rahman and Hossain, 2019). Another element that can lower maternal mortality and enhance both mother and child health is birth location (Chowdhury et al., 2022; Rahman and Hossain, 2019). In Bangladesh, just one in five women receive medical care from a professional, and approximately 71% of deliveries occur at home (Prata et al., 2014). Furthermore, it has also been observed that ANC services and the wealth index are closely associated (Elahi and Biswas, 2020; Shahjahan et al., 2013).

The majority of antenatal care-related studies evaluated their data using the logistic regression (LR) model, particularly for binary responses (Chowdhury et al., 2022). All underlying assumptions must be met to estimate the LR model's parameters, including that predictors are independent and have a significant relationship with the outcome variable. Consequently, it can be challenging to estimate the model parameters accurately using this prevalent approach to predictive modeling, and inaccurate estimates can lead to misleading information. ML-based algorithms are widely used as standard automated systems for accurately predicting early-stage diseases, and their use is proliferating (Islam et al., 2022). Based on past knowledge, machine learning algorithms find functional patterns in massive, complicated data sets using various statistical techniques (Mitchell, 1997). In the context of Bangladesh, there is a dearth of in-depth research based on machine learning algorithms. To fill this backdrop, the authors aimed to identify potential risk factors for women's ANC visits using the Boruta algorithm and seven different ML-based classifiers to predict women's ANC visits, including logistic regression (LR), support vector machine (SVM), K nearest neighbors (KNN), CART model, light GBM, lasso regression, and random forest (RF). Moreover, the most suitable predictive model was selected based on the accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and area under the curve of these classifiers.

Data and Variables

This study is based on secondary data extracted from the Bangladesh Demographic and Health Survey (2017–2018 BDHS), the eighth nationwide survey to provide information on the demographics and health status of women and children. The BDHS (2017–2018) is a national survey that includes a sample of approximately 20,250 randomly chosen households that are nationally representative. Individual interviews were available to all ever-married women between the ages of 15 and 49 who are regular residents of the chosen houses or who spent the night before the survey in the chosen homes. The survey was conducted to provide accurate estimates for important national indicators, urban and rural areas, and each of the eight divisions: Barishal, Chattogram, Dhaka, Khulna, Mymensingh, Rajshahi, Rangpur, and Sylhet.

The World Health Organization (WHO) recommends an evidence-based, cost-effective model that pregnant women with uncomplicated pregnancies should receive a minimum of four ANC visits, with the first visit occurring before 14 weeks of gestation. This recommendation is based on a landmark cluster randomized WHO Antenatal Care Trial (WHOACT) conducted in 2001 and a careful review of the effectiveness of various types of models of maternal health care (Villar et al., 2001; Villar et al., 2003). According to the WHOACT, cutting the number of ANC visits to at least four did not increase the risk of negative outcomes for mothers and newborns more than standard Western ANC packages. It also had the potential to save money (Villar et al., 2001). Therefore, in this study, we set the outcome variable Antenatal care visit (ANC), which was measured as a binary outcome. Thus, antenatal care visits were categorized as “Yes” (taken 4 or more antenatal care visits, coded as 1) or “No” (less than 4 antenatal care visits, coded as 0) for all the models. The predictors (features) used in this study include mothers’ current age, divisions (Barisal, Chittagong, Dhaka, Khulna, Rajshahi, Rangpur, Sylhet, Mymensingh), individual age in 5-year groups, educational level of the respondent (No education, Primary, Secondary, Higher), wealth index status (Poorest, Poorer, Middle, Richer, Richest), types of residency (Urban, Rural), educational level of respondent’s partner (No education, Primary, Secondary, Higher), respondent employment status (employed, not employed/housewife), total number of children in household, health insurance (Yes, No), sex of household head (Male, Female), and household size.

Models

This study aimed to identify the determinants of the ANC utilization of mothers and to predict the factors associated with ANC visits in Bangladesh using different machine learning (ML) algorithms: logistic regression, K-nearest neighbors (KNN), random forest (RF), classification and regression tree (CART), lasso regression, light gradient boosting machine (LightGBM), and support vector machine (SVM). The Boruta algorithm is used as a feature selection criterion before running the ML algorithms to determine the risk factors to predict ANC visits in Bangladesh. The authors motivated the selection of machine learning models based on the literature (Bitew et al., 2020). In this regard, eighty percent of the total sample was randomly chosen and trained and then utilized for 10-fold cross-validation to fine-tune the model's parameters. To forecast the metrics of model performance, test data were taken from the remaining 20% of the random sample. To demonstrate how effectively the models, work in terms of antenatal care visits, model accuracy measures, including sensitivity, specificity, positive predictive value, and negative predictive value, were generated. To assess how well the model performed in separating the “Yes” and “No” cases, metrics such as the area under the curve (AUC) and receiver operating characteristic (ROC) curve were also used. The R programming language (version 4.1.2) and the caret package (Kuhn, 2008) were used for machine learning analysis.

Boruta Algorithm

To extract the pertinent risk indicators for ANC visits in Bangladesh, the Boruta algorithm was used. This is an algorithm that has been built around the RF classification to determine the features that are relevant and significant with regard to the outcome variable (Kursa et al., 2010).

Logistic Regression

When analyzing binary data, logistic regression is frequently employed as an inferential tool in population health research, but it may also be utilized as a binary classification model. Assessing an association between attributes and the probability of a specific outcome is the goal of logistic regression. To forecast the likelihood of a categorical dependent variable, the classification algorithm logistic regression is employed. In logistic regression, a binary variable (yes, true, normal, success, etc.) or (no, false, abnormal, failure, etc.) is used as the dependent variable (Dreiseitl and Ohno-Machado, 2002).

K-Nearest Neighbor (KNN)

Based on its capacity to identify both linear and nonlinear group borders, the KNN model was chosen. To determine its nearest neighbors, the primary idea of KNN relies on measuring the distances between the trained and tested data samples. The key determining factor in this classifier is the K value, which stands for the quantity of nearest neighbors. To estimate the value of a given observation, it depends on determining the optimal value for k. As a result, when k = 1, the new data object is simply given its closest neighbor's class. Although there are several numerical metrics, the "nearness" of observations is frequently measured using the Euclidean distance between observations (Ali et al., 2019; Larose, 2015).

Random Forest (RF)

The RF model is frequently utilized in machine learning scenarios because it is more adaptable and provides better predictive accuracy. The RF model generates decision trees by randomly selecting a set of predictor variables and repeatedly sampling the variables in the training data set. When a large number of these trees have formed, the forest is looked at to determine which characteristic consistently yields a higher prediction. These collections of comparatively uncorrelated models are capable of generating ensemble forecasts that are more precise than any of the individual predictions. This is due to how the trees protect against one another's errors (as long as they do not all constantly err in the same direction) (Segal, 2004).

Classification and Regression Tree (CART)

The CART is a machine-learning prediction approach that demonstrates how the values of the target variable can be predicted based on other factors. Each fork of the decision tree is divided into a predictor variable, and at the conclusion of each node is a prediction for the target variable. Depending on the threshold value of an attribute, nodes in the decision tree are divided into sub nodes. The training set is the root node, which is divided into two sets by taking the best attribute and threshold value into account. Additionally, the subsets are divided according to the same rationale. This continues until the tree has its final pure subset or has produced all of its potential leaves (Lewis, 2000).

LASSO Regression

Machine learning uses lasso regression to reduce the error. Moreover, it can be used to choose features by setting coefficients to 0. L1-norm regularization is another name for lasso regression (Muthukrishnan and Rohini, 2016).

Machine for Boosting Light Gradients (LightGBM)

LightGBM is a machine-learning gradient-boosting classifier that uses tree-based learning techniques. With quicker drive speeds, higher efficiency, less memory utilization, and better accuracy, it is made to be distributed and efficient (Gou et al., 2023).

Support Vector Machine (SVM)

The SVM is used because it enables the identification of complex correlations between data without the need for extensive manual manipulations. It is a great choice if you're working with tiny data sets that include tens of thousands to hundreds of thousands of characteristics. Due to their propensity for handling small, complex data sets, they frequently produce more accurate findings when compared to other algorithms (Jakkula, 2006).

Receiver operating characteristic (ROC) curve

For classifiers using data sets, another effective visualization tool is the ROC curve. The ROC curve demonstrates the classifier's sensitivity when true positive and false positive rates are considered. The true positive rate will increase and the area under the ROC curve will be set to 1 when the classifier performs admirably (Igual et al., 2017).

Background Characteristics of Mothers

The frequency distribution of background characteristics of mothers and ANC visits are presented in Table 1. Out of 4,946 mothers in the sample, most were middle-aged and in the age groups 20-24 (35.4%) and 25-29 (26.2%). Respondents are from all regional places of Bangladesh, and most of them are rural (65.6%). Among them, 6.1% are illiterate, 27.7% and 48% of women have completed primary education and secondary education, respectively, and the remaining 18.2% are highly educated. At the time of birth of the child, most of the mothers (93.7%) were under 35 years, of whom 17.1% were teenagers aged 13–19, and the average age of the mothers was 23.6 years. Most of the mothers (93.9%) had secondary or above-level education [Table 1].

Table 1: Percent distribution of background characteristics of mothers and ANC visits

Characteristics	Categorization	Mothers (%)	ANC Visit Yes (%)
Age Group	15-19	848 (17.1)	406(17.0%)
	20-24	1751 (35.5)	850(35.6%)
	25-29	1294 (26.2)	647(27.1%)
	30-34	745 (15.1)	355(14.9%)
	35-39	252 (5.1)	103(4.3%)
	40-44	48 (1.0)	24(1.0%)
	45-49	8 (0.2)	3(0.1%)
Division	Barisal	525 (10.6)	210(8.8%)
	Chittagong	827 (16.7)	335(14.0%)
	Dhaka	730 (14.8)	391(16.4%)
	Khulna	513 (10.4)	308(12.9%)
	Mymensingh	594 (12.0)	286(12.0%)
	Rajshahi	520 (10.5)	262(11.0%)
	Rangpur	553 (11.2)	343(14.4%)
	Sylhet	684 (13.8)	253(10.6%)
Place of Residence	Urban	1699 (34.4)	1007(42.2%)
Place of Residence	Rural	3247 (65.6)	1381(57.8%)
Mother’s Education	No education	304 (6.1)	63(2.6%)
	Primary	1370 (27.7)	468(19.6%)
	Secondary	2372 (48.0)	1221(51.1%)
	Higher	900 (18.2)	636(26.6%)
Wealth Index	Poorest	1066 (21.6)	328(13.7%)
	Poorer	998 (20.2)	384(16.1%)
	Middle	891 (18.0)	427(17.9%)
	Richer	980 (19.8)	527(22.1%)
	Richest	1011 (20.4)	722(30.2%)
Health Insurance	No	4937 (99.8)	2381(99.7%)
Health Insurance	Yes	9 (0.2)	7(0.3%)
Husband’s Education	No education	692 (14.0)	208(8.7%)
	Primary	1657 (33.5)	637(26.7%)
	Secondary	1635 (33.1)	849(35.6%)
	Higher	962 (19.5)	694(29.1%)
Employment Status	No	3098 (62.6)	1509(63.2%)
Employment Status	Yes	1848 (37.4)	879(36.8%)
Sex of the Head	Male	4357 (88.1)	2114(88.5%)
Sex of the Head	Female	589 (11.9)	274(11.5%)

Less than one-third (31.5%) of mothers were from two large administrative divisions, namely, Dhaka (14.8%) and Chittagong (16.7%), and the proportion of mothers varied from 0.02% to 3.2% across the other five divisions. Among the 4946 mothers, 21.6% are the poorest and 20.4% are the richest according to the wealth index. There is a similar portion of mothers from the five wealth indices in this study. Only 9 (0.2%) mothers among 4946 are covered by health insurance. That portrays the poor health sector of the country. According to the educational standard of husbands, 86% attended the primary level, and the remaining 14% were illiterate at their education level. Among those who attended at least the primary level, 33.5% had primary education, 33.1% had secondary education and 19.5% had completed higher education. Women's employment is an important determinant of health care services during pregnancy. Information collected in this regard. As information suggests, 37.4% of mothers are currently working. The sex of the household head is also an important determinant of health care services during pregnancy. As information suggests, only 11.9% of mothers are household heads [Table 1].

Machine learning results

Feature Selection

Using the Boruta algorithm, Figure 1 demonstrates that nine variables, i.e., age, age group, division, educational level of mothers, household members, place of residence, husband’s education, wealth index, and number of children, were selected among all surveyed variables as risk factors to predict ANC visits in Bangladesh. The performance of ML algorithms was then evaluated using these nine variables.

Machine Learning (ML) Model Evaluation

To achieve our objective, we utilized seven well-known machine learning (ML) algorithms: logistic regression (LR), support vector machine (SVM), K nearest neighbors (KNN), CART model, light GBM, lasso regression, and random forest (RF). These ML algorithms were implemented on 80% of the individuals in each group (training data set, n = 5147) and validated on 20% of the remaining individuals (test data set, n = 1716). Using 10-fold cross-validation, all models were trained. On the training set, we employed 10-fold cross-validation, and on the testing set, performance was estimated. Performance parameters such as accuracy, sensitivity, specificity, PPV (positive predictive value), NPV (negative predictive value), and AUC (area under the curve) were used to compare the predictive performance of ML algorithms to predict factors associated with the number of ANC visits in Bangladesh, and the results are illustrated in Table 2. AUC represents the overall performance of the model, while accuracy, sensitivity, specificity, PPV, and NPV show the model's performance in predicting positive and negative cases.

The logistic regression and Lasso regression have similar performance with an AUC of 0.73 and an accuracy of 0.66. They also have the same sensitivity and specificity values, indicating that they are better at predicting true positives than true negatives. The KNN model has the highest AUC of 0.75 but a lower accuracy of 0.63, indicating that it is good at separating positive and negative cases but may misclassify some instances. The SVM and LightGBM models have similar accuracy and AUC values, although SVM has higher sensitivity, while they have the same specificity, NPV, and PPV values. They also have almost the same sensitivity and specificity values, indicating that they are better at predicting true positives than true negatives. The CART model has the lowest AUC of 0.65 and the lowest accuracy of 0.62, indicating that it may not be a good model for predicting positive and negative cases. Although the CART model has high sensitivity, it has low specificity, indicating that it may misclassify many negative cases as positive.

From the findings presented in Table 2, it is observed that the test data accuracy of the random forest classifier is 66%, which means that the algorithm is 66% correct for the prediction. The area under the AUC curve (AUC) of the RF model is 80%, which is the highest among the seven models. The sensitivity and specificity of the RF model were 74% and 58%, respectively. The positive predictive value (PPV) and negative predictive value (NPV) for the RF classifier are 0.65 and 0.68, respectively. In this study, a pair model tuning parameter is used for the best performance of the random forest (RF) classifier. Although there are many parameters for RF, we chose two parameters that provide the best effect on the final accuracy. Those parameters are the “number of variables randomly sampled” (denoted by “mtry”) and “number of trees to grow” (denoted by “ntree”). For the study, we found that the best mtry is 2 and the best ntree is 500 through 10-fold cross-validation.

Table 2: Comparison of the prediction performance of the different ML models

Method	AUC	Accuracy	Sensitivity	Specificity	PPV	NPV
Logistic Regression	0.73	0.66	0.70	0.62	0.66	0.66
CART	0.65	0.62	0.84	0.38	0.59	0.69
KNN	0.75	0.63	0.67	0.59	0.64	0.63
Random Forest	0.80	0.66	0.74	0.58	0.65	0.68
LightGBM	0.75	0.66	0.71	0.60	0.66	0.66
Lasso Regression	0.73	0.66	0.70	0.62	0.66	0.66
SVM	0.75	0.66	0.72	0.60	0.66	0.66

Among the seven classifiers, we obtained the best performance based on an accuracy of 66% for logistic regression, random forest, LightGBM, lasso regression, and SVM classifiers. Although accuracy is a parameter for evaluating performance, we estimate model performance based on the ROC (receptor performance) curve and the AUC (area under the ROC curve) value. Because the overall accuracy is based on a cut point, the ROC curve tries all the cut points and plots the sensitivity and 1−specificity. Therefore, we obtain the best performance from the random forest model based on the above reason.

If we try to interpret the model performance depending on accuracy, we only consider a particular cut point. However, the overall accuracy varies with different cut points, which are considered when drawing the ROC curve. Furthermore, AUC is the measure of separability that indicates the model’s capability in distinguishing between classes. Thus, in practice, the ROC curve and the AUC can give us more accurate information than accuracy. Depending on the AUC value (Figure 2), we can see that random forest produces a great distinction between intended and unintended pregnancy among all classifiers; i.e., it gives a more accurate prediction (approximately 80%) than others.

Predicting the determinants of antenatal care visits (ANCs) based on the random forest model

Figure 2 illustrates the receiver operating characteristic (ROC) curve. The random forest model's curve has the highest area under the curve (AUC) value (0.80) among the seven machine learning models used in this study, demonstrating that it is the most effective at classifying the determinants of ANC visits. Table 3 shows the determinants associated with classifying ANC visits as percentages.

Table 3: Determinants associated with classifying ANC visits

Method	Top ten determinants associated with classifying ANC visits (Overall variable importance) in percentage
Random Forest	Age of mother (100.00), Wealth index: Richest (93.94), Household member (76.08), Number of children (70.37), Wealth index: Poorest (47.09), Educational level of mother: Primary (42.85), Place of residence: Urban (38.62), Husband’s/partner’s Education: No education (31.2949), Husband’s/partner’s Education: Primary (30.76), Educational level of mother: No education (26.91)

The study's findings show that the most relevant indicators for predicting antenatal care visits using the ML feature selection algorithm “Boruta” were age group, division, place of residence, age, educational level, household member, wealth index, number of children, and husband's education. The prediction performance of these four machine learning algorithms is compared based on the curve value area. Many authors have made comparisons based on accuracy (Talukder and Ahammed et al., 2020). However, several authors have shown that AUC is a better method than accuracy, in both experience and form (Ling et al., 2003). According to the ROC curve area, the best result was obtained by the random forest (RF) algorithm. The AUC of the random forest (RF) algorithm is approximately 0.79. Using the performance metrics of the confusion matrix and the AUC, we assessed the effectiveness of seven different machine learning (ML) models in forecasting antenatal care visits in Bangladesh. Based on different performance metrics, the RF algorithm produced the most significant results, with an accuracy of 66%, a sensitivity of 74%, a specificity of 58%, and an AUC of 80%.

As a resource-poor country, the maternal health care programs of the country still face many challenges, including accessibility, lack of equity, lack of public health facilities, scarcity of skilled workforce, and inadequate financial resource allocation (Islam A et al., 2014). More qualitative and quantitative studies need to be undertaken to identify the barriers to the utilization of ANC services in Bangladesh and develop an effective ANC program.

This research has some limitations. When the predictive model is built using BDHS cross-sectional data, it cannot access additional information about other related factors. Combining these factors may increase the predictive accuracy and AUC. However, this study proves that machine learning algorithms can identify the determinants of ANC utilization of mothers that can help in the development of interventions to improve planned pregnancies and family planning among married couples in Bangladesh.

Antenatal care visits are one of the world's most critical public health issues; the issue is even more disastrous in nations with dense populations, such as Bangladesh. To create a prediction model for antenatal care visits in Bangladesh using a range of machine learning (ML) approaches, this research set out to determine the key factors that affect such visits. It goes without saying that this work presents the use of various ML models in the prediction of antenatal care visits, such as RF, which does not require any assumptions and is very simple to implement in any standard program. All 8 of these significant variables were included in the RF model to predict antenatal care visits utilizing their individual and interaction effects. The RF model is discovered to be more authentic and instructive in forecasting antenatal care visits in Bangladesh due to its high prediction accuracy, improved performance, and assumption-free features. Considering the study's results, we recommend creating successful campaigns to educate women. Thanks to this, they will become more aware of the causes of their more frequent ANC visits and the risks associated with potentially fatal pregnancy problems. Unquestionably, these effective programs decrease ANC visits and infant and maternal morbidity and mortality. Women from low-income households and those who live in rural areas should also have access to essential services for maternal health care.

Ethical approval and consent to participate

The demographic health surveys are available publicly, and ethics approvals were completed by institutions that commissioned, funded, and managed the surveys. DHS surveys are approved by Inner City Fund (ICF) International and an in-country Institutional Review Board (IRB) to ensure that protocols are in compliance with the U.S. Department of Health and Human Services regulations for the protection of human subjects.

Consent for publication

Not Applicable

Availability of data and materials

The data that support the findings of this study are openly available in Kaggle at [https://www.kaggle.com/datasets/hmnayem/anc-data-extracted-from-bdhs-2017-2018-data-set].

Competing interests

The authors declared no competing interests.

Funding

All authors ensure that this research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Authors' contributions

H.M. Nayem (HMN), Md. Moyazzem Hossain (MMH), Sinha Aziz (SA), Tareq Rahman (TR), Salma Sultana Trisha (SST) and Abul Bashar (AB):

HMN designed the study; SST, AB and SA performed the background study; HMN, MMH and TR performed the data extraction, cleaning, and analysis; SA, MMH, and HMN performed the data interpretation; MMH and HMN prepared the manuscript for publication; all authors participated in its critical revision and editing. All authors read and approved the final manuscript.

Acknowledgments

The authors are grateful to ICF International, Rockville, Maryland, USA, for providing the Bangladesh DHS data sets for this analysis.

Ali, N., Neagu, D., & Trundle, P. (2019). Evaluation of k-nearest neighbor classifier performance for heterogeneous data sets. SN Applied Sciences, 1, 1-15.
Bhowmik, K.R., Das, S., Islam, M.A., 2020. Modeling the number of antenatal care visits in Bangladesh to determine the risk factors for reduced antenatal care attendance. PLOS ONE 15, e0228215. https://doi.org/10.1371/journal.pone.0228215
Chowdhury, A.I., Habib, M.A., Rahman, T., 2022. Factors Influencing the Utilization of Antenatal Care, Institutional Delivery, and Postnatal Care Services Among Women in Bangladesh. MJHR 26. https://doi.org/10.7454/msk.v26i3.1385
Collaborators, G.B.D., 2016. Global, regional, and national levels of maternal mortality, 1990–2015: a systematic analysis for the Global Burden of Disease Study 2015.
Dreiseitl, S. and Ohno-Machado, L., 2002. Logistic regression and artificial neural network classification models: a methodology review. Journal of biomedical informatics, 35(5-6), pp.352-359.
Elahi, F., Biswas, S.C., 2020. Analysis of overdispersed count data: A multilevel modeling approach.
Guliani, H., Sepehri, A., Serieux, J., 2014. Determinants of prenatal care use: evidence from 32 low-income countries across Asia, Sub-Saharan Africa and Latin America. Health Policy and Planning 29, 589–602. https://doi.org/10.1093/heapol/czt045
Guo, J., Yun, S., Meng, Y., He, N., Ye, D., Zhao, Z., Jia, L. and Yang, L., 2023. Prediction of heating and cooling loads based on light gradient boosting machine algorithms. Building and Environment, 236, p.110252.
Haque, M.E., Mallick, T.S., Bari, W., 2022a. Does dropout from school matter in taking antenatal care visits among women in Bangladesh? An application of the marginalized Poisson-Poisson mixture model. BMC Pregnancy Childbirth 22, 476. https://doi.org/10.1186/s12884-022-04794-w
Haque, M.E., Mallick, T.S., Bari, W., 2022b. Does dropout from school matter in taking antenatal care visits among women in Bangladesh? An application of the marginalized Poisson-Poisson mixture model. BMC Pregnancy Childbirth 22, 476. https://doi.org/10.1186/s12884-022-04794-w
Igual, L., Seguí, S., Igual, L., & Seguí, S. (2017). Introduction to data science (pp. 1-4). Springer International Publishing.
Islam A, Biswas T, 2014. Health system in Bangladesh: Challenges and Opportunities. Am J of Health Res. 2(6): 366–374.
Islam, M.M., Masud, M.S., 2018. Determinants of frequency and contents of antenatal care visits in Bangladesh: Assessing the extent of compliance with the WHO recommendations. PLoS ONE 13, e0204752. https://doi.org/10.1371/journal.pone.0204752
Jakkula, V., 2006. Tutorial on support vector machine (svm). School of EECS, Washington State University, 37(2.5), p.3.
Kuhn, M., 2008. Building predictive models in R using the caret package. Journal of statistical software, 28, 1-26.
Kursa, M. B., & Rudnicki, W. R. (2010). Feature selection with the Boruta package. Journal of statistical software, 36, 1-13.
Larose, D. T. (2015). Data mining and predictive analytics. John Wiley & Sons.
Lewis, R.J., 2000, May. An introduction to classification and regression tree (CART) analysis. In Annual meeting of the society for academic emergency medicine in San Francisco, California (Vol. 14). San Francisco, CA, USA: Department of Emergency Medicine Harbor-UCLA Medical Center Torrance.
Ling, C.X., Huang, J. and Zhang, H., 2003, August. AUC: a statistically consistent and more discriminating measure than accuracy. In Ijcai (Vol. 3, pp. 519-524).
Muthukrishnan, R. and Rohini, R., 2016, October. LASSO: A feature selection technique in predictive modeling for machine learning. In 2016 IEEE international conference on advances in computer applications (ICACA) (pp. 18-20). IEEE.
NIPORT, ICDDRB, MEASURE Evaluation, 2017. Bangladesh Maternal Mortality and Health Care Survey (BMMS) 2016: Final Report — MEASURE Evaluation [WWW Document]. URL https://www.measureevaluation.org/resources/publications/tr-18-297.html (accessed 4.4.23).
Nizum, M.W.R., Shaun, M.M.A., Faruk, M.O., Shuvo, M.A., Fayeza, F., Alam, M.F., Mali, S.K., Rahman, M.H., Hawlader, M.D.H., 2023. Factors associated with utilization of antenatal care among rural women in Bangladesh: A community-based cross-sectional study. Clinical Epidemiology and Global Health 20, 101262. https://doi.org/10.1016/j.cegh.2023.101262
Pervin, J., Venkateswaran, M., Nu, U.T., Rahman, M., O’Donnell, B.F., Friberg, I.K., Rahman, A., Frøen, J.F., 2021. Determinants of utilization of antenatal and delivery care at the community level in rural Bangladesh. PLoS ONE 16, e0257782. https://doi.org/10.1371/journal.pone.0257782
Prata, N., Bell, S., Quaiyum, M.A., 2014. Modeling maternal mortality in Bangladesh: the role of misoprostol in postpartum hemorrhage prevention. BMC Pregnancy Childbirth 14, 78. https://doi.org/10.1186/1471-2393-14-78
Rahman, M.M., Hossain, M.A., 2019. Impact of Community Education on Antenatal Care Visits in Bangladesh: A Multilevel Analysis. Dhaka Univ. J. Sci. 67, 41–46. N https://doi.org/10.3329/dujs.v67i1.54570
Roy, A., Shengelia, L., 2016. An Analysis on Maternal Healthcare Situation in Bangladesh: A Review. Divers Equal Health Care 13. https://doi.org/10.21767/2049-5471.100076
Segal, M.R., 2004. Machine learning benchmarks and random forest regression.
Shahjahan, M., Chowdhury, H.A., Akter, J., Afroz, A., Rahman, M.M., Hafez, M., 2013. Factors associated with the use of antenatal care services in a rural area of Bangladesh. SE Asia J. Pub. Health 2, 61–66. https://doi.org/10.3329/seajph.v2i2.15956
Smith, L.C., Ramakrishnan, U., Ndiaye, A., Haddad, L., Martorell, R., 2003. The Importance of Women’s Status for Child Nutrition in Developing Countries: International Food Policy Research Institute (Ifpri) Research Report Abstract 131. Food Nutr Bull 24, 287–288. https://doi.org/10.1177/156482650302400309
Talukder, A. and Ahammed, B., 2020. Machine learning algorithms for predicting malnutrition among under five children in Bangladesh. Nutrition, 78, p.110861.
UNICEF, 2023. Maternal mortality rates and statistics [WWW Document]. UNICEF DATA. URL https://data.unicef.org/topic/maternal-health/maternal-mortality/(accessed 4.8.23).
Villar J, Ba’aqeel H, Piaggio G, Lumbiganon P, Miguel Belizan J, Farnot U et al., 2001. WHO antenatal care randomized trial for the evaluation of a new model of routine antenatal care. Lancet 357(9268):1551–64.
Villar J, Bergsjo P, Carroli G, Gulmezoglu M, 2003. The WHO new antenatal care model: the way forward. Acta obstetricia et gynecologicaScandinavica, 82(11):1065–6.
WHO, 2020. SDG 3: Ensure healthy lives and promote well-being for all at all ages [WWW Document]. URL https://www.saica.org.za/news/sdg-3-ensure-healthy-lives-and-promote-wellbeing-for-all-at-all-ages (accessed 4.6.23).
WHO, 2023. Maternal mortality [WWW Document]. URL https://www.who.int/news-room/fact-sheets/detail/maternal-mortality (accessed 4.6.23).

No competing interests reported.

Download PDF

Reviews received at journal
18 Feb, 2024
Reviewers agreed at journal
03 Feb, 2024
Reviews received at journal
21 Jan, 2024
Reviewers agreed at journal
04 Jan, 2024
Reviewers invited by journal
04 Jan, 2024
Editor assigned by journal
04 Jan, 2024
Editor invited by journal
05 Oct, 2023
Submission checks completed at journal
05 Oct, 2023
First submitted to journal
23 Sep, 2023

You are reading this latest preprint version

Performance Evaluation of Machine Learning Algorithms for Determining Influential Factors of Antenatal Care Visits in Bangladesh

Status:

Version 1

Abstract

Figures

Introduction

Methods and Materials

Data and Variables

Results

Background Characteristics of Mothers

Predicting the determinants of antenatal care visits (ANCs) based on the random forest model

Discussion

Conclusion

Declarations

References

Additional Declarations

Status:

Version 1