Implementation of Bayesian Hyperparameter Optimization for Predicting Student Dropout in Sub-Saharan Africa Secondary Schools

doi:10.21203/rs.3.rs-1881896/v1

Download PDF

Research Article

Implementation of Bayesian Hyperparameter Optimization for Predicting Student Dropout in Sub-Saharan Africa Secondary Schools

https://doi.org/10.21203/rs.3.rs-1881896/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

Many secondary schools in Sub-Saharan countries face the problem of students dropping out of school due to various reasons which are difficult to diagnose directly. Various initiatives such as the big results now initiatives, free education for all, no child left behind, and secondary education development programme as well as machine learning prediction models used to reduce the severity of the problem in Sub-Saharan countries. The ongoing dropout problem, particularly in secondary schools, is ascribed to improper root cause identification and the absence of formal procedures that may be used to estimate the severity of the issue. Bayesian Optimization technique has been used to project the student dropout by optimizing the prediction results. The forecast accuracy of machine learning algorithms is hampered by default optimization techniques, making it difficult to pinpoint the real causes of student dropouts. The performance of the optimized model was evaluated by the average accuracy, precision, recall, f1, Roc curve, and AUC. Prediction accuracy results indicate that NB = 95%, LDA = 94%, SGD = 96%, DT = 97%, RF = 97%, LR = 96%, KNN = 97%, and AdaBoost = 97%. Features including student marks, age, school distance, parental education, number of children, and parental occupation significantly contribute to student dropouts. Results indicate the prediction accuracy of the proposed model outperforms the default optimization method of the machine learning algorithms. A well-optimized strategy draws a lot of attention to the findings related to student dropout rates in Sub-Saharan countries.

Bayesian Optimization

Hyperparameter Optimization Techniques

Machine Learning Algorithms

Automated Machine Learning

Student Dropout Prediction

The high student dropout rate in secondary schools in Sub-Saharan Africa is a challenging problem. With 19% of secondary school-age students dropping out, Sub-Saharan African nations have the highest dropout rates, followed by Northern Africa and Western Asia 9% and Southern Asia 7%. The world data show that Sub-Saharan Africa has (28 million), Southern Asia (17 million) and Eastern and South-Eastern Asia (9 million) (UNESCO, 2019). The region with the greatest dropout rate among adolescents aged 7 to 16 is Sub-Saharan Africa (37%) followed by Southern Asia (15%), and Northern Africa and Western Asia 14% (UNESCO, 2019). The selection of regions depended on the adjusted gender parity to access education in primary and secondary schools in different zones of the world. The distribution of the adjusted gender parity statistics show that the per capita income of the country is the deriving factor to group regional zones of student dropout rate (Global Education Report, 2019). The global lower secondary dropout rate (16%) is twice as high as the primary dropout rate (UNESCO, 2019). The probability of dropout increased by around 5% at primary school and 12% at secondary school in Cambodia in 2015 (Kosal & Kinkesa, 2015). Secondary education presents the biggest bottleneck towards completing other post-education levels (World Bank, 2015). Moreover, secondary education is the last stage of education that is open to all with an average of around three-quarters of young people in Organization for Economic Co-operation and Development (OECD) countries (European Training Foundation, 2007).

Several initiatives implemented to address the student dropout in Sub-Saharan Africa countries. Sub-Saharan countries adopted the Big Results Now Initiative which prioritized the education improvement, accelerated quality improvement in primary and secondary education. The goal was to make sure that students were learning not just attending class (World Bank, 2014). In order to increase secondary school accessibility and ensure that students could pursue their education close to their homes, the United Republic of Tanzania launched the Secondary Education Development Programme (SEDP) in 2005, which established at least one secondary school in each administrative ward (URT, 2008). Despite the elimination of secondary school fees and denouncing of free education does mean free education, parents still pay significant amounts of money for their children education such as school uniforms, school bags, shoes and food which resulted student dropout, absenteeism and lower grade progression persist in secondary schools (Mashala, 2019; USAID, 2010). Student dropout in secondary schools has been projected by different machine learning algorithms (Said, 2020) and (Gil et al., 2020), and ensemble learning methods (Mduma et al., 2019a). These approaches suffer from the feature, algorithm selection for the corresponding hyperparameter values that lessen prediction accuracy of the model (Tuggener et al., 2019). The automated machine learning techniques has been the promising method to accurately identify potential features that lead student to dropout (Mnyawami et al., 2022). The improvement of the prediction model can be achieved by the automation of machine learning algorithms which consists of grid search, randomized search and Bayesian hyperparameter optimization techniques (Yang & Shami, 2020). In Sub-Saharan countries, it is evident that Bayesian optimization technique for predicting student dropout in secondary schools not yet implemented. The study uses the Bayesian hyperparameter optimization technique to project the severity of the secondary school student dropout problem in Sub-Saharan African countries. The remainder of the paper is structured as follows. Section 2 goes over the related works. Section 3 reviews the theoretical background of Bayesian optimization, section 4 presents methods and materials, and Section 5 discusses results. Finally, Section 6 presents the conclusion and future scope.

Mduma et al. (2019) explored factors which reduce secondary student dropout from school namely; the main source of household income, boy's pupil latrines ratio, the school has girl's privacy room, student gender, a. Their results showed LR = 89.7%, MLP = 86.5%, NB = 78.4%, and RF = 88.8% when compared with traditional ML algorithms training for under-sampling technique; LR = 75%, MLP = 76%, RF = 75%, and KNN = 73%, and for over-sampling; LR = parent who check his/her child's exercise book once in a week e.tc. However, their study focused on student-based and school-based factors compared to Mirza & Hassan (2020) and Lee & Chung (2019) studies considered socio-demographic and socio-economic factors to establish the influential features that lead to dropouts. The prediction enhancement was achieved by the ensemble classifier which combined the Logistic Regression and Multilayer Perceptron to predict secondary students’ dropout. Nevertheless, time set to visit student exercise book is too long in the fact that students are given daily activities/home works to practice for them and pupils-teacher ratio of 1:45 but now increased to 1:51 due to fee free education policy. Moreover, Mduma et al. (2019) evidenced improvement of prediction accuracy after deploying tuning parameters78%, MLP = 64%, RF = 50%, and KNN = 55% to avoid under-fitting and overfitting problem of the machine learning prediction.

Hilmarsson (2019) predicted the likelihood of upper secondary student dropout by using machine learning algorithms. Results revealed that gradient boosting performed better accuracy with 84.2%, followed by random forest with 83.1% and lastly was the AdaBoost with 82.1%. It is interesting to note that the accuracy of each of the algorithms was enhanced when subjected to a different set of factors. For instance, Gradient Boosting performed best with the following factors: average grade, and age. The AdaBoost on the other hand increased prediction accuracy when subjected to the following factors; school distance and class size, and lastly Random Forest performed best with the following factors: average grade, and absence. Based on the results it is hard to indicate the most contributing factors for students to dropout.

The discoveries of the contextual analysis portray factors which contribute to student dropout such factors as; age, gender, residence, family composition, family stress, family income, time for self-study, teacher-student relationship, marriage, peer influence, extra-curriculum activities, stream in higher secondary, student performance and infrastructure (Pant, 2018). The most potential and highly correlated student dropout factors were selected by correlation-based feature selection (CBFS) and then used in Iterative Dichotomiser 3 (ID3) decision tree algorithm to analyze the prediction results. Results showed that ID3 decision tree algorithm performed at an accuracy of 98 percent.

Sivakumar et al. (2016) used a decision tree-based model to investigate the root causes to student dropout. Sivakumar et al. (2016) model included; residence, family type, stream in senior secondary, family experience stress, infrastructure of school, participation in extra curriculum activity, family problems, syllabus, family annual income, father’s education, mother’s education, father’s occupation, mother’s occupation, home-sickness, and teacher-student ratio. Also, Sivakumar et al. (2016) their results revealed contribution of each factors to dropouts; family 10.25%, school 7.58%, low placement rate 4.62%, personal problem 4.92%, and home sickness 4.86%.

Sembiring et al. (2011) investigated on how, a) psychometric factors such as interest, study behavior, engage time, family support and believe, b) demographic factors such as gender, age, family background and disability affect students’ performance lead to dropout. Sembiring et al. (2011) results revealed that, family support contribute by 52.6% to student dropout. Their study showed that Smooth Support Vector Machine provided better prediction results when compared to K-Means clustering. However, Support Vector Machine cannot guarantee realistic results due to the capability of handling small data sets and is not suitable for large data sets (Cervantes et al., 2020; Nalepa & Kawulok, 2019).

To sum up researches conducted study to predict the student dropout in secondary schools employing plethora of machine learning algorithms. The most of the proposed prediction model achieved notable results. However, their prediction results hindered by improper selection of relevant features, algorithm and corresponding hyperparameters for the optimal model (Wen et al., 2020). The study uses the Bayesian hyperparameter optimization technique to project the severity of the secondary school student dropout problem in Sub-Saharan African countries.

Machine learning algorithms/models performance can be improved by the optimization methods to solve persisting problem (Wu et al., 2019). The optimization method composed of multiple hyperparameters that set prior to the learning process and affect how the machine learning algorithm fits the model to the particular dataset (Elgeldawi et al., 2021). Hyperparameters control the behaviours of the machine learning algorithms and have a significant affect on performance of the machine learning models (Wu et al., 2019). The process of finding a set of hyperparameter values which archive the best performance on the data in a reasonable amount of time is called hyperparameter optimization (Hutter et al., 2019). The hyperparameter optimization problem is defined by;

\({X}^{*}={}_{xϵX}{}^{argmax}f\left(x\right)\)	(1)
where f = objective function, given set of hyperparameters x from a hyperparameter space X.

The hyperparameter optimization of machine learning algorithms is the main emphasis of AutoML (Tsiakmaki et al., 2020). The hyperparameter optimization can be achieved by grid search, random search and Bayesian optimization (Yang & Shami, 2020). Grid search suffers from the high dimensional data and computationally expensive (Bergstra & Bengio, 2012). Randomized search solves large scale problems efficiently in a way that not possible for grid search (Zabinsky, 2011). Random search, on the other hand, does not employ a search technique to forecast the next trial and does not use data from trials to choose the next set (Tsiakmaki et al., 2020). Therefore, Bayesian optimization method selected in this study due to its superiority compared to random and grid search (Turner et al., 2021). When performing Bayesian optimization, a prior information is established for the optimization function, and gathers the information from previous dataset to update the posterior of the optimization function by using Bayesian theorem (Wu et al., 2019).

The theorem states that the posterior probability of a model M given data D P(M|D) is proportional to the likelihood of D given M P(D|M) multiplied by the prior probability P(M). As for the hyperparameter optimization problem, model M should not be mistaken with the output model of machine learning algorithms (Tsiakmaki et al., 2020). The function that generates the ideal value is deduced from the posterior knowledge. For minimizing or maximizing objective functions that are expensive to analyze for machine learning models, Bayesian optimization is a useful approach (Wu et al., 2019). Due to its controllability for posterior and predictive distributions, Bayesian optimization adapts the Gaussian process method to choose prior information (Rasmussen, 2004). The Gaussian process has desirable properties such as uncertainty estimates over function values, resistance to overfitting, and principled approaches to hyperparameter optimization (Gal, 2015). The Gaussian process is efficient and applied when very little information about the objective function is available, making it useful for optimizing costly black-box functions (Brochu, 2010). Gaussian process specified by its mean function m\(\left(x\right)\) and covariance function k\(\left(\text{x}, {\text{x}}^{\text{*}}\right)\). Therefore, Gaussian process can be defined as;

P\(\left(f\right)\) =\(gp\left(f;m, k\right)\)	(2)
Given observation, D = (X, f)

\(P\left(f\|D\right)=gp\left(f;{m}_{f\|D, } {k}_{f\|D}\right)\)	(3)
where function f is distributed as a \(gp\) with mean function m and covariance function k

4.1 Optimized Prediction Model Setup

The improvement of the machine learning prediction models can be achieved by the AutoML with the hyperparameter optimization methods such as grid search, random search and Bayesian optimization. Bayesian optimization technique was applied in training and testing of the machine learning algorithms performance. Performance of the prediction model measured by the evaluation metrics such as accuracy, precision, recall, f1 and AUC. The study compared obtained results from the previous work which used default optimization methods (Mnyawami et al., 2022). Figure 1 presents logical flowchart machine learning algorithms optimization. Bayesian optimization methodology adapted from Li et al. (2021) to achieve the optimized results compared to traditional training and testing of the machine learning algorithms. The selection criteria of the methodology based on the relevance of the parameters used to improve the prediction accuracy by the given student dropout features. To maximize prediction results, a systematic logical flow is applied to data preprocessing, feature engineering, and the application of the optimized method by Gaussian process. Classification metrics such as accuracy, precision, recall, f1, AUC, and Roc curve are used to evaluate successful results.

4.2 Data Preprocessing Techniques

The dataset for this study was extracted from the TwawezaUwezo information repository. Students who dropped out of school in Tanzania, Uganda, and Kenya included in these datasets. The datasets were stored as Stata (.dta) files, which Jupyter Notebook read, combined, and then transformed to CSV files. Before the data analysis and classification capabilities of Scikit-learn, datasets had 385,634 records with 37 features. The inconsistent rows and features were removed using the univariate feature processing method, there were 206885 samples and 22 features left in the dataset presented in Fig. 2. Features were imputed values in the i-th feature dimension using only non-missing values in that feature dimension (Emmanuel et al., 2021). The missing values was handled by imputation technique using mean of each column in which the missing values are found (Rezaie et al., 2010). Then, 36,723 records with outliers were removed by inter-quartile range (IQR). Dataset was then transformed from categorical to numerical values in order to accurately classify and predict the target outputs.

4.3 Feature Engineering

Feature engineering in Data Science helps to eliminate irrelevant features for classification and prediction, reduces data dimensionality curse, and improves classification accuracy to machine learning algorithms (Cai et al., 2021). Feature selection methods such as information gain, mutual information, cross entropy, chi square and gini index (Zhao et al., 2020) used to return important features for modeling machine learning algorithms. Several studies report that Information gain and chi-squared work better for dataset classification (He et al., 2003). The chi squared have advantages of low complexity, easy understanding and significant classification effect (He et al., 2003) and very useful to acquire the most relevant features (Mengash et al., 2022). Figure 3 shows the most important features for predicting student dropout in secondary schools using the chi-squared approach. The obtained dataset was used to train and evaluate supervised machine learning algorithms like Logistic Regression, Decision Tree, Random Forest, Naive Bayes, K-nearest Neighbors, Linear Discriminant Analysis, and Stochastic Gradient Descent using classic and improved methods. The study reuses student dropout features presented in previous work by (Mnyawami et al., 2022). The experiments showed the improvement of the prediction results using the Bayesian Optimization approach.

Several machine learning models fail to provide appropriate prediction results due to improper selection of the training/testing features and corresponding hyperparameter values for the optimal prediction model (Vaccaro et al., 2021). Table 1 and Figure 4 present experimental prediction results which used basic/default training and testing of naïve bayes (NB), linear discriminant analysis (LDA), stochastic gradient descent (SGD), decision tree (DT), random forest (RF), logistic regression (LR), k-nearest neighbors (K-NN) and Adaptive Booting (AdaBoost). AdaBoost uses ensemble learning method to learn from the mistakes of weak classifiers to create strong prediction model (Wang, 2012). Results indicated in Table 2 resembles with Table 1, adaboost method based on idea of creating a highly accurate prediction model by combining many relatively weak learners of the machine learning (Schapire, 2013).

Table 2 and Figure 5 presents better improvement of the prediction accuracy compared to Table 1 and Figure 4. Prediction accuracy results indicate that NB = 95%, LDA = 94%, SGD = 96%, DT = 97%, RF = 97%, LR = 96%, KNN = 97%, and AdaBoost = 97%. Other metrics show slight changes to some machine learning algorithms e.g precision for NB = 62%, SGD = 63%, DT = 70%, LR = 68%, and KNN = 70%. It is proven that there is trade-off between precision and recall in a given dataset when one increases cause other decreases (Alvarez, 2002). Thus, increasing precision through the use of optimized methods often results in decreased recall, and vice versa.

Table 3 displays the prediction accuracy of machine learning algorithms such as Decision Tree (DT), Nave Bayes (NB), Random Forest (RF), and Support Vector Machine (SVM), Logistic Regression (LR), Adaptive Boosting (AdaBoost), and ensemble prediction methods for predicting student dropout in secondary schools. Prediction result variations caused by poor data preprocessing, feature engineering, inappropriate machine learning algorithm selection, as well as their corresponding hyperparameter values for the best prediction model.

Figure 6 depicts the prediction accuracy of each machine learning algorithm when basic experimentation settings are used. Results show that SGD = 40%, LDA = 59%, DT = 82%, KNN = 85%, LR = 98%, RF = 97%, NB = 50%, and AdaBoost = 98%.

Figure 7 depicts the prediction results of various machine learning algorithms in the demonstration using the Bayesian Optimization method. The results show that the machine learning algorithms have improved when compared to basic training and testing. The optimized model shows that SGD = 78%, LDA = 60%, DT = 98%, KNN = 85%, LR = 99%, RF = 99%, NB = 60%, and AdaBoost = 99%.

Most of the existing prediction models used age, gender, attendance, truancy, early marriage, parents’ occupation, class activities, repetition, family size, school cost, distance, disability, extra-curricular activities, and student marks/grades. Some of these features are also applied in this study as presented in Figure 3. Table 3 compares the summary results of the various prediction models/algorithms to the proposed Bayesian Optimization model in Table 2. The Bayesian Optimization model finds better search points with fewer function evaluations and is robust to noisy datasets (Frazier, 2018).

In many secondary schools in Sub-Saharan Africa, children leave school for a variety of reasons that are hard to pinpoint with precision. The severity of the issue has been lessened in Sub-Saharan countries through a number of programs, including the Big Results Now projects, Free Education for All, No Child Left Behind, Secondary Education Development Programme, and machine learning prediction models. Due to incorrect root cause identification and a lack of formal protocols that may be used to gauge the severity of the problem, dropout rates continue to be a problem, especially in secondary schools. By improving the results of the predictions, the Bayesian Optimization technique has been applied to project the student dropout rate. The performance of the optimized model evaluated by the average accuracy, precision, recall, f1, and AUC. Prediction accuracy results indicate that NB = 95%, LDA = 94%, SGD = 96%, DT = 97%, RF = 97%, LR = 96%, KNN = 97%, and AdaBoost = 97%. Student dropouts are greatly influenced by factors including student marks, age, distance to school, parental education, family size, and occupation. Results show that the suggested model outperforms the machine learning algorithms' default optimization technique in terms of prediction accuracy. The results pertaining to student dropout rates in Sub-Saharan nations receive considerable attention when a well-optimized technique is applied. Furthermore, the study suggests using Genetic Algorithms to accurately predict student dropouts in Sub-Saharan secondary schools.

The study provides declarations of the following sections for further administrative proceedings.

Ethics approval and consent to participate

Not applicable

Consent for publication

Not applicable

Availability of data and materials

The datasets generated and/or analysed during the current study are available in the Uwezo - Datasets - openAFRICA (africaopendata.org)

Conflict of Interest

Authors declare that no conflict of interest exists.

Funding

The Ministry of Education, Science and Technology supported local expenses during preparation of this study.

Authors' contributions

HM and JM contributed and approved the logical flow of the manuscript. YM collected data, analyzed, and interpreted findings. Also, YM did a lot of experiments to make different Machine learning algorithms comparisons.

Acknowledgements

The Ministry of Education, Science, and Technology (MoEST) deserves special credit for funding this research.

Alvarez, S. A. (2002). An exact analytical relation among recall, precision, and classification accuracy in information retrieval. In Boston, Technical Report BCCS-02-01.
Bergstra, J., & Bengio, Y. (2012). Random Search for Hyper-parameter Optimization. Journal of Machine Learning Research, 13, 281–305.
Brochu, E. (2010). Interactive Bayesian optimization: Learning User Preferences for Graphics and Animation. In University of British Columbia. https://doi.org/10.14288/1.0051462
Cai, L. J., Lv, S., & Shi, K. B. (2021). Application of an Improved CHI Feature Selection Algorithm. Discrete Dynamics in Nature and Society, 2021. https://doi.org/10.1155/2021/9963382
Cervantes, J., Garcia-lamont, F., Rodríguez-mazahua, L., & Lopez, A. (2020). A Comprehensive Survey on Support Vector Machine Classification: Applications, Challenges and Trends. Neurocomputing, xxxx. https://doi.org/10.1016/j.neucom.2019.10.118
Elgeldawi, E., Sayed, A., Galal, A. R., & Zaki, A. M. (2021). Hyperparameter tuning for machine learning algorithms used for arabic sentiment analysis. Informatics, 8(4), 1–21. https://doi.org/10.3390/informatics8040079
Emmanuel, T., Maupong, T., Mpoeleng, D., Semong, T., Mphago, B., & Tabona, O. (2021). A Survey on Missing Data in Machine Learning. In Journal of Big Data (Vol. 8, Issue 1). Springer International Publishing. https://doi.org/10.1186/s40537-021-00516-9
EU. (2007). Secondary education in OECD countries: Common challenges, differing solutions. In European Training Foundation.
Frazier, P. I. (2018). A Tutorial on Bayesian Optimization (Issue Section 5, pp. 1–22). http://arxiv.org/abs/1807.02811
Gal, Y. (2015). Dropout as a Bayesian Approximation: Insights and Applications. Deep Learning Workshop, ICML.
GEMR. (2019). Migration, displacement and education: Building Bridges, Not Walls. In UNESCO Publishing: Vol. Second edi (Issue 1).
Gil, J. S., Delima, A. J. P., & Vilchez, R. N. (2020). Predicting Students’ Dropout Indicators in Public School using Data Mining Approaches. International Journal of Advanced Trends in Computer Science and Engineering, 9(1), 774–778. https://doi.org/https://doi.org/10.30534/ijatcse/2020/110912020
He, J., Tan, A. H., & Tan, C. L. (2003). On machine learning methods for Chinese document categorization. Applied Intelligence, 18(3), 311–322. https://doi.org/10.1023/A:1023202221875
Hilmarsson, H. Æ. (2019). Using Machine Learning for Predicting the Likelihood of Upper Secondary School Student Dropout. University of Iceland.
Hutter, F., Kotthoff, L., & Vanschoren, J. (2019). Automated Machine Learning: Methods, Systems, and Challenges. In The Springer Series on Challenges in Machine Learning. https://doi.org/https://doi.org/10.1007/978-3-030-05318-5_1
Kosal, B., & Kinkesa, K. (2015). Determinants of Student Dropout in Cambodia ’ s Primary and Lower Secondary Schools: A Survey of Program Interventions.
Lee, S., & Chung, J. Y. (2019). The Machine Learning-Based Dropout Early Warning System for Improving the Performance of Dropout Prediction. Applied Sciences, 9(15), 3093. https://doi.org/10.3390/app9153093
Li, Y., Zhang, Y., & Cai, Y. (2021). A new hyper-parameter optimization method for power load forecast based on recurrent neural networks. Algorithms, 14(6). https://doi.org/10.3390/a14060163
Márquez-Vera, C., Cano, A., Romero, C., Noaman, A. Y. M., Mousa Fardoun, H., & Ventura, S. (2016). Early Dropout Prediction using Data Mining: A case Study with High School Students. Expert Systems, 33(1), 107–124. https://doi.org/10.1111/exsy.12135
Mashala, Y. L. (2019). The Impact of the Implementation of Free Education Policy on Secondary Education in Tanzania. International Journal of Academic Multidisciplinary Research, 3(1), 6–14.
Mduma, N., Kalegele, K., & Machuve, D. (2019a). An Ensemble Predictive Model Based Prototype for Student Drop-out in Secondary Schools. Journal of Information Systems Engineering & Management, 4(3). https://doi.org/10.29333/jisem/5893
Mduma, N., Kalegele, K., & Machuve, D. (2019b). Machine Learning Approach for Reducing Students Dropout Rates. International Journal of Advanced Computer Research, 9(42), 156–169. https://doi.org/10.19101/ijacr.2018.839045
Mengash, H. A., Hussain, L., Mahgoub, H., Al-Qarafi, A., Nour, M. K., Marzouk, R., Qureshi, S. A., & Hilal, A. M. (2022). Smart Cities-Based Improving Atmospheric Particulate Matters Prediction Using Chi-Square Feature Selection Methods by Employing Machine Learning Techniques. In Applied Artificial Intelligence (Vol. 36, Issue 1). Taylor & Francis. https://doi.org/10.1080/08839514.2022.2067647
Mirza, T., & Hassan, M. M. (2020). Prediction of School Drop outs witht the help of Machine Learning Algorithms. GIS Science Journal, 7(7), 253–263.
Mnyawami, Y. N., Maziku, H. H., & Mushi, J. C. (2022). Enhanced Model for Predicting Student Dropouts in Developing Countries Using Automated Machine Learning Approach: A Case of Tanzanian’s Secondary Schools. Applied Artificial Intelligence, 36(1), 433–451. https://doi.org/10.1080/08839514.2022.2071406
Nalepa, J., & Kawulok, M. (2019). Selecting Training Sets for Support Vector Machines: A Review. Artificial Intelligence Review, 52(2), 857–900. https://doi.org/10.1007/s10462-017-9611-1
Pant, A. (2018). Determining Factors to Predict Rate of Student Dropout in Nepal (Issue November 2017) [Pokhara University]. https://doi.org/10.13140/RG.2.2.34179.40486
Rasmussen, C. E. (2004). Gaussian Processes in machine learning. Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3176, 63–71. https://doi.org/10.1007/978-3-540-28650-9_4
Rezaie, M. G., Zadeh, H. S., Ying, H., & Dong, M. (2010). Selection-Fusion Approach for Classification of Datasets with Missing Values. Pattern Recognit, 43(6), 1–27. https://doi.org/doi:10.1016/j.patcog.2009.12.003
Said, H. (2020). Developing Dropout Predictive System for Secondary Schools, By Using Classification Algorithm: A Case Study of Tabora Region.
Sansone, D. (2019). Beyond Early Warning Indicators: High School Dropout and Machine Learning. Oxford Bulletin of Economics and Statistics, 81(2), 456–485. https://doi.org/10.1111/obes.12277
Saranya, A., & Rajeswari, J. (2016). Enhanced Prediction of Student Dropouts Using Fuzzy Inference System and Logistic Regression. ICTACT Journal on Soft Computing, 06(02), 1157–1162. https://doi.org/10.21917/ijsc.2016.0161
Satyanarayana, A., & Nuckowski, M. (2016). Data Mining using Ensemble Classifiers for Improved Prediction of Student Academic Performance. ASEE Mid-Atlantic Section Spring 2016 Conference, George Washington University, Washington D.C.
Schapire, R. E. (2013). Explaining adaboost. Empirical Inference: Festschrift in Honor of Vladimir N. Vapnik, 37–52. https://doi.org/10.1007/978-3-642-41136-6_5
Sembiring, S., Zarlis, M., Hartama, D., & Wani, E. (2011). Prediction of Student Academic Performance by an Application of Data Mining Techniques. International Conference on Management and Artificial Intelligence, 6, 110–114.
Sivakumar, S., Venkataraman, S., & Selvaraj, R. (2016). Predictive Modeling of Student Dropout Indicators in Educational Data Mining Using Improved Decision Tree. Indian Journal of Science and Technology, 9(4), 1–5. https://doi.org/10.17485/ijst/2016/v9i4/87032
Tsiakmaki, M., Kostopoulos, G., Kotsiantis, S., & Ragos, O. (2020). Implementing autoML in educational data mining for prediction tasks. Applied Sciences (Switzerland), 10(1), 1–27. https://doi.org/10.3390/app10010090
Tuggener, L., Amirian, M., Rombach, K., Lorwald, S., Varlet, A., Westermann, C., & Stadelmann, T. (2019). Automated Machine Learning in Practice: State of the Art and Recent Results. Proceedings - 6th Swiss Conference on Data Science, SDS 2019, 31–36. https://doi.org/10.1109/SDS.2019.00-11
Turner, R., Eriksson, D., McCourt, M., Kiili, J., Laaksonen, E., Xu, Z., & Guyon, I. (2021). Bayesian Optimization is Superior to Random Search for Machine Learning Hyperparameter Tuning: Analysis of the Black-Box Optimization Challenge 2020. NeurIPS 2020 Competition and Demonstration Track, 3–26. http://arxiv.org/abs/2104.10201
UNESCO. (2019). New methodology shows that 258 Million children, adolescents and youth are out of school. In Http://Uis.Unesco.Org: Vol. UNESCO Ins (Issue 56). http://uis.unesco.org
URT. (2008). The United Republic of Tanzania. Education Sector Development Programme 2008-2017.
USAID. (2010). School Fee Abolition: Impact on Learning and Persistence.
Vaccaro, L., Sansonetti, G., & Micarelli, A. (2021). An empirical review of automated machine learning. Journal of Computers, 10(1), 1–27. https://doi.org/10.3390/computers10010011
Wang, R. (2012). AdaBoost for Feature Selection, Classification and Its Relation with SVM, A Review. 2012 International Conference on Solid State Devices and Materials Science, 25, 800–807. https://doi.org/10.1016/j.phpro.2012.03.160
WB. (2015). Out-of-School Youth in Sub-Saharan Africa: A Policy Perspective. https://doi.org/10.1596/978-1-4648-0505-9
Wen, L., Ye, X., & Gao, L. (2020). A new Automatic Machine Learning based Hyperparameter Optimization for Workpiece Quality Prediction. Measurement and Control (United Kingdom), 53(7–8), 1088–1098. https://doi.org/10.1177/0020294020932347
World Bank. (2014). How Tanzania Plans to Achieve “Big Results Now” in Education. http://www.worldbank.org/en/news/feature/2014/07/10/how-tanzania-plans-to-achieve-big-reforms-now-in-education
Wu, J., Chen, X. Y., Zhang, H., Xiong, L. D., Lei, H., & Deng, S. H. (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1), 26–40. https://doi.org/10.11989/JEST.1674-862X.80904120
Yang, L., & Shami, A. (2020). On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing, 415, 295–316. https://doi.org/10.1016/j.neucom.2020.07.061
Zabinsky, Z. B. (2011). Random Search Algorithms. Wiley Encyclopedia of Operations Research and Management Science. https://doi.org/10.1002/9780470400531.eorms0704
Zhao, X., Liu, K., Fan, W., Jiang, L., Zhao, X., Yin, M., & Fu, Y. (2020). Simplifying Reinforced Feature Selection via Restructured Choice Strategy of Single Agent. Proceedings - IEEE International Conference on Data Mining, ICDM, 2020-Novem, 871–880. https://doi.org/10.1109/ICDM50108.2020.00096

Table 1

Traditional Training and Testing of Machine Learning Algorithms

MLAs	Accuracy	Precision	Recall	F1	AUC
NB	0.94	0.48	1.00	0.65	0.97
LDA	0.93	0.44	0.59	0.5	0.77
SGD	0.93	0.45	0.93	0.6	0.93
DT	0.98	0.65	0.66	0.65	0.81
RF	0.98	0.73	0.75	0.74	0.86
LR	0.97	0.68	0.68	0.68	0.83
KNN	0.96	0.65	0.46	0.54	0.72
AdaBoost	0.97	0.71	0.76	0.74	0.87

Table 2

Optimized Prediction Results by Bayesian Method

MLAs	Accuracy	Precision	Recall	F1	AUC
NB	0.96	0.62	0.46	0.53	0.73
LDA	0.94	0.43	0.58	0.50	0.77
SGD	0.96	0.63	0.55	0.59	0.76
DT	0.97	0.70	0.73	0.71	0.85
RF	0.97	0.73	0.76	0.74	0.87
LR	0.96	0.68	0.67	0.67	0.82
KNN	0.97	0.70	0.65	0.67	0.82
AdaBoost	0.97	0.71	0.76	0.74	0.87

Table 3

Secondary Schools dropout prediction results by various ML algorithms

Author(s)	Algorithm(s)	Accuracy%

Mirza & Hassan (2020)	DT (CART)	89%
	RF	96%
	SVM	93%
	NB	94%
Said (2020)	GradientBoosting	82.6%
	RF	85.5%
	MLP	74.1%
	KNN	78.1%
	Ensemble classifier	82.6%
Hilmarsson (2019)	Gradient Boosting	84.2%
	AdaBoost	82.1%
	RF	83.1%
Lee and Chung (2019)	RF	63.4%
	DT	89.4%
	RF + SMOTE	64.3%
	DT + SMOTE	72.4%
Sansone (2019)	SVM	89.1%
	Boosted Regression	88.8%
	LASSO	89.9%
Márquez-Vera et al. (2016)	NB SMO IBk JRip DT(J48) ICRM	84.5% 90.0% 88.3% 89.0% 90.7% 90.0%
Satyanarayana and Nuckowski (2016)	Ensemble classifier	95%
Saranya and Rajeswari (2016)	NB	89.2%
	MLP	83.3%
	LR	97.3%
	SMO	86.5%
	AdaBoost	97.3%
	JRip	94.6%
Sara et al. (2015)	RF	93.7%
	SVM	90.4%
	DT(CART)	89.8%
	NB	86.6%

No competing interests reported.

Download PDF

Version 1

posted

You are reading this latest preprint version

Implementation of Bayesian Hyperparameter Optimization for Predicting Student Dropout in Sub-Saharan Africa Secondary Schools

Status:

Version 1

Abstract

Figures

1. Introduction

2. Related Works

3. Theoretical Background Of Bayesian Optimization Technique

4. Methods And Materials

4.1 Optimized Prediction Model Setup

4.2 Data Preprocessing Techniques

4.3 Feature Engineering

5. Results And Discussion

6. Conclusion And Future Directions

Declarations

References

Tables

Additional Declarations

Status:

Version 1