Mduma et al. (2019) explored factors which reduce secondary student dropout from school namely; the main source of household income, boy's pupil latrines ratio, the school has girl's privacy room, student gender, a. Their results showed LR = 89.7%, MLP = 86.5%, NB = 78.4%, and RF = 88.8% when compared with traditional ML algorithms training for under-sampling technique; LR = 75%, MLP = 76%, RF = 75%, and KNN = 73%, and for over-sampling; LR = parent who check his/her child's exercise book once in a week e.tc. However, their study focused on student-based and school-based factors compared to Mirza & Hassan (2020) and Lee & Chung (2019) studies considered socio-demographic and socio-economic factors to establish the influential features that lead to dropouts. The prediction enhancement was achieved by the ensemble classifier which combined the Logistic Regression and Multilayer Perceptron to predict secondary students’ dropout. Nevertheless, time set to visit student exercise book is too long in the fact that students are given daily activities/home works to practice for them and pupils-teacher ratio of 1:45 but now increased to 1:51 due to fee free education policy. Moreover, Mduma et al. (2019) evidenced improvement of prediction accuracy after deploying tuning parameters78%, MLP = 64%, RF = 50%, and KNN = 55% to avoid under-fitting and overfitting problem of the machine learning prediction.
Hilmarsson (2019) predicted the likelihood of upper secondary student dropout by using machine learning algorithms. Results revealed that gradient boosting performed better accuracy with 84.2%, followed by random forest with 83.1% and lastly was the AdaBoost with 82.1%. It is interesting to note that the accuracy of each of the algorithms was enhanced when subjected to a different set of factors. For instance, Gradient Boosting performed best with the following factors: average grade, and age. The AdaBoost on the other hand increased prediction accuracy when subjected to the following factors; school distance and class size, and lastly Random Forest performed best with the following factors: average grade, and absence. Based on the results it is hard to indicate the most contributing factors for students to dropout.
The discoveries of the contextual analysis portray factors which contribute to student dropout such factors as; age, gender, residence, family composition, family stress, family income, time for self-study, teacher-student relationship, marriage, peer influence, extra-curriculum activities, stream in higher secondary, student performance and infrastructure (Pant, 2018). The most potential and highly correlated student dropout factors were selected by correlation-based feature selection (CBFS) and then used in Iterative Dichotomiser 3 (ID3) decision tree algorithm to analyze the prediction results. Results showed that ID3 decision tree algorithm performed at an accuracy of 98 percent.
Sivakumar et al. (2016) used a decision tree-based model to investigate the root causes to student dropout. Sivakumar et al. (2016) model included; residence, family type, stream in senior secondary, family experience stress, infrastructure of school, participation in extra curriculum activity, family problems, syllabus, family annual income, father’s education, mother’s education, father’s occupation, mother’s occupation, home-sickness, and teacher-student ratio. Also, Sivakumar et al. (2016) their results revealed contribution of each factors to dropouts; family 10.25%, school 7.58%, low placement rate 4.62%, personal problem 4.92%, and home sickness 4.86%.
Sembiring et al. (2011) investigated on how, a) psychometric factors such as interest, study behavior, engage time, family support and believe, b) demographic factors such as gender, age, family background and disability affect students’ performance lead to dropout. Sembiring et al. (2011) results revealed that, family support contribute by 52.6% to student dropout. Their study showed that Smooth Support Vector Machine provided better prediction results when compared to K-Means clustering. However, Support Vector Machine cannot guarantee realistic results due to the capability of handling small data sets and is not suitable for large data sets (Cervantes et al., 2020; Nalepa & Kawulok, 2019).
To sum up researches conducted study to predict the student dropout in secondary schools employing plethora of machine learning algorithms. The most of the proposed prediction model achieved notable results. However, their prediction results hindered by improper selection of relevant features, algorithm and corresponding hyperparameters for the optimal model (Wen et al., 2020). The study uses the Bayesian hyperparameter optimization technique to project the severity of the secondary school student dropout problem in Sub-Saharan African countries.