Enhancing High-School Dropout Identification: A Collaborative Approach Integrating Human and Machine Insights

doi:10.21203/rs.3.rs-3871667/v1

Download PDF

Research Article

Enhancing High-School Dropout Identification: A Collaborative Approach Integrating Human and Machine Insights

https://doi.org/10.21203/rs.3.rs-3871667/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Despite its proven success in fields like engineering, business, and healthcare, human-machine collaboration in education remains relatively unexplored. This study aims to highlight the advantages of human-machine collaboration for improving efficiency and accuracy of decision-making processes in educational settings. High school dropout prediction serves as a case study for examining human-machine collaboration's efficacy. Unlike previous research that prioritized high accuracy with immutable predictors, this study seeks to bridge gaps by identifying actionable factors for dropout prediction through human-machine collaboration. Utilizing a large dataset from the High School Longitudinal Study of 2009 (HSLS:09), two machine learning models were developed to predict 9th -grade students' high school dropout history. Results indicated that the Random Forest algorithm outperformed the deep learning algorithm. Model explainability revealed the significance of actionable variables such as students’ GPA in the 9th grade, sense of school belonging, and self-efficacy in mathematics and science, along with immutable variables like socioeconomic status, in predicting high school dropout history. The study concludes with discussions on the practical implications of human-machine partnerships for enhancing student success.

high school dropout

machine learning

explainable AI

human-machine collaboration

In the wake of substantial technological advancements, particularly in the realms of increased computing power, data storage, and network capabilities, machines have undergone a profound transformation from basic mechanization and automation to a state of intelligentization in recent decades [37]. The contemporary landscape witnesses intelligent machines not only keeping pace with human capabilities but also outperforming them in various scenarios. This transformative trajectory has propelled the dynamic between humans and machines beyond the conventional paradigm of basic human-machine interaction, where machines primarily present information and humans make decisions [28]. The current era embraces an advanced stage of human-machine collaboration, characterized by the convergence of cognitive abilities. In this evolved paradigm, both humans and machines exhibit prowess in thinking, decision-making, and synergistically working together to elevate the overall quality of decisions. This symbiotic relationship heralds a new era where the synergy between human intellect and machine capabilities transcends traditional boundaries.

The effectiveness of human-machine collaboration can also be significantly augmented through the integration of an explainability layer into machines. This innovative approach, commonly known as Explainable Artificial Intelligence (XAI), serves as a crucial bridge between the complex, opaque decision-making processes of machine learning (ML) models and the human stakeholders interacting with them [3, 9]. In essence, XAI acts as a cornerstone for building trust in the symbiotic relationship between humans and intelligent systems. By mitigating the “black box” problem associated with complex ML algorithms, the layer of explainability empowers machines to elucidate their reasoning processes in a manner that is comprehensible and inherently trustworthy to humans [6]. This transparency not only enhances accountability but also allows human stakeholders to validate and contextualize the decisions made by intelligent systems.

Recent research indicates a growing inclination among humans to engage in collaborative efforts with machines, as highlighted by Haesevoets et al. [10]. The synergistic decision-making process involving both humans and machines has been substantiated as consistently outperforming decisions made in isolation by either party, as evidenced by the findings of Xiong et al. [36]. While the paradigm of human-machine collaboration has found its way into various domains, such as engineering, healthcare, business, and organizational settings, its integration into the realm of education remains notably understated. Recognizing this gap, the current study seeks to address the insufficient incorporation of human-machine collaboration within the educational landscape. Specifically, our focus is on illustrating the practical implementation of this collaborative approach in the context of education. To achieve this, we choose the pertinent example of high school dropout prediction, an issue of paramount importance. Our aim is not only to showcase the efficacy of human-machine collaboration in this domain but also to identify actionable factors contributing to the prediction of high school dropouts by leveraging the capabilities of different XAI techniques.

2.1 Human-Machine Collaboration

Contemporary human-machine collaboration refers to the synergistic partnership between humans and intelligent machines. These machines can take various forms, including automated systems, autonomous agents, robots, algorithms, or artificial intelligence (AI) entities [15, 37]. This collaborative approach results in enhanced performance by leveraging the strengths of both intelligent machines and human intelligence, addressing their respective limitations [2, 34, 38]. Intelligent machines also excel in processing vast amounts of information and generating rational outcomes without succumbing to cognitive biases (e.g., availability bias, representativeness bias, and anchoring effect) or being swayed by internal and external factors (e.g., ability, cognitive style, emotions, workload, fatigue, and time pressure) [32]. Conversely, humans possess unique advantages in employing intuition and experience to discern critical factors, adapt to novel conditions, and rapidly learn and apply reasoning to navigate high uncertainty or tackle new, complex, and rare challenges.

In light of these benefits, human-machine collaboration has experienced a growing application across diverse domains, including engineering [29], healthcare [21, 31], and business [11, 15]. For example, a study by Wilson and Daugherty [33] analyzed 1500 companies spanning 12 industries and found that the most substantial performance enhancements occurred when humans collaborated with machines. Moreover, research indicates that human-machine collaboration surpasses the efficacy of operations involving only humans or machines separately. For instance, a study by Xiong et al. [36] explored the performance of human-only, machine-only, and human-machine joint teams in a sequential risky decision-making task. The findings revealed that the human-machine joint teams outperformed both human-only and machine-only teams. In the human-machine joint team, the machine as a partner entailed human decision-makers to cede power and coordinate, and their pumping decisions became more conservative and fluctuating.

The trajectory of human-machine collaboration also extends to medical disciplines. A noteworthy example of successful human-machine collaboration in healthcare is evident in cancer detection through the analysis of lymph node cell images [31]. The study demonstrated that combining predictions from a deep learning system with diagnoses from a human pathologist achieved an area under the receiver operating curve (AUC) of 0.995, surpassing the AUC of the deep learning system alone (0.925) and that of the pathologist alone (0.966). This integration resulted in a remarkable reduction in error rates, amounting to at least 85%. Beyond cancer detection, collaborative frameworks have been instrumental in areas such as personalized medicine, where the integration of machine-generated insights with clinical expertise allows for tailored treatment plans based on individual patient characteristics (e.g., Khan et al. [12]).

2.2 Explainable AI

The human-machine collaboration can be enhanced by enabling machines to explain their reasoning in a way that is understandable and trustable to humans [6]. This can be achieved through the integration of XAI [22]. XAI, a sub-field of AI, provides human-interpretable explanations regarding the rationale, strengths, weaknesses, and anticipated behavior of AI systems [9, 23]. In recent years, the significance of XAI has increased due to the widespread applications of advanced AI techniques such as deep learning models. Despite their remarkable accuracy in predictions and classifications, these models are often characterized as "black box" models [23, 27]. This label stems from the reliance of machine learning models on mathematical constructs, featuring an extensive array of abstract, numerical parameters, often numbering in the millions or even billions. These parameters are learned from training data, presenting a challenge in offering profound insights into the intricate dependencies, causal relationships, and internal structures of the models [3, 18]. The opaqueness inherent in these black box models introduces the potential for misleading users [20], raising substantial concerns, particularly in sensitive domains such as healthcare and other applications that involve human life, rights, finances, and privacy [3].

To enhance the interpretability of AI outputs, researchers have proposed various XAI methods. According to the latest comprehensive review conducted by Minh et al. [16], XAI methods fall into three main categories: pre-modeling explainability, interpretable models, and post-modeling explainability. The pre-modeling explainability method involves a set of data processing approaches applied to gain insights into datasets used for training ML models. This includes data analysis, summarization, and transformation. On the other hand, interpretable models refer to those that can be understood by humans through examination of the model summary or parameters, such as linear models, decision trees, k-nearest neighbors, and rule-based models. Lastly, the post-modeling explainability method aims to enhance the interpretability of existing black-box ML models by employing various techniques.

Given its widespread application, Minh et al. [16] categorized post-modeling explainability techniques into four main types. First, textual justification generates explanatory text in the form of phrases or sentences. Second, visualization provides clarity through visual images, utilizing techniques like layer-wise relevance propagation (LRP) and local interpretable model-agnostic explanation (LIME). Third, simplification creates a new and simpler system from complex ML models, employing techniques such as local explanation and example generation. Fourth, feature relevance quantifies the importance of input variables, incorporating techniques like SHapley Additive exPlanations (SHAP). According to Minh et al.'s [16] summary, visualization, simplification, and feature relevance emerge as the three commonly used XAI methods, emphasizing their role in rendering AI systems more transparent and understandable.

2.3 Predicting High-School Dropout

The issue of high school dropouts has long been a focal point in education. For example, research conducted in Wisconsin revealed that approximately 3,000 students discontinue their education before reaching the 12th grade, with around 1,500 of these dropouts occurring during the 9th and 10th grades (Knowles, 2015). In response to this concerning trend, efforts by researchers, policymakers, and school administrators have been directed toward the development of early warning systems powered by ML models. These systems aim to identify students at risk of dropping out of high school and uncover actionable predictors that can guide future interventions and policy adjustments (Allensworth et al., 2018; Bowers, 2021).

To date, a plethora of studies have harnessed ML models to forecast high school dropout based on various background and demographic characteristics exhibited by students, including low grades, aggressive behavior, student poverty, and high absenteeism. For instance, Sara et al. (2015) employed the Random Forest algorithm to predict the dropout status of Danish high school students, utilizing demographic and school-related variables such as gender, school and class size, and teacher-pupil ratio. Chung and Lee (2019) similarly utilized RF to anticipate the dropout status of Korean high school students. In contrast, Sansone (2019) delved into the dropout phenomena among American students, employing Support Vector Machine, Boosted Regression, and Post-LASSO algorithms. Interestingly, this study discovered that GPA, rather than demographic variables, emerged as the most accurate predictor.

While ML models have been extensively employed in dropout prediction, only a limited number of studies have integrated XAI to comprehend high school or college dropout [14, 15, 19]. For example, Krüger et al. [14] investigated dropout factors within the Brazilian technical school system using XAI methods, specifically SHAP and LIME. The findings highlighted the significance of the year of elementary school completion, the family's minimum wages, and the mother's education and work characteristics as important predictors of dropout. Additionally, Nagy and Molontay [18] also employed XAI techniques, SHAP and LIME, revealing that a higher GPA in high school or higher marks in the mathematics section of the matura exam could significantly reduce the likelihood of college dropout.

A significant drawback in prior research exploring high school dropouts through ML models or XAI lies in the substantial reliance on immutable predictors rather than actionable predictors. Immutable predictors encompass variables over which students, teachers, administrators, and family or community members possess limited or no control—examples include gender, ethnicity, and socioeconomic status. On the other hand, actionable predictors, also known as malleable predictors, denote variables that are recent or real-time, adaptable, and amenable to intervention. These predictors can be utilized to implement tailored interventions or modify the current education system. Examples of actionable predictors include orientation to the future and academic habits of mind, such as self-regulation, self-efficacy, and time management (Ben-Avie & Darrow, 2018; Bowers, 2021).

2.4 Current Study

While prior studies have made significant contributions to the domains of human-machine collaboration, explainable AI, and high school dropout prediction, there remain notable gaps that warrant further exploration. Firstly, despite the burgeoning use of human-machine collaboration in fields such as engineering, business, and healthcare, its application within the context of education remains underexplored. However, the potential benefits of incorporating human-machine collaboration in education are substantial. This approach has the capacity to enhance efficiency and accuracy, allowing educators to dedicate more time to personalized teaching methods. Furthermore, the integration of AI can yield results that are more user-friendly, ultimately assisting teachers in improving student engagement and achievement. Therefore, this study seeks to exemplify the implementation of human-machine collaboration in education, using high school dropout prediction as a case study. Secondly, previous research on predicting high school dropouts through ML or XAI techniques has primarily concentrated on achieving higher prediction accuracy based on immutable predictors. However, these immutable predictors offer limited guidance for conducting interventions or modifying the current education system. Consequently, this study aims to address these dual gaps by identifying actionable factors for predicting high school dropouts through the incorporation of the human-machine collaboration paradigm.

3.1 Dataset

This study used empirical data from the High School Longitudinal Study of 2009 (HSLS:09)1. HSLS:09 is a nationally representative, longitudinal study that investigated possible factors impacting 9th-grade students’ postsecondary education and career trajectories in the United States [19]. In this study, we excluded students for whom the submitted information was not reported by their parents to preserve an accurate representation of variables involving parent-related constructs such as parent education and parental expectations of their child. After this removal, the final sample consisted of 16,137 students.

A review of the students’ demographic background revealed that the majority of their parents attained an educational level equivalent to high school or General Educational Development, with n = 6298 (39%) for mother/female guardian and n = 5376 (33%) for father/male guardian. The gender distribution among the students was balanced, with 8111 males and 8026 females. In terms of ethnicity, the majority of students were white (n = 9313; 58%), followed by Hispanic (n = 2433; 15%) and Black (n = 1480; 9%) students. Students from other races account for n = 2911 (18%). Geographically, the majority of the students came from the Southern (n = 6525, 40%) and Midwestern (n = 4332, 27%) regions of the United States. With respect to the school locale, 5803 students (36%) were from suburban areas, 4686 students (29%) were from city areas, 3784 students (23%) were from rural areas, and 1864 (12%) students were from town areas.

3.2 Data Preprocessing

The data preprocessing started with an initial review of the dataset obtained from the HSLS:09 website, encompassing 23,503 students and 67 variables, including the target variable indicating students' high school dropout history. As previously mentioned, students with missing parental responses were excluded, resulting in a final sample of 16,137 students. Within the dataset, 42 variables (63%) were categorical, while 25 variables (37%) were continuous. A thorough examination of the variables in the dataset was conducted, with a focus on identifying and quantifying missing values. The dataset exhibited an overall missing value rate of 14.5%.

To enhance data quality, variables with missing value percentages exceeding 30% were eliminated, thereby reducing the initial set of 67 variables to 51. Subsequently, the remaining missing values underwent replacement using a Random Forest-based multivariate imputation through chained equations, employing the mice package [29] in R (R Core Team, 2022). Following imputation, a correlation analysis was executed on the dataset to identify variables that were not correlated with the target variable (i.e., high school dropout history). Figure 1 shows the correlation matrix of the pre-trimmed dataset. Further refinement involved removing variables based on their theoretical relevance and correlation insignificance. The final dataset, comprising 16,137 students and 37 variables (23 categorical and 14 continuous), underwent a final correlation analysis, as illustrated in Fig. 2.

3.3 Data Split and Augmentation

The preprocessed dataset was split into two parts: A training dataset (70%) and a testing dataset (30%). Next, the proportion of the target variable (i.e., high school dropout history) was examined prior to the predictive modeling phase. The proportion between the two classes of the target variable in the dataset appeared highly skewed (i.e., a small number of dropout cases relative to the number of students who graduated from high school). In the training dataset, there was a severe class imbalance in the high school dropout history variable, with 1,376 students dropping out before high school graduation and 9,919 students who did not drop out (the disparity ratio was approximately 7:1). This imbalance was expected due to the occurrence rarity of the dropout phenomenon [5].

To address the imbalance issue, we utilized a hybrid resampling technique involving both the Synthetic Minority Oversampling Technique for Nominal and Continuous (SMOTE-NC) and Random Undersampling (RUS) [11, 35]. This combination of techniques allowed us to synthesize the minority class while also undersampling the majority class. The SMOTE-NC configuration for synthesizing the minority data points was based on their five nearest neighbors and 0.8 resampling ratios. The final sample of the training dataset was n = 15,870, with 7,935 cases for each class of high school dropout history. We did not balance classes of the target variable in the testing dataset to reflect real-life conditions with class imbalance. Hence, the final sample size of the testing dataset was n = 4842, with 4,214 for the majority class (i.e., non-dropped-out students) and 628 for the minority class (i.e., drop-out students) of high school dropout history.

3.4 Classification Algorithms

We utilized two classification algorithms to predict high school dropout, namely Random Forest with a collection of decision tree classifiers [24] and deep learning through the Keras library [7, 19] in Python.

3.4.1 Classification with Random Forest

For the Random Forest classifier, a randomized grid search was performed to look for the optimal hyperparameter values. The search space comprised 50 sets of hyperparameter values fitted with 3-fold cross-validation (3-fold CV), totaling 150 model fits. Candidate hyperparameter values were selected for the maximum tree depth (max_depth), number of trees (N_estimators), and number of features factored in determining the best node split (max_features). The options for max_depth values were generated from an array of 20 evenly spaced values between 100 and 500 (inclusive). The options for N_estimators were generated from an array of 20 evenly spaced values between 200 and 2000 (inclusive). For max_features, the options were ‘auto’ in which all features were used, ‘sqrt’ in which the square root of the total number of features was used to split, and “log2,” in which log 2 of the total number of features was used to split. The resulting hyperparameter values for the high school dropout prediction in this study were max_depth: 436, n_estimators: 1621, and max_features: ‘sqrt’.

Subsequently, the fine-tuned Random Forest model was used to select optimal features for the prediction with recursive feature elimination with cross-validation (RFECV). RFECV was configured with step = 1 to sequentially remove one predictor at a time and CV = 5 to perform 5-fold cross-validation to fit and evaluate predictor candidates. As a result, 16 predictors were retained for the classification (see Table 2). The performance of the fine-tuned Random Forest model was evaluated with 10-fold cross-validation on the test dataset. The prediction results of the model were consulted with a mean and standard deviation of accuracy, precision, recall, and the AUC score.

Table 2

*Predictor Values for Three Students in the HSLS:09 Dataset*
Predictors	No Dropout	Dropout with a Low GPA	Dropout with a Moderate GPA
X1MOMEDU	1	2	1
X1DADEDU	1	0	1
X1SES	-1.6338	-0.6333	-0.8049
X1MTHEFF	0.9	-2.04	0.1
X1MTHINT	0.55	-1.68	-0.18
X1SCIUTI	-0.33	-1.32	0.1
X1SCIEFF	0.67	0.38	-1.02
X1SCIINT	0.16	-1.38	-0.17
X1SCHOOLBEL	-0.39	-0.39	0.49
X1SCHOOLENG	-0.32	1.39	-0.72
X1SCHOOLCLI	-1.5	-0.22	0.12
X1COUPERTEA	0.31	0.78	0.87
X1COUPERCOU	0.33	1.15	0.61
X1COUPERPRI	0.8	1.1	0.54
X3TGPA9TH	4	1	3
S1HROTHHOMWK	2	1	1

3.4.2 Classification with Deep Learning

For the deep learning classifier with Keras, we utilized a dropout regularization layer with a sequential model to prevent overfitting [17]. To ensure methodological consistency, we applied the deep learning classifier to the same dataset used for the Random Forest classifier. We developed a six-layer neural network model. The model architecture was as follows: 1) an input layer with 17 features to reflect the maximum number of features; 2) a hidden layer with 128 units and a rectified linear unit (ReLU) activation, followed by a dropout layer with a dropout rate of 0.42; 3) hidden layer with 32 units and ReLU activation, followed by another dropout layer with a dropout rate of 0.3; and 4) an output layer with 1 unit and a sigmoid activation function. The sigmoid activation reflected the nature of the classification task, as the output was a probability ranging between 0 and 1.

In the training phase, the deep learning model was compiled using the binary cross entropy loss function and the Adam optimizer to adjust the learning rate throughout training. The model was fitted with validation_split = 0.2 to subset 20% of the dataset for testing purposes, epoch = 500 to make the model iterate through the dataset 500 times, and batch_size = 50 to make each batch contain 50 cases in updating the model. Evaluation metrics include loss rate, mean squared error (MSE), and binary accuracy. Note that AUC was not included in the evaluation metric of this algorithm because it is a global metric that evaluates the model as a whole. However, Keras operates on batches of data during training, making it potentially misleading to compute AUC directly.

3.5 Model Explainability

After performing classification tasks, results from the highest-performing model were examined with XAI to explain the prediction, both at the global and local levels. We utilized the moDel Agnostic Language for Exploration and eXplanation (DALEX) module in Python to perform the XAI analysis [4]. For a global-level explanation, where the impact of variables on the model's prediction as a whole can be assessed, we employed permutation-based variable importance analysis to identify influential predictors of high school dropout [8]. Subsequently, the influential predictors identified through this analysis were further examined using partial dependence profiling to understand how the prediction result changes in relation to the selected explanatory variable [8].

For the local-level explanation that concerns the prediction of each individual case, the breakdown method, SHAP value, and the LIME method were used to explain non-dropout cases and dropout cases [4, 8]. The breakdown method indicates the contribution of variables to the model’s prediction of a selected observation [13]. SHAP value, similar to the breakdown method, explains the contribution of each variable to the final prediction. The difference, however, is that this method calculates the average contribution of each feature over all possible orders to account for possible interactions [13]. Finally, the LIME method explains the classification result by outlining features that contribute to the prediction or serve as evidence against the prediction [25].

https://nces.ed.gov/surveys/hsls09/
During the model training phase, a fraction of randomly selected neurons (in this case, 40%) in the previous layer was set to zero at each update, which helped prevent overfitting.

4.1 Classification Outcomes

Table 1 presents the outcomes of the Random Forest classifier, providing the mean and standard deviation (SD) values for accuracy, precision, recall, AUC, and MSE over 10 iterations of cross-validation. For comparison, the results for the deep learning classifier in Table 1 also describe the mean and SD values for accuracy, precision, recall, and MSE across 500 epochs. Comparing the evaluation metrics of both algorithms revealed that the Random Forest classifier exhibited a comparable mean accuracy to the deep learning classifier, with 0.88 for the Random Forest classifier and 0.87 for the deep learning classifier. Both models also exhibited comparable MSE, with 0.11 for the Random Forest classifier and 0.10 for the deep learning classifier.

Table 1

*Classification Results from the Random Forest and Deep Learning Classifiers*
Classifier	Performance Metrics	M	SD
Random Forest	Accuracy	0.88	0.01
	Precision	0.63	0.14
	Recall	0.17	0.05
	ROC-AUC	0.78	0.03
	MSE	0.11	0.01
Deep Learning	Accuracy	0.87	0.01
	Precision	0.42	0.06
	Recall	0.21	0.02
	ROC-AUC	0.70	0.02
	MSE	0.10	0.01

Although the Random Forest and Deep Learning classifiers yielded comparable outcomes in terms of accuracy and MSE, the Random Forest classifier outperformed the deep learning classifier in two key areas. Specifically, it achieved a higher precision score (0.63 compared to 0.42), and a higher AUC value (0.78 compared to 0.70). In the context of high school dropout prediction, precision is an important metric because resources for dropout prevention programs are often limited, making the allocation of the resources to students who are most likely to drop out highly important. Similarly, AUC is an important metric that measures the classifier's ability to assign higher probabilities to positive instances than to negative instances. A high AUC value suggests that the classifier is effective in distinguishing between students who drop out and those who do not. This discrimination capability is vital for ensuring that intervention and support efforts are directed towards students at higher risk of dropping out, enhancing the overall efficacy of dropout prevention strategies. The combination of higher precision and AUC values positions the Random Forest classifier as a more suitable choice for this high-stakes task of dropout prediction, where the consequences of misallocation of resources can have significant real-world implications.

The observed performance gap between the deep learning and Random Forest classifiers was anticipated, aligning with the well-documented characteristics of deep learning algorithms. Deep learning models, known for their intricate neural network architectures, typically demand a substantial amount of data to generalize effectively. However, the inherent complexity of these algorithms may result in diminished accuracy when faced with relatively smaller datasets, and an increased risk of overfitting can further exacerbate performance issues [26, 30]. The HSLS:09 dataset may not fully harness the potential of deep learning models, leading to a significant difference in performance compared to the Random Forest classifier. The Random Forest classifier’s superior accuracy and AUC values underscore its suitability for the specific task of dropout prediction in this dataset. Recognizing the importance of interpretability in decision-making processes involving human stakeholders, including teachers, principals, and other school-based professionals, we acknowledge the need for an XAI analysis. Given the Random Forest classifier's enhanced performance, it will be the focus of the XAI analysis. The interpretability afforded by the XAI analysis is critical for empowering stakeholders with the ability to comprehend the model's rationale, facilitating more informed decisions regarding intervention strategies for at-risk students.

4.2 XAI: Global-Level Explanation

For a global-level explanation, Fig. 3 presents the results of the permutation-based variable importance analysis. The most influential variable in students’ high school dropout was students’ 9th-grade GPA (X3TGPA9TH), followed by students’ socioeconomic status (X1SES), father’s/male guardian’s highest level of education (X1DADEDU), mother’s/female guardian’s highest level of education (X1MOMEDU), and students’ interest in math (X1MTHINT). The examination of the remaining variables indicated that the contribution of the variables to the dropout prediction was relatively minor. This is evidenced by their dropout loss (see Appendix B), which was less than 0.001 on the prediction outcome.

Figure 4 presents the results of the partial dependence profiling. Students’ 9th-grade GPA (X3TGPA9TH) exhibited the strongest impact on the prediction of the model, as seen from the broad range of changes in the prediction, spanning from approximately 0.1 to 0.6 on the y-axis. As students attained higher GPAs, their likelihood of being predicted as high school dropouts gradually decreased. A substantial dip in the prediction was observed at a GPA value of 2.5, suggesting that students whose GPA was around 2.5 or higher had a much smaller probability of being predicted as high school dropouts compared to those within the GPA range of 1 to 2.5.

The second most influential predictor was students’ socioeconomic status (X1SES). Intriguingly, the non-linear impact of socioeconomic status on high school dropout rates introduces a nuanced perspective on the conventional understanding of the relationship between socioeconomic status and educational outcomes. Contrary to the assumption that higher levels of socioeconomic status would invariably correlate with an increased likelihood of graduation, the U-shaped pattern in Fig. 4 suggests a more complex dynamic. While students with lower socioeconomic status are at a heightened risk of dropout, the surprising downturn in the likelihood of dropout within the range of -1 to 1 implies that a high level of socioeconomic status may not guarantee graduation either. This unexpected finding underscores the multifaceted nature of the factors influencing educational attainment, emphasizing the need for a more comprehensive understanding of the interplay between socio-economic status and academic success.

The two variables of the father’s level of education (X1DADEDU) and the mother’s highest level of education (X1MOMEDU) similarly indicated a negative influence on the prediction of dropout status. Specifically, students whose parents had lower levels of education exhibited the highest likelihood of being predicted as high school dropouts. This likelihood gradually diminished as the educational level of the parents increased.

A parallel pattern emerged in the analysis of students' interest in mathematics (X1MTHINT), revealing a significant correlation with the predicted outcomes. Notably, a negative impact on the prediction results was discerned, as students with lower X1MTHINT values, indicative of diminished interest in mathematics, exhibited a heightened probability of being predicted as high school dropouts. Conversely, those with higher values demonstrated a lower likelihood of dropout prediction. This observation implies a compelling link between students' level of interest in mathematics and their motivation to actively participate in their studies, suggesting that a diminished interest in the subject may contribute to a decline in overall academic performance [37]. This underscores the importance of recognizing and addressing motivational factors, such as interest in specific subjects, in the formulation of strategies aimed at reducing high school dropout rates.

4.3 XAI: Local-Level Explanation

The local-level explanation of this study presents three cases of prediction: 1) a non-dropout case, 2) a dropout case with a low-grade 9th-grade GPA, and 3) a dropout case with a moderate 9th-grade GPA but low in other predictors. Table 2 displays the values of the predictors of these three cases.

4.3.1 Non-Dropout Case

Figure 5 presents the LIME explanation, the breakdown plot, and the SHAP plot for the non-dropout case. The LIME explanation indicates that the student holds an 83% likelihood of being predicted as a non-dropout student and a 17% chance of being predicted as a dropout student. Key predictors opposing the dropout prediction include their 9th-grade GPA (X3TGPA9TH: 4.0) and their positive self-efficacy levels in mathematics (X1MTHEFF: 0.90) and science (X1SCIEFF: 0.67). Among these, the 9th-grade GPA stands out as the most influential factor against the dropout prediction.

Conversely, predictors contributing to the dropout prediction are the low level of education for their father/male guardian (X1DADEDU: 1) and mother/female guardian (X1MOMEDU: 1), the low socio-economic status of their family (X1SES: -1.63), the negative feeling of belonging at school (X1SCHOOLBEL: -0.39), lower levels of expectations from the counselor (X1COUPERCOU: 0.33) and the principal (X1COUPERPRI: 0.80) in their school, and the low perception of science utility (X1SCIUTI: -0.33). For this particular student, family socioeconomic status (X1SES) was the highest contributing factor to the dropout prediction. The predictive variables for this student revealed that, despite the presence of certain unfavorable factors, a high GPA played a pivotal role in contributing to the individual's success in continuing their education without dropping out.

4.3.2 Dropout Case with Low 9th-Grade GPA

Figure 6 presents the LIME explanation, the breakdown plot, and the SHAP plot of the dropout case with a low 9th-grade GPA. The LIME explanation indicates that this student holds a 10% likelihood of being predicted as a non-dropout student and a 90% chance of being predicted as a dropout student. The key predictors opposing the dropout prediction include positive expectations of counselors at their school (X1COUPERCOU: 1.15) and time spent doing homework on a typical school day (S1HROTHHOMWK: 1). Among these predictors, the expectation of counselors at their school stands out as the most influential factor against the dropout prediction.

Conversely, predictors contributing to the dropout prediction are the low 9th-grade GPA (X3TGPA9TH: 1.0), low socioeconomic status of their family (X1SES: -0.63), low level of education for their father/male guardian (X1DADEDU: 0) and mother/female guardian (X1MOMEDU: 2), negative sense of school belonging (X1SCHOOLBEL: -0.39), negative interest in math (X1MTHINT: -1.68), adverse school climate (X1SCHOOLCLI: -0.22), low teacher expectations (X1COUPERTEA: 0.78), low levels of self-efficacy in science (X1SCIEFF: 0.38) and mathematics (X1MTHEFF: -2.04). Among these variables, the most influential factor driving the dropout prediction was the student's 9th-grade GPA. The explanatory variables for this case revealed a different trend compared to the non-dropout case discussed above. In this case, a low GPA emerged as the primary factor increasing the student's risk of dropout. The explanation may emphasize that a low GPA reflects poor academic performance, serving as a crucial indicator of the student's struggles in their classes that contribute to the increased likelihood of dropout.

4.3.3 Dropout Case with Moderate 9th-Grade GPA but Low in Other Predictors

Figure 7 presents the LIME explanation, the breakdown plot, and the SHAP plot of the dropout case with a moderate ninth-grade GPA, respectively. The LIME explanation indicates that this case holds a 27% likelihood of being predicted as a non-dropout student and a 73% chance of being predicted as a dropout student. Key predictors opposing the dropout prediction include their 9th-grade GPA (X3TGPA9TH: 3.0) and their positive feeling of school belonging (X1SCHOOLBEL: 0.49). Among these variables, the 9th-grade GPA stands out as the most influential factor against the dropout prediction.

Conversely, predictors contributing to the dropout prediction were the low socioeconomic status of their family (X1SES: -0.80), low levels of self-efficacy in mathematics (X1MTHEFF: 0.10) and science (X1SCIEFF: -1.02), low level of education for their father/male guardian (X1DADEDU: 1) and mother/female guardian (X1MOMEDU: 1), low albeit positive principal expectation (X1COUPERPRI: 0.54), negative school engagement (X1SCHOOLENG: -0.72), negative interest in math (X1MTHINT: -0.18), low counselor expectation (X1COUPERCOU: 0.61), low teacher expectation (X1COUPERTEA: 0.87), adverse school climate (X1SCHOOLCLI: 0.12), and limited hours spent on homework/studying on typical school day (S1HROTHHOMWK: 1.0). Among these, the most influential factor driving the dropout prediction is the student's low self-efficacy in science (X1SCIEFF). The explanation may emphasize that despite having a moderate GPA, this student might have many challenges both at the individual level (i.e., low self-efficacy in math and science, interest in math, low school engagement) and the socio-environmental level (i.e., low socio-economic status, low educational level in parents, and adverse school climate). These challenges may outweigh the mitigating effect of their GPA, heightening the risk of dropout.

The goal of this study was to highlight the efficacy of human-machine collaboration in educational contexts while using the prediction of high school dropouts as a case study. Beyond mere advancements in predictive accuracy, this study harnesses the synergistic potential of various XAI techniques to unveil crucial factors influencing dropout predictions. By seamlessly integrating human insight with machine intelligence, we not only enhance the performance of the predictive models but also discern actionable insights vital for informing targeted interventions and systemic improvements. The identified factors, such as high school GPA (indicative of broader student achievement), students' sense of school belonging, perception of science utility, interest in mathematics, and self-efficacy in both mathematics and science, collectively represent a comprehensive understanding. This collaborative approach stands as a testament to the transformative impact that human-machine partnerships can have on shaping educational strategies and fostering student success.

The identified variables crucial for predicting high school dropouts offer actionable insights that can be strategically harnessed to enhance educational outcomes. For example, to elevate students' perception of science utility, a transformative approach involves presenting the subject matter in a context that vividly illustrates its real-world applications, transcending conventional textbook-based lectures. By integrating practical relevance into science education, this method has the potential to cultivate a more profound understanding of the subject and foster increased engagement among students. Similarly, the improvement of students' sense of school belonging can be effectively realized through collaborative efforts among parents, teachers, and students. Initiatives such as parent-teacher-school conferences can serve as instrumental platforms for fostering a supportive environment. This collaborative intervention holds the promise of positively impacting students' overall well-being, translating into improved school attendance, enhanced academic performance, and heightened involvement in various academic activities, as elucidated by previous research [1]. By strategically leveraging these identified variables, educational stakeholders can actively contribute to creating a more enriched and supportive learning environment, thereby addressing factors associated with high school dropout risks.

5.1 Limitations and Future Research

This study has several limitations worth noting. First, both predictive models utilized in this study exhibited low recall, suggesting room for improvement in identifying dropout cases [24]. A high prediction accuracy with low recall indicates that while the model may not identify all dropout instances, it is reliable when it does [24]. This limitation could be a consequence of the class imbalance in the targeted variable, despite being mitigated by the resampling process [11]. Future studies could explore various class imbalance mitigation techniques to identify the most effective strategy for dropout prediction. In this study, we opted for a single technique as our focus was not on testing multiple imbalance mitigation strategies, which is typically the aim of methodological papers.

Second, there are some conflicts in the prediction that may happen due to the interaction of the predictors. For instance, consider the dropout case with a low GPA. The LIME explainer and breakdown plot suggest that students' negative self-efficacy in mathematics serves as evidence against the dropout prediction, albeit with low influence, a result that may seem counterintuitive from a theoretical perspective. In contrast, its SHAP plot indicates a stronger positive contribution of negative math self-efficacy to the dropout prediction, a suggestion aligned with the existing literature.

Another discrepancy is in the impact of students' hours spent on homework/studying on a typical school day (S1HROTHHOMWK) in predicting students’ dropout. In the case of a student with a low 9th-grade GPA (refer to section 4.3.2), the variable counteracts the dropout prediction. Conversely, in the case of a student with a moderate 9th-grade GPA (refer to section 4.3.3), it contributes to the prediction of a dropout. Interestingly, this variable is absent in the SHAP output for the former scenario, suggesting that its influence might be less significant compared to the latter scenario. This discrepancy can be attributed to the thoroughness of the SHAP analysis compared to the LIME and the breakdown method. SHAP analyzes every possible combination of predictors, accounting for predictor interactions, albeit at the expense of longer computational time. Consequently, suggestions from SHAP may take precedence over the other two analyses when supported by theory and when they align more logically with the broader context. This limitation could serve as a guideline for future studies to consider results from multiple analyses in interpreting the predictions of a model.

5.2. Conclusion

This study underscores the paramount importance of cultivating a collective intelligence framework, wherein human involvement remains pivotal in validating the outcomes of predictive models before initiating any actionable measures. Despite the remarkable advancements achieved through AI and machine learning techniques, the indispensable role of human validation cannot be overstated. Establishing a synergy between AI and human expertise ensures a comprehensive and nuanced understanding of the intricate factors influencing high school dropout predictions. By incorporating human judgment into the validation process, we mitigate the risks of potential biases or oversights inherent in purely automated approaches. This emphasis on collective intelligence not only instills a sense of accountability and reliability in the predictive models but also reinforces the idea that technology should complement and augment human capabilities rather than replace them. In educational contexts, this approach promotes a harmonious collaboration that maximizes the strengths of both human and machine intelligence, fostering more informed and responsible decision-making processes.

Conflicts of interest:

The authors have no relevant financial or non-financial interests to disclose.

Ethics approval:

No approval from research ethics committees was deemed necessary for the completion of this study, as the research adhered to a secondary data analysis approach utilizing an existing, publicly available database that contains no personal identifying information.

Funding:

The authors did not receive support from any organization for the submitted work.

Author Contribution

Conceptualization: OB, TW, SH, and SL; Methodology: OB and TW; Formal analysis and software: TW, Writing - original draft preparation: OB, TW, SH, and SL; Writing - review and editing: OB, TW, and SH, Supervision: OB. All authors reviewed the manuscript.

Data availability:

The original HSLS:09 dataset used in this study is available through the IES & NCES Datalab: https://nces.ed.gov/datalab/onlinecodebook The pre-processed version of the dataset can be obtained from the corresponding author.

Code availability:

The Python codes employed in this study are available upon request from the corresponding author.

Ahmadi, S., Hassani, M., & Ahmadi, F. (2020). Student- and school-level factors related to school belongingness among high school students. International Journal of Adolescence and Youth, 25(1), 741–752. https://doi.org/10.1080/02673843.2020.1730200
Amann, J., Blasimme, A., Vayena, E., Frey, D., & Madai, V. I. (2020). Explainability for artificial intelligence in healthcare: A multidisciplinary perspective. BMC Medical Informatics and Decision Making, 20(1), 310. https://doi.org/10.1186/s12911-020-01332-6
Angelov, P. P., Soares, E. A., Jiang, R., Arnold, N. I., & Atkinson, P. M. (2021). Explainable artificial intelligence: An analytical review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(5), e1424. https://doi.org/10.1002/widm.1424
Baniecki, H., Kretowicz, W., Piatyszek, P., Wisniewski, J., & Biecek, P. (2021). dalex: Responsible machine learning with interactive explainability and fairness in Python. Journal of Machine Learning Research, 22(214), 1–7. Retrieved from https://www.jmlr.org/papers/v22/20-1473.html
Barros, T. M., SouzaNeto, P. A., Silva, I., & Guedes, L. A. (2019). Predictive models for imbalanced data: A school dropout perspective. Education Sciences, 9(4), 275. https://doi.org/10.3390/educsci9040275
Cadario, R., Longoni, C., & Morewedge, C. K. (2021). Understanding, explaining, and utilizing medical artificial intelligence. Nature Human Behaviour, 5(12), 1636–1642. https://doi.org/10.1038/s41562-021-01146-0
Chollet, F. (2015). Keras (3.0.1) [Python]. https://keras.io
Gianfagna, L., & Di Cecco, A. (2021). Explainable AI with Python. Springer.
Gunning, D., & Aha, D. (2019). DARPA’s explainable artificial intelligence (XAI) program. AI Magazine, 40(2), 44–58. https://doi.org/10.1145/3301275.3308446
Haesevoets, T., De Cremer, D., Dierckx, K., & Van Hiel, A. (2021). Human-machine collaboration in managerial decision making. Computers in Human Behavior, 119, 106730. https://doi.org/10.1016/j.chb.2021.106730
He, H., & Ma, Y. (Eds.). (2013). Imbalanced learning: Foundations, algorithms, and applications. John Wiley & Sons, Inc.
Khan, O., Badhiwala, J. H., Grasso, G., & Fehlings, M. G. (2020). Use of machine learning and artificial intelligence to drive personalized medicine approaches for spine care. World neurosurgery, 140, 512–518. https://doi.org/10.1016/j.wneu.2020.04.022
Kozak, A. (2020, October 18). Basic XAI with DALEX. Responsible ML having fun while building responsible ML models. Retrieved from https://medium.com/responsibleml/basic-xai-with-dalex-part-1-introduction-e68f65fa2889
Krüger, J. G. C., de Souza Britto Jr, A., & Barddal, J. P. (2023). An explainable machine learning approach for student dropout prediction. Expert Systems with Applications, 233, 120933. https://doi.org/10.1016/j.eswa.2023.120933
Melo, E., Silva, I., Costa, D. G., Viegas, C. M., & Barros, T. M. (2022). On the use of explainable artificial intelligence to evaluate school dropout. Education Sciences, 12(12), 845. https://doi.org/10.3390/educsci12120845
Minh, D., Wang, H. X., Li, Y. F., & Nguyen, T. N. (2022). Explainable artificial intelligence: A comprehensive review. Artificial Intelligence Review, 55(5), 3503–3568. https://doi.org/10.1007/s10462-021-10088-y
Moolayil, J. (2019). An introduction to deep learning and Keras. In Learn keras for deep neural networks: A fast-track approach to modern deep learning with Python (pp. 1–16). Springer.
Nagy, M., & Molontay, R. (2023). Interpretable dropout prediction: Towards XAI-based personalized intervention. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-023-00331-8
National Center for Educational Statistics [NCES]. (2016). High school longitudinal study of 2009 [dataset]. National Center for Educational Statistics [NCES]. https://nces.ed.gov/surveys/hsls09/
Nguyen, A., Yosinski, J., & Clune, J. (2015). Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 427–436. https://www.cv-foundation.org/openaccess/content_cvpr_2015/html/Nguyen_Deep_Neural_Networks_2015_CVPR_paper.html
Padoy, N., & Hager, G. D. (2011). Human-machine collaborative surgery using learned models. 2011 IEEE International Conference on Robotics and Automation, 5285–5292. https://doi.org/10.1109/icra.2011.5980250
Paleja, R., Ghuy, M., Ranawaka Arachchige, N., Jensen, R., & Gombolay, M. (2021). The utility of explainable AI in ad hoc human-machine teaming. Advances in Neural Information Processing Systems, 34, 610–623. Retrieved from https://proceedings.neurips.cc/paper_files/paper/2021/file/05d74c48b5b30514d8e9bd60320fc8f6-Paper.pdf
Pasquale, F. (2015). The Black Box Society: The Secret Algorithms That Control Money and Information. Harvard University Press. https://doi.org/10.4159/harvard.9780674736061
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., & others. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(Oct), 2825–2830. Retrieved from https://www.jmlr.org/papers/v12/pedregosa11a.html
Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “ Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144. https://doi.org/10.1145/2939672.2939778
Roßbach, P. (2018). Neural networks vs. Random forests–Does it always have to be deep learning? [Germany: Frankfurt School of Finance and Management]. Retrieved from https://blog.frankfurt-school.de/wp-content/uploads/2018/10/Neural-Networks-vs-Random-Forests.pdf
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5), 206–215. https://doi.org/10.1038/s42256-019-0048-x
Russakovsky, O., Li, L.-J., & Fei-Fei, L. (2015). Best of both worlds: Human-machine collaboration for object annotation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2121–2131. https://doi.org/10.1109/cvpr.2015.7298824
Van Buuren, S. (2018). Flexible imputation of missing data (Second edition). CRC Press.
Wang, S., Aggarwal, C., & Liu, H. (2017). Using a random forest to inspire a neural network and improving on it. Proceedings of the 2017 SIAM International Conference on Data Mining, 1–9. https://doi.org/10.1137/1.9781611974973.1
Wang, D., Khosla, A., Gargeya, R., Irshad, H., & Beck, A. H. (2016). Deep Learning for Identifying Metastatic Breast Cancer (arXiv:1606.05718). http://arxiv.org/abs/1606.05718
Whelehan, D. F., Conlon, K. C., & Ridgway, P. F. (2020). Medicine and heuristics: Cognitive biases and medical decision-making. Irish Journal of Medical Science, 189(4), 1477–1484. https://doi.org/10.1007/s11845-020-02235-1
Wilson, H. J., & Daugherty, P. R. (2018). Collaborative intelligence: Humans and AI are joining forces. Harvard Business Review, 96(4), 114–123. Retrieved from https://hbr.org/2018/07/collaborative-intelligence-humans-and-ai-are-joining-forces
Wongvorachan, T., He, S., & Bulut, O. (2023). A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining. Information, 14(1), 54. https://doi.org/10.3390/info14010054
Xiong, W., Fan, H., Ma, L., & Wang, C. (2022). Challenges of human-machine collaboration in risky decision-making. Frontiers of Engineering Management, 9(1), 89–103. https://doi.org/10.1007/s42524-021-0182-0
Xiong, W., Wang, C., & Ma, L. (2023). Partner or subordinate? Sequential risky decision-making behaviors under human-machine collaboration contexts. Computers in Human Behavior, 139, 107556. https://doi.org/10.1016/j.chb.2022.107556
Yeh, C. Y. C., Cheng, H. N. H., Chen, Z.-H., Liao, C. C. Y., & Chan, T.-W. (2019). Enhancing achievement and interest in mathematics learning through math-island. Research and Practice in Technology Enhanced Learning, 14(1), 5. https://doi.org/10.1186/s41039-019-0100-9

No competing interests reported.

Appendixs.docx

Download PDF

Editorial decision: Revision requested
05 May, 2024
Reviews received at journal
04 May, 2024
Reviews received at journal
01 May, 2024
Reviewers agreed at journal
30 Apr, 2024
Reviews received at journal
30 Apr, 2024
Reviewers agreed at journal
30 Apr, 2024
Reviewers agreed at journal
29 Apr, 2024
Reviews received at journal
31 Mar, 2024
Reviewers agreed at journal
25 Mar, 2024
Reviewers agreed at journal
24 Mar, 2024
Reviewers invited by journal
22 Mar, 2024
Editor assigned by journal
18 Mar, 2024
Submission checks completed at journal
18 Mar, 2024
First submitted to journal
16 Jan, 2024

You are reading this latest preprint version

Enhancing High-School Dropout Identification: A Collaborative Approach Integrating Human and Machine Insights

Status:

Version 1

Abstract

Figures

1. Introduction

2. Theoretical Framework

2.1 Human-Machine Collaboration

2.2 Explainable AI

2.3 Predicting High-School Dropout

2.4 Current Study

3. Method

3.1 Dataset

3.2 Data Preprocessing

3.3 Data Split and Augmentation

3.4 Classification Algorithms

3.4.1 Classification with Random Forest

3.4.2 Classification with Deep Learning

3.5 Model Explainability

4. Results

4.1 Classification Outcomes

4.2 XAI: Global-Level Explanation

4.3 XAI: Local-Level Explanation

4.3.1 Non-Dropout Case

4.3.2 Dropout Case with Low 9th-Grade GPA

4.3.3 Dropout Case with Moderate 9th-Grade GPA but Low in Other Predictors

5. Discussion

5.1 Limitations and Future Research

5.2. Conclusion

Declarations

Conflicts of interest:

Ethics approval:

Funding:

Author Contribution

Data availability:

Code availability:

References

Additional Declarations

Supplementary Files

Status:

Version 1