Development and Validation of an AI-based model to predict the assessment outcomes of pre-clinical MBBS/BDS students

doi:10.21203/rs.3.rs-5277982/v1

Download PDF

Research Article

Development and Validation of an AI-based model to predict the assessment outcomes of pre-clinical MBBS/BDS students

https://doi.org/10.21203/rs.3.rs-5277982/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Accurately predicting student performance is crucial in medical education, especially in the critical pre-clinical years when foundations are laid. This study employed artificial intelligence (AI) to develop a predictive model for assessment outcomes of 4th-year MBBS/BDS students, aiming to provide educators with a tool for proactive intervention. A quantitative, cross-sectional study design was employed, involving 144 students from two institutions in Rawalpindi, Pakistan. A comprehensive dataset of academic and demographic variables was analyzed using various machine learning algorithms, including Random Forest, AdaBoost, Logistic Regression, SVM, and XGBoost. The Random Forest model emerged as the most effective machine learning model while year 2 exam scores and weekly study hours as key predictors of student success. This model allows educators to shift from traditional reactive approaches to a proactive, data-driven approach to student support by providing a framework for AI driven student support system. By identifying at-risk students early, personalized interventions can be implemented, potentially improving overall success rates and nurturing a more supportive learning environment. This study highlights the potential of AI to revolutionize medical education by enabling personalized learning pathways, optimizing resource allocation, and enhancing teaching effectiveness. However, the ethical considerations of AI in education are also addressed to ensure responsible implementation that maximizes student success and creates a more inclusive learning environment.

In the dynamic landscape of medical education, the ability to accurately predict student performance is paramount (1). It empowers educators to identify at-risk students early, tailor interventions, and optimize learning outcomes, ultimately ensuring the graduation of competent healthcare professionals(2). Traditional assessment methods, while valuable, often fall short in capturing the complex interplay of factors that influence student success. This limitation has spurred the exploration of innovative approaches, with Artificial Intelligence (AI) emerging as a promising avenue for enhancing predictive capabilities(3,4).

The integration of AI in medical education has gained significant traction in recent years, driven by its potential to revolutionize assessment practices and personalize learning pathways (5,6). Machine learning algorithms, a subset of AI, have demonstrated remarkable proficiency in analyzing vast and intricate student data, uncovering hidden patterns and relationships that elude conventional statistical methods(6,7). This capability has paved the way for the development of predictive models that can forecast student performance with unprecedented accuracy(8).

Previous research has explored the application of AI in various aspects of medical education, including intelligent tutoring systems, virtual learning environments, and automated assessment tools(6,7,9–11). However, a notable gap persists in the development and validation of robust predictive models specifically designed for pre-clinical medical and dental students. This early stage of medical training is critical for establishing foundational knowledge and skills, and the ability to identify students who may be at risk of underperforming is crucial for timely intervention and support.

This study addresses this critical gap by developing and validating an AI-based model to predict the assessment outcomes of pre-clinical MBBS/BDS students. By leveraging machine learning algorithms and a comprehensive dataset encompassing academic and demographic variables, the study aims to create a reliable tool for educators to proactively identify students who may require additional support or those who could benefit from enrichment opportunities. The focus on pre-clinical students is particularly significant as it allows for early identification and intervention, potentially preventing academic difficulties and improving overall student success rates.

The rationale behind this study is rooted in the transformative potential of AI in medical education as hypothesized by authors in Fig. 1. By developing a validated and reliable predictive model, we aim to empower educators to make informed decisions, optimize resource allocation, and create a more supportive and effective learning environment for future healthcare professionals. The study's findings have the potential to revolutionize assessment practices, facilitate personalized learning pathways, and ultimately contribute to the cultivation of a more competent and skilled healthcare workforce.

Research Questions and Objectives

Central to this investigation is the following research question:

How can a valid and reliable AI-based model be developed to predict the assessment outcomes of pre-clinical MBBS/BDS students based on the provided data?

To address this question, the study outlined three key objectives:

Develop an AI (machine learning) based model for predicting the assessment outcomes of pre-clinical MBBS/BDS students. This objective focuses on the creation of a predictive model utilizing machine learning algorithms, tailored to the specific context of pre-clinical medical and dental education.

Establish the model's predictive validity and reliability. This objective emphasizes the rigorous validation of the developed model, ensuring its accuracy and consistency in predicting student outcomes.

Predict the assessment outcomes of pre-clinical MBBS/BDS students. This objective centers on applying the validated model to forecast the academic performance of students, enabling early identification of those at risk and facilitating targeted interventions.

Study Design

The research employed a quantitative, cross-sectional analytical study design. This approach was chosen to examine the relationship between predictor variables (demographics, academic performance, study habits) and the assessment outcomes of pre-clinical MBBS/BDS students at a specific point in time. The quantitative nature of the study facilitated statistical modelling and the application of machine learning algorithms for objective measurement and analysis. The study was planned in four Phases as shown in Figure 2.

Setting

The study was conducted at two institutions in Rawalpindi, Pakistan: Riphah International University and its affiliated Islamic International Medical College. These institutions were selected based on their accessibility, willingness to participate, and the diversity of their pre-clinical MBBS/BDS student populations, which enhanced the generalizability of the findings.

Sample Selection

The study population encompassed all 4th-year MBBS and BDS students enrolled at Riphah International University during the data collection period. This approach ensured a comprehensive representation of the target population, capturing the full spectrum of academic abilities and backgrounds. The impact of sample size on accuracy has been explored using various methods, including curve-fitting, cross-validation, and linear discriminant analysis (12–14). However, the results have been inconsistent due to factors like bias-variance tradeoffs, over-sampling, algorithms, and feature size. While reducing samples can theoretically increase variance, removing samples that contribute to large data variance might decrease it. The study employed a census sampling technique, encompassing all 4th-year MBBS/BDS students. The sample size determination was guided by a comprehensive literature review and the guidelines outlined in Rajput et al. (2023), resulting in the inclusion of 144 students in the final analysis.

Inclusion Criteria:

Enrollment: Students must be actively enrolled in the 4th year of either the MBBS or BDS program at Riphah International University. This criterion ensured that the participants were at a similar stage in their academic journey, allowing for meaningful comparisons and predictions of assessment outcomes.

Exclusion Criteria:

Incomplete Questionnaire: Students who failed to complete all mandatory items in the questionnaire were excluded from the study. This measure was implemented to maintain data integrity and ensure the reliability of the analysis. Incomplete questionnaires could introduce bias and compromise the accuracy of the predictive models.

This rigorous sample selection process aimed to create a homogenous and representative sample of pre-clinical medical and dental students, enhancing the validity and generalizability of the study's findings.

Ethical Approval

Prior to data collection, the study protocol underwent rigorous review and received ethical approval from the Institutional Review Board (IRB) of Riphah International University (Appendix 1). This ensured that the research adhered to stringent ethical guidelines and safeguarded the rights and well-being of the participants.

Questionnaire Development Process

A structured questionnaire was meticulously developed to gather data on factors influencing assessment outcomes (Appendix 2). The process involved:

Literature and Expert Review: An initial draft was created based on extensive literature review and then scrutinized by experts in medical education, artificial intelligence, and questionnaire design. Their feedback ensured content validity and clarity of questions.
Pilot Testing: The questionnaire was pilot-tested with a small sample of 15 4th-year MBBS/BDS students to assess its clarity, identify ambiguities, and estimate completion time. Feedback from the pilot phase led to further refinements in question phrasing and response options.
In addition to the questionnaire data, relevant academic records were obtained from the examination departments of the respective institution. These records included exam scores, attendance records, and other pertinent academic performance data. The collection of these records was conducted with strict adherence to confidentiality protocols to protect student privacy

These methodological steps, particularly the careful questionnaire development and pilot testing, contributed significantly to the validity and reliability of the study. By ensuring the collection of high-quality data and employing robust validation techniques, the research aimed to generate findings that are both accurate and generalizable to the broader context of medical education.

Data Utilization in Python

The data, meticulously organized and anonymized in Microsoft Excel, was seamlessly transitioned into the Python programming language, version 3.9.1 environment for further analysis and model development. This transition was facilitated by the Python Libraries, a powerful tool renowned for its data manipulation and analysis capabilities.

Libraries and Tools

The analysis and model development leveraged the capabilities of several key Python libraries, each playing a crucial role in achieving the research objectives:

NumPy: This fundamental library for numerical computations provided the necessary tools for efficient array manipulation and mathematical operations, enabling seamless handling and transformation of the student data.
Pandas: This powerful data analysis library offered data structures and functions for data cleaning, manipulation, and exploration. It streamlined the process of organizing and preparing the dataset for model development.
Scikit-learn: This comprehensive machine-learning library provided a wide range of algorithms and tools for model development, training, and evaluation. It facilitated the implementation of various classification algorithms, including Random Forest, AdaBoost, Logistic Regression, Support Vector Machine (SVM), and XGBoost, enabling a comparative analysis of their performance in predicting student outcomes.
Matplotlib: This versatile plotting library was used to create visualizations of the data and model results. It enabled the generation of informative graphs and charts, such as ROC curves, confusion matrices, and feature importance plots, facilitating the interpretation and communication of the findings.

Machine Learning Model Development

The core of this study lies in the application of machine learning algorithms to identify patterns within the data and predict student assessment outcomes. Five algorithms were employed:

Random Forest: An ensemble learning method that constructs multiple decision trees and outputs the mode of the classes (classification) or mean prediction (regression) of the individual trees. Its ability to handle high-dimensional data and resistance to overfitting makes it suitable for predicting student outcomes in medical education.
AdaBoost (Adaptive Boosting): An ensemble technique that combines multiple weak learners to create a strong learner. It iteratively trains weak learners, focusing on misclassified instances in each round. AdaBoost's adaptability makes it suitable for predicting student outcomes where various factors interplay.
Logistic Regression: A statistical model that predicts the probability of a binary outcome based on one or more predictor variables. Its simplicity and interpretability make it a popular choice for predicting binary outcomes, such as pass/fail, in educational settings.
Support Vector Machine (SVM): A supervised learning model that finds the optimal hyperplane separating data into different classes. SVM's ability to handle high-dimensional and non-linear data makes it suitable for predicting student outcomes where complex relationships may exist.
XGBoost (Extreme Gradient Boosting): An advanced implementation of the gradient boosting algorithm. It builds an ensemble of decision trees sequentially, with each tree correcting the errors of the previous ones. XGBoost's high performance and efficiency make it suitable for handling large and complex datasets in medical education.

Feature Engineering and Model Optimization

The raw data transformed to enhance the models' predictive power:

Data Preprocessing: Data cleaning addressed missing values, outliers, and inconsistencies. Missing values were imputed, and outliers were handled to minimize their influence. Data was normalized to ensure consistent input ranges for the models.
Feature Selection: Techniques like recursive feature elimination and feature importance analysis were used to identify the most impactful predictors of student outcomes, reducing dimensionality and focusing on salient factors.
Feature Creation: New features, such as interaction terms, were created to capture additional information and potentially enhance predictive accuracy.
Model Training and Validation: Models were trained and validated using a portion of the dataset and k-fold cross-validation to ensure robustness and generalizability.
Hyperparameter Tuning: GridSearchCV was used to optimize model parameters based on performance metrics.

Achieving Objective 2: Establishing Model Validity and Reliability

The study employed a multi-faceted approach to establish the validity and reliability of the AI-based models:

Validity: Assessed through k-fold cross-validation and multiple evaluation metrics (accuracy, precision, recall, F1-score, MCC, AUC-ROC, AUC-PRC). This ensured the models generalized well to unseen data and accurately predicted student outcomes.
Reliability: Evaluated through repeated k-fold cross-validation runs and the use of MCC, demonstrating consistent performance and a strong correlation between predicted and actual outcomes.
Bias Control: Strategies were implemented to mitigate potential biases, including selection bias, sampling bias, algorithmic bias, and overfitting, ensuring fairness and transparency in the AI system.

These rigorous methods ensured that the developed AI models were not only accurate and reliable but also fair, transparent, and ethically sound, thus contributing to their trustworthiness and potential for effective implementation in medical education.

Methodology to Achieve Objective 3: Predicting Assessment Outcomes

The final objective of the study was to utilize the developed AI models to predict the assessment outcomes of pre-clinical MBBS/BDS students. This involved applying the best-performing model, identified in Objective 2, to the preprocessed dataset and generating predictions for each student.

Data Preparation and Prediction

The preprocessed dataset, which included the relevant predictor variables and the target variable (high achiever or low achiever), was fed into the chosen AI model. The model, trained on historical data, utilized its learned patterns and relationships to generate predictions for each student in the dataset. These predictions indicated the likelihood of each student being classified as a 'high achiever' or a 'low achiever' based on their individual characteristics and academic performance.

Classification and Interpretation

Students were classified into two categories based on their assessment outcomes: high achievers and low achievers. High achievers were those who scored more than 70% in their assessments, while low achievers scored less than 70%. A new column was added to the Excel sheet to label each student as either a high achiever (1) or a low achiever (0). This binary classification was used as the target variable for the machine learning models.

The confusion matrix, a table that compares the model's predictions to the actual outcomes, was generated to assess the accuracy of the classification.

Identification of At-Risk Students and High Achievers

Based on the model's predictions, students classified as 'low achievers' were identified as potentially at-risk. These students were flagged for further attention and potential interventions to improve their academic performance. Conversely, students classified as 'high achievers' were recognized for their strong academic potential.

Anonymity and Confidentiality

Throughout the prediction and classification process, the anonymity and confidentiality of student data were strictly maintained. Surrogate IDs were used to represent individual students, ensuring that their personal identities remained protected.

Results for Objective 1: Development of an AI-based Model

Model Selection and Implementation: Five machine learning algorithms were employed: Random Forest, AdaBoost, Logistic Regression, Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost). These algorithms were chosen for their ability to handle diverse data types and their proven efficacy in classification tasks.
Data Preprocessing and Feature Engineering: The dataset underwent rigorous preprocessing, including handling missing values, normalization, and feature engineering. Feature importance analysis was conducted to identify the most influential predictors of student outcomes.
Model Training and Validation: The models were trained and validated using a portion of the dataset and k-fold cross-validation, ensuring robustness and generalizability. Hyperparameter tuning was performed to optimize model performance.

Results for Objective 2: Establishing Model Validity and Reliability

Performance Evaluation: The models were evaluated using a range of classification metrics, including accuracy, precision, recall, F1-score, Matthews Correlation Coefficient (MCC), Area Under the Receiver Operating Characteristic Curve (AUC-ROC), and Area Under the Precision-Recall Curve (AUC-PRC) as given in table 1.
Top Performer: The Random Forest model emerged as the most effective, demonstrating superior performance across multiple evaluation metrics.
Comparative Analysis: While Random Forest outperformed other models, Logistic Regression and SVM also showed good performance, particularly in terms of precision and recall.
Key Predictors: The study identified Year 2 exam scores and hours studied per week as the most significant predictors of student success.

Results for Objective 3: Predicting Assessment Outcomes

Confusion Matrix: The confusion matrix for the Random Forest model revealed its accuracy in classifying students into 'high achievers' and 'low achievers.'
Classification:
- True Positives (TP): 55 students correctly classified as high achievers.
- True Negatives (TN): 49 students correctly classified as low achievers.
- False Positives (FP): 19 students incorrectly classified as high achievers.
- False Negatives (FN): 18 students incorrectly classified as low achievers.
Interpretation: The model demonstrated a good ability to identify both high-achieving and low-achieving students, although there was room for improvement in minimizing false positives and false negatives. Model Performance: The Random Forest model exhibited the highest overall performance, showcasing its ability to accurately classify students into 'high achievers' and 'low achievers.' Figure 3. This suggests its potential utility in real-world educational settings for identifying students who may require additional support or those who could benefit from enrichment opportunities.
Key Predictors: The identification of Year 2 exam scores and hours studied per week as the most significant predictors underscores the importance of recent academic performance and consistent study habits in determining student success. This insight can guide educators in developing targeted interventions and support systems.
Model Comparison: While Random Forest outperformed other models, Logistic Regression and SVM also demonstrated good performance, particularly in terms of precision and recall, highlighting their potential value in specific educational scenarios.

Table 1 Comparison of performance matric of ML models

Confusion Matrix

The confusion matrix provides a detailed breakdown of the model's predictions as given Figure 4:

True Positives (TP): Students correctly predicted as high achievers.
True Negatives (TN): Students correctly predicted as low achievers.
False Positives (FP): Students incorrectly predicted as high achievers.
False Negatives (FN): Students incorrectly predicted as low achievers.

The confusion matrix for the Random Forest model revealed a good ability to identify both high-achieving and low-achieving students. However, the presence of false positives and false negatives indicated room for improvement.

Interpretations

High Precision: The high precision of the Random Forest model (0.74) signifies its accuracy in identifying true high achievers, minimizing the risk of misclassifying students as successful when they might require support.
High Recall: The Support Vector Machine's high recall (0.80) indicates its effectiveness in identifying all high-achieving students, ensuring that no student with potential is overlooked.
Balanced Performance: The Random Forest model's high F1-score and MCC demonstrate its balanced performance, achieving a good trade-off between precision and recall.
Discriminatory Power: The high AUC-ROC values for all models suggest their ability to effectively differentiate between high and low achievers.
Handling Imbalanced Data: The high AUC-PRC value of the Random Forest model indicates its effectiveness in identifying the minority class (low achievers) even with an imbalanced dataset.

These results highlight the potential of AI, particularly the Random Forest model, in predicting the assessment outcomes of pre-clinical MBBS/BDS students. The identification of key predictors and the model's performance metrics provide valuable insights for educators and administrators seeking to enhance student support and optimize educational practices.

The study’s results underscore the efficacy of AI models, particularly Random Forest, in predicting preclinical MBBS/BDS student assessment outcomes. The feature importance analysis revealed that Year 2 exam scores and hours studied per week were the most significant predictors of assessment outcomes. This finding aligns with prior research indicating that recent academic performance and consistent study habits are critical determinants of student success (2,10,15).Year 2 exam scores emerged as the most influential feature, suggesting that academic performance in the immediately preceding year has a substantial impact on future outcomes. This is consistent with (6)who found that continuous assessment results are strong predictors of future academic performance.

The importance of hours studied per week highlights the critical role of disciplined study habits. This finding is supported by (16)who emphasized the significance of regular study patterns in achieving academic success.

The Random Forest model showed superior performance compared to other models such as Logistic Regression and Support Vector Machine (SVM), both of which also performed well. This enhanced performance can be attributed to Random Forest's capability to manage large datasets with numerous features and its resilience against overfitting. (17)

Logistic Regression and SVM were particularly effective in terms of precision and recall, indicating their utility in binary classification tasks, such as predicting pass/fail outcomes. This aligns with the findings of (18)who noted the effectiveness of these models in educational settings.

Objective 1: Develop an AI-based model for the assessment outcomes of pre-clinical MBBS/BDS students.

The successful development of an AI-based predictive model for student assessment outcomes has profound implications for revolutionizing medical education. By harnessing the power of machine learning, educators can now move beyond traditional, often reactive, approaches to student support and adopt a more proactive and data-driven stance.

Early Identification of At-Risk Students: The ability to predict which students might encounter academic challenges early in their training allows for timely and targeted interventions. This proactive approach, facilitated by AI, can significantly reduce the likelihood of student attrition and failure. By identifying at-risk students before they reach a critical point, educators can offer personalized support, such as tailored tutoring or counselling, to address their specific needs and enhance their chances of success. The model's capacity to flag potential struggles empowers institutions to create a more nurturing and supportive learning environment, fostering student well-being and academic achievement.
Personalized Learning Pathways: The one-size-fits-all approach to medical education is gradually giving way to personalized learning, and AI is a key enabler of this paradigm shift. The predictive model, by analyzing individual student data, can identify unique learning styles, strengths, and weaknesses. This information can be leveraged to create customized learning plans, recommend specific resources, and adapt instructional methods to cater to the diverse needs of students. The integration of AI-powered adaptive learning technologies can further enhance this personalization by dynamically adjusting content difficulty and providing real-time feedback, ensuring that each student receives the optimal level of challenge and support.
Optimized Curriculum Design: The insights gleaned from the AI model can inform evidence-based curriculum design and development. By identifying areas where students commonly face challenges, educators can proactively modify the curriculum, teaching strategies, or assessment methods to improve learning outcomes. This data-driven approach to curriculum optimization ensures that medical education remains relevant, effective, and aligned with the evolving needs of the healthcare landscape. The continuous feedback loop provided by the AI model enables educators to refine their teaching practices and create a more engaging and impactful learning experience for students.
Enhanced Teaching Effectiveness: The AI model's ability to provide detailed feedback on student performance empowers educators to make informed adjustments to their instructional methods. By understanding how students are interacting with the material and identifying areas of difficulty, educators can tailor their teaching approaches to better meet the needs of their students. This personalized feedback loop fosters a more dynamic and responsive learning environment, promoting active engagement and deeper understanding. Furthermore, the model's insights can contribute to faculty development by highlighting areas where teaching practices can be enhanced, ultimately leading to improved teaching effectiveness and student success.
AI Model selection and features relationship: Random Forest’s strong performance in this context is consistent with its widespread use in fields where large and complex datasets are present. However, it's important to consider that the model’s reliance on past performance may limit its ability to predict outcomes for students who exhibit sudden behavioral or motivational changes, which are not necessarily captured by academic records. Therefore, while the model performs exceptionally well, it might not fully account for the nuanced behavioral changes that educators witness in real-time. A crucial aspect of the Random Forest model was its ability to highlight key predictors of student performance. According to the feature importance analysis, previous exam scores were the strongest predictor of future outcomes, followed by demographic variables such as age, attendance, and study habits. The importance of academic records aligns with traditional educational theories, such as self-regulated learning and achievement motivation theory, both of which emphasize that students who perform well in early exams tend to maintain high performance levels. The model captures this pattern and confirms the predictive power of early academic performance. The inclusion of demographic factors such as age and attendance highlight how non-academic factors also play a role in determining student success. While these variables were not as strong as exam scores, their presence suggests that AI models can benefit from a holistic approach to understanding student outcomes, incorporating socio-behavioral factors alongside academic indicators.

Objective 2: Establish the Model's Predictive Validity and Reliability

Establishing the predictive validity and reliability of the AI model is paramount to ensure its trustworthiness and ethical implementation in medical education. The implications of achieving this objective are far-reaching and impactful.

Confidence in Predictions: A rigorously validated and reliable model instils confidence in its predictions, allowing educators and administrators to make informed decisions based on its outputs. This confidence is crucial for the effective utilization of the model in high-stakes scenarios, such as identifying students who may require additional support or those who are eligible for advanced placement opportunities. The assurance of accuracy and consistency in the model's predictions fosters trust in its capabilities and encourages its widespread adoption in medical education.
Evidence-Based Interventions: The reliability of the model's predictions enables the implementation of evidence-based interventions. By relying on accurate and consistent forecasts of student performance, educators can design and implement targeted support programs that are more likely to be effective. This data-driven approach to intervention ensures that resources are allocated judiciously and that students receive the most appropriate support to overcome challenges and achieve their academic goals.
Generalizability and Scalability: A validated model with demonstrated reliability can be applied across different cohorts and institutions, expanding its impact and reach. This broader application allows for the standardization and replication of results, facilitating comparisons and knowledge sharing across various educational settings. The ability to generalize the model's findings enhances its value and contributes to the development of best practices in AI-powered medical education. Ensuring reliability across academic years and institutional contexts is key to the successful deployment of AI models in medical education. The cross-validation approach used in this study strengthens the case for scalability, allowing other institutions to adopt this model with minimal adaptation. However, as the educational environment evolves, especially with new forms of assessment like Objective Structured Clinical Examinations (OSCEs) or simulation-based learning, it is important that models continue to be updated with more diverse data to ensure continued reliability
Ethical and Responsible Use: Validation is essential for mitigating biases and ensuring the fair and equitable treatment of all students. By rigorously assessing the model's performance and addressing any potential biases in its algorithms or training data, we can ensure that its predictions are not discriminatory or unjust. Furthermore, transparency in the model's prediction methodologies and holding institutions accountable for its use are crucial for maintaining ethical standards and fostering trust in the AI system.
Continuous Improvement and Adaptation: The educational landscape is constantly evolving, and the AI model must remain accurate and adaptable to these changes. Continuous validation and refinement of the model, incorporating feedback mechanisms and staying abreast of advancements in AI, ensure that it remains a valuable and relevant tool for medical education. This commitment to ongoing improvement fosters innovation and allows educational institutions to leverage the full potential of AI in enhancing student success and shaping the future of healthcare professionals.

Objective 3: Predict the Assessment Outcomes of Pre-Clinical MBBS/BDS Students

The implications of True Positives (TP), True Negatives (TN), False Positives (FP), and False Negatives (FN) in the context of medical education, as highlighted in the thesis, are profound. They extend beyond mere statistical measures, impacting the lives and futures of aspiring medical professionals.

True Positives (TP): These represent the students correctly identified by the AI model as "high achievers." In the realm of medical education, this accurate identification is pivotal. It allows educators to recognize and nurture the potential of these students, offering them opportunities for advanced placement, research projects, or specialized mentorship programs. By fostering their talents, the system ensures that these future healthcare providers receive the challenges and support they need to excel. The implications of maximizing true positives extend beyond individual student success. It contributes to the overall quality of the medical workforce by ensuring that the brightest minds are given the chance to flourish and contribute their skills to the field.
True Negatives (TN): These are the students accurately predicted by the model to be "low achievers." While this might seem less celebratory, its importance in medical education is undeniable. The early and precise identification of students who might be struggling allows for timely intervention and support. The thesis emphasizes that this could involve offering tutoring, counseling, or remedial classes, all aimed at helping these students overcome their challenges and stay on track. The implications of maximizing true negatives are significant. It reflects a proactive approach to student support, potentially preventing dropouts, reducing the emotional toll of academic failure, and ultimately contributing to a more inclusive and successful medical education system.
False Positives (FP): These are instances where the model incorrectly predicts a student to be a "high achiever" when they are not. In the context of medical education, this misclassification can have detrimental consequences. Students might become complacent, believing they are performing well when they are not. This could lead to them missing out on crucial support or interventions, potentially hindering their progress and jeopardizing their future in the medical field. The implications of minimizing false positives are clear. It ensures that no student is left behind due to a misinterpretation of their potential, promoting fairness and equity in the educational process.
False Negatives (FN): These represent the students who are actually "high achievers" but are incorrectly predicted by the model to be "low achievers." This misclassification can be equally damaging. High-achieving students might be denied opportunities for advancement or enrichment programs due to the model's underestimation of their abilities. This could lead to demotivation, a sense of injustice, and a potential loss of talent for the medical field. The implications of minimizing false negatives are crucial. It ensures that deserving students are recognized and given the opportunities they deserve, fostering a meritocratic system that rewards hard work and talent.
Implications of classifying students: The classification of students into low achievers and high achievers using AI-based models holds profound implications for both individual student trajectories and broader institutional practices in medical education. This categorization offers not only a diagnostic tool for understanding student performance but also serves as a foundation for designing targeted interventions, optimizing resource allocation, and transforming educational outcomes. One of the key implications of classifying students as low or high achievers is the ability to move from a generalized, one-size-fits-all educational model to a highly personalized approach. In traditional education, students typically receive the same content, pace, and level of attention regardless of their specific needs. The AI-based classification provides educators with the data necessary to individualize education, offering tailored interventions based on the specific needs of students. Students identified as high achievers can be provided with advanced learning opportunities. This might include more challenging coursework, leadership roles in academic projects, or early exposure to clinical rotations. Providing enrichment for high achievers ensures they are continually challenged and can maximize their potential. These students can also serve as mentors or peer leaders, creating a collaborative learning environment that benefits both them and their peers. On the other hand, classifying students as low achievers allows educators to intervene earlier, providing them with personalized tutoring, additional resources, or mentorship to address their learning gaps. The classification is crucial because it identifies students before they fail major assessments, allowing educators to implement strategies that can prevent poor outcomes rather than reacting to them. This shift to personalization based on AI insights emphasizes proactive rather than reactive educational strategies. By pre-empting student struggles, institutions can improve student retention, reduce dropout rates, and optimize the learning process for each individual. The implications of this classification extend to the structural and operational levels of medical education. Support services, such as tutoring, counseling, and academic advisement, can be better targeted and more effective when guided by AI-derived insights. This enables a shift in resource allocation, ensuring that students who need the most support receive it, rather than relying on self-referrals or faculty observations, which may not always capture at-risk students early enough. Classifying students as low achievers enables early, proactive engagement with students who might otherwise not seek help. In many educational settings, students who are struggling may be reluctant to seek help due to stigma or fear of judgment. AI classification can serve as an objective indicator, bypassing these barriers and enabling educators to reach out to these students directly and offer tailored support.For students classified as low achievers, AI-driven insights allow for the development of tailored learning plans that focus on their specific needs. These plans can be flexible, incorporating a mix of remedial instruction, self-paced learning modules, and adaptive assessments. By aligning the support to the individual’s learning style and weaknesses, these students are more likely to experience sustained improvements, rather than receiving generic support that may not address the root causes of their struggles.This data-driven approach to resource allocation could also influence institutional policies on student success, promoting a more efficient and outcome-oriented use of faculty time, financial aid, and academic resources.
Implications for Faculty and Pedagogical Approaches: From the faculty perspective, the ability to classify students based on performance potential can lead to more informed teaching strategies. Teachers often spend significant time and energy on trying to gauge student understanding and progress. AI-based classification simplifies this task, offering concrete data on which students are likely to struggle. Educators can adjust their teaching strategies in real time, giving additional attention to low achievers during lectures or using more advanced materials for high achievers. Classrooms can become more dynamic and responsive, where interventions happen as performance risks emerge rather than after they manifest as failures. The classification system could lead a shift toward continuous formative assessment rather than traditional summative assessments. In many educational models, students receive feedback only after major exams or assessments. However, with the AI system classifying students based on ongoing data inputs, educators can provide real-time feedback on progress. This continuous feedback loop allows students to adjust their learning strategies well before major exams, preventing low achievers from being blindsided by poor results.To fully leverage this classification system, institutions may need to invest in faculty development programs to train educators on how to interpret AI-generated insights and apply them in pedagogical decisions. Teachers need to understand not only the technical side of AI but also how to foster a supportive learning environment for students identified as low achievers.
Addressing the Whole Student: Beyond Cognitive Skills: It is also important to note that categorizing students into low and high achievers based on performance outcomes often highlights cognitive factors (e.g., exam scores, attendance). However, success in medical education—particularly in clinical and practical settings—relies heavily on non-cognitive skills such as resilience, communication, and teamwork. AI-based classification systems could be enhanced by incorporating holistic assessments of student performance, including emotional intelligence, interpersonal skills, and professionalism. In the medical profession, these qualities are essential and often as important as academic performance. Thus, refining the AI model to include these non-cognitive factors could result in a more well-rounded classification system, allowing educators to address students' emotional and professional development alongside academic success.

In conclusion, the implications of TP, TN, FP, and FN in medical education are far-reaching. They impact not only individual student success but also the overall quality and inclusivity of the medical education system as summarized in Figure 5. By striving to maximize true positives and true negatives while minimizing false positives and false negatives, AI-powered predictive models can play a crucial role in creating a more supportive, equitable, and effective learning environment for future healthcare professionals. The thesis underscores the importance of these metrics in evaluating the effectiveness of such models and highlights the need for continuous refinement and ethical implementation to ensure their positive impact on medical education.

Data Quality, Implementation Challenges and Bias

The efficacy of AI models largely hinges on the quality and diversity of the training data. It is essential to ensure high-quality, representative data to mitigate biases that may impact the accuracy of the model's predictions. Masters (2023) emphasized the need for transparency and regular auditing of AI models to ensure fairness and accuracy. Concerns regarding academic integrity, particularly the potential for AI to facilitate cheating or produce biased outputs, need to be addressed. It is essential to implement AI tools ethically and transparently to uphold educational standards. Azer and Guerrero (2023) emphasized the significance of upholding academic integrity amidst technological advancements. Implementing AI in educational settings involves several practical challenges, such as ensuring adequate infrastructure, the need for necessary computational power of information technology support, training educators to use AI tools effectively, and addressing resistance to technological change. These challenges must be systematically addressed to guarantee the effective incorporation of AI into medical education (Ouyang et al., 2023).

Limitations and Future Research

Study Limitations

The study acknowledges several limitations that could impact the generalizability and applicability of its findings. The authors recognizes that the single-institution setting might limit the external validity of the results. The AI models were developed and tested using data from students at Riphah International University and Islamic International Medical College in Rawalpindi, Pakistan. The specific characteristics of this student population, including their academic backgrounds, socioeconomic factors, and cultural influences, might not be fully representative of other institutions or regions. Therefore, the models' performance and predictive accuracy might vary when applied to different student cohorts or educational contexts.

Furthermore, the study relied on self-reported data for some features, such as study habits and demographic information. Self-reported data can be prone to biases, such as social desirability bias or recall bias, which could affect the accuracy of the models' predictions. The study also acknowledges the potential influence of unmeasured confounding variables that could impact both the predictor variables and the assessment outcomes. While the study included a range of potential predictors, it is possible that other factors, such as socioeconomic status, prior educational experiences, or personality traits, could also play a role in student performance. The exclusion of these variables might limit the model's ability to capture the full complexity of factors influencing academic success.

The study also recognizes the potential limitations of the AI models themselves. While the models demonstrated promising results in predicting assessment outcomes, they are not infallible. The presence of false positives and false negatives in the confusion matrix highlights the possibility of misclassification, which could have implications for student support and interventions. The models' reliance on historical data also raises concerns about their ability to adapt to changes in the educational environment or student population over time.

Finally, the study acknowledges the potential impact of the specific educational system and assessment methods on the model's performance. The findings might not be directly generalizable to other educational contexts with different curricula, assessment practices, or student demographics. The author emphasizes the need for further research to validate the model's applicability across diverse educational settings and to explore the integration of additional data sources and AI techniques to enhance its predictive accuracy and adaptability.

Future Research Directions

The study proposes several promising avenues for future research to expand upon its findings and enhance the applicability of AI-based predictive models in medical education.

Multi-Institutional Validation: The current study was conducted at a single institution, which might limit the generalizability of its findings. Future research should aim to validate the developed AI models across multiple institutions with diverse student populations. This would help assess the models' robustness and adaptability to different educational contexts and student demographics, enhancing their external validity and potential for widespread implementation.
Incorporation of Additional Data Sources: The study primarily relied on academic records and self-reported data. Future research could explore the integration of additional data sources, such as psychological assessments, social network analysis, or learning analytics data from online platforms. These additional data points could provide a more comprehensive understanding of student learning behaviours and potentially improve the models' predictive accuracy.
Longitudinal Studies: The current study focused on predicting assessment outcomes at a specific point in time. Longitudinal studies that track student performance and the impact of AI-driven interventions over a more extended period would be valuable. This would allow researchers to assess the long-term effects of the models' predictions and interventions on student success and retention rates.
Prediction of Other Educational Outcomes: The study focused on predicting assessment outcomes, but AI models could also be used to predict other important educational outcomes, such as clinical competency, professional behaviour, or career choices. Expanding the scope of prediction could provide valuable insights for educators and administrators to tailor interventions and support programs to promote holistic student development.
Exploration of Advanced AI Techniques: The study employed several machine learning algorithms, but future research could explore the application of more advanced AI techniques, such as deep learning or natural language processing. These techniques might uncover more complex patterns in the data and further enhance the models' predictive capabilities.
Ethical and Social Implications: The use of AI in education raises important ethical and social considerations. Future research should delve deeper into these implications, exploring issues such as data privacy, algorithmic bias, and the potential impact of AI on educational equity. Developing guidelines and frameworks for the ethical and responsible use of AI in medical education is crucial to ensure that these technologies are used to benefit all students and promote a fair and just learning environment.

By pursuing these research directions, the field of AI in medical education can continue to evolve and mature, providing educators with powerful tools to support student success, optimize learning experiences, and shape the future of healthcare professionals.

Ethical and Social Implications

The study acknowledges the ethical and social implications of using AI in medical education, particularly in predicting student outcomes. The following points highlight the key ethical and social considerations that need to be addressed:

Data Privacy and Security: The collection and use of student data, including academic records and personal information, raise concerns about privacy breaches and potential misuse. It is crucial to ensure that robust data protection measures are in place, including obtaining informed consent from students, anonymizing data, and implementing secure storage and access protocols. The responsible handling of student data is essential to maintain trust and protect their privacy rights.
Algorithmic Bias and Fairness: AI algorithms can inadvertently perpetuate biases present in the training data, leading to discriminatory outcomes. The study emphasizes the importance of using diverse and representative datasets to train the models and regularly auditing the algorithms for fairness and transparency. Addressing algorithmic bias is crucial to ensure that the AI models do not reinforce existing inequalities or discriminate against certain groups of students.
Transparency and Explainability: AI models, particularly complex ones, can be opaque and difficult to interpret. This lack of transparency can raise concerns about accountability and the potential for unintended consequences. The study advocates for the development of explainable AI models that provide insights into the factors influencing predictions, enabling educators and students to understand the rationale behind AI-driven decisions. Transparency and explainability are essential for building trust in the AI system and ensuring that its use is fair and justifiable.
Student Agency and Empowerment: The use of AI in education should empower students rather than diminish their agency. Students should be informed about how AI is being used in their assessment and have the opportunity to provide feedback and participate in decision-making processes. This approach fosters a sense of ownership and control over their learning journey, promoting student engagement and motivation.
Human Oversight and Accountability: While AI can provide valuable insights, it should not replace human judgment and decision-making in medical education. The study emphasizes the importance of maintaining human oversight and accountability in the use of AI systems. Educators and administrators should be responsible for interpreting the model's predictions, making informed decisions, and ensuring that AI is used as a tool to support, rather than supplant, human expertise.

The ethical and social implications of AI in medical education extend beyond these specific points. As AI continues to advance and become more integrated into educational practices, it is crucial to engage in ongoing discussions and research to address the broader societal impact of these technologies. This includes exploring the potential effects of AI on educational equity, access to opportunities, and the overall learning experience for medical students. By proactively addressing these ethical and social considerations, we can ensure that AI is used responsibly and ethically in medical education to promote student success, enhance educational practices, and contribute to the development of competent and compassionate healthcare professionals.

This study underscores the capability of AI-based models in forecasting the assessment results of pre-clinical MBBS/BDS students. Through these models, educators can enhance student support, customize learning paths, and optimize resource allocation. Despite existing challenges, such as ensuring data quality and ethical deployment, these findings lay a robust framework for future research and practical applications in medical education. The integration of AI in education promises to revolutionize learning by making it more effective, fair, and personalized to the needs of each student.

CONFLICTS OF INTEREST AND DISCLOSURE STATEMENT

The author declares no conflicts of interest related to this study and that the research was conducted independently without any external funding or affiliations that could influence the work. The study protocol was approved by the Institutional Review Board (IRB) of Riphah International University, and all procedures were performed following ethical standards. Informed consent was obtained from all participants

FUNDING DECLARATION

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Tsai YS, Gasevic D. Learning Analytics in Higher Education-Challenges and Policies: A Review of Eight Learning Analytics Policies. 2017 [cited 2023 Sep 25]; Available from: http://dx.doi.org/10.1145/3027385.3027400
Tolsgaard MG, Boscardin CK, Yoon ·, Park S, Cuddy MM, Sebok-Syer SS. The role of data science and machine learning in Health Professions Education: practical applications, theoretical contributions, and epistemic beliefs data science · Machine learning · Artificial intelligence. 2020 [cited 2023 Sep 24];25:1057–86. Available from: https://doi.org/10.1007/s10459-020-10009-8
Zhong Q, Wang H, Christensen P, McNeil K, Linton M, Payton M. Early prediction of the risk of scoring lower than 500 on the COMLEX 1. BMC Med Educ. 2021 Dec 1;21(1).
Garg T. Artificial Intelligence in Medical Education. Am J Med. 2020 Feb 1;133(2):e68.
Tolsgaard MG, Pusic M V., Sebok-Syer SS, Gin B, Svendsen MB, Syer MD, et al. The fundamentals of Artificial Intelligence in medical education research: AMEE Guide 156. Med Teach. 2023;45(6):565–73.
Hussain S, Khan MQ. Student-Performulator: Predicting Students’ Academic Performance at Secondary and Intermediate Level Using Machine Learning. Annals of Data Science [Internet]. 2023 Jun 1 [cited 2023 Sep 25];10(3):637–55. Available from: https://link.springer.com/article/10.1007/s40745-021-00341-0
Pallathadka H, Wenda A, Ramirez-Asís E, Asís-López M, Flores-Albornoz J, Phasinam K. Classification and prediction of student performance data using various machine learning algorithms. Mater Today Proc. 2023 Jan 1;80:3782–5.
Mastour H, Dehghani T, Moradi E, Eslami S. Early prediction of medical students’ performance in high-stakes examinations using machine learning approaches. Heliyon [Internet]. 2023 Jul 1 [cited 2024 Jul 14];9(7). Available from: /pmc/articles/PMC10372649/
Musa M, Massalesse J, Hashim LH, Dreeb NK, Hashim KH, Dhilipan J, et al. Prediction of Students Performance using Machine learning. IOP Conf Ser Mater Sci Eng [Internet]. 2021 Feb 1 [cited 2023 Sep 25];1055(1):012122. Available from: https://iopscience.iop.org/article/10.1088/1757-899X/1055/1/012122
Waheed H, Hassan SU, Aljohani NR, Hardman J, Alelyani S, Nawaz R. Predicting academic performance of students from VLE big data using deep learning models. Comput Human Behav. 2020 Mar 1;104:106189.
Yağcı M. Educational data mining: prediction of students’ academic performance using machine learning algorithms. Smart Learning Environments. 2022 Dec 1;9(1).
Rajput D, Wang WJ, Chen CC. Evaluation of a decided sample size in machine learning applications. BMC Bioinformatics [Internet]. 2023 Dec 1 [cited 2024 Aug 3];24(1). Available from: https://pubmed.ncbi.nlm.nih.gov/36788550/
Beleites C, Neugebauer U, Bocklitz T, Krafft C, Popp J. Sample size planning for classification models. Anal Chim Acta. 2013 Jan 14;760:25–33.
Hua J, Xiong Z, Lowey J, Suh E, Dougherty ER. Optimal number of features as a function of sample size for various classification rules. Bioinformatics. 2005 Apr 15;21(8):1509–15.
Narayanan S, Ramakrishnan R, Durairaj E, Das A, Narayanan S, Ramakrishnan R, et al. Artificial Intelligence Revolutionizing the Field of Medical Education. Cureus [Internet]. 2023 Nov 28 [cited 2024 Jul 14];15(11). Available from: https://www.cureus.com/articles/182776-artificial-intelligence-revolutionizing-the-field-of-medical-education
Shan T, Tay FR, Gu L. Application of Artificial Intelligence in Dentistry. Vol. 100, Journal of Dental Research. SAGE Publications Inc.; 2021. p. 232–44.
Ouyang F, Wu M, Zheng L, Zhang L, Jiao P. Integration of artificial intelligence performance prediction and learning analytics to improve student learning in online engineering course. International Journal of Educational Technology in Higher Education. 2023 Dec 1;20(1).
Sridharan K, Sequeira RP. Artificial intelligence and medical education: application in classroom instruction and student assessment using a pharmacology & therapeutics case study. BMC Med Educ [Internet]. 2024 Dec 1 [cited 2024 Jul 14];24(1):1–13. Available from: https://bmcmededuc.biomedcentral.com/articles/10.1186/s12909-024-05365-7

No competing interests reported.

APPENDIX.docx

Download PDF

Editor assigned by journal
18 Oct, 2024
Submission checks completed at journal
18 Oct, 2024
First submitted to journal
16 Oct, 2024

You are reading this latest preprint version

Development and Validation of an AI-based model to predict the assessment outcomes of pre-clinical MBBS/BDS students

Status:

Version 1

Abstract

Figures

Introduction

Research Questions and Objectives

Methodology

Results

Discussion

Conclusion

Declarations

References

Additional Declarations

Supplementary Files

Status:

Version 1