Research on Student Performance Prediction Based on Deep Learning

doi:10.21203/rs.3.rs-4967448/v1

Download PDF

Article

Research on Student Performance Prediction Based on Deep Learning

https://doi.org/10.21203/rs.3.rs-4967448/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Student performance serves as a crucial indicator for assessing learning outcomes across various educational institutions. Conducting reasonable predictions and analyses can provide insights into students' future academic achievements. Predicting student performance is a significant research direction in the current exploration of educational data analysis patterns. Accurate predictions can assist educational administrators and teachers in promptly identifying academic risks, optimizing the allocation of scarce educational resources, and consequently improving education quality and learning outcomes. This paper establishes a Convolutional Neural Network (CNN) prediction model, constructing training and testing datasets using PyTorch tensors based on existing student information data. By extracting target information columns, the model identifies patterns within the data and ultimately predicts a student's score in a specific subject for their next exam. Upon completing the predictions, the results are displayed in a chart alongside existing scores for analysis, providing a reference for developing subsequent training programs.

Physical sciences/Mathematics and computing/Computational science

Physical sciences/Mathematics and computing/Computer science

Physical sciences/Mathematics and computing/Information technology

Deep learning

Grade prediction

Convolutional neural network

Feature extraction

In recent years, China's education sector has been undergoing extensive reform and development, with a significant increase in the number of newly enrolled students. For relevant functional departments, the advantages of large data storage capacity and diverse collection channels make it possible to use data mining techniques to more quickly and easily discover relationships within the data. This, in turn, provides scientific and rational references for decision-making, while also enhancing the credibility of national education policies¹.

In today's information era, the continuous development of relevant information technology provides solutions for various educational institutions². Traditional statistical methods are insufficient to fully explore the latent patterns within the multidimensional data, including existing student information, academic performance, classroom behavior, and psychological states. Instead, leveraging multiple technologies such as machine learning, deep learning, and image processing can facilitate the feature extraction and analysis of complex high-dimensional data. By using a Convolutional Neural Network (CNN) model to analyze and predict student performance, educators can utilize the interpretability inherent in these learning methods to clearly understand the learning progress differences among students in a particular class. This enables the formulation of differentiated teaching plans, optimizing the allocation of educational resources, and improving education quality. Conducting in-depth analysis and evaluation of campus data related to student performance, and identifying valuable attribute features for performance analysis and prediction, has become a current research focus.

As one of the earliest research directions in educational data mining, student performance prediction has been extensively studied by numerous scholars who have made significant contributions. The methods used in these studies can be categorized into two main types: traditional machine learning-based approaches and deep learning-based approaches.

Grade Prediction Based on Traditional Machine Learning Methods.Wang et al.³ combined students' gender, educational background data, subject background, and online learning behavior data, inputting these into a decision tree to predict academic performance. Xiaoli Wang et al. ⁴ proposed a weighted naive Bayes classification method based on mutual information and Bayesian classification algorithms to predict students' computer proficiency test scores. Ashenafi et al. ⁵ extracted features from students' classroom discussion questions and answer ratings that capture student activity information, using a multiple linear regression model to predict course grades. Jintao Hu ⁶ utilized decision trees to analyze all course grades, identifying the impact of basic course grades on specific professional course grades, and used the generated rules to establish a predictive model for professional course grades. As an ensemble learning meta-algorithm, Bagging was used by Hillebrand et al. ⁷ to reduce generalization error and improve prediction accuracy by combining different models. Ahmed⁸ proposed using the GBDT algorithm to predict college students' performance in final exams.

Grade Prediction Based on Deep Learning Methods.Okubo⁹ proposed a method for predicting students' final grades by inputting log data stored in educational systems, such as attendance, video watching, and reports, into a recurrent neural network. Kalyani¹⁰ applied convolutional neural networks (CNN) to predict student performance, considering that CNNs can mimic the human brain's behavior in analyzing and processing information, which helps solve problems beyond human capabilities. Hongjiang Cao¹¹ addressed the sequential nature of students' historical grades and the forgetting characteristics during the learning process by introducing an LSTM network to model students' knowledge structure states. By integrating emotional and behavioral features, the method significantly improved the accuracy of the grade prediction model according to experimental results. Qu et al. ¹² constructed a grade prediction framework with an attention mechanism, using the attention mechanism to adjust the weights of partial time behaviors and behavior patterns. Experiments demonstrated that this model has higher accuracy compared to other methods. Aljaloud¹³ tackled the issue of complex features and high processing difficulty in online learning systems by using CNN to extract learning features from time-series data, which were then fed into an LSTM neural network for grade prediction, proving the feasibility of this method. Tao Fang¹⁴ proposed an Att-LSTM model, capable of filtering out key information from a large number of input features, with a focus on features that significantly impact student performance, to predict grades.

Related technologies

Principles of Data Processing

Perform the following operations on the existing data:First, anonymize student information by replacing real names with specific non-numeric strings. Use letter codes as identifiers to determine whether the student meets certain special attributes and assign values accordingly. Second, after importing the dataset, create a new mapping dictionary to numerically replace all non-numeric strings that appear in the dataset. After replacement, define a data cleaning function to check for missing and abnormal values in the converted numerical data, provide timely warnings based on the check results, and make corrections to ensure data quality. Finally, perform one-hot encoding on the verified data. The purpose is to convert relevant feature columns into numerical values that can be processed by deep learning algorithms, maintain the independence of feature columns, and enhance the model's understanding ability.The principle of single hot encoding is shown in Fig. 1.

Convolutional Neural Network

This study employs the Convolutional Neural Network (CNN) algorithm for deep learning analysis. Due to their efficiency in extracting implicit features and strong capability to hierarchically capture spatial structure information, some researchers use CNNs to extract latent features in students' learning activities¹⁵. This research uses the CNN feedforward algorithm as the foundational structure for student performance prediction analysis. The model structure is specifically divided into an input layer, convolution layer, pooling layer, fully connected layer, and output layer, as shown in Fig. 2.

Ethics declarations

This study confirms that all methods were conducted in accordance with relevant guidelines and regulations.This study confirms that all experimental protocols have been approved by Qingdao New Oriental School and Qingdao University of Science and Technology.This study confirms that informed consent has been obtained from all participants.

Data Sources

The dataset used in this study comes from Qingdao New Oriental School, containing historical exam scores and other relevant information about campus students, such as gender, age, and school attended. This dataset includes information on a total of 2,225 students from numerous schools, with the chosen subject for performance prediction being junior high school mathematics. To avoid involving personal privacy information, this study anonymized the relevant columns in the dataset before processing the information, ensuring the dataset's authenticity and validity. This, in turn, aims to achieve the most accurate results possible in subsequent prediction processes. Detailed student information is shown in Table 1.

Table 1

student information data
Attribute Number	Attribute Name	Attribute Type
1	Name	varchar
2	Gender	varchar
3	Age	char
4	School	varchar
5	Keyclass	char
6	Grade	float

Data Processing

In this study, the processing of both the training and testing datasets by the CNN model requires converting them into PyTorch tensors. By using One-Hot Encoding, each category of data is transformed into a fixed-length binary vector, ensuring that the desired input data has consistent shapes and dimensions. One-Hot Encoding is a method of mapping discrete feature data into a high-dimensional space. The principle involves using an N-bit state register to encode N states, with only one bit outputting a valid value of 1 at any given time. This method converts discrete variables into independent binary vectors, avoiding data misrepresentation during training. Let X be a categorical variable with k possible values, denoted as {x₁,x₂,x₃,...,x_k}.The One-Hot Encoding for X can be represented as:

$$OneHot(X)=[{v_1},{v_2},{v_3}, \cdots ,{v_k}]$$

Among them, vk is defined as:

$${v_k}=\left\{ \begin{gathered} 1,x={x_k} \hfill \\ 0,x \ne {x_k} \hfill \\ \end{gathered} \right.$$

The vectors generated by One-Hot Encoding are used in a sparse representation. This method can significantly reduce computational complexity and memory consumption while enhancing the model's ability to distinguish between different categories and capture subtle differences between features.

To mitigate the significant data level differences caused by varying scales, it is necessary to normalize the dataset by scaling the data into a similar range, thereby enhancing the accuracy of classification. Min-Max scaling¹⁶ is one of the most well-known methods for standardizing information, where each element's base estimate is transformed to 0, the maximum value to 1, and other values to decimals between 0 and 1. In this study, the Min-Max normalization method is employed to normalize student information, scaling all numerical features encoded previously within a specified range. The processed data feature values are represented as Eq. (3).

In this study, after the dataset underwent one-hot encoding, the categorical features of the original data were transformed into numerical features. These categorical features refer specifically to the column names of the dataset, preserving not only their independence but also ensuring compatibility with convolutional neural network models. Subsequently, after normalization, all data in the dataset can be converted into PyTorch two-dimensional tensors using the `torch.size()` function, preparing them for model training. The normalized feature data and the label data stored in the original dataset are separately stored in two tensors, X and Y. Based on the dimensions of the divided training and testing sets, the dimensions of tensors X and Y are updated as follows: the dimensions of the training set features `train_X` are torch.Size([1780,7]), the dimensions of the training set labels `train_Y` are torch.Size([1780,1]), the dimensions of the testing set features `test_X` are torch.Size([445,7]), and the dimensions of the testing set labels `test_Y` are torch.Size([445,1]).

Experimental environment and parameters

In this study, the PyTorch deep learning framework and the CNN (Convolutional Neural Network) algorithm were employed. PyTorch tensors, used instead of NumPy's ndarray data structure, offer automatic differentiation capabilities. This feature enables automatic computation of tensor gradients, which plays a crucial role in training neural networks. The environment parameters configuration is detailed in Table 2.

Table 2

Parameter configuration of python
Parameter	Configuration
Pandas	1.1.5
Numpy	1.19.5
Torch	1.10.2
Matplotlib	3.3.4
Scikit-learn	0.24.2
Keras	2.10.0

Training parameters

The parameters for training the CNN network model in this article are shown in Table 3.

Table 3

CNN network dataset parameter configuration
Parameter	Configuration
data_frame	[2225,13]
features_columns	[2225,8]
train_X.shape	torch.Size([1780,7])
train_y.shape	torch.Size([1780,1])
test_X.shape	torch.Size([445,7])
test_y.shape	torch.Size([445,1])
batch_size	32
num_epochs	100

Training Process

Before constructing the CNN neural network model, it is essential to set up a crucial component: the data loader. The data loader divides input data into multiple small batches, loading only one batch at a time during each training iteration. This approach ensures faster gradient descent and parameter updates when using batched data. Data loaders are used to manage and store data for both the training and testing sets, ensuring efficient handling and utilization of data throughout the model training process.

During the training process of the CNN network model, an outer loop first iterates over the number of training epochs, setting the model to training mode to enable specific layers such as dropout and batch normalization. Throughout training, backpropagation is repeatedly used to compute gradients for each parameter. However, performing backpropagation multiple times can lead to gradient explosion, causing the model to fail to train properly and resulting in overflow errors in numerical computations. Therefore, gradient clipping is introduced to prevent such distortions.After training is completed, the iterated data is fed into the test model to compute the output. The loss function is then used to measure the loss between the predicted values and the true labels. The model's performance is ultimately validated using the Mean Squared Error (MSE). A lower MSE value, approaching zero, indicates better model performance. The formula for calculating MSE is shown in Eq. (4),where n is the number of samples,y_i is the true value of the i-th sample, and${\hat {y}_i}$is the predicted value of the i-th sample.

$$MSE=\frac{1}{n}\sum\limits_{{i=1}}^{n} {{{({y_i} - {{\hat {y}}_i})}^2}}$$

In this study, the constructed CNN model incorporates an adaptive pooling layer, which automatically calculates the size of each pooling window based on the input feature map size and the target output size. This simplifies the feature extraction process. After passing through the adaptive pooling layer, the lengths of the two sets of tensor sequences are adjusted to 10. These tensors are then flattened according to the channel number and sequence length and sequentially fed into two fully connected layers. Each set of feature values is combined with the initial weights of the input and output nodes and mapped to the predicted values. Consequently, the shapes of the tensors for the training and testing sets are adjusted accordingly.

Model innovation

Compared to Recurrent Neural Networks (RNNs), this study opts to use Convolutional Neural Networks (CNNs) for data training and prediction for several reasons:

High Spatial Structure Utilization: The convolutional layers in CNN models can effectively capture spatial local features in the input data. For two-dimensional tensors, the convolution operation achieves weight sharing spatially. As the convolution kernel slides systematically over the input data, the model can extract features from various parts in a directional sequence.

Parameter Sharing in Convolutions: During convolution operations, the same convolutional kernel is applied across the entire input image, with the weights on the convolution kernel remaining identical. This significantly reduces the number of parameters the model needs to process, thereby enhancing computational efficiency and the model's generalization ability. Additionally, parameter sharing confers translational invariance to the convolutional layers, allowing the convolutional kernel to correctly identify features regardless of where they appear in the data.

Multi-Scale Feature Extraction: Through multiple layers of convolution and pooling operations, CNN models can progressively extract multi-scale features from the data. Low-level convolutions can capture edge features in the input data, while high-level convolutions can extract object features contained within the input data.

When processing time series data, RNN recurrent neural networks are commonly used for analysis. In order to more intuitively demonstrate which model is more advantageous between CNN and RNN in processing student data and making predictions, this study used the LSTM algorithm for comparison ¹⁷.

The xLSTM algorithm extends upon LSTM to enhance model performance and the capability to handle complex sequential data. It introduces an exponential gating mechanism to provide the model with more dynamic information filtering capabilities. Additionally, it incorporates additional normalization and stabilization techniques to mitigate numerical stability issues arising from the exponential gating. The design objective of xLSTM is to address limitations of traditional LSTM in handling large-scale data and long sequences, such as poor parallelism and limited storage capacity. By introducing new gating mechanisms and memory structures, xLSTM aims to be more competitive in deep learning applications. The prediction results of xLSTM are depicted in Fig. 3.

Based on the parameters described above, the CNN convolutional neural network model is constructed and the data training is initiated. To ensure the training results are as accurate as possible, the training is set for 100 epochs. Using Python, implement the CNN convolutional neural network algorithm by storing the target student's data information in the form of a numpy array, performing feature extraction, Min-Max normalization, regularization, and tensor conversion. Finally, input the transformed data tensors into the CNN model to compute the prediction values and output them, updating the original dataset. The prediction results of CNN convolutional neural network model are shown in Fig. 4.

The comparison of mean square error (MSE) and root mean square error (RMSE) between CNN convolutional neural network model and xLSTM algorithm is shown in Fig. 5.

In comparison to the predictions of the xLSTM algorithm, the CNN model performs better in both numerical prediction and model performance validation. Despite xLSTM's strong performance in handling time series data, in this study, the CNN model excels in predicting and analyzing student grade data over an entire academic year. The fixed two-dimensional tensor dimensions are advantageous for convolutional operations in the CNN model. Conversely, models built on xLSTM require the decomposition of two-dimensional tensors into one-dimensional tensors for training and analysis, potentially leading to feature loss during intermediate processes, thereby affecting prediction outcomes.

This paper analyzes student performance over a school year's large-scale exams, incorporating personal information such as age, gender, and school to conduct multidimensional performance analysis. Leveraging CNN convolutional neural networks within the PyTorch programming environment, compared to RNN recurrent neural networks and Transformer technology, CNNs are more suitable for handling spatially structured and large-scale data. Their inherent features of parameter sharing, automatic feature extraction, and strong transfer learning capabilities distinguish CNNs among various technologies, making them widely applicable across different domains. Therefore, this study chooses CNN models for performance prediction analysis, ensuring stability and accuracy of the predictions to a certain extent. After multiple training iterations with input data fed into the model, the predicted results undergo validation through mean squared error checks, establishing the reliability of using this network model for predictive analysis with a degree of confidence in the data.

The model still has its limitations. For instance, the performance data only considers scores from large-scale exams within the current school year, which spans a significant period. The study interval of one month between exams may lead to substantial fluctuations in students' day-to-day learning states, thereby questioning the reliability of the chosen exam scores. Additionally, the collected personal information about students, including age, school, and enrollment in specialized classes, are factors that could significantly influence performance but are immutable. The study did not explore potentially influential mutable factors such as classroom behavior and psychological activities of the students. Considering these variables could potentially alter the predictive outcomes. These are aspects that warrant further investigation.

Author Contribution

W.X. proposed research directions and innovative points, and analyzed the experimental results. C.G. provided the dataset and wrote code to complete the experimental part of the paper. H.Y. is responsible for writing, polishing, formatting, and formatting the paper. All authors have read and agreed to the published version of the manuscript.

Data Availability

The dataset generated and analyzed in this study cannot be made public as it contains information that may compromise the privacy of research participants. However, it can be obtained from the corresponding authors upon reasonable request.

Wang Zang. Research on Learning Status and Grade Prediction Method Based on Deep Learning [D]. Guizhou University. DOI: (2023). 10.27047/d.cnki.ggudu.2023.000655
Sayed, A. R., Khafagy, M. H. & Ali, M. Marwa Hussien Mohamed,Predict student learning styles and suitable assessment methods using click stream.
Wang, G. H., Zhang, J. & Fu, G. S. Predicting student behaviors and performance in online learning using decision tree. 2018 Seventh International Conference of Educational Innovation through Technology (Eitt 2018) [J]. 214–219. (2018).
Wang & Xiaoli Yuan Junhong. Grade Prediction Model Based on Weighted Naive Bayes Classification Method. Electronics Technology and Software Engineering [J]. (19): 225–226. (2013).
Ashenafi, M. M., Riccardi, G. & Ronchetti, M. Predicting students' final exam scores from their course activities. Frontiers in Education Conference (Fie), 372–380. (2015) [J].2015.
Hu Jintao. Research and Implementation of Student Performance Prediction Teaching System Based on C4.5 Decision Tree (Southwest Jiaotong University [D], 2017).
Hillebrand, E., Lukas, M. & Wei, W. Bagging weak predictors. Int. J. Forecast. [J]. 37 (1), 237–254 (2021).
Ahmed, D. M., Abdulazeez, A. M. & Zeebaree, D. Q. Ahmed Fyh. Predicting university's students performance based on machine learning techniques[C]//2021 IEEE International Conference on Automatic Control & Intelligent Systems (I2CACIS).2021.
Okubo, F., Yamashita, T., Shimada, A. & Ogata, H. A neural network approach for students' performance prediction. Seventh International Learning Analytics & Knowledge Conference (Lak'17) [J]. 598–599. (2017).
Kalyani, B. S., Harisha, D., Ramyakrishna, V. & Manne, S. Evaluation of students performance using neural networks. Intelligent Computing, Information and Control Systems, Iciccs 2019 [J]. 1039: 499–505. (2020).
Hongjiang, C. & Xie Jin.. Research on Learning Performance Prediction Based on LSTM and Its Influencing Factors. Journal of Beijing University of Posts and Telecommunications (Social Sciences Edition) [J]. 22 (06): 90–100. (2020).
Qu, S. J., Li, K., Wu, B., Zhang, S. H. & Wang, Y. C. Predicting student achievement based on temporal learning behavior in MOOCs. Appl. Sciences-Basel [J] 9(24). (2019).
Aljaloud, A. S. et al. A deep learning model to predict student learning outcomes in LMS using CNN and LSTM. IEEE Access. [J]. 10, 85255–85265 (2022).
Fang Tao. Research on Student Performance Prediction Model Based on Ensemble Learning Algorithms (Guilin University of Electronic Technology [D], 2022).
Wang, C. & Research on Performance Prediction Method Based on Student Online Learning Behavior [D].. Guilin University of Electronic Technology, DOI: (2022). 10.27049/d.cnki.ggldc.2022.000414
Deng, J. & Huang, X. Xiaopeng Ren,A multidimensional analysis of self-esteem and individualism: A deep learning-based model for predicting elementary school students' academic performance.
Sun Ruiqi. Research on Prediction Model of US Stock Index Price Trends Based on LSTM Neural Network [D] (Capital University of Economics and Business, 2016).

No competing interests reported.

Download PDF

Editorial decision: Revision requested
30 Oct, 2024
Reviews received at journal
27 Oct, 2024
Reviewers agreed at journal
15 Oct, 2024
Reviews received at journal
07 Oct, 2024
Reviewers agreed at journal
13 Sep, 2024
Reviewers agreed at journal
12 Sep, 2024
Reviewers invited by journal
10 Sep, 2024
Editor assigned by journal
10 Sep, 2024
Editor invited by journal
09 Sep, 2024
Submission checks completed at journal
06 Sep, 2024
First submitted to journal
24 Aug, 2024

You are reading this latest preprint version

Research on Student Performance Prediction Based on Deep Learning

Status:

Version 1

Abstract

Figures

Introduction

Related technologies

Principles of Data Processing

Convolutional Neural Network

Methods

Ethics declarations

Data Sources

Data Processing

Experimental environment and parameters

Training parameters

Training Process

Model innovation

Results

Discussion

Declarations

Author Contribution

Data Availability

References

Additional Declarations

Status:

Version 1