The study of the problem and the whole process of the research is implemented following the Data Science Process. It is chosen because it provides all the basic steps needed for predicting the Students’ Academic Performance on the basis of their grades.
3.1 Research Goal
To understand the problem more formally, a literature review has been performed to study the existing work and related studies. There are many researchers who analyzed the performance of students. Most of them worked on the effects of variables like students’ attendance, personal efforts, time for studies etc. on their performance. Some of them also considered the current CGPA of students but none of them has taken student’s grade in individual courses as an attribute for the study.
The primary goal of higher education institutions is to provide quality education to its students. A good prediction of the students’ performance is helpful to identify the low performance students at the beginning of the learning process. One way to achieve this objective is by discovering knowledge for prediction regarding academic performance of students. This process will be done by using Machine Learning and Data Science tools and techniques. The intention behind the model is identification and extraction of potentially valuable knowledge for foreseeing poor and good performances in different courses.
The main objective of the research is to predict the future grades of the students using their grades in past and grades of the other graduated students so as to lower the dropout rate and inform students about their expected grades and performance in the course.
3.2 Data Collection
The data required is collected from FAST – National University of Computer and Emerging Sciences. The administration of the university provided the data of 3 years initially which was encoded by them for each student to ensure the anonymity of students’ grades and data confidentiality. The data provided has 13,956 rows and 11 columns. Each row has the details of a student registration in a course. It contains details for bachelor’s degree in Computer Science and Electrical Engineering.
The attributes of the data collected are the following:
- Roll No: The students’ identification number in the university record.
- Program: The degree program, student is enrolled in.
- Course: It is the course code of the course, student is registered in.
- Course Title: It is the title of the course, student is enrolled in.
- Credit Hours: The credit hour of the course, student has to complete per week.
- GP: The grading points, student has scored in the course.
- Grade: The grade of the student in the course.
- Sem.: The year and semester, the course was studied in.
- Batch: The batch (year) of student’s enrollment.
The data collected is very redundant and observations of the students are very scattered in raw data. After cleaning and transformation of data, the final data contains only 500 rows for applying the classifiers on. The data is sufficient for Naïve Bayes Algorithm but only 500 observations were not enough for Neural Networks Classifier. Therefore, to apply the classifiers and make better predictions from the past data, more data about the students’ past grades and registration information was collected and the new data contains 34,476 rows. The structure of the data is same but it has different records from the previously collected data. After transformation and cleaning this data resulted into the data of 839 students for their grades in different courses.
3.3 Data Preparation
As discussed above, the data acquired contains redundancy and useless information that is not needed for the predictions and modelling. Furthermore, the structure of the data is also required to change because it can’t be used as an input for classification models.
3.3.1 Data Integration
Combining the data was required for Neural Network and SVM Classifier. Microsoft Excel and its tools were used to simply integrate two datasets into a single dataset.
3.3.2 Data Transformation
As discussed above, the data acquired is not in the required form and is not structurally valid for the modelling. Let’s discuss it more formally, the information required for the single row of the student is scattered in the whole data set. For the modelling process, each student must be in a single row with all its courses in the same record and in the data acquired each student, course pair (i.e., Registration) is a single separate record. This transformation is done using PowerPivot, an external library of Microsoft Excel for data analysis, modelling, cleaning and transformation. The Tables - 1 & 2 will further illustrate the process of transformation.
Table 1- Data before Transformation
Roll No.
|
Course Title
|
Grade
|
1
|
Computer Programming
|
3.33
|
2
|
Data Structures
|
2.33
|
1
|
Discrete Structures
|
2.00
|
1
|
Data Structures
|
2.67
|
2
|
Computer Programming
|
3.33
|
Table 2- Data after Transformation
Roll No
|
Computer Programming
|
Data Structures
|
Discrete Structures
|
1
|
3.33
|
2.67
|
2.00
|
2
|
3.33
|
2.33
|
|
After this transformation, the appropriate course columns are selected for the modelling process.
3.3.3 Data Cleaning
The data we now have after transformation has many missing values and some other errors. Given below is the checklist of the whole cleansing process step by step.
Missing Values
Missing Values are found in the data set after the data transformation (you can see missing values in Table 3). The causes of these students with the missing grade in a course are:
- They are from the other Degree Program (i.e., Electrical Engineering)
- They haven’t completed their degree or they are university dropouts.
- The course selected is renamed or is discarded from the list of offered courses in the later semesters
For the case 3, above the observations were merged with the renamed courses. For Case 1 and 2 the observations are removed, if they haven’t completed the chain of courses or if they are from another degree program.
3.3.4 Exploratory Data Analysis
Exploratory data analysis is necessary because information becomes much easier to grasp when shown in a picture. This phase is about exploring data and the goal is not just to clean the data but to discover the irregularity or exceptions that were missed before.
Graphs, Plots and Charts are used to explore data and are combined so that they can provide even more insights.
3.4 Building Model and Evaluation
3.4.1 Model Selection
The methods are selected on the basis that they classify the students into four categories; depending on the previous university academic performances.
The models selected are:
1. Naïve Bayes:
In machine learning, Naïve Bayes is a classifier based on probabilities, applying Bayes' theorem with strong (naïve) independence assumptions between the features. Naïve Bayes Classifier is selected because the future grades of the particular course chain are required to be predicted if the details of previous courses/subjects are provided. Naïve Bayes Classifier works best in these “given that” and conditional situations.
Chain predictions for Database course can be solved by applying Naïve Bayes as follows;
P(DB| ITC,CP,DS)=P(ITC,CP,DS |DB)P(DB)
P(ITC,CP,DS)
P(DB|ITC,CP,DS)=P(ITC|DB)P(CP|DB)P(DS|DB)P(DB)
P(ITC,CP,DS)
2. Neural Networks:
Neural networks in machine learning are based upon perceptron, which takes multiple binary inputs and produces a single binary output. The inputs have weights which represent the importance of the respective inputs. This leads to how to weigh different kind of attributes or knowledge in order to make decisions. Neural Networks is selected because the input grades can be weighed and the importance of the pre-requisite courses could be defined easily which would affect the course output to be precise.
Input layer of neurons, there are activation functions; Sigmoid, Tanh – hyperbolic tangent, ReLu - rectified linear units. ReLu is used; it can only be applied within the hidden layers of the neural network model, so the output layer uses Softmax function for computing probabilities of classes.
3. Support Vector Machine:
Support Vector Machine comprises of the concept of planes. A plane will have sets of different instances belonging to different classes. It has different classifying techniques. The simplest one is linear, which separates sets of instances into their groups with just a line. For more accurate classification, more complex structures are used. In our classification model, we have used different kernel functions; which are linear and RBF.
4. Decision tree classifier:
Decision tree is defined same as the name describes; it is a tree-like structure, with attributes distributed as root till leaf nodes. It has several branches with varying attributes. Each leaf node (end node) represents a class (category or output variable). The model is based on C4.5 decision tree algorithm, and builds a decision tree from the training dataset. It uses the concept of information entropy.
3.4.2 Variable Selection
There are several techniques used in variable selection for checking different criteria. The most used technique is course chain technique. These techniques are:
1. Course Chain Technique:
Course Chain is a chain of courses that are arranged in a series where a previous course is pre-requisite of next course(s) or predominately related to it. In this case, studying the next course highly recommends passing the previous course of the chain and having good knowledge of the field related to the course. Hence, there is a dependency between those courses and it is very probable that the grade obtained in CS Course-2 would be close to the grade obtained in CS Course-1.
So, the variables selected in this case are grades of all the member courses in the class which will be classified into grades of the last course of chain.
2. Previous Semester to Next Course Technique:
This technique is used for predicting the grade of any course from the next semester having the grades of the courses in the next semester as predictors. It is used because we don’t have enough courses in the chain for the fresh students because they have not enough courses of a chain to get better predictions.
3. Prerequisite Pass Check:
This technique is used to predict whether a student will pass a course or not based on the grades of the prerequisite course of the said course. Grade of prerequisite course is used as attribute while current course is the class.