Dynamic Educational Recommender System Based on Improved LSTM Neural Network

doi:10.21203/rs.3.rs-2306639/v1

Virtual learning environments have become widespread in today's society to avoid time and space constraints and to share high-quality learning resources. In the process of human-computer interaction, student behaviors are recorded instantly. This article aims to design an educational recommendation system according to the individual's interests in educational resources, which is evaluated based on clicking or downloading the source and the score given to that source by the user. In online tutorials, in addition to the problem of choosing the right source, we face the challenge of how to be aware of diversity in users' preferences and tastes, especially their short-term interests in the near future, at the beginning of a session. We assume that the user's interests consist of two parts: 1) the user's long-term interests, which include the user's constant interests based on the history of the user's dynamic activities, and 2) the user's short-term interests, which indicate the user's current interests. Due to the use of Bilstm networks and their gradual learning feature, the proposed model supports learners' behavioral changes, with an average accuracy of 0.9978 and an error of 0.0051 offers more appropriate recommendations than similar articles.

Physical sciences/Engineering

Physical sciences/Engineering/Electrical and electronic engineering

Deep learning networks

recurrent methods

Educational resource recommender system

In recent years, educators' use of learning resources has increased significantly. Many educational resources are distributed in different repositories that respond to a wide range of various educational topics and goals. However, due to information overload, many learners have difficulty finding relevant and valuable learning resources that meet their learning needs [1].

Although existing recommender systems have been very successful in the field of e-commerce, due to There are challenges such as differences in learners' characteristics such as learning style, knowledge level, and learning pattern in creating an accurate recommender of learning resources in the field of e-learning [2]. Many existing recommendation methods do not consider differences in the learners' characteristics. Of course, this problem can be improved by adding plugin information about learners to the recommendation process. In addition, many recommendation methods face problems such as cold start and data sparsity [2].

Educational Recommender Systems (ERS) is used as a tool to help students and teachers [3].

In recent years, deep learning has gained significant results in different problems [4], [5].

Recently, with the help of deep learning methods, recommender systems have grown significantly. Recurrent neural networks (RNNs) play an essential role in modeling session-based recommender systems and promise further improvements. Compared to many other recommender models, an RNN uses a sorted sequence, or a consistent history of users, to create a comprehensive user profile that helps improve the recommendation [7, 8, 18].

So far, much work has been done in combining long-term and short-term user interests [9, 10, 11]. On the other hand, user preferences for items evolve continuously over time [12], while these networks implicitly assume that user interests are static, which is a false assumption in many scenarios [7–9, 13]. Therefore, these methods work very poorly when the interests are content-sensitive and transient. There are also session-based methods [8, 13] that deal with interest drift to solve the problem of static interests of the user. In contrast, these methods assume that all items have the same effect in a session, which is a false assumption, especially considering user interactions with items with different properties. We propose a long-term and short-term attention-based model to recommend educational resources to solve this problem. In none of the mentioned studies an educational recommender has been developed; however, such a recommender is very much needed in educational applications. Our model recommends better training resources using a hybrid attention-based RNN network that simultaneously includes two types of data (short-term and long-term users' interests and updating educational resources). We considered the attention-based technique on the short-term interests of the user, and in the compression phase, we weigh the user interactions based on the time vector. That is, the more practical attributes of the user were considered to have a more effective effect on the recommendation of the educational resource. Unlike clustering techniques that ignore the fewest iterations, all user interactions are preserved.

Utilizing a bidirectional LSTM deep neural network involves the user's long-term and short-term interests. We expect the proposed method to improve the recommendation of the next item, especially at the beginning of sessions, and to actually deal with the problem of a cold start at the beginning of each session.

The study of session-based recommender systems is not a new research topic [9, 14, 18] and is more suitable for learning dynamic and sequential user behaviors compared to traditional iterative systems. The recurrent systems aim to generate search results close to the user's needs and make predictions based on their priorities. In virtual learning environments, educational recommender systems carry learning objects based on the student's characteristics, priorities, and learning needs. A learning object is a unit of educational content that, once found and retrieved, can assist students in their learning process [15]. According to the IEEE [16] definition, a learning object is a digital or non-digital entity with educational design features that can be used, reused, or referenced during computer-based training.

Because of the ability of deep learning to solve many tasks and produce excellent results, universities and industries compete with each other to apply deep learning in a wider range of applications [17]. Recently, deep architectural learning has dramatically transformed recommenders and improved their performance. Deep learning can effectively capture non-linear and non-trivial user-item relationships and encode more complex abstractions as high-level data representations. In addition, it can obtain complex relationships within data from other sources such as conceptual, textual, and visual information [18]. The increasing volume of data in the big data age poses challenges for real-world applications. As a result, scalability is essential for the efficiency of recommendation models in real-world systems, and temporal complexity significantly affects model selection. Fortunately, deep learning is very effective and reliable [39].

1.1 Literature review

In the following, the studies performed on educational recommenders will be reviewed.

1.1.1. Data mining methods

Many educational researchers focus on extracting information about learning progress to help students properly. Yago et al. (2018) introduced an ontology network-based student model for multiple learning environments (ON-SMMILE), which is a semantic web-based model to assist teachers in educating students [19]. It is a constructive learning model in which students are highly involved in learning. This student model provides sufficient assistance to obtain, analyze, and categorize meaningful information about students and their knowledge status. It can also identify possible weaknesses and mistakes in learning. This helps the educator decide what recommendation should be given to each student. One of the advantages of this model is the possibility of applying different processing methods. Applying a combination of supervision and data processing methods to education provides a consistent and broad automated view of student learning.

Knowing the information that helps us define the user profile and identify their interests is essential to producing a personal recommender. Qiao and Xu (2018), introduced an infrastructure that can extract user-profiles and educational content from the Facebook social network and suggest educational resources to him/his [20]. This is done by information extraction and semantic web methods to extract, enrich, and define the user profile and interests. The recommendation action is based on repositories of learning objects, related data, and video repositories and takes advantage of how long the user has been using the Web. Evaluating the user's profile and behavioral characteristics via social media has the advantage that relevant educational information can be offered to him/her that will lead to his/her academic success. In addition, through this, the user's personal information and even his/her educational interests are always up to date. Of course, there are challenges such as access to information and information extraction methods.

Tarus et al. (2017) introduced a hybrid knowledge-based recommender system based on ontology and sequential pattern mining to recommend e-learning resources to learners [2]. An ontology is used to represent knowledge about learners and educational resources, while the SPM algorithm is used to discover the sequential learning patterns of learners. One of the advantages of this method is that it can reduce the problem of cold start and sparsity.

Kuznetsov et al. (2016) showed how the ontology of students' skills could reduce the problem of the cold start of educational recommender systems. Students' profiles are created using their results in university courses [21]. The aim is to offer students projects and industrial opportunities. In this work, they used collaborative filtering algorithms and rule-based recommendations derived from user interactions in their system. The profile-based recommendation has been used to overcome the problem of cold start.

Since the learners' profiles are different, to use recommender systems in issues such as personal learning in the field of e-learning, an educational program should be prepared according to the needs of the learners. With knowledge of learners' profiles, more appropriate learning objects can be recommended to improve the learning process. Bourkoukou and Bachari (2018) proposed a LearnFit system that can automatically adapt to learners' dynamic priorities [22]. This system recognizes different patterns of learners' learning methods and habits by testing their psychological model and extracting their web browsing reports. In this system, to overcome the problem of cold start, which is consistent with the learning method of the learner, a personalized learning scenario is first obtained using the Felder-Silverman model [23]. Then, the habits and priorities of the users are obtained using data mining methods. Finally, the learning scenario is reviewed and updated by combining the collaborative filtering method and association rule mining.

Ludewig and Jannach (2018) introduced a recommender system that helps learners in choosing their lessons [24]. Choosing the right lessons in the early years can improve the research process. In this combined method, the N-gram query classification and the ontology are used to retrieve useful information and make an accurate recommendation. Generally, the query is converted to N-gram in the preprocessing step. The query extension is then applied using WordNet to retrieve similar words. They are then retrieved, and duplicate lessons are removed. In the last step, the related lessons are extracted using the ontology of the lessons. This method is faster than classical methods and helps learners to improve their performance and level of satisfaction and have easier access to information.

1.1.2. Development of traditional methods

Traditional online learning systems are based on different filtering methods that often rely on user behavior in the face of different sources. Recommending resources extracted from users with similar behavior often does not have satisfactory results.

Hagemann et al. (2018) proposed a personal recommender to help students make informed decisions about their learning path [25]. This study aimed to improve the path of discovery of selected modules by students using a hybrid recommender system explicitly designed to help students better discover available options. By combining content-based similarity and dispersion based on structural information about module space, it is possible to improve the predictability of choices that are exclusively consistent with students' priorities and goals. One of the advantages of this is that it can add diversity to the set of recommendations.

Tseng et al. (2017) introduced an adaptive learning and recommendation platform as a tracking tool for educators to observe and monitor student learning activities [26]. Students can learn ALR using the learning path through the platform. The strengths and weaknesses of students' learning can be revealed through the analysis of their learning activities, learning process, and learning efficiency. The aim is to create a concept map for adaptive learning to provide an educational advisor for students. Alinani et al. (2016) proposed a heterogeneous educational resource recommender system based on user preferences [27]. This system not only meets users' needs but also reduces some of the problems of most recommender systems, such as cold start. To do this, the system recommends the user on the recent request process and learns from the user's behavior in the process. A key point of this is that each recommended resource is assigned a weight, which is calculated based on the user's response. Heterogeneous recommender resource allows users to quickly find different types of related resources, thereby increasing user productivity.

Bourkoukou and Achbarou (2018) aimed to build a personal recommender system that results in useful content and better recommendations in the shortest possible time [28]. The proposed system is a web-based client-side application that uses user profiles to form neighborhoods and calculates predictions using weights. To overcome the problem of a cold start, i.e. the lack of information about learners and their interests during the first communication, their profiles are created using the learning method. Resources that are of interest to the user are suggested through predictions calculated by the new functions and the collaborative filtering method. This method reduces both the problem of cold start and data sparsity.

1.1.3. Machine learning-based methods

Thanh-Nhan et al. (2017) introduced a method for integrating social networks into intelligent tutoring systems (ITA) that can predict student performance [29]. To do this, the matrix factorization method is used along with social networks. One of the advantages of this work is that the relationships between students can be used to build models and thereby improve the predictive results.

Pupara et al. (2016) aimed to generate a recommendation and modeling system that uses students' characteristics and opinions to accurately predict and select the most appropriate institution for specific students through data analysis methods [30]. This system consists of three main stages of design, development, and analysis to suggest to students a suitable university. To do this, the decision tree and association rules were used, providing acceptable results for the four factors of university compulsion, trust in institutions, learner skills, and family income.

Rodríguez et al. l (2017) proposed an educational recommender system based on argumentation theory that can combine content-based, collaborative, and knowledge-based recommendation methods or act as a new recommendation method. This method provides educational objects to the student that can generate further arguments to justify their competence [15].

1.1.4. Artificial intelligence-based methods

Duque Méndez et al. l (2018) proposed a CBR-based intelligent personal assistant that can perform user-requested operations and access information from remote sources [31]. This system is a particular recommender system because of its use in web searches. Therefore, the personal assistant allows the user to interact, display, and select items according to needs and priorities. The proposed intelligent personal assistant enables users to select educational resources from learning repositories. In this regard, a recommender system has been implemented based on an artificial intelligence method called CBR. CBR is a method in artificial intelligence that tries to solve new problems like humans do, using the experiences they have gained in similar events to make decisions in similar cases [32]. In the method used, first, it is necessary to identify the elements in students' profiles and to learn object metadata. Then it is essential to define a criterion for retrieving the most similar items and specify the update of these items. One of the advantages of this article is the use of different educational resources.

Neto (2018) [33], developed a multi-agent recommender system, which helps e-learning recommendation systems to offer students the most appropriate educational resources. This work utilizes multi-agent technology to develop a system that combines web usage and extraction algorithms such as content-based methods and collaborative filtering to find the most appropriate training resources. The performance of this combined method is better than each algorithm.

Advances have also been made in building models for searching and retrieving learning objects stored in heterogeneous repositories. Paula Rodríguez (2013) proposes integrating two multi-agent models focused on delivering specific LO adapted to a student's profile, and delivering LO to teachers to assist them in creating courses [34].

This aims to have an integrated multi-agent model that meets the needs of students and educators and thus improves the learning and teaching process.

1.1.5. Methods based on neural networks and deep learning networks

Paradarami et al. (2017) [35] proposed a hybrid recommender system for vote prediction using an artificial neural network framework that uses both the capabilities of the content-based method and collaborative filtering to model training. This method combines content (user and business), participation (comments and votes) and vote-related metadata under a single learning model with an observer that provides better results compared to collaborative filtering recommendation systems. In this method, a multi-class classification model is developed to predict the class of a vote. One of the advantages of this is that ANN can be extended to other classes and any user in the system, which makes the model highly scalable.

Xiao Wang et al. (2017) proposed the e-learning recommendation framework based on deep learning [36]. This model is based on deep learning that can learn from large-scale data. The deep network used is GRU [37], a type of LSTM return network. One of the advantages of this work is the use of the K-nearest neighbor method to train the model, the accuracy of which is guaranteed. Second, it can recommend new items whose similarities cannot be calculated. Third, it dramatically reduces system performance, which benefits the actual applications of recommenders.

In [54], an advanced recommendation method called AROLS has been proposed. It is integrated with a comprehensive learning model style for online students. This work suggests recommendations by considering the learning method as prior knowledge. In this way, first, it creates clusters of different learning styles; second, the behavioral patterns presented by the learning resource similarity matrix and the communication rules of each cluster are extracted using students' review history. Finally, it creates a set of personal recommendations with variable sizes based on the data mining results of the previous steps. Experiments show that the method presents the recommendations more accurately while maintaining the computational advantage than the traditional recommendation of participatory filtering (CF).

In [52], researchers develop a recommendation mechanism for the adaptive learning system, which considers the theory of learning style by combining traditional recommendation algorithms with the clustering technique. The experiments show that the proposed method can perform better and has a computational advantage over the conventional recommendation.

In [53], a method that integrates the features of online learning style in the recommendation algorithm is proposed. In general, it consists of the features of participatory filtering (CF), association rules criteria (ARM), and online learning style (OLS). The proposed method has a 25% improvement over the scheme without student characteristics.

In [54], attention concentration neural networks (CNN) have been used to collect user information, predict user rankings, and recommend top courses. In this work, the participatory filter and an attention-based CNN method are integrated to enable real-time recommendations and reduce server workload. The learner behaviors and learning history are shown as feature vectors; Also, the attention mechanism is used to improve the relationship estimation by considering the difference between the estimated and the actual scores provided by users for neural network training. Finally, the trained model recommends courses to students. However, the proposed system may suffer by recommending similar courses due to a large number of them with the development of MOOCs. This module shows the type of learning and learning habits of the user, such as the time of study per week, dropout rate, etc.

With the rapid rise of MOOC platforms, online learning resources are on the rise. Since learners differ in their ability to recognize and structure knowledge, they cannot quickly identify the learning resources they are interested in. Traditional technologies that recommend collaborative filtering work very poorly and have problems such as data sparsity and cold start. In addition, duplicate recommended content and high-dimensional, non-linear data on online learning users cannot be effectively managed, leading to resource recommender inefficiencies. To increase learner productivity and enthusiasm, Zhang et al. (2018) introduced a highly accurate resource recommendation (MOOCRC) model based on deep belief networks (DBNs) in MOOC environments [38].

This method deeply extracts the characteristics of the learners and their lesson content and incorporates the behavioral characteristics of the learners to construct the vector of user-lesson characteristics as the input of the deep model. Learners' scores on lessons are processed as supervised learning tags by the supervisor. The MOOCRC model is trained with supervisors without pre-training and is fine-tuned using supervisor feedback. In addition, this model is obtained by repeatedly adjusting the model parameters. To evaluate the effectiveness of MOOCRC, an experimental analysis is performed using selective data from learners obtained from the starC MOOC platform of Central China Normal University. Learners' actual participation data in lessons are used to assess the accuracy of the MOOCRC classification. The results show that MOOCRC has higher recommendation accuracy and faster convergence than other traditional recommender methods.

In the following, some specialized concepts will be stated, and in the next section, the details of the proposed method will be explained, which is a combined architecture of deep learning networks with compression technique and separation of activity time into two short-term and long-term. In the fourth section, the results of the implementation of the proposed algorithm will be reviewed, and finally, in the fifth section, the content will be summarized, and suggestions will be presented.

In [56], we have used the resource advisor system as an educational environment to recommend educational resources to students so that these recommendations are tailored to the preferences and needs of each student. We present the resource recommending system as a combination of improved deep learning networks MLP, BiLSTM, and LSTM using the attention method. Compared to similar studies using DBN networks and focusing only on the interests and preferences of users in the recent past, the proposed system, in addition to the long-term past interests of the user, offers higher accuracy and more appropriate recommendations according to current interests.

1.1.6. Conclusion

Different techniques and methods have studied recommender systems and educational recommender systems. Most of these researches, especially regarding an educational recommender for recommending scientific resources, have tried to solve the problem with linear and data mining methods and models such as ontology. [24, 19–26, 28].

A large amount of information is one of the problems and limitations of these methods; they only introduce a framework for recommendations. An educational consultant that automatically provides useful advice is not much discussed. Some others have also benefited from machine learning and artificial intelligence methods, which have shown a more suitable efficiency for solving problems than the previous methods. [19, 26, 28]. One of the best ways to solve the problem is to use the structure of artificial neural networks [38, 35–37]. Continuing the evolution of neural networks, deep learning networks are one of the newest and most complete solutions.

These types of networks can solve problems with high accuracy due to the acceptance of a massive amount of problem data, neural network integration, learning techniques, and structural dynamics in forming the number of hidden layers. The issue of the educational advisor was not exempted from this issue, and Most of the news articles presented in the field of recommendation have used this technique. It can be claimed that today's topic and solution in the field of recommenders is deep learning techniques and their combination models, along with the use of large-scale data (big data).

But in many of these methods, the exact weight is considered for all the users' interests in learning. Only the user's past information is used for education. While in the present thesis, having a network that looks both backward and forwards, it is possible to cover the changes in the learner's behavior and suggest more updated recommendations.

The implementation process of our proposed recommender system includes 5 phases of data mining from OULSD standard database, data preprocessing, and model construction with a combination of deep learning networks such as LSTM, MLP, GRU, Bilstm improved with attention technique, weighting parameters, and training. Finally, using the trained network, we recommend resources to users.

The proposed model (Fig. 1) consists of two independent blocks for processing long-term interests and short-term interests. Dataset records are divided into short-term and long-term sections based on a specific time axis. This separation of user interests and different views and valuing of interests based on the time axis, while increasing the system's accuracy and not encountering too many errors, suggest suitable and personalized resources based on the needs and tastes of students. Short-term interests play a more effective role than long-term interests in offering educational resources.

Researchers have looked at short-term interests as a fixed feature in many past works and therefore assigned the same weight to the items. Then the compression algorithm presented in this treatise is applied to the long-term part, and the data is compressed in both row and column dimensions. In the following, for the users in the data bank who have not been active in a short period, we have considered their last activity in the long-term sector as the activity in the short-term sector. After the design and creation of the model, the training begins. At this stage, for each user, the first record of his activity in the short-term section is repeated for all his activities in the long-term section (this work is repeated for the number of records in the short-term section).

Finally, the remaining records from the database have been used to test the loss and accuracy of the test data set.

2.1. Data Preprocessing

The steps in the preprocessing phase can be seen in Fig. 2. First, we extract the provided resources, student features, courses held, and student performance and evaluation in each course from the OULAD standard database. After combining we proceed to categorizing, mapping the features, deleting the empty or incorrect data and normalizing the features.

Dataset As input to the training and testing phase of our proposed model, we have used the standard analytical learning database of OULAD Free University, which is stored in CSV format. [47, 45].

This database is collected from sample data from students, including demographic data, the courses attended, their set of study activities during the course, and the final results of each course. More than 30,000 students interact with the virtual learning environment (VLE) and include data about 22 courses that indicate the learning subject and the set of sessions that ended with the test. This data is collected through a daily summary of student clicks on various sources. In OULAD, tables are linked to a person using unique identifiers. The files used are briefly described below:

Assessments

There are several evaluations and one final exam in each course; their data is available in this file.

Student Info

This file holds data about students' demographics along with their results. In addition, each student can have several records.

Student Vle

Includes student's clicks and interactions with resources available in Vlr that can be in Html, pdf, etc. formats.

Student Assessment

keeps the results of the evaluations made during the course per student

For more information, study [44, 45].

After merging the data with Formula 1, we proceed to normalize the feature in the domain [0,1].

$${x}^{\text{*}}=\frac{x-{x}_{min}}{{x}_{max}-{x}_{min}}$$

1

Where xmin represents the minimum eigenvalue, xmax is the maximum value, x^* is the normalized value, and x is the original data. Converting the available string values in the database to numeric values is another step in the preprocessing step. In Table 1, you can see an example of the values available in the database that are mapped to numerical values in Table 2.

Table 1

A sample of database data before mapping
code_module	code_presentation	id_student	gender	highest_education	age_band	final_result	score_mean	id_site	date	sum_click
AAA	2013J	11391	M	HE Qualification	55<=	Pass	82	546669	-5	16
AAA	2013J	11391	M	HE Qualification	55<=	Pass	82	546662	-5	44
AAA	2013J	11391	M	HE Qualification	55<=	Pass	82	546652	-5	1
AAA	2013J	11391	M	HE Qualification	55<=	Pass	82	546668	-5	2
AAA	2013J	11391	M	HE Qualification	55<=	Pass	82	546652	-5	1
AAA	2013J	11391	M	HE Qualification	55<=	Pass	82	546670	-7	2
AAA	2013J	11391	M	HE Qualification	55<=	Pass	82	546671	-7	2

Table 2

The values of features mapped to the number
code_module		code_presentation		age_band		gender		highest_education		final_result
AAA	0.1	2013B	540	0–35	0.1	F	0.1	A Level or Equivalent	0.1	Distinction	0.1
BBB	0.2	2013J	720	35–55	0.2	M	0.2	HE Qualification	0.2	Fail	0.2
CCC	0.3	2014B	180	55<=	0.3	-	-	Lower Than A Level	0.3	Pass	0.3
DDD	0.4	2014J	360	-	-	-	-	No Formal quals	0.4	Withdrawn	0.4
EEE	0.5	-	-	-	-	-	-	Post Graduate Qualification	0.5	-	-
FFF	0.6	-	-	-	-	-	-	-	-	-	-
GGG	0.7	-	-	-	-	-	-	-	-	-	-

2.2. Records labeling

Since the available data do not have a defined label, in this research, we have labeled the records as follows:

From the set of activities recorded for each student's joint courses, from the course that has the highest score in his assessments (which can be the effect of the resources studied) the source that has the most clicks (which can indicate the taste and interest of the student) has been chosen as the label.

The result of data labeling was the separation of sources into 562 categories. Their frequency can be seen in Fig. 3. At the end of the preprocessing phase, the data is divided into training, test, and validation sets.

Investigating the correlation between variables and labels

To investigate the possible correlation between the label and the existing variables that were used as input to train the model, we used the correlation test, and as you can see in Figure Fig. 4, there is no significant correlation between the variables and their label.

2.3.1. Symbols

In this research, our main goal is to extract the interests and priorities of users by considering User-Item interactions. For example, each click on the learning source is considered an action for the user. We suppose U and V, respectively, present a set of users and items, and for each user, u∈U is consecutive time windows: Wu = {w1u, w2u ,…,wtu}, where t indicates the total number of time windows. Also, wtu represents a set of smaller time units as wtu = {d1,d2,…,dx} where x represents the length of the time windows. In addition, user's related items (u) in time windows (t) are expressed by wtu. In each time window, there is a number of events {et, iu∈ Rm | i = 1, 2, ..., wtu|} where et and iu describe event i in time units of the time window (dx). Another point to note is that the user u, Interacts with resources | vi∈V |in each event.

For a time stage t, the S_t^u session represents the short-term interests of the user at time t, and the sessions before the time stage t represent the long-term interests of the user, which is defined as L_t−1^u = S₁^u ∪ S₂^u ∪ ... ∪ S_t−1^u. Our goal is to predict the next learning resource (e_t,i+1^u) in the S_t^u session.

1. Short-term Interests Layer:

The task of the short-term class based on the attention technique is to generate recommendations considering the user's long-term interests while processing a sequence of training resources in the current session to extract short-term interests. To recommend the next source, the user's short-term interests are essential. The user's short-term interests are different types of short-term interests in different categories of educational resources.

Researchers have looked at short-term interests as a fixed feature in many past works and therefore assigned the same weight to the items. As a result, diversity in short-term interests has not been properly evaluated. Thus, the proposed model's architecture based on the attention technique has given weight to both long-term and short-term sessions. With this technique, the characteristics of the user u are fully taken into account. In addition, two-way LSTM is used to predict the recommendation to look at both the past and the future and be sensitive to the variation in the user's short-term interests in both directions; this way, learners' behavioral changes can be covered.

To discover these features, we propose a module based on two-way LSTM networks to extract periodic features and capture such time dependence of the input feature.

In this research, we have evaluated single-layer and double-layer models.

The input with length 12, $\text{I}=\{{i}_{1},{i}_{2},\dots {i}_{12}\}$ is entered into the proposed model and the output ${y}_{t}$ is calculated according to formula 2.

$${h}_{t}^{f}=\text{t}\text{a}\text{n}\text{h}({W}_{xh}^{f}{x}_{t}+{W}_{hh}^{f}{h}_{t-1}^{f}+{b}_{h}^{f})$$

$${h}_{t}^{b}=\text{t}\text{a}\text{n}\text{h}\left({W}_{xh}^{b}{x}_{t}+{W}_{hh}^{b}{h}_{t+1}^{b}+{b}_{h}^{b}\right)$$

2

$${y}_{t}={W}_{hy}^{f}{h}_{t}^{f}+{W}_{hy}^{b}{h}_{t}^{b}+{b}_{y}$$

As you will see in Table 6, the results obtained from the bi-layer BiLSTM network, where the output ${y}_{t}$ from the lower layer are transferred to the input of the upper layer, which are more favorable.

The main idea of the attention technique is to learn to assign accurate (normalized) weights to a set of features. So, higher weights indicate that the corresponding feature contains more important information for the given task. (Fig. 5)

Attention techniques are divided into two categories based on calculating attention scores.

1) Standard vanilla attention and 2) Collaborative attention. Note Vanilla uses a parameterized content vector, while collaborative attention is related to learning attention weights from two sequences. In this treatise, method 1 (formula 3 [163]) is used.

$${u}_{t}=\text{t}\text{a}\text{n}\text{h}(\text{W}{ĥ}_{t}+\text{b})$$

$${\alpha }_{t}= \frac{\text{e}\text{x}\text{p}\left({\text{u}}_{t}^{T}u\right)}{\sum _{t}\text{e}\text{x}\text{p}\left({\text{u}}_{t}^{T}u\right)}$$

3

$$v=\sum _{t}{a}_{t}{ĥ}_{t}$$

${u}_{t}$ : The vector of valuing the features

${\alpha }_{t}:$ Normalized weight of features obtained by softmax function.

$v$ : The sum of all input information that includes the sum of the weights of each ${h}_{t}$ With ${\alpha }_{t}$ as the corresponding weights

Then the v vector is entered into a fully connected layer with softmax activation to perform the final classification. The recommendation is a vector $y\in {R}^{2}$ with significant and non-significant probability. Using argmax, we select the highest probability as the model recommendation.

2. Long-term Interests Layer:

The task of the long-term class is to provide the user's long-term interest information to the short-term interest class based on the attention technique before the short-term interest class enters the current session. The user's last k sessions should be considered first. The attention technique calculates the importance of each item in the set of short-term items for a specific user. On the other hand, the compression algorithm, in turn, gives weight to the resources available in different time windows with a policy. This information is entered into the system to represent the user's interests and priorities.

In the long-term interests class, we use the compression technique in both row and column dimensions due to the large volume of data records belonging to each student. The main innovation of the long-term interests section is in the compression phase of users' long-term data.

After applying compression on long-term data with a window length of 7, the available data will decrease from 12*10543682 to 11*5167599, and GRU is entered as input to a two-layer multi-cell network the output ${y}_{t}$ is calculated according to formula 4.

$${z}_{t}={\sigma }({W}_{z}.[{h}_{t-1,}{x}_{t}\left]\right)$$

$${r}_{t}={\sigma }({W}_{r}.[{h}_{t-1,}{x}_{t}\left]\right)$$

$${ĥ}_{t}=\text{t}\text{a}\text{n}\text{h}\left(W.\left[{r}_{t}*{h}_{t-1,}{x}_{t}\right]\right) \left(4\right)$$

$${h}_{t}=\left(1-{z}_{t}\right)* {h}_{t-1}{+z}_{t}* {ĥ}_{t}$$

The result of short-term and long-term class training are connected, then entered into the MLP layer with the relu activation function. After applying dropout = 0.25, the last step enters it into the MLP layer with the softmax activation function. The cost function used in this structure is categorical cross-entropy. In the model training process, due to the large volume of input data, we used the minibatch method or size 1028.

To achieve better results in feature extraction, model parameters in each layer have been adjusted many times, and the model has been trained, tested, and evaluated. Finally, the learning rate of 0.0001 and the relu activation functions in the MLP layer have worked optimally in the model structure. The dropout layer is used to prevent the complications of fully connected layers, preventing the overfitting of the network. The allocated value for it is also among the causes obtained by setting different values, training, and testing the model.

In similar problems where the number of existing classes is more than two classes, the Softmax activation function and the cross-class entropy cost function are used in the output layer of the model.

1- Compression in Feature Level

To select the best features for a supervised learning model, "Supervised Feature Selection methods" are provided. These algorithms aim to select the best subset of features to ensure the optimal performance of a supervised model that uses "labeled data" to select the best features. However, in the absence of labeled data, methods such as "unsupervised feature selection methods" have been implemented that score features based on various criteria such as "variance", "entropy", and "ability of features in data retention related to local similarities and other items.

In this research, according to the nature of network input data and the considerable features that the Correlation Matrix algorithm has, it has been used as the feature selection algorithm.

2- Compression in Record Level

One of the lesser-known ways is to compress rows to reduce the amount of data. In our proposed method (Fig. 6), none of the lines that show user interaction with the learning environment is deleted. Also, to improve the results of recommender systems, in the time vector of user behavior, the more we move from birth to the present, the more we value that performance.

We consider one (or a limited number) of windows as a background and treat user interactions in different time windows as objects moving in the foreground. In addition, we assign weight to time intervals so that each user action, according to the position in the relevant time window, gains a proportionate weight.

We have considered lower weights for very distant time windows and more weights for very close time windows. For us, the repetition of features in the period under study is also important, and for the number of repetitions of each weighted feature in this background, a new weight is assigned to it. Finally, each feature is repeated only once in the session intended as the background. In the final table of data compression, the new field holds the frequency of the merged feature.

2. Methods And Tools Of Data Analysis

We seek to predict the best educational resources, to evaluate the performance of the proposed method, a set of criteria are tested and evaluated as follows:

$$Accuracy=\frac{ TP+TN}{TP+\text{T}N+FP+FN}$$

5

Indicates what percentage of experimental records are properly categorized.

$$Precision=\frac{\sum _{x\in X}|R\left(x\right)\cap H\left(x\right)|}{\sum _{x\in X}\left|R\left(x\right)\right|}$$

6

$$Recall=\frac{\sum _{x\in X}|R\left(x\right)\cap H\left(x\right)|}{\sum _{x\in X}\left|H\left(x\right)\right|}$$

7

$$F1=\frac{2*Precision* Recall}{Precision+Recall}$$

8

Here x is a student from the set of all students X, R (x) represents the learning resources recommended for student x, and H (x) represents the learning resources observed by learner x. [53]

The available data have been used as network input in short-term and long-term sections. The short-term part contains 139968 records, and the long-term part contains 5167599 records after compression, as can be seen in Table 3, we have performed in 4 different steps, and all cases show validation-split = 0.2 as a result. After the completion of the Epochs, as shown in Tables 12 and Table13, our model has better suggestions for scientific resources due to the optimal structure in comparison with other methods.

Table 3

Model Test Result
Train test split	% 70 Train % 30 Test	% 80 Train % 20 Test	% 90 Train % 10 Test	% 99 Train % 1 Test
Accuracy test	9979.0	0.9978	0.9977	0.9977

3.1. Investigating the Effect of the Number of Cells in Each Layer

Our input data is tabular; usually, a variety of long-term recursive networks converge to this type of data and texts earlier. Besides, the temporal nature of learning requires adopting methods that can use a period. So, in the first step, we implemented three types of LSTM, Gru, and Bilstm networks with single-cell structures in three single-layer, two-layers and three-layer architectures. As you can see in Table 4, the results are not desirable. None of the 9 single-cell architectures has been able to find the pattern and the relationship between different features and their relationship in combination and fitting with different models. Even increasing the number of layers did not play an essential role in improving the results.

Table 4

Results of training and testing of one to three-layer single-cell networks: LSTM, Bilstm, and GRU
model	layer	train		Test
model	layer	accuracy	val_ accuracy	accuracy
lstm	1	0.2825	0.2879	0.28
	2	0.2607	0.2577	0.18
	3	0.266	0.2548	0.25
GRU	1	0.2389	0.2539	0.25
	2	0.2439	0.2562	0.25
	3	0.2317	0.2388	0.23
Bilstm	1	0.4481	0.6298	0.62
	2	0.3133	0.3231	0.31
	3	0.3057	0.3132	0.31

In the second step, we achieved better results by increasing the number of cells in each layer, equivalent to the number of features available as model input. As seen in Table 5, the multi-cellular single-layer architecture implemented and studied in three single-layer LSTM, Bilstm, and Gru architectures have better results than single-cell architecture. Meanwhile, the Gru network with higher generalizability, an error of 0.2 and an accuracy of 0.91 has better performance than the two other types of networks.

Table 5

Results of training and testing single layer multi-cellular networks of LSTM, GRU, and BILSTM
model	train		test
model	accuracy	val_ accuracy	accuracy
LSTM	0.7573	0.7704	0.77
BiLSTM	0.8757	0.9036	0.90
GRU	0.8980	0.9099	0.91

3.2. Investigating the Effect of the Number of Layers of Network Architecture

To evaluate the effect of increasing the number of layers, we have implemented and trained three two-layer multi-cellular architectures with LSTM, Gru, and Bilstm networks. The results in Table 6 show that two-layer architectures have been better than single-layer architectures. Also, they have been more successful in finding the pattern and the relationship of different features.

Table 6

Results of training and testing of two-layer LSTM, GRU, and BILSTM networks
model	train		test
model	accuracy	val_ accuracy	accuracy
LSTM	0.9021	0.9169	0.91
GRU	0.9013	0.9121	0.92
BiLSTM	0.9529	0.9539	0.95

3.3. Investigating the effect of using the attention mechanism

The method of "attention" is derived from human visual attention. Just as man focuses on certain parts of visual inputs for cognition or perception. Integrating the "attention" technique with Rnns helps to process long and noisy inputs. Although LSTM can theoretically solve the problem of long memory, it still has problems when faced with long intervals. The attention-based network technique helps in remembering long-term inputs. Using the attention technique in recommender systems removes useless content while maintaining interpretability and selects the items with the most representation [44–45]. We have evaluated the attention technique's effectiveness by implementing single-layer multi-cellular Lstm, and Bilstm networks. As seen in Table 7, the use of attention techniques in network architecture has positively affected the results.

Table 7

Results of educating and testing of LSTM and BiLSTM network and the attention technique
model	train		test
model	accuracy	val_ accuracy	accuracy
LSTM and Attention Techniques	0.9423	0.9366	0.94
BiLSTM and Attention Techniques	0.7458	0.7580	0.70

3.4. Compression evaluation

As you can see in Table 8, we compressed the extracted original data into four different window lengths with our proposed compression algorithm. The win count column gives the number of windows matched on the records. As expected, the length of the larger window leads to fewer windows. The amount of data compression in the row dimension depends on the length of the selected time window. While in the column dimension, due to the stability of the Correlation Matrix algorithm and the dataset fields, the compression output in all evaluations has a constant value of 11.

Table 8

The effect of compression on the number of rows and columns
Dataset	Dimensions	Record Train	Record Test	Time Data dimensions has reduced( sec)	Label Count	Time Data Label(sec)	win count
Original dataset	10,543,682*12	7064266	3479416	0	562	125.2212	-
Dataset Compress Win7	5167599*11	3462291	1705308	48.1757	252	123.3214	116
Dataset Compress Win 14	4239764*11	2840641	1399123	23.9671	453	122.1479	58
Dataset Compress Win 30	3396653*11	2275757	1120896	12.8500	1123	132.0480	27
Dataset Compress Win 60	2816927*11	1887341	929586	8.6325	3754	128.2923	14

To evaluate the proposed compression method, we have trained and evaluated five architectures: LSTM, GRU, LSTM + Attention, GRU + Attention, and Bilstm in 3 different window lengths. Researchers sometimes miss network training due to a large amount of data and the lack of access to appropriate hardware. As shown in Table 9, with our proposed method, this is possible by selecting the appropriate window length and designing the application model. We have trained and tested original and uncompressed data in 50 courses for different architectures. We have trained and tested similar architectures with 100 courses during 7, 14, and 30-day windows. In this way, you can see that in addition to using fewer hardware resources such as RAM, despite doubling the number of courses, a lot of time is saved.

Table 9

Investigating the effect of selected window length in accuracy and loss of training and testing phases.
Structure of Network		Original dataset ( Epoch 50)				Compress Win 7 ( Epoch 100)
		Time ( seconds)		accuracy		Time ( seconds)		accuracy
		PC	Colab	Val	train	Colab	PC	Val	train
Lstm	train	115395	low Memory or Time	0.8652	0.8645	15853	65003	0.8767	0.8872
Lstm	test	1933		0.90		686	983	0.88
Gru	train	154456		0.8809	0.893	42546	82781	0.9048	0.9087
Gru	test	2486		0.89		836	1558	0.91
Lstm + Attention	train	172034		0.9122	0.9229	15108	90539	0.8855	0.8934
Lstm + Attention	test	1970		0.94		720	1056	0.89
Gru + Attention	train	205520		0.809	0.815	48011	93042	0.8977	0.9034
Gru + Attention	test	2518		0.81		1005	1756	0.90
Bilstm	train	188240		0.8299	0.8609	99400	160885	0.8104	0.8256
Bilstm	test	2020		0.86		875	1152	0.82
Structure of Network		Compress Win 14( Epoch 100)				Compress Win 30( Epoch 100)
		Time ( seconds)		accuracy		Time ( seconds)		accuracy
		PC	Colab	Val	train	Colab	PC	Val	train
Lstm	train	53769	12529	0.7506	0.7735	10418	43670	0.5592	0.5986
Lstm	test	762	587	0.77		296	556	0.60
Gru	train	68944	30388	0.837	0.8508	25549	55505
Gru	test	1347	583	0.85		311	1153	0.72
Lstm + Attention	train	86345	11982	0.7864	0.8029	9816	59918	0.6811	0.7219
Lstm + Attention	test	839	654	0.80		412	702	0.72
Gru + Attention	train	79475	42066	0.7911	0.8001	35313	60301	0.7283	0.753
Gru + Attention	test	1520	875	0.80		401	1293	0.75
Bilstm	train	132128	59400	0.6723	0.686	50733	106813	0.5721	0.6218
Bilstm	test	912	756	0.69		590	807	0.62

As you can see in Table 9, by comparing the results of the original data and the accuracy and loss after data compression in the training and testing phase of the implemented models, the network with window lengths of 7 and 14 Learns with high speed and acceptable accuracy. As can be seen (Table 10), in the window with a time duration of 7, the average accuracy of training and testing of implemented models is maintained despite having the data and increasing the execution speed. Also, for a window with a length of 14, when our data volume has reached approximately 40, the accuracy has decreased by 0.10%.

Table 10

Summarizing the impact of compression in terms of accuracy, saving memory and time. (wi means the length of window).
Dataset	Percentage reduction of data records after compression	Average execution time in Colab for trained models (seconds)	Average execution time in PC for trained architectures (seconds)	Average accuracy in trained models
Original	0	NAN	84657.2	0.88
Compressing (WI = 7)	50.98867%	22504	49875.5	0.88
Compressing (WI = 14)	59.78858%	15982	42604.1	0.78
Compressing (WI = 30)	67.78494%	13383.9	33071.8	0.682

According to Table 11, the results of our proposed architecture, with a loss rate of 0.005 and an accuracy of 0.997, are much more accurate and desirable than the proposed architecture [46–51]. Our proposed architecture has two short-term and long-term compressed layers with a window length of 7 and uses an improved two-sided bilstm structure with attention and GRU technique, which according to Table 8, has good generalizability. We also trained and tested our proposed architecture on compressed data with different window lengths, and the results (Table 11) were also very desirable with high window length compressions.

As can be seen in Fig. 7, the loss rate at the final AIPACs indicates that the number of selected AIPACs is appropriate, and since the validation and validation accuracy diagrams are approximately the same, it is clear that no overfitting occurred in this experiment. The decreasing slope of the loss charts in the early AIPACs also indicates the appropriateness of the learning rate selected in our training process.

Table 11

The result of the proposed network in terms of train and test accuracy
	train		test
Win7	accuracy	val_ accuracy	accuracy
Win7	0.9969	0.9977	0.998
Win14	accuracy	val_ accuracy	accuracy
Win14	0.9901	0.9939	0.994
Win30	accuracy	val_ accuracy	accuracy
Win30	0.9825	0.9914	0.99
Win60	accuracy	val_ accuracy	accuracy
Win60	0.9664	0.9812	0.98

3.5. 5-fold cross-validation

Mutual validation is a model evaluation method that determines the extent to which statistical analysis results on a data set can be generalized and independent of educational data [47]. The data is divided into five subsets, each used for validation and the other 4 for training. This procedure is repeated 5 times, and finally, the average of the results is chosen as a final estimate. The results of the validation of the proposed model are shown in Table 12.

Table 12

Results of the K-fold method on data accuracy
5-fold	K = 1	K = 2	K = 3	K = 4	K = 5
accuracy	99.57	99.89	99.71	99.78	99.82

3.6. Comparison of the performance of the proposed model with other models

We have compared the results of the proposed model in the first row of Table 13 with other methods presented in related work or implemented by ourselves. As can be seen, the results are more desirable for different evaluation parameters of the proposed model than other implemented methods. All evaluations were performed on OULAD shared data.

The proposed method [38] has been implemented, trained, tested, and evaluated with OULAD data. As shown in Table 13, it performed worse than our proposed model in terms of both error and accuracy criteria.

In [48], the three criteria, including Recall, Prec, and F1 for the three methods itemCF, Clustering + itemCF, and AROLS are examined. It shows that the proposed algorithm (AROLS) has a better Prec than the other two cases. Meanwhile, F1 and Recalls remain relatively steady at n top recommendation simultaneously.

The work [52] shows that AROLS performs much better than traditional participatory filtering, especially User-AROLS calling and accuracy, which has more than tripled. Also, the calling accuracy of UserCF is much smaller than ItemCF, probably because UserCF focuses more on the interests of learners who are more like a particular learner. At the same time, the ItemCF recommendation is more personal because it primarily suggests similar items based on the learner's interest. As can be seen in the first row, the proposed model performed better than all seven reviewed methods.

Table 13

Comparison of the proposed method with the results obtained from other implementations performed by us and studies [38, 48, 52, 53]
Model	Accuracy	Recall	Prec.	F1	ref
proposed model	0.9978	0.8898	0.8826	0.8791	-
KNN Baseline (knn)	0.9865	0.9295	0.9865	0.9864
Naive Bayes(nb)	0.1981	0.4725	0.2937	0.1853
Logistic Regression(Lr)	0	0	0	0
Latent Dirichlet Allocation(lda)	0.5415	0.0772	0.4314	0.4579
SVD	0.2281	0.4975	0.3231	0.2157
DBN	0.2912	-	-	-	38
AROLS	-	0.3	0.29	0.55	48
itemCF	-	0.5	0.18	0.57
Clustering + itemCF	-	0.85	0.24	0.86
ItemCF	-	0.026	0.1334	0.0435	52
Item-AROLS	-	0.0406	0.1880	0.0668
User-AROLS	-	0.0018	0.0046	0.0026
UserCF	-	0.0005	0.0011	0.0007
CF with ARM	-	0.6874	0.076	0.1374	53
Method article	-	0.8647	0.1033	0.1842	53

In [53], the results show that OLS characters can make the recommendation algorithm more accurate and robust, but as seen in the first row, the proposed model performed better than both studied methods.

3.7. Investigating the scalability and Time Complexity of the proposed method

We have trained and tested the proposed model with different data volumes to check the scalability and time complexity. The results in Table 14 show that reducing the volume of input data reduces time complexity, but the accuracy obtained is still desirable, indicating our proposed model's scalability.

Table 14

Investigation of scalability and time complexity of the proposed method
Number of records	Train time(sec)	accuracy	val_accuracy	Accuracy-test
Win7 = 5167599	133394	0.9972	0.9981	0.997
Win14 = 4239764	106375	0.9901	0.9939	0.99
Win30 = 3396653	87024	0.9825	0.9914	0.98
Win60 = 2816927	72806	0.9664	0.9812	0.966
1816927	46554	0.9328	0.9563	0.93
816927	21529	0.7371	0.7722	0.73

Recommender systems, especially educational recommender systems, have been studied using different techniques and methods mentioned in the previous sections. Most of these studies, especially in the issue of the educational recommender to recommend scientific resources, have tried to solve the problem with linear methods and models and data mining such as ontology [19–26, 28]. One of the problems and limitations of these methods is that they do not accept a large amount of information, and in some of them, only a framework for recommendation is introduced and the training recommender who can automatically suggest useful advice has not been much discussed. Others have used machine learning methods and artificial intelligence and such techniques have shown better efficiency in solving the problem than previous methods [15, 29–34].

One of the best methods to solve the problem is to use the structure of artificial neural networks [35–38]. Following the evolution of neural networks, one of the newest and most complete solutions is the use of deep learning networks. These types of networks can solve problems with high accuracy due to the receptivity of a large amount of problem data, integration of neural networks, learning techniques, and structural dynamics in the formation of several hidden layers. The issue of the educational recommender is no exception to this case; most of the news articles presented in the field of recommendation have used this technique [38, 43, 45]. It can be claimed that deep learning techniques and their hybrid models and big-scale data (bulk data) are the topic and solution of the day in the field of recommenders.

However, in many of these methods, the same weight is considered in learning for all users' interests, and only the user's past information is used in learning. While in the present dissertation, having a network that looks both backward and forwards, it is possible to cover changes in learner behavior and offer more up-to-date recommendations.

Due to the advancement of science and the production of new scientific resources, educational resources are constantly increasing, and the network needs to be trained permanently and gradually. However, in most systems, network learning is based on existing resources and does not look to future resources. As a result, due to the rapid growth of these educational resources, network offers will soon become unusable and outdated. This challenge can also be addressed by graduating from the network training process. However, given the following problems, more work needs to be done in the future to achieve effective recommendations. These problems include 1) incremental learning for non-stationary and current data such as large user volumes and input items, 2) computational efficiency for high-dimensional tensors and multimedia data sources 3) balancing model complexity and scalability despite increasing parameters exponentially. One area of inspiration for research is the acquisition of knowledge, which is addressed in the paper [40] for learning small/intensive models for inference in recommender systems. The main idea is to teach a smaller student model that absorbs knowledge from the larger teacher model.

Given that inference time is critical for real-time applications on a million / billion-user scale, this is another inspiring topic for future research. The next promising issue is to pay attention to compression methods [39]. High-dimensional input data can be compressed to reduce computational space and time during model learning.

LSTM: Long and Short Term Memory; ERS: Educational Recommender System; GRU: Gated Recurrent Unit; MLP: Multi-layer perceptron; DLS: deep learning system; DSP: discriminative sequential pattern; RNN: Recurrent Recommender Network; AROLS: Adaptive Recommendation based on Online Learning Style; CF: collaborative filtering; ARM: Association Rules Mining; OLS: Online Learning Style; CNN: convolutional neural network; RBM: Restricted Boltzmann Machine; OULAD: Open University Learning Analytics dataset;

Conflicts of Interest

The authors declare no conflict of interest.

Acknowledgments

The authors are thankful to anonymous reviewers for their valuable comments and suggestions that helped improving the quality of the paper.

Funding

Not applicable.

Availability of data and materials

Datasets that have been used for experiments in this paper are available at:

https://analyse.kmi.open.ac.uk/open_dataset

Ethics approval and consent to participate

Not applicable

Competing interests

The authors have no competing interests.

Consent for publication

Not applicable

Authors’ contributions

All authors read and approved the final manuscript.

Author Details

¹Department of Computer Engineering, Islamic Azad University, Neyshabur Branch, Neyshabur, Iran.

²Department of Electrical Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.

Maria-Iuliana Dascalu, Educational recommender systems and their application in lifelong learning, Taylor Francis, BEHAVIOUR \& INFORMATION TECHNOLOGY, 2016.
John K. Tarus, Zhendong Niu and Abdallah Yousif, A hybrid knowledge-based recommender system for e-learning based on ontology and sequential pattern mining, Future Generation Computer Systems, 2017.
Durovic, G., Dlab, M.H. and Hoic-Bozic, N., 2018. Educational Recommender Systems: An Overview and Guidelines for Further Research and Development. CROATIAN JOURNAL OF EDUCATION-HRVATSKI CASOPIS ZA ODGOJ I OBRAZOVANJE, 20(2), pp.531-560.
Zhang, S.; Yao, L.; Sun, A. Deep learning based recommender system: A survey and new perspectives. arXiv 2017, arXiv:1707.07435.
Quadrana,M.; Cremonesi, P.; Jannach, D. Sequence-aware recommender systems. arXiv 2018, arXiv:1802.08452.
Ruocco, M.; Skrede, O.S.L.; Langseth, H. Inter-Session Modeling for Session-Based Recommendation. In Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems, Como, Italy, 27–31 August 2017; pp. 24–31.
Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based Recommendations with Recurrent Neural Networks. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016; pp. 1–10.
Tan, Y.K.; Xu, X.; Liu, Y. Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 17–22.
Park, S.E.; Lee, S.; Lee, S.G. Session-based collaborative filtering for predicting the next song. In Proceedings of the IEEE 2011 First ACIS/JNU International Conference on Computers, Networks, Systems and Industrial Engineering (CNSI), Jeju Island, Korea, 23–25 May 2011; pp. 353–358.
Moore, J.L.; Chen, S.; Turnbull, D.; Joachims, T. Taste Over Time: The Temporal Dynamics of User Preferences. In Proceedings of the 14th International Society for Music Information Retrieval Conference, Curitiba, Brazil, 4–8 November 2013; pp. 401–406.
Hu, L.; Cao, L.;Wang, S.; Xu, G.; Cao, J.; Gu, Z. Diversifying personalized recommendation with user-session context. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 1858–1864.
Ludewig,M.; Jannach, D. Evaluation of Session-based Recommendation Algorithms. arXiv 2018, arXiv:1803.09587.
Paula Rodríguez et all, An educational recommender system based on argumentation theory, AI Communications 30 (2017), 19–36.
arning Technology Standards Committee, IEEE Standard for Learning Object Metadata. Institute of Electrical and Electronics Engineers, New York (2002).
Paul Covington, Jay Adams, and Emre Sargin. 2016. Deep neural networks for youtube recommendations. In Recsys. 191–198.
Shuai Zhang, et. all, "Deep Learning based Recommender System: A Survey and New Perspectives", ACM Computing Surveys, Vol. 1, No. 1,2018.
Hector Yago, Julia Clemente, Daniel Rodriguez, ON-SMMILE: Ontology Network-based Student Model for Multiple Learning Environments, Elsevier, Data \& Knowledge Engineering, 2018, 48-67.
Chen Qiao, Xiao Hu, Discovering Student Behavior Patterns from Event Logs: Preliminary Results on ANovel Probabilistic Latent Variable Model, 2018 IEEE 18th International Conference on Advanced Learning Technologies
Stanislav Kuznetsov, Pavel Kord ık, Tomas Reho rek, Josef Dvo r ak and Petr Kroha, Reducing Cold Start Problems in Educational Recommender Systems, 2016 IEEE.
Outmane Bourkoukou and Essaid El Bachari, Toward a Hybrid Recommender System for E-learning Personalization Based on Data Mining Techniques, INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION, VOL 2 ,NO 4, 2018.
R. Felder, L. Silverman, Learning and Teaching Styles in Engineering Eductaion, Engineering Education, 78: 674-684, 1988.
Zameer Golzar, A.Anny Leema, Gerard Deepak, PCRS: Personalized Course Recommender System Based on Hybrid Approach, Elsevier, Procedia Computer Science 125 (2018) 518–524.
Nina Hagemann, Michael P. O'Mahony, and Barry Smyth, Nina Hagemann(B), Michael P. O'Mahony, and Barry Smyth, Springer International Publishing AG, part of Springer Nature 2018, 319–325.
Hsiao-Chien Tseng et all, Building an Online Adaptive Learning and Recommendation Platform, Springer International Publishing AG 2017, 428–432.
Karim Alinani, Annadil Alinani, Xiangyong Liu and Guojun Wang, Heterogeneous educational resource recommender system based on user preferences, Int. J. Autonomous and Adaptive Communications Systems, Vol. 9, Nos. 1/2, 2016.
Outmane Bourkoukou and Omar Achbarou, Weighting Based Approach for Learning Resources Recommendations, INTERNATIONAL JOURNAL ON INFORMATICS VISUALIZATION, VOL 2 ,NO 3, 2018.
Huynh-Ly Thanh-Nhan, Le Huy-Thap, Nguyen Thai-Nghe, Toward integrating social networks into Intelligent Tutoring Systems, 2017 9th International Conference on Knowledge and Systems Engineering(KSE), IEEE.
Kanakarn Pupara, Wongpanya Nuankaew and Pratya Nuankaew, An Institution Recommender System Based on Student Context and Educational Institution in a Mobile Environment, 2016 IEEE.
N. D. Duque Mendez, P. A. Rodríguez Marin and D. A. Ovalle Carranza, Intelligent Personal Assistant for Educational Material Recommendation Based on CBR, Springer International Publishing AG 2018.
Rossillea, D., Laurentc, J., Burguna, A.: Modelling a Decision-Support System for Oncology using Rule-Based and Case-Based Reasoning Methodologies. Int. J. Med. Inform. 74 (2005).
Joaquim Neto, Multi-agent Web Recommender System for Online Educational Environments, Springer International Publishing AG 2018.
Paula Rodríguez, Néstor Duque, and Sara Rodríguez, Integral Multi-agent Model Recommendation of Learning Objects, for Students and Teachers, Springer International Publishing Switzerland 2013, Management Intelligent Systems, AISC 220, pp. 127 – 134.
Paradarami, T.K., Bastian, N.D. and Wightman, J.L., 2017. A hybrid recommender system using artificial neural networks. Expert Systems with Applications, 83, pp.300-313.
Xiao Wang et all, E-Learning Recommendation Framework Based on Deep Learning, 2017 IEEE International Conference on Systems, Man, and Cybernetics
K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation," Computer Science, 2014.
Hao Zhang1, Tao Huang1, Zhihan Lv2, Sanya Liu1 and Heng Yang1, 2018, MOOCRC: A Highly Accurate Resource Recommendation Model for Use in MOOC Environments. Springer, Mobile Networks and Applications.
Maryam M Najafabadi, Flavio Villanustre, Taghi M Khoshgoar, Naeem Seliya, Randall Wald, and Edin Muharemagic. 2015. Deep learning applications and challenges in big data analytics. Journal of Big Data 2, 1 (2015), 1.
Jiaxi Tang and Ke Wang. 2018. Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System. In SIGKDD
Joan Serra` and Alexandros Karatzoglou. 2017. Getting deep recommenders fit: Bloom embeddings for sparse binary input/output networks. In Recsys. 279–287.
Sepp Hochreiter, YoshuaBengio, Paolo Frasconi, J urgenSchmidhuber, Gradient Flow in Recurrent Nets: theDifficulty of Learning Long-TermDependencies-2001
Alex Graves, Generating Sequences WithRecurrent Neural Networks-2014
Jingyuan Chen, Hanwang Zhang, Xiangnan He, LiqiangNie, Wei Liu, and Tat-SengChua.AttentiveCollaborativeFiltering:Multimedia , 2017.
Yu Lu, Deliang Wang, Qinggang Meng, Penghe Chen, Towards Interpretable Deep Learning Models for Knowledge Tracing, International Conference on Artificial Intelligence in Education, pp 185-190, 2020
Oulad: Open university learning analytics dataset. https://www.kaggle.com/vjcalling/oulad-open-university-learning-analytics-dataset.
J. Kuzilek, M. Hlosta, and Z. Zdrahal. Open university learning analytics dataset.Scientific data,4:170171, 2017.
Zhang, H., Huang, T., Zhihan, Lv., Liu, S., and Yang, H. MOOCRC: A Highly Accurate Resource Recommendation Model for Use in MOOC Environments. Springer, Mobile Networks and Applications, Springer ,2018
Allen, D.M., The relationship between variable selection and data agumentation and a method for prediction. technometrics, 1974. 16(1): p. 125-127.
Charnelli, María Emilia. Sistemas recomendadores aplicados en Educación. Diss. Universidad Nacional de La Plata, 2019. 125-127.
Chen, Hui, et al. "Enhanced learning resource recommendation based on online learning style model." Tsinghua Science and Technology 25.3 (2019): 348-356.
Li, Rumei, et al. "Online learning style modeling for course recommendation." Recent Developments in Intelligent Computing, Communication and Devices. Springer, Singapore, 2019. 1035-1042.
Yan, Lingyao, et al. "Learning Resource Recommendation in E-Learning Systems Based on Online Learning Style." International Conference on Knowledge Science, Engineering and Management. Springer, Cham, 2021.
Wang, Jingjing, et al. "Attention-based CNN for personalized course recommendations for MOOC learners." 2020 International Symposium on Educational Technology (ISET). IEEE, 2020.

No competing interests reported.

Dynamic Educational Recommender System Based on Improved LSTM Neural Network

Status:

Journal Publication

Version 1

Abstract

Figures

1. Introduction