Hybrid Filtering Techniques and Improved SOM for Next-Generation Film Recommendations

doi:10.21203/rs.3.rs-4262900/v1

Download PDF

Research Article

Hybrid Filtering Techniques and Improved SOM for Next-Generation Film Recommendations

https://doi.org/10.21203/rs.3.rs-4262900/v1

This work is licensed under a CC BY 4.0 License

You are reading this latest preprint version

Personalized movie recommendations are becoming increasingly important for increasing user satisfaction and engagement in an era where entertainment content is becoming more digitally connected. This work presents a novel approach to facilitate movie selection through the use of Enhanced Self-Organizing Maps (SOM). SOMs, or unsupervised neural network architectures, are helpful in recommendation systems because of their ability to recognize intricate patterns in data. The methodology described in this paper starts with gathering user-movie interaction data, such as user ratings and movie characteristics. This data is normalized before being used to train the model in order to preserve homogeneity. Subtle patterns in data can be effectively identified by the Enhanced SOM due to its adjustable learning rate and neighborhood function. The Enhanced SOM's capacity to identify related users and movies is used to generate personalized movie suggestions. The framework's incorporation of hybrid filtering approaches improves the caliber of recommendations. While content-based filtering uses movie attributes like genres and descriptions, collaborative filtering algorithms take advantage of user-item interactions. These methods yield recommendations that synergistically combine multiple filtering processes. The efficacy of the proposed resolution is meticulously evaluated by contrasting the precision of the recommendations and user contentment with pre-established standards. A thorough evaluation of real-world datasets supports the efficacy of the Enhanced SOM-based movie recommendation technique. To further enhance suggestion quality, the system offers options to modify grid sizes, neighborhood functions, and parameters. When taken as a whole, these features demonstrate how successful the suggested approach is in making personalized movie recommendations. Enhanced SOMs are a reliable model for content platforms that want to enhance user experiences by offering precise movie recommendations along with scalability and flexibility when paired with hybrid filtering techniques.

Movie Recommendation

Enhanced SOM

Personalization

Hybrid Filtering

User Engagement

With the advent of digital entertainment, consumers now have access to vast movie and TV show libraries, which has significantly expanded their options. Because of its expansion, selecting the next viewing experience is both an exciting and challenging task. Recommendation engines play a crucial role in solving this issue since they point users toward content that aligns with their interests. In particular, the use of movie recommendation algorithms has significantly increased as a result of movies' increasing popularity. This study explores the topic of movie recommendation and proposes a novel strategy that utilizes Enhanced Self-Organizing Maps (SOM) to increase the customisation and accuracy of movie recommendations. Effective recommendation algorithms are more important as digital libraries, streaming services, and video-on-demand platforms increase and change the way people consume entertainment. Consumers want recommendations for movies that offer a range of options in addition to being tailored to their own tastes. Several factors, such as user demographics and cultural backgrounds, the constantly shifting content environment, and the subjective nature of movie preferences, make movie suggestion a challenging endeavor. While typical recommendation techniques such as content-based and collaborative filtering have made some headway in addressing these problems, they often fall short in capturing the intricate relationships seen in movie collections. Our proposed approach overcomes these limitations by leveraging Enhanced Self-Organizing Maps (SOM), which are widely recognized for their effectiveness in detecting complex patterns in data. SOMs have applications in many domains, such as feature mapping and data clustering, and show promise in recommendation systems because of their versatility and potential for unsupervised learning. An important part of our approach is using Enhanced SOMs to provide a topological representation of user preferences and movie attributes. The SOM is able to recognize recurrent themes and patterns in the videos by arranging them in a high-dimensional context. These patterns go beyond conventional genre- or content-based correlations to incorporate subtle variations in user preferences. Enhanced SOMs enhance their dynamic representation of complex inputs during training by utilizing neighborhood functions and adjustable learning rates. A few critical steps in our method are made clearer by our research. The initial step in the data collection procedure is gathering user ratings and movie characteristics. Next, standardization is carried out to ensure that rating scales are consistent with one another.By integrating hybrid filtering approaches with Enhanced SOMs, movie recommendations become more diversified and accurate, hence improving the viewer's enjoyment experience. Our approach serves as a guide in the ever-evolving world of digital entertainment, assisting users in discovering films that align with their interests and preferences.

The field of recommender systems has experienced substantial development in reaction to the constantly growing array of digital content that consumers can access. A subset of this topic called movie recommendation seeks to provide consumers with interesting and personalized movie recommendations based on their past interactions and preferences. The present literature review offers an extensive amalgamation of previous studies conducted in the domain of recommender systems, with a specific emphasis on movie suggestion. It also functions as an advance towards the presentation of a novel methodology that employs Enhanced Self-Organizing Maps (SOM).

The Situation of Recommender Systems Right Now

Alyari and Navimipour (2018) offer a thorough analysis of the most recent advances in recommender system technology. Their study emphasizes both content-based and collaborative techniques, providing insightful information on the various methodologies used. It emphasizes the need for further investigation and progress in the area, laying the foundation for innovative approaches like the one this study suggests.

Table 1

Comparison of Literature Review
Paper	Year	Focus	Approach
Alyari, F. et al.[1]	2018	Recommender systems	Systematic review and suggestions
Caro-Martinez, M. et al.[2]	2018	Explanations in recommender systems	Theoretical model of explanations
Gupta, S. [3]	2020	Literature review on recommendation systems	Review of various approaches
Abdulla, G.M. et al. [4]	2017	Size recommendation system	Focus on size-based recommendations
Aggarwal, C.C.[5]	2016	Introduction to recommender systems	Basics of recommender systems
Ghazanfar, M.A. et al. [6]	2010	Scalable, accurate hybrid recommender system	Proposal of a hybrid system
Deldjoo, Y. et al.[7]	2016	Content-based video recommendation	Focus on stylistic visual features
Alamdari, P.M. et al.[8]	2020	Recommender systems in e-commerce	A systematic study on e-commerce systems
Cami, B.R. et al.[9]	2017	Content-based movie recommender system	Temporal user preferences in movies
Beniwal, R. et al.[10]	2021	Hybrid recommender system	Using artificial bee colony optimization
Çano, E. et al.[11]	2017	Hybrid recommender systems	A systematic literature review
Schafer, J.B. et al.[12]	2001	E-commerce recommendation applications	Focus on e-commerce recommendations
Shen, J. et al.[13]	2020	Collaborative filtering-based recommendation	Recommendation system for big data
Dakhel, G.M. et al.[14]	2011	Collaborative filtering algorithm	Using K-means clustering and voting
Katarya, R. et al.[15]	2017	Effective collaborative movie recommender	Using cuckoo search
Kumar, B. et al.[16]	2016	Approaches, issues, and challenges	A systematic review of recommender systems

The table that is provided provides a brief synopsis of the scholarly articles that were referenced in the literature study. It lists the publications' years of publication, primary subjects, and methodology. This tabular structure makes it easier to understand the contextual importance of each study with respect to the future research project that will use Enhanced Self-Organizing Maps for movie recommendation.

Initialization:

a. Start by setting up a matrix R to capture user-movie interactions, where each entry Rij represents the rating of user i for movie j.

b. Then, initialize the Self-Organizing Map (SOM) grid along with its weights.

Training SOM:

c. Normalize the ratings matrix R to ensure ratings fall within the range of 0 to 1.

d. Begin by setting up the SOM weights with small random values.

e. Specify key parameters:

i. Learning rate (η): This governs the extent of weight adjustments during training. Typically, it starts high and diminishes over time.

ii. Neighborhood function (h): This function determines how neighboring units influence weight updates.

For each user u:

f. Compute the preference vector Pu based on user ratings. This vector encapsulates the user's preferences across all movies.

During each training epoch:

g. Iterate through the training epochs, representing passes over the dataset.

h. For each user's preference vector Pu:

i. Identify the Best Matching Unit (BMU) within the SOM that closely aligns with the user's preferences.

ii. Update the weights of the BMU and its neighboring units based on the user's preferences. This update operation incorporates the learning rate and the neighborhood function.

Recommendation Generation:

i. For every user u:

i. Recalculate the preference vector Pu based on the user's ratings.

ii. Determine the BMU within the SOM corresponding to the user's updated preferences.

iii. Retrieve movies associated with the BMU, which serve as personalized recommendations for the user.

Mathematical Formulas:

j. Express the mathematical formulas employed in the algorithm:

i. Learning rate (η): Usually adjusted over time to modulate the learning pace.

ii. Neighborhood function (ℎh): Specifies the extent of influence exerted by neighboring units during weight updates.

iii. BMU calculation: Identifies the unit in the SOM that best matches the user's preferences by assessing the Euclidean distance between the user's preference vector and the SOM's weight vectors.

In summary, this algorithm initializes a SOM, trains it using user ratings data, and subsequently utilizes the trained SOM to generate personalized movie recommendations tailored to each user's preferences.

A. Dataset and Features

An analysis of the recommended approach, its performance measures, and results is presented in this section. The hybrid system that is being presented here was created especially for movie recommendations using Tensorflow in Python. The Movielens databases provided a wealth of information that was used in the inquiry. The collection comprises ratings for 62,000 films and 1 million tags from 162,000 individuals, for a total of 25 million entries. It also includes tag genome data with 15 million relevance ratings and 1,129 tags. This dataset, dubbed "ml-25m," combines free-text labeling tasks and 5-star ratings taken from the MovieLens recommendation engine. From January 9, 1995, to November 21, 2019, 162,541 users contributed 1,093,360 tag applications, 62,423 movies, and 25,000,095 ratings to the collection. On November 21, 2019, a random selection of individuals who had each rated at least 20 different movies was used to build the dataset. Nevertheless, demographic information is missing, and users are only identified by their ID. The MovieLens databases provide extensive information about users, movies, ratings, and tags. The proposed model was trained on 75% of the training data and tested on the remaining 25% after the dataset was divided into training and testing sets. The hybrid system-based machine learning approach uses the dataset's information in both stages to recommend movies. User and movie ratings are graphically represented in Figs. 1 and 2, which are important for recommendation generation. The results pertaining to user and movie ratings are presented in Figs. 1 and 2 below, along with suggestions.

November, the eleventh month of the year, has the largest average number of ratings (about 70,000), followed by August (around 50,000) and December (almost 30,000), as can be seen in Fig. 2. However, it's crucial to remember that these numbers are estimates because there isn't much data for 2003—just two months' worth—available. Let's now examine the thorough evaluation of the overall number of ratings received every month for the entire year.

There is a discernible increase in ratings, as seen in Fig. 3, especially in November 2000. As the graph illustrates, this surge—which accounts for nearly 90% of the total ratings—has a major impact on the high average ratings that are seen in November, August, and December. That being said, a negative tendency becomes evident in 2001 and 2002. With only two months' worth of data available for 2003, it is difficult to draw firm conclusions from this pattern.

From here on out, we'll be looking at how different rating levels are distributed in order to learn more about how users generally rate things.

Approximately 350,000 instances of the rating value 4 have been seen, making it the most often observed rating, according to the statistics shown in Fig. 4. In our one million rating dataset, around 35% of the ratings were 4 stars, while approximately 26% and 21% of the ratings were 3 and 5 stars, respectively. It is significant to note that these estimations may include some mistakes due to the limited data available for 2003. However, it is evident that a considerable proportion of users typically provide ratings of four or above.

Every year in the dataset can be subjected to comparable analysis.

Comparable distributions are shown for each year period in Fig. 5. The analysis of the monthly distribution of rating values may then be undertaken.

A distribution resembling that seen in the annual and comprehensive graphs may be seen in Fig. 6. All categories combined have an average rating of about 3.6. After that, we'll examine the ratings' temporal fluctuations and use standard deviation-based techniques to depict the lower and upper bounds.

It is clear that throughout time, the average rating has continuously fluctuated between 3 and 4.

Figure 7 shows that in the late 1990s, there was a lot of interest in the comedy and film noir genres.

Figure 8 illustrates that the Film-Noir and Horror categories consistently demonstrate high and low average ratings, respectively, with occasional extreme values. Following this observation, a density plot depicting ratings by genre will be generated.

In Fig. 9, it is evident that all genres exhibit a left-skewed distribution (with an approximate mean of 3.5), except for the Horror genre, which is characterized by low ratings.

The matrix output generated for movie recommendations is displayed in Fig. 10. In this case, each column represents a different movie ID, and each row represents a different user ID. This matrix, which was created using the planned hybrid system, shows suggestions for the users' preferred films. For each user, a certain movie's recommendation score is represented as a cell in the matrix.

The results of the movie recommendations are shown in Fig. 11. These suggestions are developed by combining conclusions or approximations obtained by combining the hybrid approach with K-means + + and IKSOM. The final product could include reviews, release years, movie names, and other pertinent details.

A variety of performance criteria were employed in order to assess the efficacy of the suggested methodology. These measurements include precision, accuracy, recall, mean absolute error, F1-score, and root mean square error. A detailed description of every performance metric is given below.

The performance metrics for Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are shown in Fig. 12. Using content-driven K-nearest neighbors (KNN) with cosine similarity, the hybrid approach shown in this work addresses the cold start issue and yields an RMSE of 0.410 and an MAE of 0.256. This technique makes rating predictions possible even in situations with limited data. Consequently, the suggested hybrid strategy has the ability to lower errors in RMSE and MAE by resolving issues with data sparsity and cold start.

The approaches outlined below were used to evaluate the suggested hybrid technique's accuracy, recall, and F1-score.

$$Precision=\left(relevent movies recommended \right)/\left(all movie recommended\right)$$

$$recall=\left(relevent movies recommended\right)/\left(all possible movies\right)$$

$$F1-score=2.(precision.recall)/(precision+recall)\dots ...\left(3\right)$$

The performance metrics for the suggested method are shown in Fig. 13. The proposed hybrid technique, which makes use of content-driven K-nearest neighbors (KNN), IKSOM, and K Means + + clustering, produced impressive results: 91.41% accuracy, 93.09% precision, 93.82% recall, and a 92.44% F1-score. This approach successfully addressed issues with data sparsity and scalability that are frequently present in movie recommendations. A review of the movie recommendations' quality included determining the accuracy, recall, and Root Mean Squared Error (RMSE). The hybrid system, as shown in the picture, attained an RMSE of 0.4140, compared to the prior system's RMSE values of 1.151 for user KNN, 1.044 for item KNN, 1.157 for Slope One, 1.143 for co-clustering, and 1.136 for NMF.A detailed comparison between the suggested approach and some of the techniques currently in use is shown in Table 2.

Table 2

COMPARISON OF PROPOSED METHOD WITH OTHER EXISTING METHOD.
Methods	RMSE	MAE
User based KNN [2]	1.151	0.889
Item based KNN [6]	1.044	0.819
Slope one [8]	1.157	0.893
CO-Clustering [10]	1.143	0.893
Non-Negative Matrix-Factorization-based approach (NMF) [12]	1.136	0.894
Proposed Hybrid method	0.410	0.256

The aforementioned study comes to the conclusion that current content-based filtering algorithms are unsuitable since they are laborious and inefficient when used on a variety of datasets. As a result, a hybrid strategy that combines content screening and collaborative techniques is suggested as a fix. It has been shown that the best approach for rating prediction combines matrix factorization with nearest neighbor selection. Cosine similarity is used to evaluate user similarity based on their rated films and recommend movies with similar users, hence resolving the cold-start issue. In the task of grading recommender systems, the hybrid system provided performs better than previous approaches in both the overall and cold start scenarios across significant evaluation criteria such as RMSE and MAE. High anticipated accuracy is the outcome of the hybrid system's capacity to learn complicated knowledge successfully. Therefore, the suggested hybrid system maintains excellent accuracy, precision, recall, and F1-Score while achieving low RMSE and MAE error rates. The suggested hybrid model offers recommendations for the top N films in the recommendation system that are more accurate than the present method.

This study proposed a hybrid movie recommendation system based on the Movielens_25M dataset. Initially, the Improved Singular Value Decomposition (SVD) matrix factorization technique was used to reduce the dataset's dimensions and features. After that, the user ratings, release year, and description were used to determine the degree of film resemblance using the Content-Driven K-nearest neighbors (KNN) method. Later, the IKSOM method was presented to assess the overlap between reduced clusters of nearest neighbors using the EISEN cosine correlation distance. The silhouette clustering method was used to estimate the ideal number of clusters for organizing movie data. After that, the hybrid methodology was used to create user and movie matrices and calculate scores. The best movie recommendations were then generated by using SVD collaborative filtering, which predicted item ratings and arranged the items in the appropriate order. With an RMSE of 0.410 and an MAE of 0.256, the suggested hybrid system outperformed the existing one in terms of error measures. With precision, recall, and F1-score values of 93.09%, 93.82%, and 92.44%, respectively, it also showed great accuracy. As a result, the aforementioned hybrid technique produces better results and provides customers with accurate movie recommendations. Future iterations of this system might offer even more customized movie recommendations based on mood, location, and time of day in addition to user ratings, viewing preferences, and viewing histories.

Data Availability

The information supporting the conclusions of this research is obtainable from the corresponding author upon request via email at [email protected].

Conflicts of Interest

The authors assert that they do not have any conflicts of interest to disclose with respect to the current research.

Funding

This independent research endeavor was undertaken by the authors without the receipt of any external funding or financial support.

Authors’ Contributions

Saurabh Sharma actively participated in the methodology, supervision, writing review and editing, research administration, visualization, investigation, formal analysis, and contributed substantially to the proofreading of the research article. Dr. Ghanshyam Prasad Dubey significantly contributed to various aspects, including data analysis, methodology, software, investigation, resource management, and writing the original draft. Dr. Harish Kumar Shakya, Dr. Deepak Motwani played a pivotal role in overseeing, guiding, conceptualizing, and collecting references for this research endeavor.

Alyari, F.; Navimipour, N.J. Recommender systems: A systematic review of the state of the art literature and suggestions for future research. Kybernetes 2018, 47, 985. [Google Scholar] [CrossRef]
Caro-Martinez, M.; Jimenez-Diaz, G.; Recio-Garcia, J.A. A theoretical model of explanations in recommender systems. In Proceedings of the ICCBR, Stockholm, Sweden, 9–12 July 2018. [Google Scholar]
Gupta, S. A Literature Review on Recommendation Systems. Int. Res. J. Eng. Technol. 2020, 7, 3600–3605. [Google Scholar]
Abdulla, G.M.; Borar, S. Size recommendation system for fashion e-commerce. In Proceedings of the KDD Workshop on Machine Learning Meets Fashion, Halifax, NS, Canada, 14 August 2017. [Google Scholar]
Aggarwal, C.C. An Introduction to Recommender Systems. In Recommender Systems; Springer: Berlin/Heidelberg, Germany, 2016; pp. 1–28. [Google Scholar] [CrossRef]
Ghazanfar, M.A.; Prugel-Bennett, A. A scalable, accurate hybrid recommender system. In Proceedings of the 2010 Third International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 9–10 January 2010. [Google Scholar]
Deldjoo, Y.; Elahi, M.; Cremonesi, P.; Garzotto, F.; Piazzolla, P.; Quadrana, M. Content-Based Video Recommendation System Based on Stylistic Visual Features. J. Data Semant. 2016, 5, 99–113. [Google Scholar] [CrossRef][Green Version]
Alamdari, P.M.; Navimipour, N.J.; Hosseinzadeh, M.; Safaei, A.A.; Darwesh, A. A Systematic Study on the Recommender Systems in the E-Commerce. IEEE Access 2020, 8, 115694–115716. [Google Scholar] [CrossRef]
Cami, B.R.; Hassanpour, H.; Mashayekhi, H. A content-based movie recommender system based on temporal user preferences. In Proceedings of the 2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS), Shahrood, Iran, 20–21 December 2017. [Google Scholar]
Beniwal, R.; Debnath, K.; Jha, D.; Singh, M. Hybrid Recommender System Using Artificial Bee Colony Based on Graph Database. In Data Analytics and Management; Springer: Berlin/Heidelberg, Germany, 2021; pp. 687–699. [Google Scholar] [CrossRef]
Çano, E.; Morisio, M. Hybrid recommender systems: A systematic literature review. Intell. Data Anal. 2017, 21, 1487–1524. [Google Scholar] [CrossRef][Green Version]
Schafer, J.B.; Konstan, J.A.; Riedl, J. E-commerce recommendation applications. Data Min. Knowl. Discov. 2001, 5, 115–153. [Google Scholar] [CrossRef]
Shen, J.; Zhou, T.; Chen, L. Collaborative filtering-based recommendation system for big data. Int. J. Comput. Sci. Eng. 2020, 21, 219–225. [Google Scholar] [CrossRef]
Dakhel, G.M.; Mahdavi, M. A new collaborative filtering algorithm using K-means clustering and neighbors’ voting. In Proceedings of the 11th International Conference on Hybrid Intelligent Systems (HIS), Malacca, Malaysia, 5–8 December 2011; pp. 179–184. [Google Scholar] [CrossRef]
Katarya, R.; Verma, O.P. An effective collaborative movie recommender system with cuckoo search. Egypt. Inform. J. 2017, 18, 105–112. [Google Scholar] [CrossRef][Green Version]
Kumar, B.; Sharma, N. Approaches, Issues and Challenges in Recommender Systems: A Systematic Review. Indian J. Sci. Technol. 2016, 9, 1–12. [Google Scholar] [CrossRef]
Sharma S, Shakya HK. Hybrid Real-Time Implicit Feedback SOM-Based Movie Recommendation Systems. InInternational Conference on Computing, Communications, and Cyber-Security 2022 Oct 21 (pp. 371-388). Singapore: Springer Nature Singapore.
Sharma S, Shakya HK. Recommendation System for Movies Using Improved version of SOM with Hybrid Filtering Methods. In2023 6th International Conference on Information Systems and Computer Networks (ISCON) 2023 Mar 3 (pp. 1-7). IEEE.
Sharma S, Shakya HK. Recommendation Systems for a Group of Users Which Recommend Recent Attention: Using Hybrid Recommendation Model. InInternational Conference on Advanced Communication and Intelligent Systems 2022 Oct 21 (pp. 659-672). Cham: Springer Nature Switzerland.
Sharma, S., Shakya, H.K. and Marriboyina, V., 2021. A location based novel recommender framework of user interest through data categorization. Materials Today: Proceedings, 47, pp.7155-7161.
Sharma, S., Shakya, H.K. and Mishra, A., 2022. 11 Study and Estimation of Existing Software Quality. Multi-Criteria Decision Models in Software Reliability: Methods and Applications, p.235.
Sharma S, Mishra A, Shakya HK. A Time-Variant Software Stability Model for Error Detection. InMulti-Criteria Decision Models in Software Reliability 2022 Nov 30 (pp. 277-290). CRC Press.
Sharma S, Shakya HK. Hybrid Real-Time Implicit Feedback SOM-Based Movie Recommendation Systems. In International Conference on Computing, Communications, and Cyber-Security 2022 Oct 21 (pp. 371-388). Singapore: Springer Nature Singapore.

No competing interests reported.

Download PDF

Reviews received at journal
23 Jul, 2024
Reviewers agreed at journal
23 Jul, 2024
Reviewers invited by journal
23 Jul, 2024
Editor assigned by journal
02 Jul, 2024
Submission checks completed at journal
01 Jul, 2024
First submitted to journal
13 Apr, 2024

You are reading this latest preprint version

Hybrid Filtering Techniques and Improved SOM for Next-Generation Film Recommendations

Status:

Version 1

Abstract

Figures

I. Introduction

II. Literature Review

III. The proposed methodology for building a Movie Recommendation System

IV. Methodology & Result

V. Conclusion

Declarations

References

Additional Declarations

Status:

Version 1