Multimodal Representation and Cross Modal Enhancement for Short Video Recommendation

doi:10.21203/rs.3.rs-4036905/v1

Download PDF

Research Article

Multimodal Representation and Cross Modal Enhancement for Short Video Recommendation

https://doi.org/10.21203/rs.3.rs-4036905/v1

This work is licensed under a CC BY 4.0 License

Version 1

posted

You are reading this latest preprint version

The surge in short video content production on various platforms has marked the emergence of short videos as a new and popular form of media. However, the sheer abundance and complexity of short video data present challenges for effective video recommendation. Short videos encapsulate rich multimodal information across both temporal and spatial dimensions, allowing users to engage with videos in various ways—whether focusing on the content of a particular shot, delving into the storyline, or enjoying the accompanying music. Conventional video recommendation systems typically focus on a singular type of recommended content, providing recommendations for entire videos, which may not fully satisfy the nuanced preferences of users. In multimodal data, each modality contributes specific information to the others, establishing correlations between them. In the context of video data, which combines image, speech, and text data, understanding the relationships among these three media types is crucial for effective multimodal content-based video recommendation. In this paper, we leverage the consistency of multimodal features for understanding multimedia content, aiming to derive a robust representation from the inherent characteristics of short videos. Unlike previous studies that primarily concentrate on a single modality in short video recommendation, our approach capitalizes on the multimodality of short video content and adopts a multimodal recommendation strategy. By extracting and fusing information from multiple modalities, we achieve a more comprehensive short video content analysis, paving the way for our recommendation method.

Information Retrieval and Management

Short video

Recommendation

Multimodal

Cross-modal

Representation

Fusion

The authors declare no competing interests.

Download PDF

Version 1

posted

You are reading this latest preprint version

Multimodal Representation and Cross Modal Enhancement for Short Video Recommendation

Status:

Version 1

Abstract

Full Text

Additional Declarations

Status:

Version 1