[edit]
Neighbor Similarity and Multimodal Alignment based Product Recommendation Study
Proceedings of the Fortieth Conference on Uncertainty in Artificial Intelligence, PMLR 244:4209-4218, 2024.
Abstract
Existing multimodal recommendation research still faces some challenges, such as not being able to fully mine the implicit relevance information of neighbor nodes, and the unreasonable weight allocation to imbalanced nodes. To address the aforementioned challenges, this paper introduces a new multimodal recommendation model called NSMAR+. Specifically, the model firstly constructs a neighbor similarity graph convolutional network to capture the implicit relevance information and reasonably assigns the attention weights through the graph attention mechanism. Secondly, the model introduces a modal alignment and fusion mechanism by using a multilayer perceptron (MLP) to map image and text features into a shared space for comparison and fusion. In addition, the model constructs a user co-interaction graph and an item semantic graph based on the original information and performs graph convolution operations to enhance the preference information of users and items, and to better capture the interactions and internal features between users and items. Finally, MLP is employed to aggregate user and item representations and predict personalized recommendation rankings. To validate the experiment’s efficacy, this paper compares with several leading multimodal recommendation models on public datasets with a performance improvement of 1% to 3%. The experimental outcomes indicate that the model in this paper has good superiority and accuracy in multimodal recommendation tasks.