精煉對比學習與單應性關係於多模態推薦系統
Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation
August 19, 2025
作者: Shouxing Ma, Yawen Zeng, Shiqing Wu, Guandong Xu
cs.AI
摘要
多模態推薦系統致力於利用物品的豐富模態信息(如圖像和文本描述)來提升推薦性能。當前方法憑藉圖神經網絡強大的結構建模能力已取得顯著成功。然而,這些方法在現實場景中常受制於數據稀疏性。儘管對比學習和同構圖(即同質圖)被用於應對數據稀疏性挑戰,現有方法仍存在兩大主要局限:1)簡單的多模態特徵對比未能生成有效表示,導致模態共享特徵中的噪聲以及模態獨有特徵中有價值信息的丟失;2)對用戶興趣與物品共現之間同構關係的探索不足,導致用戶-物品交互挖掘不完整。
為解決上述局限,我們提出了一種新框架——REfining multi-modAl contRastive learning and hoMography relations(REARM)。具體而言,我們通過引入元網絡和正交約束策略來完善多模態對比學習,這些策略能過濾掉模態共享特徵中的噪聲,並保留模態獨有特徵中與推薦相關的信息。為有效挖掘同質關係,我們將新構建的用戶興趣圖和物品共現圖與現有的用戶共現圖和物品語義圖相結合,進行圖學習。在三個真實世界數據集上的廣泛實驗證明了REARM相較於多種最先進基線方法的優越性。我們的可視化結果進一步展示了REARM在區分模態共享特徵與模態獨有特徵方面的改進。代碼可於此處獲取:https://github.com/MrShouxingMa/REARM。
English
Multi-modal recommender system focuses on utilizing rich modal information (
i.e., images and textual descriptions) of items to improve recommendation
performance. The current methods have achieved remarkable success with the
powerful structure modeling capability of graph neural networks. However, these
methods are often hindered by sparse data in real-world scenarios. Although
contrastive learning and homography ( i.e., homogeneous graphs) are employed to
address the data sparsity challenge, existing methods still suffer two main
limitations: 1) Simple multi-modal feature contrasts fail to produce effective
representations, causing noisy modal-shared features and loss of valuable
information in modal-unique features; 2) The lack of exploration of the
homograph relations between user interests and item co-occurrence results in
incomplete mining of user-item interplay.
To address the above limitations, we propose a novel framework for
REfining multi-modAl contRastive learning
and hoMography relations (REARM). Specifically, we complement
multi-modal contrastive learning by employing meta-network and orthogonal
constraint strategies, which filter out noise in modal-shared features and
retain recommendation-relevant information in modal-unique features. To mine
homogeneous relationships effectively, we integrate a newly constructed user
interest graph and an item co-occurrence graph with the existing user
co-occurrence and item semantic graphs for graph learning. The extensive
experiments on three real-world datasets demonstrate the superiority of REARM
to various state-of-the-art baselines. Our visualization further shows an
improvement made by REARM in distinguishing between modal-shared and
modal-unique features. Code is available
https://github.com/MrShouxingMa/REARM{here}.