精进对比学习与同态关系在多模态推荐中的应用
Refining Contrastive Learning and Homography Relations for Multi-Modal Recommendation
August 19, 2025
作者: Shouxing Ma, Yawen Zeng, Shiqing Wu, Guandong Xu
cs.AI
摘要
多模态推荐系统致力于利用物品的丰富模态信息(如图像和文本描述)来提升推荐性能。当前方法凭借图神经网络强大的结构建模能力已取得显著成功。然而,这些方法在实际场景中常受限于数据稀疏问题。尽管对比学习和同构图(即同质图)被用来应对数据稀疏挑战,现有方法仍存在两大局限:1)简单的多模态特征对比未能生成有效表示,导致模态共享特征中的噪声以及模态独有特征中有价值信息的丢失;2)对用户兴趣与物品共现之间同构关系探索的不足,使得用户-物品交互的挖掘不够全面。
针对上述局限,我们提出了一种新颖的框架——REfining multi-modAl contRastive learning and hoMography relations(REARM)。具体而言,我们通过引入元网络和正交约束策略来完善多模态对比学习,这些策略能够滤除模态共享特征中的噪声,并保留模态独有特征中与推荐相关的信息。为了有效挖掘同质关系,我们将新构建的用户兴趣图和物品共现图与现有的用户共现图及物品语义图相结合,用于图学习。在三个真实世界数据集上的广泛实验表明,REARM相较于多种最先进的基线方法具有显著优势。我们的可视化结果进一步展示了REARM在区分模态共享与模态独有特征方面的改进。代码可在此处获取:https://github.com/MrShouxingMa/REARM。
English
Multi-modal recommender system focuses on utilizing rich modal information (
i.e., images and textual descriptions) of items to improve recommendation
performance. The current methods have achieved remarkable success with the
powerful structure modeling capability of graph neural networks. However, these
methods are often hindered by sparse data in real-world scenarios. Although
contrastive learning and homography ( i.e., homogeneous graphs) are employed to
address the data sparsity challenge, existing methods still suffer two main
limitations: 1) Simple multi-modal feature contrasts fail to produce effective
representations, causing noisy modal-shared features and loss of valuable
information in modal-unique features; 2) The lack of exploration of the
homograph relations between user interests and item co-occurrence results in
incomplete mining of user-item interplay.
To address the above limitations, we propose a novel framework for
REfining multi-modAl contRastive learning
and hoMography relations (REARM). Specifically, we complement
multi-modal contrastive learning by employing meta-network and orthogonal
constraint strategies, which filter out noise in modal-shared features and
retain recommendation-relevant information in modal-unique features. To mine
homogeneous relationships effectively, we integrate a newly constructed user
interest graph and an item co-occurrence graph with the existing user
co-occurrence and item semantic graphs for graph learning. The extensive
experiments on three real-world datasets demonstrate the superiority of REARM
to various state-of-the-art baselines. Our visualization further shows an
improvement made by REARM in distinguishing between modal-shared and
modal-unique features. Code is available
https://github.com/MrShouxingMa/REARM{here}.