ChatPaper.aiChatPaper

CM^3:多模态推荐校准系统

CM^3: Calibrating Multimodal Recommendation

August 2, 2025
作者: Xin Zhou, Yongjie Wang, Zhiqi Shen
cs.AI

摘要

对齐性和均匀性是对比学习领域中的基本原则。在推荐系统中,已有研究表明优化贝叶斯个性化排序(BPR)损失有助于实现对齐性和均匀性目标。具体而言,对齐性旨在拉近交互用户和物品的表示,而均匀性则要求用户和物品的嵌入在单位超球面上均匀分布。本研究重新审视了多模态推荐系统中的对齐性和均匀性特性,揭示了现有模型倾向于优先考虑均匀性而牺牲对齐性的趋势。我们的假设挑战了通过均匀性损失实现物品平等对待的传统观念,提出了一种更为细致的方法,即具有相似多模态属性的物品在超球面流形上向邻近表示收敛。具体来说,我们利用物品多模态数据之间的固有相似性来校准其均匀分布,从而在嵌入空间中诱导出更显著的异质实体间的排斥力。理论分析阐明了这种校准后的均匀性损失与传统均匀性函数之间的关系。此外,为了增强多模态特征的融合,我们引入了一种球面贝塞尔方法,旨在整合任意数量的模态,同时确保融合后的特征被约束在同一超球面流形上。在五个真实世界数据集上进行的实证评估证实了我们的方法相较于竞争基线的优越性。我们还展示了所提出的方法通过整合MLLM提取的特征,能够在NDCG@20指标上实现高达5.4%的性能提升。源代码可在以下网址获取:https://github.com/enoche/CM3。
English
Alignment and uniformity are fundamental principles within the domain of contrastive learning. In recommender systems, prior work has established that optimizing the Bayesian Personalized Ranking (BPR) loss contributes to the objectives of alignment and uniformity. Specifically, alignment aims to draw together the representations of interacting users and items, while uniformity mandates a uniform distribution of user and item embeddings across a unit hypersphere. This study revisits the alignment and uniformity properties within the context of multimodal recommender systems, revealing a proclivity among extant models to prioritize uniformity to the detriment of alignment. Our hypothesis challenges the conventional assumption of equitable item treatment through a uniformity loss, proposing a more nuanced approach wherein items with similar multimodal attributes converge toward proximal representations within the hyperspheric manifold. Specifically, we leverage the inherent similarity between items' multimodal data to calibrate their uniformity distribution, thereby inducing a more pronounced repulsive force between dissimilar entities within the embedding space. A theoretical analysis elucidates the relationship between this calibrated uniformity loss and the conventional uniformity function. Moreover, to enhance the fusion of multimodal features, we introduce a Spherical B\'ezier method designed to integrate an arbitrary number of modalities while ensuring that the resulting fused features are constrained to the same hyperspherical manifold. Empirical evaluations conducted on five real-world datasets substantiate the superiority of our approach over competing baselines. We also shown that the proposed methods can achieve up to a 5.4% increase in NDCG@20 performance via the integration of MLLM-extracted features. Source code is available at: https://github.com/enoche/CM3.
PDF11August 8, 2025