CM^3: マルチモーダル推薦のキャリブレーション

要旨

アライメントと均一性は、コントラスティブラーニングの領域における基本的な原則です。レコメンダーシステムにおいて、これまでの研究では、ベイズ的パーソナライズドランキング（BPR）損失を最適化することが、アライメントと均一性の目的に寄与することが示されています。具体的には、アライメントは相互作用するユーザーとアイテムの表現を近づけることを目指し、均一性はユーザーとアイテムの埋め込みが単位超球面上で均一に分布することを要求します。本研究では、マルチモーダルレコメンダーシステムの文脈において、アライメントと均一性の特性を再検討し、既存のモデルが均一性を優先し、アライメントを損なう傾向があることを明らかにしました。私たちの仮説は、均一性損失を通じてアイテムを均等に扱うという従来の前提に挑戦し、類似したマルチモーダル属性を持つアイテムが超球面多様体内で近接した表現に収束する、より微妙なアプローチを提案します。具体的には、アイテムのマルチモーダルデータ間の固有の類似性を活用して、均一性分布を調整し、埋め込み空間内で異なるエンティティ間により顕著な反発力を誘導します。理論分析により、この調整された均一性損失と従来の均一性関数との関係が明らかにされます。さらに、マルチモーダル特徴の融合を強化するために、任意の数のモダリティを統合し、結果として得られる融合特徴が同じ超球面多様体に制約されるように設計された球面ベジェ法を導入します。5つの実世界のデータセットで実施された実証評価により、私たちのアプローチが競合するベースラインを上回ることを裏付けています。また、提案された方法がMLLM抽出特徴を統合することで、NDCG@20性能において最大5.4%の向上を達成できることも示しました。ソースコードは以下で利用可能です: https://github.com/enoche/CM3.

English

Alignment and uniformity are fundamental principles within the domain of contrastive learning. In recommender systems, prior work has established that optimizing the Bayesian Personalized Ranking (BPR) loss contributes to the objectives of alignment and uniformity. Specifically, alignment aims to draw together the representations of interacting users and items, while uniformity mandates a uniform distribution of user and item embeddings across a unit hypersphere. This study revisits the alignment and uniformity properties within the context of multimodal recommender systems, revealing a proclivity among extant models to prioritize uniformity to the detriment of alignment. Our hypothesis challenges the conventional assumption of equitable item treatment through a uniformity loss, proposing a more nuanced approach wherein items with similar multimodal attributes converge toward proximal representations within the hyperspheric manifold. Specifically, we leverage the inherent similarity between items' multimodal data to calibrate their uniformity distribution, thereby inducing a more pronounced repulsive force between dissimilar entities within the embedding space. A theoretical analysis elucidates the relationship between this calibrated uniformity loss and the conventional uniformity function. Moreover, to enhance the fusion of multimodal features, we introduce a Spherical B\'ezier method designed to integrate an arbitrary number of modalities while ensuring that the resulting fused features are constrained to the same hyperspherical manifold. Empirical evaluations conducted on five real-world datasets substantiate the superiority of our approach over competing baselines. We also shown that the proposed methods can achieve up to a 5.4% increase in NDCG@20 performance via the integration of MLLM-extracted features. Source code is available at: https://github.com/enoche/CM3.

CM^3: マルチモーダル推薦のキャリブレーション

CM^3: Calibrating Multimodal Recommendation

要旨

Support