ChatPaper.aiChatPaper

RecGOAT:基于图最优自适应传输的双重语义对齐大语言模型增强多模态推荐系统

RecGOAT: Graph Optimal Adaptive Transport for LLM-Enhanced Multimodal Recommendation with Dual Semantic Alignment

January 31, 2026
作者: Yuecheng Li, Hengwei Ju, Zeyu Song, Wei Yang, Chi Lu, Peng Jiang, Kun Gai
cs.AI

摘要

多模態推薦系統通常整合用戶行為與項目的多模態數據,從而更精準地捕捉用戶偏好。隨著大模型的崛起,多模態推薦日益利用其在語義理解和上下文推理方面的優勢。然而,大模型表徵本質上針對通用語義任務進行優化,而推薦模型則高度依賴稀疏的用戶/項目唯一身份特徵。現有研究忽略了大模型與推薦系統之間的根本表徵差異,導致多模態表徵不相容及推薦性能欠佳。為彌合這一鴻溝,我們提出RecGOAT——一種新穎而簡潔的雙語義對齊框架,用於大語言模型增強的多模態推薦,該框架具備理論保證的對齊能力。RecGOAT首先利用圖注意力網絡,通過建模項目-項目、用戶-項目及用戶-用戶關係,結合用戶/項目的大模型表徵與交互歷史來豐富協同語義。進一步地,我們設計了雙粒度漸進式多模態-身份對齊框架,分別通過跨模態對比學習和最優自適應傳輸實現實例級與分佈級語義對齊。理論上,我們證明該對齊框架衍生的統一表徵具有卓越的語義一致性和完備性。在三個公開基準上的大量實驗表明,RecGOAT實現了最優性能,從實證角度驗證了我們的理論見解。此外,在大規模在線廣告平台上的部署證實了該模型在工業推薦場景中的有效性與可擴展性。代碼已開源於https://github.com/6lyc/RecGOAT-LLM4Rec。
English
Multimodal recommendation systems typically integrates user behavior with multimodal data from items, thereby capturing more accurate user preferences. Concurrently, with the rise of large models (LMs), multimodal recommendation is increasingly leveraging their strengths in semantic understanding and contextual reasoning. However, LM representations are inherently optimized for general semantic tasks, while recommendation models rely heavily on sparse user/item unique identity (ID) features. Existing works overlook the fundamental representational divergence between large models and recommendation systems, resulting in incompatible multimodal representations and suboptimal recommendation performance. To bridge this gap, we propose RecGOAT, a novel yet simple dual semantic alignment framework for LLM-enhanced multimodal recommendation, which offers theoretically guaranteed alignment capability. RecGOAT first employs graph attention networks to enrich collaborative semantics by modeling item-item, user-item, and user-user relationships, leveraging user/item LM representations and interaction history. Furthermore, we design a dual-granularity progressive multimodality-ID alignment framework, which achieves instance-level and distribution-level semantic alignment via cross-modal contrastive learning (CMCL) and optimal adaptive transport (OAT), respectively. Theoretically, we demonstrate that the unified representations derived from our alignment framework exhibit superior semantic consistency and comprehensiveness. Extensive experiments on three public benchmarks show that our RecGOAT achieves state-of-the-art performance, empirically validating our theoretical insights. Additionally, the deployment on a large-scale online advertising platform confirms the model's effectiveness and scalability in industrial recommendation scenarios. Code available at https://github.com/6lyc/RecGOAT-LLM4Rec.
PDF11February 5, 2026