R^2ec：迈向具备推理能力的大型推荐模型

摘要

大型推荐模型通过编码或项目生成，已将大型语言模型（LLMs）扩展为强大的推荐系统，而LLM推理领域的最新突破同步激发了推荐系统中推理能力的探索。当前研究通常将LLMs定位为外部推理模块，以产生辅助思维来增强传统推荐流程。然而，这种解耦设计在显著资源成本和次优联合优化方面存在局限。为解决这些问题，我们提出了\name，一个具备内在推理能力的统一大型推荐模型。首先，我们重新构思模型架构，以促进自回归过程中推理与推荐的交替进行。随后，我们提出了RecPO，一个相应的强化学习框架，该框架在单一策略更新中同时优化\name的推理和推荐能力；RecPO引入了一种融合奖励机制，仅利用推荐标签来模拟推理能力，从而消除了对专门推理注释的依赖。在三个数据集上进行的多种基线实验验证了\name的有效性，显示在Hit@5和NDCG@20上分别相对提升了68.67\%和45.21\%。代码可在https://github.com/YRYangang/RRec获取。

English

Large recommender models have extended LLMs as powerful recommenders via encoding or item generation, and recent breakthroughs in LLM reasoning synchronously motivate the exploration of reasoning in recommendation. Current studies usually position LLMs as external reasoning modules to yield auxiliary thought for augmenting conventional recommendation pipelines. However, such decoupled designs are limited in significant resource cost and suboptimal joint optimization. To address these issues, we propose \name, a unified large recommender model with intrinsic reasoning capabilities. Initially, we reconceptualize the model architecture to facilitate interleaved reasoning and recommendation in the autoregressive process. Subsequently, we propose RecPO, a corresponding reinforcement learning framework that optimizes \name\ both the reasoning and recommendation capabilities simultaneously in a single policy update; RecPO introduces a fused reward scheme that solely leverages recommendation labels to simulate the reasoning capability, eliminating dependency on specialized reasoning annotations. Experiments on three datasets with various baselines verify the effectiveness of \name, showing relative improvements of 68.67\% in Hit@5 and 45.21\% in NDCG@20. Code available at https://github.com/YRYangang/RRec.

R^2ec：迈向具备推理能力的大型推荐模型

R^2ec: Towards Large Recommender Models with Reasoning

摘要

Support