R^2ec: 推論能力を備えた大規模レコメンダーモデルに向けて

要旨

大規模レコメンダーモデルは、LLM（大規模言語モデル）を強力なレコメンダーとして拡張し、エンコーディングやアイテム生成を通じてその能力を発揮してきました。また、最近のLLMの推論能力におけるブレークスルーは、レコメンデーションにおける推論の探求を同時に促しています。現在の研究では、通常、LLMを外部の推論モジュールとして位置づけ、従来のレコメンデーションパイプラインを強化するための補助的な思考を生成しています。しかし、このような分離された設計は、多大なリソースコストと最適化の不十分さに制限されています。これらの問題に対処するため、我々は\nameを提案します。これは、内在的な推論能力を備えた統一された大規模レコメンダーモデルです。最初に、自己回帰プロセスにおいて推論とレコメンデーションを交互に行うためのモデルアーキテクチャを再構築します。次に、RecPOという対応する強化学習フレームワークを提案します。このフレームワークは、単一のポリシー更新において、\name\の推論能力とレコメンデーション能力を同時に最適化します。RecPOは、推論能力をシミュレートするためにレコメンデーションラベルのみを活用する融合報酬スキームを導入し、専門的な推論アノテーションへの依存を排除します。3つのデータセットにおける様々なベースラインを用いた実験により、\name\の有効性が検証され、Hit@5で68.67%、NDCG@20で45.21%の相対的改善が示されました。コードはhttps://github.com/YRYangang/RRecで公開されています。

English

Large recommender models have extended LLMs as powerful recommenders via encoding or item generation, and recent breakthroughs in LLM reasoning synchronously motivate the exploration of reasoning in recommendation. Current studies usually position LLMs as external reasoning modules to yield auxiliary thought for augmenting conventional recommendation pipelines. However, such decoupled designs are limited in significant resource cost and suboptimal joint optimization. To address these issues, we propose \name, a unified large recommender model with intrinsic reasoning capabilities. Initially, we reconceptualize the model architecture to facilitate interleaved reasoning and recommendation in the autoregressive process. Subsequently, we propose RecPO, a corresponding reinforcement learning framework that optimizes \name\ both the reasoning and recommendation capabilities simultaneously in a single policy update; RecPO introduces a fused reward scheme that solely leverages recommendation labels to simulate the reasoning capability, eliminating dependency on specialized reasoning annotations. Experiments on three datasets with various baselines verify the effectiveness of \name, showing relative improvements of 68.67\% in Hit@5 and 45.21\% in NDCG@20. Code available at https://github.com/YRYangang/RRec.

R^2ec: 推論能力を備えた大規模レコメンダーモデルに向けて

R^2ec: Towards Large Recommender Models with Reasoning

要旨

Support