RecGPT-V2技术报告

摘要

大语言模型（LLMs）在将推荐系统从隐式行为模式匹配转向显式意图推理方面展现出巨大潜力。尽管RecGPT-V1通过融合基于LLM的推理机制，在用户兴趣挖掘和物品标签预测领域成功开创了该范式，但其存在四个根本性局限：（1）多推理路径下的计算低效与认知冗余；（2）固定模板生成中解释多样性的不足；（3）监督学习范式下泛化能力有限；（4）结果导向的单一评估标准难以匹配人类评判基准。为解决这些挑战，我们提出具备四项关键创新的RecGPT-V2。首先，层级化多智能体系统通过协同合作重构意图推理流程，在消除认知重复的同时实现多样化意图覆盖。结合压缩用户行为上下文的混合表征推理技术，我们的框架降低60%的GPU消耗，并将独占召回率从9.39%提升至10.99%。其次，元提示框架动态生成上下文自适应的提示模板，使解释多样性提升7.3%。第三，约束强化学习缓解多奖励冲突，在标签预测和解释接受度上分别实现24.1%和13.0%的提升。第四，智能体即评判官框架将评估分解为多步推理，显著增强人类偏好对齐能力。淘宝在线A/B测试显示关键指标全面提升：点击率提升2.98%，详情页浏览量提升3.71%，交易额提升2.19%，新体验用户占比提升11.46%。RecGPT-V2从技术可行性与商业价值双重维度验证了LLM驱动的意图推理系统的大规模部署能力，为认知探索与工业应用搭建了桥梁。

English

Large language models (LLMs) have demonstrated remarkable potential in transforming recommender systems from implicit behavioral pattern matching to explicit intent reasoning. While RecGPT-V1 successfully pioneered this paradigm by integrating LLM-based reasoning into user interest mining and item tag prediction, it suffers from four fundamental limitations: (1) computational inefficiency and cognitive redundancy across multiple reasoning routes; (2) insufficient explanation diversity in fixed-template generation; (3) limited generalization under supervised learning paradigms; and (4) simplistic outcome-focused evaluation that fails to match human standards. To address these challenges, we present RecGPT-V2 with four key innovations. First, a Hierarchical Multi-Agent System restructures intent reasoning through coordinated collaboration, eliminating cognitive duplication while enabling diverse intent coverage. Combined with Hybrid Representation Inference that compresses user-behavior contexts, our framework reduces GPU consumption by 60% and improves exclusive recall from 9.39% to 10.99%. Second, a Meta-Prompting framework dynamically generates contextually adaptive prompts, improving explanation diversity by +7.3%. Third, constrained reinforcement learning mitigates multi-reward conflicts, achieving +24.1% improvement in tag prediction and +13.0% in explanation acceptance. Fourth, an Agent-as-a-Judge framework decomposes assessment into multi-step reasoning, improving human preference alignment. Online A/B tests on Taobao demonstrate significant improvements: +2.98% CTR, +3.71% IPV, +2.19% TV, and +11.46% NER. RecGPT-V2 establishes both the technical feasibility and commercial viability of deploying LLM-powered intent reasoning at scale, bridging the gap between cognitive exploration and industrial utility.