Agent-R1:基于端到端强化学习的高性能大语言模型智能体训练
Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning
November 18, 2025
作者: Mingyue Cheng, Jie Ouyang, Shuo Yu, Ruiran Yan, Yucong Luo, Zirui Liu, Daoyu Wang, Qi Liu, Enhong Chen
cs.AI
摘要
大型语言模型(LLMs)正被日益广泛地应用于构建能够通过主动环境交互(如工具调用)解决复杂问题的智能体。强化学习(RL)被视为训练此类智能体的关键技术并展现出巨大潜力,但目前将RL有效应用于LLM智能体仍处于起步阶段且面临显著挑战。这一新兴领域目前缺乏针对LLM智能体场景的深度强化学习方法研究,同时也亟需灵活易扩展的专用训练框架。为推动该领域发展,本文首先通过系统化扩展马尔可夫决策过程(MDP)框架来明确定义LLM智能体的核心组件,进而梳理并阐释适用于LLM智能体的强化学习方法。其次,我们提出了Agent-R1——一个模块化、灵活性高且用户友好的RL驱动型LLM智能体训练框架,该框架可轻松适配不同任务场景与交互环境。我们在多跳问答基准任务上进行了实验,初步验证了所提方法与框架的有效性。
English
Large Language Models (LLMs) are increasingly being explored for building Agents capable of active environmental interaction (e.g., via tool use) to solve complex problems. Reinforcement Learning (RL) is considered a key technology with significant potential for training such Agents; however, the effective application of RL to LLM Agents is still in its nascent stages and faces considerable challenges. Currently, this emerging field lacks in-depth exploration into RL approaches specifically tailored for the LLM Agent context, alongside a scarcity of flexible and easily extensible training frameworks designed for this purpose. To help advance this area, this paper first revisits and clarifies Reinforcement Learning methodologies for LLM Agents by systematically extending the Markov Decision Process (MDP) framework to comprehensively define the key components of an LLM Agent. Secondly, we introduce Agent-R1, a modular, flexible, and user-friendly training framework for RL-based LLM Agents, designed for straightforward adaptation across diverse task scenarios and interactive environments. We conducted experiments on Multihop QA benchmark tasks, providing initial validation for the effectiveness of our proposed methods and framework.