ChatPaper.aiChatPaper

Agent-R1:基于端到端强化学习的强大LLM智能体训练框架

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning

November 18, 2025
作者: Mingyue Cheng, Jie Ouyang, Shuo Yu, Ruiran Yan, Yucong Luo, Zirui Liu, Daoyu Wang, Qi Liu, Enhong Chen
cs.AI

摘要

大型语言模型(LLM)正日益被探索用于构建能够通过主动环境交互(如工具使用)来解决复杂问题的智能体。强化学习(RL)被视为训练此类智能体的关键潜力技术,然而将RL有效应用于LLM智能体仍处于起步阶段且面临显著挑战。当前这一新兴领域缺乏针对LLM智能体场景的深度强化学习方法研究,同时也缺少专为此目标设计的灵活可扩展训练框架。为推进该领域发展,本文首先通过系统化扩展马尔可夫决策过程(MDP)框架来重新审视并厘清LLM智能体的强化学习方法,全面定义LLM智能体的关键组件。其次,我们推出了Agent-R1——一个模块化、灵活且用户友好的RL驱动型LLM智能体训练框架,该框架可轻松适配不同任务场景与交互环境。我们在多跳问答基准任务上开展实验,为所提方法及框架的有效性提供了初步验证。
English
Large Language Models (LLMs) are increasingly being explored for building Agents capable of active environmental interaction (e.g., via tool use) to solve complex problems. Reinforcement Learning (RL) is considered a key technology with significant potential for training such Agents; however, the effective application of RL to LLM Agents is still in its nascent stages and faces considerable challenges. Currently, this emerging field lacks in-depth exploration into RL approaches specifically tailored for the LLM Agent context, alongside a scarcity of flexible and easily extensible training frameworks designed for this purpose. To help advance this area, this paper first revisits and clarifies Reinforcement Learning methodologies for LLM Agents by systematically extending the Markov Decision Process (MDP) framework to comprehensively define the key components of an LLM Agent. Secondly, we introduce Agent-R1, a modular, flexible, and user-friendly training framework for RL-based LLM Agents, designed for straightforward adaptation across diverse task scenarios and interactive environments. We conducted experiments on Multihop QA benchmark tasks, providing initial validation for the effectiveness of our proposed methods and framework.
PDF172December 1, 2025