**ProRL智能体：面向多轮LLM智能体强化学习的即服务式推演平台**

摘要

多轮大语言模型智能体在解决复杂交互任务中日益重要，而强化学习是优化其长周期行为的关键要素。然而强化学习训练需生成大量沙盒化轨迹推演数据，现有基础设施常将推演编排与训练循环紧耦合，导致系统难以迁移维护。基于"推演即服务"理念，我们提出ProRL智能体——通过API服务支撑完整智能体推演生命周期的可扩展基础设施。该系统还提供标准化、可扩展的沙盒环境，支持无根高性能计算场景下的多样化智能体任务。我们通过在软件工程、数学、STEM及编程任务上的强化学习训练验证了ProRL智能体效能。该系统已开源并集成至英伟达NeMo训练平台。

English

Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL training requires generating large numbers of sandboxed rollout trajectories, and existing infrastructures often couple rollout orchestration with the training loop, making systems hard to migrate and maintain. Under the rollout-as-a-service philosophy, we present ProRL Agent , a scalable infrastructure that serves the full agentic rollout lifecycle through an API service. ProRL Agent also provides standardized and extensible sandbox environments that support diverse agentic tasks in rootless HPC settings. We validate ProRL Agent through RL training on software engineering, math, STEM, and coding tasks. ProRL Agent is open-sourced and integrated as part of NVIDIA NeMo Gym.