**ProRL智能体：面向多轮LLM智能体强化学习的即服务式轨迹推演平台**

摘要

在多轮大语言模型智能体日益成为解决复杂交互任务关键的当下，强化学习（RL）是提升其长程行为表现的核心要素。然而，强化学习训练需要生成大量沙盒化的轨迹推演，而现有基础设施往往将推演编排与训练循环紧耦合，导致系统难以迁移和维护。基于"推演即服务"理念，我们推出ProRL智能体——一个通过API服务提供完整智能体推演生命周期的可扩展基础设施。该平台还提供标准化、可扩展的沙盒环境，支持在无根高性能计算场景下执行多样化智能体任务。我们通过在软件工程、数学、STEM及编程任务上的强化学习训练验证了ProRL智能体的效能。该系统已开源并集成至英伟达NeMo训练平台。

English

Multi-turn LLM agents are increasingly important for solving complex, interactive tasks, and reinforcement learning (RL) is a key ingredient for improving their long-horizon behavior. However, RL training requires generating large numbers of sandboxed rollout trajectories, and existing infrastructures often couple rollout orchestration with the training loop, making systems hard to migrate and maintain. Under the rollout-as-a-service philosophy, we present ProRL Agent , a scalable infrastructure that serves the full agentic rollout lifecycle through an API service. ProRL Agent also provides standardized and extensible sandbox environments that support diverse agentic tasks in rootless HPC settings. We validate ProRL Agent through RL training on software engineering, math, STEM, and coding tasks. ProRL Agent is open-sourced and integrated as part of NVIDIA NeMo Gym.

ProRL智能体：面向多轮LLM智能体强化学习的即服务式轨迹推演平台

ProRL Agent: Rollout-as-a-Service for RL Training of Multi-Turn LLM Agents

摘要

Support