Agent Lightning：运用强化学习训练任意AI智能体

摘要

我们推出Agent Lightning，这是一个灵活且可扩展的框架，旨在为任何AI智能体实现基于强化学习（RL）的大型语言模型（LLMs）训练。与现有方法不同，这些方法通常将RL训练与智能体紧密耦合，或依赖于序列拼接与掩码技术，Agent Lightning实现了智能体执行与训练的完全解耦，使得能够无缝集成通过多种方式开发的现有智能体（例如，使用LangChain、OpenAI Agents SDK、AutoGen等框架，或从零构建），几乎无需代码修改。通过将智能体执行建模为马尔可夫决策过程，我们定义了一个统一的数据接口，并提出了一种分层RL算法——LightningRL，该算法包含一个信用分配模块，使我们能够将任何智能体生成的轨迹分解为训练转换。这使得RL能够处理复杂的交互逻辑，如多智能体场景和动态工作流。在系统设计方面，我们引入了训练-智能体分离架构，并将智能体可观测性框架引入智能体运行时，提供了一个标准化的智能体微调接口。在文本到SQL、检索增强生成和数学工具使用任务上的实验，展示了稳定且持续的改进，凸显了该框架在实际智能体训练与部署中的潜力。

English

We present Agent Lightning, a flexible and extensible framework that enables Reinforcement Learning (RL)-based training of Large Language Models (LLMs) for any AI agent. Unlike existing methods that tightly couple RL training with agent or rely on sequence concatenation with masking, Agent Lightning achieves complete decoupling between agent execution and training, allowing seamless integration with existing agents developed via diverse ways (e.g., using frameworks like LangChain, OpenAI Agents SDK, AutoGen, and building from scratch) with almost ZERO code modifications. By formulating agent execution as Markov decision process, we define an unified data interface and propose a hierarchical RL algorithm, LightningRL, which contains a credit assignment module, allowing us to decompose trajectories generated by ANY agents into training transition. This enables RL to handle complex interaction logic, such as multi-agent scenarios and dynamic workflows. For the system design, we introduce a Training-Agent Disaggregation architecture, and brings agent observability frameworks into agent runtime, providing a standardized agent finetuning interface. Experiments across text-to-SQL, retrieval-augmented generation, and math tool-use tasks demonstrate stable, continuous improvements, showcasing the framework's potential for real-world agent training and deployment.

Agent Lightning：运用强化学习训练任意AI智能体

Agent Lightning: Train ANY AI Agents with Reinforcement Learning

摘要

Support