Agent Lightning:使用強化學習訓練任意AI代理
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
August 5, 2025
作者: Xufang Luo, Yuge Zhang, Zhiyuan He, Zilong Wang, Siyun Zhao, Dongsheng Li, Luna K. Qiu, Yuqing Yang
cs.AI
摘要
我們推出Agent Lightning,這是一個靈活且可擴展的框架,旨在為任何AI代理實現基於強化學習(RL)的大型語言模型(LLM)訓練。與現有方法不同,這些方法將RL訓練與代理緊密耦合或依賴於序列拼接與遮罩,Agent Lightning實現了代理執行與訓練的完全解耦,使得能夠無縫整合通過多種方式開發的現有代理(例如,使用LangChain、OpenAI Agents SDK、AutoGen等框架,或從零構建),幾乎無需代碼修改。通過將代理執行建模為馬爾可夫決策過程,我們定義了一個統一的數據接口,並提出了一種分層RL算法——LightningRL,該算法包含一個信用分配模塊,使我們能夠將任何代理生成的軌跡分解為訓練轉換。這使得RL能夠處理複雜的交互邏輯,如多代理場景和動態工作流。在系統設計方面,我們引入了訓練-代理分離架構,並將代理可觀測性框架引入代理運行時,提供了一個標準化的代理微調接口。在文本到SQL、檢索增強生成和數學工具使用任務上的實驗展示了穩定且持續的改進,彰顯了該框架在實際代理訓練與部署中的潛力。
English
We present Agent Lightning, a flexible and extensible framework that enables
Reinforcement Learning (RL)-based training of Large Language Models (LLMs) for
any AI agent. Unlike existing methods that tightly couple RL training with
agent or rely on sequence concatenation with masking, Agent Lightning achieves
complete decoupling between agent execution and training, allowing seamless
integration with existing agents developed via diverse ways (e.g., using
frameworks like LangChain, OpenAI Agents SDK, AutoGen, and building from
scratch) with almost ZERO code modifications. By formulating agent execution as
Markov decision process, we define an unified data interface and propose a
hierarchical RL algorithm, LightningRL, which contains a credit assignment
module, allowing us to decompose trajectories generated by ANY agents into
training transition. This enables RL to handle complex interaction logic, such
as multi-agent scenarios and dynamic workflows. For the system design, we
introduce a Training-Agent Disaggregation architecture, and brings agent
observability frameworks into agent runtime, providing a standardized agent
finetuning interface. Experiments across text-to-SQL, retrieval-augmented
generation, and math tool-use tasks demonstrate stable, continuous
improvements, showcasing the framework's potential for real-world agent
training and deployment.