鏈式代理:通過多代理蒸餾與代理強化學習實現端到端代理基礎模型
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
August 6, 2025
作者: Weizhen Li, Jianbo Lin, Zhuosong Jiang, Jingyi Cao, Xinpeng Liu, Jiayu Zhang, Zhenqiang Huang, Qianben Chen, Weichen Sun, Qiexiang Wang, Hongxuan Lu, Tianrui Qin, Chenghao Zhu, Yi Yao, Shuying Fan, Xiaowan Li, Tiannan Wang, Pai Liu, King Zhu, He Zhu, Dingfeng Shi, Piaohong Wang, Yeyi Guan, Xiangru Tang, Minghao Liu, Yuchen Eleanor Jiang, Jian Yang, Jiaheng Liu, Ge Zhang, Wangchunshu Zhou
cs.AI
摘要
近期,大型語言模型(LLMs)與多代理系統的進展,在深度研究、氛圍編碼及數學推理等複雜問題解決任務中展現了顯著能力。然而,現存的多代理系統大多基於手動提示/工作流工程,並依賴於複雜的代理框架,這使得它們在計算上效率低下、能力有限,且無法從數據中心的學習中受益。本研究中,我們引入了代理鏈(Chain-of-Agents, CoA),這是一種新穎的LLM推理範式,它能夠在單一模型內實現原生端到端的複雜問題解決,其方式與多代理系統(即,利用多種工具與多個代理進行多輪問題解決)相同。在代理鏈問題解決過程中,模型動態激活不同的工具代理與角色扮演代理,以模擬多代理協作,實現端到端的處理。為了激發LLMs中端到端代理鏈問題解決的能力,我們提出了一種多代理蒸餾框架,將頂尖的多代理系統蒸餾成代理鏈軌跡,用於代理監督微調。隨後,我們在可驗證的代理任務上採用代理強化學習,進一步提升模型在代理鏈問題解決上的能力。我們將由此產生的模型稱為代理基礎模型(Agent Foundation Models, AFMs)。我們的實證研究表明,AFM在網絡代理與代碼代理設置的多樣化基準測試中均建立了新的最優性能。我們將整個研究,包括模型權重、訓練與評估代碼以及訓練數據,完全開源,為未來代理模型與代理強化學習的研究提供了堅實的起點。
English
Recent advances in large language models (LLMs) and multi-agent systems have
demonstrated remarkable capabilities in complex problem-solving tasks such as
deep research, vibe coding, and mathematical reasoning. However, most existing
multi-agent systems are built upon manual prompt/workflow engineering with
sophisticated agent frameworks, making them computationally inefficient, less
capable, and can not benefit from data-centric learning. In this work, we
introduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables
native end-to-end complex problem-solving in the same way as a multi-agent
system (i.e., multi-turn problem solving with multiple tools and multiple
agents) within one model. In chain-of-agents problem-solving, the model
dynamically activates different tool agents and role-playing agents to simulate
multi-agent collaboration in an end-to-end fashion. To elicit end-to-end
chain-of-agents problem-solving abilities in LLMs, we introduce a multi-agent
distillation framework to distill state-of-the-art multi-agent systems into
chain-of-agents trajectories for agentic supervised fine-tuning. We then use
agentic reinforcement learning on verifiable agentic tasks to further improve
the models' capabilities on chain-of-agents problem solving. We call the
resulting models Agent Foundation Models (AFMs). Our empirical studies
demonstrate that AFM establishes new state-of-the-art performance across
diverse benchmarks in both web agent and code agent settings. We make the
entire research, including the model weights, code for training and evaluation,
and the training data, fully open-sourced, which offers a solid starting point
for future research on agent models and agentic RL.