链式智能体:通过多智能体蒸馏与智能体强化学习实现端到端智能体基础模型
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL
August 6, 2025
作者: Weizhen Li, Jianbo Lin, Zhuosong Jiang, Jingyi Cao, Xinpeng Liu, Jiayu Zhang, Zhenqiang Huang, Qianben Chen, Weichen Sun, Qiexiang Wang, Hongxuan Lu, Tianrui Qin, Chenghao Zhu, Yi Yao, Shuying Fan, Xiaowan Li, Tiannan Wang, Pai Liu, King Zhu, He Zhu, Dingfeng Shi, Piaohong Wang, Yeyi Guan, Xiangru Tang, Minghao Liu, Yuchen Eleanor Jiang, Jian Yang, Jiaheng Liu, Ge Zhang, Wangchunshu Zhou
cs.AI
摘要
近期,大型语言模型(LLMs)与多智能体系统的突破性进展,在深度研究、氛围编码及数学推理等复杂问题解决任务中展现了非凡能力。然而,现有大多数多智能体系统依赖于手工提示/工作流工程与复杂的智能体框架构建,导致其计算效率低下、能力受限,且难以受益于以数据为中心的学习。本研究中,我们提出了“智能体链”(Chain-of-Agents, CoA),一种创新的LLM推理范式,它能够在单一模型内实现如同多智能体系统般的原生端到端复杂问题解决(即,利用多种工具与多个智能体进行多轮问题求解)。在智能体链问题解决过程中,模型动态激活不同的工具智能体与角色扮演智能体,以端到端方式模拟多智能体协作。为了激发LLMs中的端到端智能体链问题解决能力,我们引入了一个多智能体蒸馏框架,将顶尖多智能体系统蒸馏为智能体链轨迹,用于智能体监督微调。随后,我们在可验证的智能体任务上采用智能体强化学习,进一步提升模型在智能体链问题解决上的能力。我们将由此得到的模型称为“智能体基础模型”(Agent Foundation Models, AFMs)。实证研究表明,AFM在网页智能体与代码智能体设置下的多样化基准测试中均创下了新的性能记录。我们全面开源了包括模型权重、训练与评估代码及训练数据在内的整个研究,为未来智能体模型与智能体强化学习的研究提供了坚实的基础。
English
Recent advances in large language models (LLMs) and multi-agent systems have
demonstrated remarkable capabilities in complex problem-solving tasks such as
deep research, vibe coding, and mathematical reasoning. However, most existing
multi-agent systems are built upon manual prompt/workflow engineering with
sophisticated agent frameworks, making them computationally inefficient, less
capable, and can not benefit from data-centric learning. In this work, we
introduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables
native end-to-end complex problem-solving in the same way as a multi-agent
system (i.e., multi-turn problem solving with multiple tools and multiple
agents) within one model. In chain-of-agents problem-solving, the model
dynamically activates different tool agents and role-playing agents to simulate
multi-agent collaboration in an end-to-end fashion. To elicit end-to-end
chain-of-agents problem-solving abilities in LLMs, we introduce a multi-agent
distillation framework to distill state-of-the-art multi-agent systems into
chain-of-agents trajectories for agentic supervised fine-tuning. We then use
agentic reinforcement learning on verifiable agentic tasks to further improve
the models' capabilities on chain-of-agents problem solving. We call the
resulting models Agent Foundation Models (AFMs). Our empirical studies
demonstrate that AFM establishes new state-of-the-art performance across
diverse benchmarks in both web agent and code agent settings. We make the
entire research, including the model weights, code for training and evaluation,
and the training data, fully open-sourced, which offers a solid starting point
for future research on agent models and agentic RL.