链式智能体：通过多智能体蒸馏与智能体强化学习实现端到端智能体基础模型

摘要

近期，大型语言模型（LLMs）与多智能体系统的突破性进展，在深度研究、氛围编码及数学推理等复杂问题解决任务中展现了非凡能力。然而，现有大多数多智能体系统依赖于手工提示/工作流工程与复杂的智能体框架构建，导致其计算效率低下、能力受限，且难以受益于以数据为中心的学习。本研究中，我们提出了“智能体链”（Chain-of-Agents, CoA），一种创新的LLM推理范式，它能够在单一模型内实现如同多智能体系统般的原生端到端复杂问题解决（即，利用多种工具与多个智能体进行多轮问题求解）。在智能体链问题解决过程中，模型动态激活不同的工具智能体与角色扮演智能体，以端到端方式模拟多智能体协作。为了激发LLMs中的端到端智能体链问题解决能力，我们引入了一个多智能体蒸馏框架，将顶尖多智能体系统蒸馏为智能体链轨迹，用于智能体监督微调。随后，我们在可验证的智能体任务上采用智能体强化学习，进一步提升模型在智能体链问题解决上的能力。我们将由此得到的模型称为“智能体基础模型”（Agent Foundation Models, AFMs）。实证研究表明，AFM在网页智能体与代码智能体设置下的多样化基准测试中均创下了新的性能记录。我们全面开源了包括模型权重、训练与评估代码及训练数据在内的整个研究，为未来智能体模型与智能体强化学习的研究提供了坚实的基础。

English

Recent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computationally inefficient, less capable, and can not benefit from data-centric learning. In this work, we introduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables native end-to-end complex problem-solving in the same way as a multi-agent system (i.e., multi-turn problem solving with multiple tools and multiple agents) within one model. In chain-of-agents problem-solving, the model dynamically activates different tool agents and role-playing agents to simulate multi-agent collaboration in an end-to-end fashion. To elicit end-to-end chain-of-agents problem-solving abilities in LLMs, we introduce a multi-agent distillation framework to distill state-of-the-art multi-agent systems into chain-of-agents trajectories for agentic supervised fine-tuning. We then use agentic reinforcement learning on verifiable agentic tasks to further improve the models' capabilities on chain-of-agents problem solving. We call the resulting models Agent Foundation Models (AFMs). Our empirical studies demonstrate that AFM establishes new state-of-the-art performance across diverse benchmarks in both web agent and code agent settings. We make the entire research, including the model weights, code for training and evaluation, and the training data, fully open-sourced, which offers a solid starting point for future research on agent models and agentic RL.

链式智能体：通过多智能体蒸馏与智能体强化学习实现端到端智能体基础模型

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

摘要

Support