Chain-of-Agents: マルチエージェント蒸留とエージェント型強化学習によるエンドツーエンドエージェント基盤モデル

要旨

大規模言語モデル（LLM）とマルチエージェントシステムの最近の進展は、深層研究、バイブコーディング、数学的推論などの複雑な問題解決タスクにおいて顕著な能力を示しています。しかし、既存のマルチエージェントシステムのほとんどは、手動のプロンプト/ワークフローエンジニアリングに基づいて構築されており、洗練されたエージェントフレームワークを使用しているため、計算効率が低く、能力が限定され、データ中心の学習の恩恵を受けることができません。本研究では、Chain-of-Agents（CoA）という新しいLLM推論パラダイムを導入します。これは、マルチエージェントシステム（つまり、複数のツールと複数のエージェントを使用したマルチターン問題解決）と同様の方法で、単一のモデル内でネイティブなエンドツーエンドの複雑な問題解決を可能にします。Chain-of-Agentsの問題解決では、モデルが動的に異なるツールエージェントと役割演技エージェントを活性化し、エンドツーエンドの方法でマルチエージェントの協力をシミュレートします。LLMにエンドツーエンドのChain-of-Agents問題解決能力を引き出すために、我々はマルチエージェント蒸留フレームワークを導入し、最先端のマルチエージェントシステムをChain-of-Agentsの軌跡に蒸留して、エージェント的な教師ありファインチューニングを行います。その後、検証可能なエージェントタスクに対してエージェント的な強化学習を使用し、Chain-of-Agents問題解決におけるモデルの能力をさらに向上させます。この結果得られるモデルをAgent Foundation Models（AFM）と呼びます。我々の実証研究は、AFMがウェブエージェントとコードエージェントの設定において、多様なベンチマークで新たな最先端の性能を確立することを示しています。我々は、モデルの重み、トレーニングと評価のためのコード、トレーニングデータを含む研究全体を完全にオープンソース化し、エージェントモデルとエージェント的強化学習に関する将来の研究のための堅実な出発点を提供します。

English

Recent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computationally inefficient, less capable, and can not benefit from data-centric learning. In this work, we introduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables native end-to-end complex problem-solving in the same way as a multi-agent system (i.e., multi-turn problem solving with multiple tools and multiple agents) within one model. In chain-of-agents problem-solving, the model dynamically activates different tool agents and role-playing agents to simulate multi-agent collaboration in an end-to-end fashion. To elicit end-to-end chain-of-agents problem-solving abilities in LLMs, we introduce a multi-agent distillation framework to distill state-of-the-art multi-agent systems into chain-of-agents trajectories for agentic supervised fine-tuning. We then use agentic reinforcement learning on verifiable agentic tasks to further improve the models' capabilities on chain-of-agents problem solving. We call the resulting models Agent Foundation Models (AFMs). Our empirical studies demonstrate that AFM establishes new state-of-the-art performance across diverse benchmarks in both web agent and code agent settings. We make the entire research, including the model weights, code for training and evaluation, and the training data, fully open-sourced, which offers a solid starting point for future research on agent models and agentic RL.

Chain-of-Agents: マルチエージェント蒸留とエージェント型強化学習によるエンドツーエンドエージェント基盤モデル

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

要旨

Support