체인-오브-에이전트: 다중 에이전트 증류와 에이전트 강화학습을 통한 종단간 에이전트 기초 모델

초록

최근 대규모 언어 모델(LLM)과 다중 에이전트 시스템의 발전은 심층 연구, 바이브 코딩, 수학적 추론과 같은 복잡한 문제 해결 과제에서 놀라운 능력을 보여주고 있습니다. 그러나 대부분의 기존 다중 에이전트 시스템은 정교한 에이전트 프레임워크를 기반으로 수동 프롬프트/워크플로 엔지니어링을 통해 구축되어 계산적으로 비효율적이고, 능력이 제한적이며, 데이터 중심 학습의 이점을 얻을 수 없습니다. 본 연구에서는 하나의 모델 내에서 다중 에이전트 시스템(즉, 다중 도구와 다중 에이전트를 사용한 다중 턴 문제 해결)과 동일한 방식으로 네이티브 엔드투엔드 복잡 문제 해결을 가능하게 하는 새로운 LLM 추론 패러다임인 Chain-of-Agents(CoA)를 소개합니다. Chain-of-Agents 문제 해결에서 모델은 동적으로 다양한 도구 에이전트와 역할 수행 에이전트를 활성화하여 다중 에이전트 협업을 엔드투엔드 방식으로 시뮬레이션합니다. LLM에서 엔드투엔드 Chain-of-Agents 문제 해결 능력을 유도하기 위해, 우리는 최첨단 다중 에이전트 시스템을 Chain-of-Agents 트레이젝토리로 증류하여 에이전트 지도 미세 조정을 위한 다중 에이전트 증류 프레임워크를 도입했습니다. 그런 다음 검증 가능한 에이전트 작업에 대해 에이전트 강화 학습을 사용하여 Chain-of-Agents 문제 해결에서 모델의 능력을 더욱 향상시켰습니다. 우리는 이를 통해 얻은 모델을 에이전트 기반 모델(AFM)이라고 명명했습니다. 우리의 실험 연구는 AFM이 웹 에이전트 및 코드 에이전트 설정에서 다양한 벤치마크에서 새로운 최첨단 성능을 달성함을 보여줍니다. 우리는 모델 가중치, 훈련 및 평가 코드, 훈련 데이터를 포함한 전체 연구를 완전히 오픈소스로 공개하여, 에이전트 모델 및 에이전트 강화 학습에 대한 미래 연구를 위한 견고한 출발점을 제공합니다.

English

Recent advances in large language models (LLMs) and multi-agent systems have demonstrated remarkable capabilities in complex problem-solving tasks such as deep research, vibe coding, and mathematical reasoning. However, most existing multi-agent systems are built upon manual prompt/workflow engineering with sophisticated agent frameworks, making them computationally inefficient, less capable, and can not benefit from data-centric learning. In this work, we introduce Chain-of-Agents (CoA), a novel paradigm of LLM reasoning that enables native end-to-end complex problem-solving in the same way as a multi-agent system (i.e., multi-turn problem solving with multiple tools and multiple agents) within one model. In chain-of-agents problem-solving, the model dynamically activates different tool agents and role-playing agents to simulate multi-agent collaboration in an end-to-end fashion. To elicit end-to-end chain-of-agents problem-solving abilities in LLMs, we introduce a multi-agent distillation framework to distill state-of-the-art multi-agent systems into chain-of-agents trajectories for agentic supervised fine-tuning. We then use agentic reinforcement learning on verifiable agentic tasks to further improve the models' capabilities on chain-of-agents problem solving. We call the resulting models Agent Foundation Models (AFMs). Our empirical studies demonstrate that AFM establishes new state-of-the-art performance across diverse benchmarks in both web agent and code agent settings. We make the entire research, including the model weights, code for training and evaluation, and the training data, fully open-sourced, which offers a solid starting point for future research on agent models and agentic RL.

체인-오브-에이전트: 다중 에이전트 증류와 에이전트 강화학습을 통한 종단간 에이전트 기초 모델

Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL

초록

Support