AgentFugue: 집단 추론을 통한 장기적 과제를 위한 에이전트 스케일링

초록

최근 장기적(long-horizon) 에이전트 과제에 대한 진전은 주로 더 강력한 모델, 향상된 도구, 보다 효과적인 스캐폴딩(scaffolding)을 통해 개별 에이전트를 확장(scale up)하는 방식으로 이루어져 왔습니다. 이와 대조적으로, 확장(scale out)에 대해서는 이해가 훨씬 부족한데, 동일한 과제를 목표로 하는 다수의 피어(peer) 에이전트가 명시적인 역할 전문화나 워크플로 오케스트레이션에 의존하지 않고도 추가적인 능력의 원천이 될 수 있는지에 대한 연구가 미흡합니다. 본 논문에서는 이 질문을 연구하고, 공유 추론 허브(shared reasoning hub)를 중심으로 구축된 집단적 추론 프레임워크인 AgentFugue를 제안합니다. 피어 에이전트가 동일한 과제를 병렬로 탐색하는 동안 허브는 각 에이전트가 확립, 시도, 또는 배제한 사항에 대한 간결한 메모를 기록하며, 각 에이전트가 현재 탐색에 유용한 형태로 다른 에이전트가 발견한 내용에 선택적으로 접근할 수 있도록 합니다. 이 설계는 중앙 집중식 계획 없이, 그렇지 않으면 고립되었을 궤적을 재사용 가능한 중간 추론의 연결된 생태계로 전환합니다. 허브는 플러그인 통신 계층으로 구현되며, 지도 미세 조정(supervised fine-tuning) 및 종단 간 강화 학습(end-to-end reinforcement learning)을 통해 훈련됩니다. 우리가 연구한 도전적인 장기적 설정 전반에서 AgentFugue는 강력한 기준선(baseline) 대비 성능을 향상시킵니다. 본 결과는 집단적 추론이 피어 에이전트 시스템의 확장을 단순히 더 많은 컴퓨팅을 사용하는 방식이 아닌, 별개의 능력 향상 원천으로 전환할 수 있음을 시사합니다.

English

Recent progress on long-horizon agentic tasks has been driven largely by scaling up individual agents through stronger models, better tools, and more effective scaffolding. In contrast, much less is understood about scaling out: whether multiple peer agents, all targeting the same task, can become an additional source of capability without relying on explicit role specialization or workflow orchestration. We study this question and propose AgentFugue, a collective reasoning framework built around a shared reasoning hub. As peer agents explore the same task in parallel, the hub records concise notes on what each agent has established, attempted, or ruled out, and enables each agent to selectively access what other agents have discovered in a form useful for its current search. This design turns otherwise isolated trajectories into a connected ecology of reusable intermediate reasoning without requiring centralized planning. We instantiate the hub as a plug-in communication layer, trained with supervised fine-tuning and end-to-end reinforcement learning. Across the challenging long-horizon settings we study, AgentFugue improves over strong baselines. Our results suggest that collective reasoning can turn scaling out peer agent systems into a distinct source of capability gains, rather than merely a way of spending more compute.