ChatPaper.aiChatPaper

SearchSwarm:面向长周期深度研究中智能体大语言模型的委派智能

SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research

June 8, 2026
作者: Pu Ning, Quan Chen, Kun Tao, Xinyu Tang, Tianshu Wang, Qianggang Cao, Xinyu Kong, Zujie Wen, Zhiqiang Zhang, Jun Zhou
cs.AI

摘要

大型語言模型日益被期望能夠處理複雜、長期的現實世界任務,其語境需求可能無限增長,然而模型的上下文窗口本質上是有限的。近期研究探索了一種範式:主代理將任務分解並將子任務分派給子代理,後者執行任務並僅回傳總結結果,從而節省主代理的上下文預算。然而,要有效執行此流程,需要具備委派智能:即分解複雜任務、判斷何時與委派何事、以及將回傳結果整合至持續工作流程的能力。此類能力的訓練數據在自然出現的文本中相當稀缺,而據我們所知,在開源社群中,如何合成此類數據並訓練模型以獲得此能力,仍是尚未充分探索的領域。為填補此缺口,我們提出一項初步探索,聚焦於深度研究——一項具代表性的長期代理任務。具體而言,我們設計了一套引導框架,引導模型進行高品質的任務分解與委派,同時約束子代理以適當方式回傳結果,以支援主代理的工作流程。該框架引導的軌跡自然編碼了正確的委派決策,我們將其作為監督式微調數據,將委派智能內化至模型權重中。我們由此產出的模型 SearchSwarm-30B-A3B,在 BrowseComp 上達到 68.1,在 BrowseComp-ZH 上達到 73.3,是同規模模型中最佳表現。我們將釋出引導框架、模型權重及訓練數據,以促進未來研究。
English
Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and return only summarized results, conserving the main agent's context budget. However, performing this well requires delegation intelligence: the ability to decompose complex tasks, determine when and what to delegate, and integrate returned results into the ongoing workflow. Training data for this capability is scarce in naturally occurring text, and to our knowledge, how to synthesize such data and train models to acquire this capability remains largely unexplored in the open-source community. To bridge this gap, we present a preliminary exploration targeting deep research, a representative long-horizon agent task. Specifically, we design a harness that guides the model toward high-quality task decomposition and delegation, while constraining subagents to return results properly to support the main agent's workflow. The harness-guided trajectories naturally encode correct delegation decisions, which we use as supervised fine-tuning data to internalize delegation intelligence into model weights. Our resulting model, SearchSwarm-30B-A3B, achieves 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, the best results among all models of comparable scale. We will release our harness, model weights, and training data to facilitate future research.