SearchSwarm:面向智能体大语言模型中的委托智能以实现长期深度研究
SearchSwarm: Towards Delegation Intelligence in Agentic LLMs for Long-Horizon Deep Research
June 8, 2026
作者: Pu Ning, Quan Chen, Kun Tao, Xinyu Tang, Tianshu Wang, Qianggang Cao, Xinyu Kong, Zujie Wen, Zhiqiang Zhang, Jun Zhou
cs.AI
摘要
大型语言模型越来越需要处理复杂、长期的实际任务,这些任务的上下文需求可能无限增长,而模型的上下文窗口本质上仍有限。近期研究探索了一种范式:主代理(agent)将任务分解并分派子任务给子代理,子代理执行后仅返回总结结果,从而节省主代理的上下文预算。然而,高效执行这一过程需要委托智能(delegation intelligence),即分解复杂任务、判断何时委托及委托什么,并将返回结果整合到持续工作流中的能力。自然文本中此类能力的训练数据稀缺,据我们所知,在开源社区中,如何合成此类数据并训练模型获取该能力仍鲜有探索。为弥补这一空白,我们以深度研究(deep research)这一典型的长期代理任务为目标,开展了初步探索。具体而言,我们设计了一个引导框架(harness),引导模型进行高质量的任务分解与委托,同时约束子代理妥善返回结果以支持主代理的工作流。该框架引导生成的轨迹自然编码了正确的委托决策,我们将其作为监督微调数据,将委托智能内化到模型权重中。最终得到的模型SearchSwarm-30B-A3B在BrowseComp上达到68.1分,在BrowseComp-ZH上达到73.3分,是同等规模模型中的最佳成绩。我们将公开引导框架、模型权重及训练数据,以促进未来研究。
English
Large language models are increasingly expected to handle complex, long-horizon real-world tasks whose context demands can grow without bound, yet model context windows remain inherently finite. Recent work explores a paradigm where a main agent decomposes tasks and dispatches subtasks to subagents, which execute and return only summarized results, conserving the main agent's context budget. However, performing this well requires delegation intelligence: the ability to decompose complex tasks, determine when and what to delegate, and integrate returned results into the ongoing workflow. Training data for this capability is scarce in naturally occurring text, and to our knowledge, how to synthesize such data and train models to acquire this capability remains largely unexplored in the open-source community. To bridge this gap, we present a preliminary exploration targeting deep research, a representative long-horizon agent task. Specifically, we design a harness that guides the model toward high-quality task decomposition and delegation, while constraining subagents to return results properly to support the main agent's workflow. The harness-guided trajectories naturally encode correct delegation decisions, which we use as supervised fine-tuning data to internalize delegation intelligence into model weights. Our resulting model, SearchSwarm-30B-A3B, achieves 68.1 on BrowseComp and 73.3 on BrowseComp-ZH, the best results among all models of comparable scale. We will release our harness, model weights, and training data to facilitate future research.