ChatPaper.aiChatPaper

WideSeek-R1:基于多智能体强化学习的广度信息探索宽度扩展研究

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

February 4, 2026
作者: Zelai Xu, Zhexuan Xu, Ruize Zhang, Chunyang Zhu, Shi Yu, Weilin Liu, Quanlu Zhang, Wenbo Ding, Chao Yu, Yu Wang
cs.AI

摘要

近期大型语言模型(LLM)的研究进展主要聚焦于深度扩展,即通过单一智能体进行多轮推理与工具调用以解决长周期问题。然而随着任务范围的扩大,关键瓶颈已从个体能力转向组织效能。本研究探索了多智能体系统的宽度扩展这一互补维度,以应对广泛信息检索需求。现有多智能体系统往往依赖人工设计的工作流程和轮替式交互,难以实现高效并行化工作。为弥补这一差距,我们提出WideSeek-R1框架,通过多智能体强化学习训练主控智能体与从属智能体协同工作,实现可扩展的统筹规划与并行执行。该框架基于共享LLM架构,通过隔离上下文与专用工具,在包含2万条广泛信息检索任务的精选数据集上联合优化主控智能体与并行从属智能体。大量实验表明,WideSeek-R1-4B在WideSearch基准测试中达到40.0%的项目F1分数,与单智能体DeepSeek-R1-671B性能相当。更重要的是,随着并行从属智能体数量的增加,WideSeek-R1-4B展现出持续的性能提升,充分印证了宽度扩展的有效性。
English
Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.
PDF712February 6, 2026