ChatPaper.aiChatPaper

WideSeek-R1:基於多智能體強化學習的寬度擴展架構探索——面向廣域資訊檢索的應用研究

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

February 4, 2026
作者: Zelai Xu, Zhexuan Xu, Ruize Zhang, Chunyang Zhu, Shi Yu, Weilin Liu, Quanlu Zhang, Wenbo Ding, Chao Yu, Yu Wang
cs.AI

摘要

近期大型語言模型(LLM)的發展主要聚焦於深度擴展,即單一智能體通過多輪推理與工具使用來解決長視野問題。然而隨著任務範圍的擴大,關鍵瓶頸已從個體能力轉向組織效能。本研究探索了多智能體系統的寬度擴展這一互補維度,以應對廣泛資訊獲取需求。現有多智能體系統通常依賴手工設計的工作流程和輪替式交互,難以有效實現並行化工作。為彌合這一差距,我們提出WideSeek-R1——一個通過多智能體強化學習(MARL)訓練的主從智能體框架,旨在協同實現可擴展的調度與並行執行。該框架基於共享LLM架構,採用隔離上下文與專用工具,在精選的2萬個廣泛資訊獲取任務數據集上聯合優化主智能體與並行從屬智能體。大量實驗表明,WideSeek-R1-4B在WideSearch基準測試中達到40.0%的項目F1分數,其表現可與單智能體DeepSeek-R1-671B相媲美。更重要的是,隨著並行從屬智能體數量增加,WideSeek-R1-4B展現出持續的性能提升,彰顯了寬度擴展的有效性。
English
Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.
PDF712February 6, 2026