REDSearcher:一個可擴展且成本效益高的長視野搜尋代理框架
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents
February 15, 2026
作者: Zheng Chu, Xiao Wang, Jack Hong, Huiming Fan, Yuqi Huang, Yue Yang, Guohai Xu, Chenxiao Zhao, Cheng Xiang, Shengchao Hu, Dongdong Kuang, Ming Liu, Bing Qin, Xing Yu
cs.AI
摘要
大型語言模型正從通用知識引擎轉型為現實世界問題解決者,但針對深度搜索任務的優化仍面臨挑戰。核心瓶頸在於高質量搜索軌跡與獎勵信號的極度稀疏性,這源於可擴展長週期任務構建的難度,以及涉及外部工具調用的高互動成本。為解決這些難題,我們提出REDSearcher——一個通過協同設計複雜任務合成、訓練中優化與訓練後優化的統一框架,實現可擴展搜索智能體優化。具體而言,REDSearcher引入以下創新:(1) 將任務合成構建為雙重約束優化問題,通過圖拓撲結構與證據分散度精確控制任務難度,實現複雜高質量任務的可擴展生成;(2) 引入工具增強型查詢機制,激勵智能體主動使用工具而非被動回憶;(3) 在訓練中階段強化核心原子能力(知識處理、規劃規劃與函數調用),大幅降低下游訓練所需高質量軌跡的收集成本;(4) 構建本地模擬環境,為強化學習實驗提供快速低成本的算法迭代平台。在純文本與多模態搜索智能體基準測試中,我們的方法均實現了最先進的性能。為推動長週期搜索智能體的未來研究,我們將公開10K高質量複雜文本搜索軌跡、5K多模態軌跡、1K文本強化學習查詢集,並同步開原始碼與模型檢查點。
English
Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the difficulty of scalable longhorizon task construction and the high cost of interactionheavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, highquality tasks. (2) We introduce toolaugmented queries to encourage proactive tool use rather than passive recall.(3) During midtraining, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting highquality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, lowcost algorithmic iteration for reinforcement learning experiments. Across both textonly and multimodal searchagent benchmarks, our approach achieves stateoftheart performance. To facilitate future research on longhorizon search agents, we will release 10K highquality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.