REDSearcher:面向长程搜索智能体的可扩展高性价比框架
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents
February 15, 2026
作者: Zheng Chu, Xiao Wang, Jack Hong, Huiming Fan, Yuqi Huang, Yue Yang, Guohai Xu, Chenxiao Zhao, Cheng Xiang, Shengchao Hu, Dongdong Kuang, Ming Liu, Bing Qin, Xing Yu
cs.AI
摘要
大型语言模型正从通用知识引擎向现实问题求解器转型,但针对深度搜索任务的优化仍具挑战。核心瓶颈在于高质量搜索轨迹与奖励信号的极端稀疏性,这源于可扩展长周期任务构建的困难性以及涉及外部工具调用的交互密集型推演的高成本。为应对这些挑战,我们提出REDSearcher框架,通过协同设计复杂任务合成、训练中期优化与训练后优化,实现可扩展的搜索智能体优化。具体而言,REDSearcher引入以下改进:(1)将任务合成构建为双约束优化问题,通过图拓扑结构与证据分散度精确控制任务难度,实现复杂高质量任务的可扩展生成;(2)引入工具增强型查询机制,激励智能体主动使用工具而非被动回忆;(3)在训练中期强化核心原子能力——知识处理、规划与函数调用,显著降低下游训练所需高质量轨迹的收集成本;(4)构建本地模拟环境,为强化学习实验提供快速低成本的算法迭代平台。在纯文本与多模态搜索智能体基准测试中,我们的方法均实现了最先进性能。为促进长周期搜索智能体的未来研究,我们将公开10K条高质量复杂文本搜索轨迹、5K条多模态轨迹、1K条文本强化学习查询集,并同步发布代码与模型检查点。
English
Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the difficulty of scalable longhorizon task construction and the high cost of interactionheavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, highquality tasks. (2) We introduce toolaugmented queries to encourage proactive tool use rather than passive recall.(3) During midtraining, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting highquality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, lowcost algorithmic iteration for reinforcement learning experiments. Across both textonly and multimodal searchagent benchmarks, our approach achieves stateoftheart performance. To facilitate future research on longhorizon search agents, we will release 10K highquality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.