ChatPaper.aiChatPaper

突破回合限制:基于动态上下文窗口的深度搜索智能体训练

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

October 9, 2025
作者: Qiaoyu Tang, Hao Xiang, Le Yu, Bowen Yu, Yaojie Lu, Xianpei Han, Le Sun, WenJuan Zhang, Pengbo Wang, Shixuan Liu, Zhenru Zhang, Jianhong Tu, Hongyu Lin, Junyang Lin
cs.AI

摘要

尽管近期推理模型的进展通过强化学习展现了认知行为,现有方法在多轮交互的长时程智能体上仍难以激发深层推理能力。我们提出DeepMiner,这一新颖框架通过引入高难度训练任务和动态上下文窗口来激发此类能力。DeepMiner采用逆向构建方法,从真实网络资源中生成复杂但可验证的问答对,既确保了训练数据的挑战性与可靠性,又为多轮推理场景注入了认知能力。我们进一步设计了一种简洁而高效的动态上下文管理策略,适用于训练与推理阶段,利用滑动窗口机制,同时摆脱对外部摘要模型的依赖,从而有效赋能模型处理持续扩展的长时程上下文。通过在Qwen3-32B上进行强化学习,我们开发了DeepMiner-32B,在多个搜索智能体基准测试中实现了显著的性能提升。DeepMiner在BrowseComp-en上达到33.5%的准确率,较之前最佳开源智能体提升了近20个百分点,并在BrowseComp-zh、XBench-DeepSearch和GAIA上持续展现改进。尤为突出的是,我们的动态上下文管理使得在标准32k上下文长度内实现了近100轮的持续交互,有效解决了现有多轮交互系统面临的上下文限制问题。
English
While recent advances in reasoning models have demonstrated cognitive behaviors through reinforcement learning, existing approaches struggle to invoke deep reasoning capabilities in multi-turn agents with long-horizon interactions. We propose DeepMiner, a novel framework that elicits such abilities by introducing high-difficulty training tasks and dynamic context window. DeepMiner presents a reverse construction method to generate complex but verifiable question-answer pairs from authentic web sources, which ensures the challenge and reliability of training data while injecting cognitive capabilities into multi-turn reasoning scenarios. We further design an elegant yet effective dynamic context management strategy for both training and inference, utilizing sliding window mechanisms while eliminating the dependency on external summarization models, thereby efficiently empowering the model to handle continuously expanding long-horizon contexts. Through reinforcement learning on Qwen3-32B, we develop DeepMiner-32B, which achieves substantial performance improvements across multiple search agent benchmarks. DeepMiner attains 33.5% accuracy on BrowseComp-en, surpassing the previous best open-source agent by almost 20 percentage points, and demonstrates consistent improvements on BrowseComp-zh, XBench-DeepSearch, and GAIA. Notably, our dynamic context management enables sustained interactions of nearly 100 turns within standard 32k context length, effectively addressing the context limitations that constrain existing multi-turn interaction systems.
PDF72October 10, 2025