超越回合限制：使用動態上下文窗口訓練深度搜索代理

摘要

尽管近期推理模型的进展通过强化学习展示了认知行为，现有方法在多轮交互的长时程智能体上仍难以激发深层推理能力。我们提出DeepMiner，一个新颖的框架，通过引入高难度训练任务和动态上下文窗口来激发此类能力。DeepMiner采用逆向构建方法，从真实网络资源生成复杂但可验证的问答对，这确保了训练数据的挑战性和可靠性，同时将认知能力注入多轮推理场景。我们进一步设计了一种简洁而有效的动态上下文管理策略，适用于训练和推理，利用滑动窗口机制，同时消除了对外部摘要模型的依赖，从而高效地赋能模型处理持续扩展的长时程上下文。通过在Qwen3-32B上进行强化学习，我们开发了DeepMiner-32B，在多个搜索智能体基准测试中实现了显著的性能提升。DeepMiner在BrowseComp-en上达到了33.5%的准确率，比之前最佳的开源智能体高出近20个百分点，并在BrowseComp-zh、XBench-DeepSearch和GAIA上展示了一致的改进。值得注意的是，我们的动态上下文管理使得在标准的32k上下文长度内能够维持近100轮的持续交互，有效解决了现有多轮交互系统所面临的上下文限制问题。

English

While recent advances in reasoning models have demonstrated cognitive behaviors through reinforcement learning, existing approaches struggle to invoke deep reasoning capabilities in multi-turn agents with long-horizon interactions. We propose DeepMiner, a novel framework that elicits such abilities by introducing high-difficulty training tasks and dynamic context window. DeepMiner presents a reverse construction method to generate complex but verifiable question-answer pairs from authentic web sources, which ensures the challenge and reliability of training data while injecting cognitive capabilities into multi-turn reasoning scenarios. We further design an elegant yet effective dynamic context management strategy for both training and inference, utilizing sliding window mechanisms while eliminating the dependency on external summarization models, thereby efficiently empowering the model to handle continuously expanding long-horizon contexts. Through reinforcement learning on Qwen3-32B, we develop DeepMiner-32B, which achieves substantial performance improvements across multiple search agent benchmarks. DeepMiner attains 33.5% accuracy on BrowseComp-en, surpassing the previous best open-source agent by almost 20 percentage points, and demonstrates consistent improvements on BrowseComp-zh, XBench-DeepSearch, and GAIA. Notably, our dynamic context management enables sustained interactions of nearly 100 turns within standard 32k context length, effectively addressing the context limitations that constrain existing multi-turn interaction systems.

超越回合限制：使用動態上下文窗口訓練深度搜索代理

Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window

摘要

Support