OpenResearcher：面向长周期深度研究轨迹合成的全开放流程

摘要

训练深度研究智能体需要能够交错进行搜索、证据整合与多步推理的长周期轨迹。然而，现有数据收集流程通常依赖专有网络API，导致大规模轨迹合成存在成本高、稳定性差且难以复现的问题。我们提出OpenResearcher——一个可复现的流程框架，通过三大显式浏览器原语（搜索、打开、查找）在包含1500万文档的语料库上完全离线执行搜索-浏览循环，实现了单次语料库引导与多轮轨迹合成的解耦。基于GPT-OSS-120B作为教师模型，我们合成了超过9.7万条轨迹，其中包含大量工具调用次数达100+的长周期任务。通过对30B-A3B骨干网络进行监督微调，该模型在BrowseComp-Plus上的准确率达到54.8%，较基础模型提升34.0个百分点，同时在BrowseComp、GAIA和xbench-DeepSearch基准上保持竞争力。由于环境完全离线且全流程可监测，该系统还支持可控分析——我们的研究揭示了深度研究管道设计的实用洞见，包括数据过滤策略、智能体配置选择，以及检索成功率与最终答案准确性的关联。我们已开源该流程框架、合成轨迹、模型检查点及离线搜索环境，详见https://github.com/TIGER-AI-Lab/OpenResearcher。

English

Training deep research agents requires long-horizon trajectories that interleave search, evidence aggregation, and multi-step reasoning. However, existing data collection pipelines typically rely on proprietary web APIs, making large-scale trajectory synthesis costly, unstable, and difficult to reproduce. We present OpenResearcher, a reproducible pipeline that decouples one-time corpus bootstrapping from multi-turn trajectory synthesis and executes the search-and-browse loop entirely offline using three explicit browser primitives: search, open, and find, over a 15M-document corpus. Using GPT-OSS-120B as the teacher model, we synthesize over 97K trajectories, including a substantial long-horizon tail with 100+ tool calls. Supervised fine-tuning a 30B-A3B backbone on these trajectories achieves 54.8\% accuracy on BrowseComp-Plus, a +34.0 point improvement over the base model, while remaining competitive on BrowseComp, GAIA, and xbench-DeepSearch. Because the environment is offline and fully instrumented, it also enables controlled analysis, where our study reveals practical insights into deep research pipeline design, including data filtering strategies, agent configuration choices, and how retrieval success relates to final answer accuracy. We release the pipeline, synthesized trajectories, model checkpoints, and the offline search environment at https://github.com/TIGER-AI-Lab/OpenResearcher.