马可深度研究：通过以验证为核心的设计解锁高效深度研究智能体

摘要

深度研究智能体能够自主开展开放式调研，通过整合复杂信息检索与跨源多步推理来解决现实世界问题。为在长周期任务中持续保持这种能力，可靠的验证机制在训练和推理阶段都至关重要。现有范式的主要瓶颈在于问答数据合成、轨迹构建和测试时扩展中缺乏显式验证机制，各阶段产生的误差会向下游传递并降低智能体整体性能。为此，我们推出Marco DeepResearch——一个采用三层验证中心化框架设计的深度研究智能体：（1）问答数据合成层面，我们为基于图谱和智能体的问答合成引入验证机制，在控制问题难度的同时确保答案唯一正确；（2）轨迹构建层面，我们设计验证驱动的轨迹合成方法，将显式验证模式注入训练轨迹；（3）测试时扩展层面，在推理阶段使用Marco DeepResearch自身作为验证器，有效提升复杂问题的处理性能。大量实验结果表明，我们所提出的Marco DeepResearch智能体在BrowseComp、BrowseComp-ZH等高难度基准测试中显著优于8B规模的深度研究智能体。值得注意的是，在600次工具调用的最大预算下，Marco DeepResearch甚至超越或接近Tongyi DeepResearch-30B等若干30B规模智能体的表现。

English

Deep research agents autonomously conduct open-ended investigations, integrating complex information retrieval with multi-step reasoning across diverse sources to solve real-world problems. To sustain this capability on long-horizon tasks, reliable verification is critical during both training and inference. A major bottleneck in existing paradigms stems from the lack of explicit verification mechanisms in QA data synthesis, trajectory construction, and test-time scaling. Errors introduced at each stage propagate downstream and degrade the overall agent performance. To address this, we present Marco DeepResearch, a deep research agent optimized with a verification-centric framework design at three levels: (1)~QA Data Synthesis: We introduce verification mechanisms to graph-based and agent-based QA synthesis to control question difficulty while ensuring answers are unique and correct; (2)~Trajectory Construction: We design a verification-driven trajectory synthesis method that injects explicit verification patterns into training trajectories; and (3)~Test-time scaling: We use Marco DeepResearch itself as a verifier at inference time and effectively improve performance on challenging questions. Extensive experimental results demonstrate that our proposed Marco DeepResearch agent significantly outperforms 8B-scale deep research agents on most challenging benchmarks, such as BrowseComp and BrowseComp-ZH. Crucially, under a maximum budget of 600 tool calls, Marco DeepResearch even surpasses or approaches several 30B-scale agents, like Tongyi DeepResearch-30B.

马可深度研究：通过以验证为核心的设计解锁高效深度研究智能体

Marco DeepResearch: Unlocking Efficient Deep Research Agents via Verification-Centric Design

摘要

Support