QUEST：使用全合成任务训练前沿深度研究智能体

摘要

深度研究智能体将搜索引擎的角色从检索关键词匹配页面扩展为知识综合，从根本上改变了人类与信息交互的方式。然而，前沿系统仍为专有，而现有的开放智能体通常在不同任务类型间泛化能力较差，导致如何训练一个广泛适用的深度研究智能体尚不明确。我们发布了QUEST——一系列开放模型（参数规模从2B到35B），作为通用型深度研究智能体，旨在处理多种长跨度搜索任务，具备事实检索、引文溯源和报告合成的强大能力。为构建QUEST，我们提出了一种结合中期预训练、监督微调和强化学习的有效训练策略。该策略的核心是基于统一评分规则树构建的合成数据流水线，该流水线适用于不同任务类型，并能在无需人工标注的情况下合成带有可验证奖励的训练数据。此外，QUEST内置了上下文管理机制，支持高效的长程推理与知识综合。仅使用8K条合成任务，QUEST在涵盖多种任务类型的八个深度研究基准测试中接近甚至超越前沿闭源智能体，并在近期开放权重智能体中取得了最佳综合性能。我们开源了所有内容：模型、数据及训练脚本。

English

Deep research agents extend the role of search engines from retrieving keyword-matched pages to synthesizing knowledge, fundamentally changing how humans interact with information. However, frontier systems remain proprietary, while existing open agents often generalize poorly across different task types, leaving unclear how to train a broadly capable deep research agent. We release QUEST, a family of open models (ranging from 2B to 35B) that serve as general-purpose deep research agents designed to handle a wide range of long-horizon search tasks, with strong capabilities in fact seeking, citation grounding, and report synthesis. To build QUEST, we propose an effective training recipe combining mid-training, supervised fine-tuning, and reinforcement learning. Central to this recipe is a curated data synthesis pipeline based on unified rubric trees, which applies to different task types and enables synthesizing training data with verifiable rewards without human annotation. In addition, QUEST incorporates a built-in context management mechanism that enables effective long-horizon reasoning and knowledge synthesis. Using only 8K synthesized tasks, QUEST approaches or even surpasses frontier closed-source agents across eight deep research benchmarks spanning diverse task types, and achieves the best overall performance among recent open-weight agents. We released everything: models, data, and training scripts.