QUEST: 完全合成タスクを用いたフロンティアディープリサーチエージェントの訓練

要旨

深層研究エージェントは、検索エンジンの役割をキーワードマッチによるページ検索から知識の統合へと拡張し、人間と情報の相互作用のあり方を根本的に変革しつつある。しかし、最先端のシステムは依然としてプロプライエタリであり、既存のオープンエージェントは異なるタスクタイプ間での汎化性能が低いことが多く、広範な能力を持つ深層研究エージェントをどのように訓練するかは不明瞭なままである。我々は、汎用深層研究エージェントとして機能するQUESTモデルファミリー（2Bから35Bの規模）を公開する。これらは、事実探索、引用根拠付け、レポート統合において強力な能力を備え、幅広い長期探索タスクを処理するよう設計されている。QUESTを構築するために、中間学習、教師ありファインチューニング、強化学習を組み合わせた効果的な訓練レシピを提案する。このレシピの中核は、統一ルーブリック木に基づく厳選されたデータ合成パイプラインであり、これは異なるタスクタイプに適用可能で、人間によるアノテーションを必要とせずに検証可能な報酬を伴う訓練データを合成することを可能にする。さらに、QUESTは組み込みのコンテキスト管理機構を備えており、効果的な長期推論と知識統合を実現する。わずか8Kの合成タスクを用いて、QUESTは多様なタスクタイプを網羅する8つの深層研究ベンチマークにおいて、プロプライエタリなクローズドソースエージェントに迫るか、あるいは凌駕する性能を示し、最近のオープンウェイトエージェントの中で総合的に最高の性能を達成した。我々は、モデル、データ、訓練スクリプトのすべてを公開する。

English

Deep research agents extend the role of search engines from retrieving keyword-matched pages to synthesizing knowledge, fundamentally changing how humans interact with information. However, frontier systems remain proprietary, while existing open agents often generalize poorly across different task types, leaving unclear how to train a broadly capable deep research agent. We release QUEST, a family of open models (ranging from 2B to 35B) that serve as general-purpose deep research agents designed to handle a wide range of long-horizon search tasks, with strong capabilities in fact seeking, citation grounding, and report synthesis. To build QUEST, we propose an effective training recipe combining mid-training, supervised fine-tuning, and reinforcement learning. Central to this recipe is a curated data synthesis pipeline based on unified rubric trees, which applies to different task types and enables synthesizing training data with verifiable rewards without human annotation. In addition, QUEST incorporates a built-in context management mechanism that enables effective long-horizon reasoning and knowledge synthesis. Using only 8K synthesized tasks, QUEST approaches or even surpasses frontier closed-source agents across eight deep research benchmarks spanning diverse task types, and achieves the best overall performance among recent open-weight agents. We released everything: models, data, and training scripts.