ChatPaper.aiChatPaper

QUEST:以全合成任務訓練前沿深度研究智能體

QUEST: Training Frontier Deep Research Agents with Fully Synthetic Tasks

May 22, 2026
作者: Jian Xie, Tianhe Lin, Zilu Wang, Yuting Ning, Yuekun Yao, Tianci Xue, Zhehao Zhang, Zhongyang Li, Kai Zhang, Yufan Wu, Shijie Chen, Boyu Gou, Mingzhe Han, Yifei Wang, Vint Lee, Xinpeng Wei, Xiangjun Wang, Yu Su, Huan Sun
cs.AI

摘要

深度研究代理將搜尋引擎的角色從檢索關鍵詞匹配頁面擴展為知識綜合,從根本上改變了人類與資訊互動的方式。然而,前沿系統仍屬於專有領域,而現有的開放代理在不同任務類型間常難以有效泛化,使得如何訓練一個具備廣泛能力的深度研究代理仍不明朗。我們釋出QUEST,一系列開放模型家族(參數量從20億到350億),專為通用型深度研究代理設計,能處理多樣化的長程搜尋任務,並在事實查找、引文錨定及報告綜合方面具備強大能力。為建構QUEST,我們提出一套有效的訓練方案,結合中期訓練、監督微調與強化學習。此方案的核心是一個基於統一評分樹的策劃數據合成流程,該流程適用於不同任務類型,無需人工標註即可合成具備可驗證獎勵的訓練數據。此外,QUEST內建情境管理機制,能實現有效的長程推理與知識綜合。僅使用8000個合成任務,QUEST便在涵蓋多種任務類型的八項深度研究基準測試中,接近甚至超越前沿封閉源代理,並在近期開放權重代理中取得最佳整體表現。我們已釋出所有內容:模型、數據及訓練腳本。
English
Deep research agents extend the role of search engines from retrieving keyword-matched pages to synthesizing knowledge, fundamentally changing how humans interact with information. However, frontier systems remain proprietary, while existing open agents often generalize poorly across different task types, leaving unclear how to train a broadly capable deep research agent. We release QUEST, a family of open models (ranging from 2B to 35B) that serve as general-purpose deep research agents designed to handle a wide range of long-horizon search tasks, with strong capabilities in fact seeking, citation grounding, and report synthesis. To build QUEST, we propose an effective training recipe combining mid-training, supervised fine-tuning, and reinforcement learning. Central to this recipe is a curated data synthesis pipeline based on unified rubric trees, which applies to different task types and enables synthesizing training data with verifiable rewards without human annotation. In addition, QUEST incorporates a built-in context management mechanism that enables effective long-horizon reasoning and knowledge synthesis. Using only 8K synthesized tasks, QUEST approaches or even surpasses frontier closed-source agents across eight deep research benchmarks spanning diverse task types, and achieves the best overall performance among recent open-weight agents. We released everything: models, data, and training scripts.