通義深度研究技術報告

摘要

我們推出通義深度研究（Tongyi DeepResearch）——一款專為長時程深度資訊探索研究任務設計的智能體大型語言模型。為激發自主深度研究能力，該模型通過結合智能體中期訓練與智能體後期訓練的端到端訓練框架開發，實現跨複雜任務的可擴展推理與資訊探索。我們設計了高度可擴展的全自動數據合成管線，無需依賴高成本人工標註，即可支撐所有訓練階段。通過為每個階段構建定制化環境，我們的系統能實現全流程穩定一致的交互。通義深度研究模型總參數量達305億，每令牌僅激活33億參數，在包括「人類終極考試」、BrowseComp、BrowseComp-ZH、WebWalkerQA、xbench-DeepSearch、FRAMES及xbench-DeepSearch-2510在內的一系列智能體深度研究基準測試中均達到頂尖性能。我們將開源模型、框架及完整解決方案，以賦能研究社群。

English

We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.