ChatPaper.aiChatPaper

通义深度研究技术报告

Tongyi DeepResearch Technical Report

October 28, 2025
作者: Tongyi DeepResearch Team, Baixuan Li, Bo Zhang, Dingchu Zhang, Fei Huang, Guangyu Li, Guoxin Chen, Huifeng Yin, Jialong Wu, Jingren Zhou, Kuan Li, Liangcai Su, Litu Ou, Liwen Zhang, Pengjun Xie, Rui Ye, Wenbiao Yin, Xinmiao Yu, Xinyu Wang, Xixi Wu, Xuanzhong Chen, Yida Zhao, Zhen Zhang, Zhengwei Tao, Zhongwang Zhang, Zile Qiao, Chenxi Wang, Donglei Yu, Gang Fu, Haiyang Shen, Jiayin Yang, Jun Lin, Junkai Zhang, Kui Zeng, Li Yang, Hailong Yin, Maojia Song, Ming Yan, Peng Xia, Qian Xiao, Rui Min, Ruixue Ding, Runnan Fang, Shaowei Chen, Shen Huang, Shihang Wang, Shihao Cai, Weizhou Shen, Xiaobin Wang, Xin Guan, Xinyu Geng, Yingcheng Shi, Yuning Wu, Zhuo Chen, Zijian Li, Yong Jiang
cs.AI

摘要

我们推出通义深度研究(Tongyi DeepResearch)——一款专为长周期深度信息检索任务设计的智能体大语言模型。为激发自主深度研究能力,该模型通过融合智能体中期训练与智能体后期训练的端到端训练框架开发,实现了跨复杂任务的可扩展推理与信息检索。我们设计了高度可扩展的全自动数据合成流程,无需依赖昂贵的人工标注即可支撑所有训练阶段。通过为每个阶段构建定制化交互环境,系统确保了全流程稳定一致的交互体验。通义深度研究模型总参数量达305亿,每令牌仅激活33亿参数,在包括"人类终极考试"、BrowseComp、BrowseComp-ZH、WebWalkerQA、xbench-DeepSearch、FRAMES及xbench-DeepSearch-2510等一系列智能体深度研究基准测试中均达到业界领先水平。我们将开源模型、框架及完整解决方案,以赋能研究社区。
English
We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep research agency, Tongyi DeepResearch is developed through an end-to-end training framework that combines agentic mid-training and agentic post-training, enabling scalable reasoning and information seeking across complex tasks. We design a highly scalable data synthesis pipeline that is fully automatic, without relying on costly human annotation, and empowers all training stages. By constructing customized environments for each stage, our system enables stable and consistent interactions throughout. Tongyi DeepResearch, featuring 30.5 billion total parameters, with only 3.3 billion activated per token, achieves state-of-the-art performance across a range of agentic deep research benchmarks, including Humanity's Last Exam, BrowseComp, BrowseComp-ZH, WebWalkerQA, xbench-DeepSearch, FRAMES and xbench-DeepSearch-2510. We open-source the model, framework, and complete solutions to empower the community.
PDF954December 1, 2025