通过持续预训练扩展智能体
Scaling Agents via Continual Pre-training
September 16, 2025
作者: Liangcai Su, Zhen Zhang, Guangyu Li, Zhuo Chen, Chenxi Wang, Maojia Song, Xinyu Wang, Kuan Li, Jialong Wu, Xuanzhong Chen, Zile Qiao, Zhongwang Zhang, Huifeng Yin, Shihao Cai, Runnan Fang, Zhengwei Tao, Wenbiao Yin, Chenxiong Qian, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou
cs.AI
摘要
大型语言模型(LLMs)已进化为具备自主工具使用和多步推理能力的智能体系统,能够解决复杂问题。然而,基于通用基础模型的后训练方法在智能体任务中表现始终欠佳,尤其是在开源实现中。我们发现了根本原因:缺乏强大的智能体基础模型,迫使模型在后训练过程中同时学习多样化的智能体行为,并将其与专家示范对齐,从而产生了根本性的优化冲突。为此,我们首次提出将智能体持续预训练(Agentic CPT)纳入深度研究智能体训练流程,以构建强大的智能体基础模型。基于这一方法,我们开发了一个名为AgentFounder的深度研究智能体模型。我们在10个基准上评估了AgentFounder-30B,并取得了最先进的性能,同时保持了强大的工具使用能力,特别是在BrowseComp-en上达到39.9%,在BrowseComp-zh上达到43.3%,在HLE上Pass@1达到31.5%。
English
Large language models (LLMs) have evolved into agentic systems capable of
autonomous tool use and multi-step reasoning for complex problem-solving.
However, post-training approaches building upon general-purpose foundation
models consistently underperform in agentic tasks, particularly in open-source
implementations. We identify the root cause: the absence of robust agentic
foundation models forces models during post-training to simultaneously learn
diverse agentic behaviors while aligning them to expert demonstrations, thereby
creating fundamental optimization tensions. To this end, we are the first to
propose incorporating Agentic Continual Pre-training (Agentic CPT) into the
deep research agents training pipeline to build powerful agentic foundational
models. Based on this approach, we develop a deep research agent model named
AgentFounder. We evaluate our AgentFounder-30B on 10 benchmarks and achieve
state-of-the-art performance while retains strong tool-use ability, notably
39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE.