ChatPaper.aiChatPaper

通过持续预训练扩展代理能力

Scaling Agents via Continual Pre-training

September 16, 2025
作者: Liangcai Su, Zhen Zhang, Guangyu Li, Zhuo Chen, Chenxi Wang, Maojia Song, Xinyu Wang, Kuan Li, Jialong Wu, Xuanzhong Chen, Zile Qiao, Zhongwang Zhang, Huifeng Yin, Shihao Cai, Runnan Fang, Zhengwei Tao, Wenbiao Yin, Chenxiong Qian, Yong Jiang, Pengjun Xie, Fei Huang, Jingren Zhou
cs.AI

摘要

大型語言模型(LLMs)已發展成為具備自主工具使用與多步推理能力的代理系統,以應對複雜問題的解決。然而,基於通用基礎模型的後訓練方法在代理任務中表現持續不佳,尤其是在開源實現中。我們發現其根本原因在於:缺乏強大的代理基礎模型,迫使模型在後訓練過程中需同時學習多樣化的代理行為並將其與專家示範對齊,從而產生了根本性的優化矛盾。為此,我們首次提出將代理持續預訓練(Agentic CPT)納入深度研究代理訓練流程中,以構建強大的代理基礎模型。基於此方法,我們開發了一款名為AgentFounder的深度研究代理模型。我們在10個基準測試上評估了AgentFounder-30B,並取得了最先進的性能,同時保持了強大的工具使用能力,特別是在BrowseComp-en上達到39.9%,在BrowseComp-zh上達到43.3%,以及在HLE上Pass@1達到31.5%。
English
Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models forces models during post-training to simultaneously learn diverse agentic behaviors while aligning them to expert demonstrations, thereby creating fundamental optimization tensions. To this end, we are the first to propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models. Based on this approach, we develop a deep research agent model named AgentFounder. We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability, notably 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE.
PDF623September 17, 2025