通过持续预训练扩展智能体

摘要

大型语言模型（LLMs）已进化为具备自主工具使用和多步推理能力的智能体系统，能够解决复杂问题。然而，基于通用基础模型的后训练方法在智能体任务中表现始终欠佳，尤其是在开源实现中。我们发现了根本原因：缺乏强大的智能体基础模型，迫使模型在后训练过程中同时学习多样化的智能体行为，并将其与专家示范对齐，从而产生了根本性的优化冲突。为此，我们首次提出将智能体持续预训练（Agentic CPT）纳入深度研究智能体训练流程，以构建强大的智能体基础模型。基于这一方法，我们开发了一个名为AgentFounder的深度研究智能体模型。我们在10个基准上评估了AgentFounder-30B，并取得了最先进的性能，同时保持了强大的工具使用能力，特别是在BrowseComp-en上达到39.9%，在BrowseComp-zh上达到43.3%，在HLE上Pass@1达到31.5%。

English

Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models forces models during post-training to simultaneously learn diverse agentic behaviors while aligning them to expert demonstrations, thereby creating fundamental optimization tensions. To this end, we are the first to propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models. Based on this approach, we develop a deep research agent model named AgentFounder. We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability, notably 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE.

通过持续预训练扩展智能体

Scaling Agents via Continual Pre-training

摘要

Support