スケーラブルなエージェントのための継続的事前学習

要旨

大規模言語モデル（LLMs）は、自律的なツール使用と複雑な問題解決のための多段階推論を可能とするエージェント型システムへと進化してきました。しかし、汎用基盤モデルを基にしたポストトレーニング手法は、特にオープンソースの実装において、エージェントタスクで一貫して低い性能を示しています。その根本的な原因として、堅牢なエージェント基盤モデルの欠如が挙げられます。これにより、ポストトレーニング中にモデルは多様なエージェント行動を学習しつつ、専門家のデモンストレーションに適合させる必要があり、根本的な最適化の緊張が生じています。この問題に対処するため、我々は初めて、深層研究エージェントのトレーニングパイプラインに「エージェント型継続事前学習（Agentic CPT）」を組み込むことを提案し、強力なエージェント基盤モデルを構築します。このアプローチに基づき、我々は「AgentFounder」という深層研究エージェントモデルを開発しました。AgentFounder-30Bを10のベンチマークで評価し、最先端の性能を達成するとともに、強力なツール使用能力を維持しています。特に、BrowseComp-enで39.9%、BrowseComp-zhで43.3%、HLEでPass@1 31.5%の結果を示しました。

English

Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models forces models during post-training to simultaneously learn diverse agentic behaviors while aligning them to expert demonstrations, thereby creating fundamental optimization tensions. To this end, we are the first to propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models. Based on this approach, we develop a deep research agent model named AgentFounder. We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability, notably 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE.

スケーラブルなエージェントのための継続的事前学習

Scaling Agents via Continual Pre-training

要旨

Support