지속적 사전 학습을 통한 에이전트 확장

초록

대형 언어 모델(LLMs)은 복잡한 문제 해결을 위한 자율적인 도구 사용과 다단계 추론이 가능한 에이전트 시스템으로 진화해 왔다. 그러나 범용 기반 모델을 기반으로 한 사후 훈련 접근법은 에이전트 작업, 특히 오픈소스 구현에서 지속적으로 낮은 성능을 보인다. 우리는 그 근본 원인을 파악했다: 강력한 에이전트 기반 모델의 부재로 인해 사후 훈련 중 모델이 다양한 에이전트 행동을 학습하는 동시에 전문가 시연에 맞춰 정렬해야 하므로 근본적인 최적화 긴장이 발생한다. 이를 위해, 우리는 강력한 에이전트 기반 모델을 구축하기 위해 딥 리서치 에이전트 훈련 파이프라인에 에이전트 지속 사전 훈련(Agentic CPT)을 통합하는 것을 최초로 제안한다. 이 접근법을 기반으로, 우리는 AgentFounder라는 딥 리서치 에이전트 모델을 개발했다. 우리는 AgentFounder-30B를 10개의 벤치마크에서 평가하며 최첨단 성능을 달성했고, 특히 BrowseComp-en에서 39.9%, BrowseComp-zh에서 43.3%, HLE에서 Pass@1 31.5%의 강력한 도구 사용 능력을 유지했다.

English

Large language models (LLMs) have evolved into agentic systems capable of autonomous tool use and multi-step reasoning for complex problem-solving. However, post-training approaches building upon general-purpose foundation models consistently underperform in agentic tasks, particularly in open-source implementations. We identify the root cause: the absence of robust agentic foundation models forces models during post-training to simultaneously learn diverse agentic behaviors while aligning them to expert demonstrations, thereby creating fundamental optimization tensions. To this end, we are the first to propose incorporating Agentic Continual Pre-training (Agentic CPT) into the deep research agents training pipeline to build powerful agentic foundational models. Based on this approach, we develop a deep research agent model named AgentFounder. We evaluate our AgentFounder-30B on 10 benchmarks and achieve state-of-the-art performance while retains strong tool-use ability, notably 39.9% on BrowseComp-en, 43.3% on BrowseComp-zh, and 31.5% Pass@1 on HLE.

지속적 사전 학습을 통한 에이전트 확장

Scaling Agents via Continual Pre-training

초록

Support