ChatPaper.aiChatPaper

A^2FM:面向工具感知混合推理的自适应智能体基础模型

A^2FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

October 13, 2025
作者: Qianben Chen, Jingyi Cao, Jiayu Zhang, Tianrui Qin, Xiaowan Li, King Zhu, Dingfeng Shi, He Zhu, Minghao Liu, Xiaobo Liang, Xin Gui, Ge Zhang, Jian Yang, Yuchen Eleanor Jiang, Wangchunshu Zhou
cs.AI

摘要

大型语言模型分为两大类别:以推理为核心的大语言模型(LLMs),这类模型强化了内部的链式思维推理能力,但无法调用外部工具;以及代理型大语言模型,这类模型学习与环境互动并利用工具,但在深度推理方面往往表现不足。这种分化源于根本不同的训练目标,导致在处理简单查询时,两类模型因过度思考或频繁调用工具而效率低下,优势错配。在本研究中,我们提出了自适应代理基础模型(A^2FM),一个遵循“先路由后对齐”原则的统一框架:模型首先学习任务感知的路由,然后在共享骨干下对齐模式特定的轨迹。为解决效率差距,我们引入了第三种模式——即时模式,直接处理简单查询,避免不必要的推理或工具调用,同时补充代理和推理模式。为共同提升准确性和效率,我们提出了自适应策略优化(APO),它强制跨模式的自适应采样,并应用成本正则化的奖励。在32B规模上,A^2FM在BrowseComp上达到13.4%,在AIME25上达到70.4%,在HLE上达到16.7%,在同类模型中创下新纪录,并在代理、推理及通用基准测试中与前沿大语言模型竞争激烈。尤为突出的是,自适应执行实现了每正确答案仅$0.00487的成本——相较于推理模式成本降低45.2%,相较于代理模式降低33.5%,从而在保持相当准确性的同时,大幅提升了成本效益。
English
Large language models split into two families: reasoning-centric LLMs, which strengthen internal chain-of-thought reasoning but cannot invoke external tools, and agentic LLMs, which learn to interact with environments and leverage tools but often lag in deep reasoning. This divide arises from fundamentally different training objectives, leading to mismatched strengths and inefficiency on simple queries, where both families tend to overthink or over-call tools. In this work, we present Adaptive Agent Foundation Model (A^2FM), a unified framework that follows a route-then-align principle: the model first learns task-aware routing and then aligns mode-specific trajectories under a shared backbone. To address the inefficiency gap, we introduce a third mode-instant-that handles simple queries directly, preventing unnecessary reasoning or tool calls while complementing the agentic and reasoning modes. To jointly enhance accuracy and efficiency, we propose Adaptive Policy Optimization (APO), which enforces adaptive sampling across modes and applies a cost-regularized reward. On the 32B scale, A^2FM achieves 13.4% on BrowseComp, 70.4% on AIME25, and 16.7% on HLE, setting new SOTA among comparable models and performing competitively with frontier LLMs across agentic, reasoning, and general benchmarks. Notably, the adaptive execution achieves a cost of pass of only $0.00487 per correct answer-cutting cost by 45.2% relative to reasoning and 33.5% relative to agentic, thus delivering substantially higher cost efficiency while maintaining comparable accuracy.
PDF223October 20, 2025