ChatPaper.aiChatPaper

代理模型:將行動鏈生成內化至推理模型中

Agent models: Internalizing Chain-of-Action Generation into Reasoning models

March 9, 2025
作者: Yuxiang Zhang, Yuqi Yang, Jiangming Shu, Xinyan Wen, Jitao Sang
cs.AI

摘要

傳統的代理工作流程依賴外部提示來管理與工具及環境的互動,這限制了推理模型的自主動性。我們提出大型代理模型(LAMs),其內化了行動鏈(CoA)的生成,使模型能自主決定何時及如何使用外部工具。我們提出的AutoCoA框架結合了監督式微調(SFT)和強化學習(RL),使模型能在推理與行動間無縫切換,同時高效管理環境互動。主要組件包括步驟級別的行動觸發、軌跡級別的CoA優化,以及一個內部世界模型以降低實際環境互動成本。在開放領域問答任務上的評估顯示,經AutoCoA訓練的代理模型在任務完成度上顯著優於基於ReAct的工作流程,特別是在需要長期推理和多步驟行動的任務中。代碼和數據集可在https://github.com/ADaM-BJTU/AutoCoA 獲取。
English
Traditional agentic workflows rely on external prompts to manage interactions with tools and the environment, which limits the autonomy of reasoning models. We position Large Agent Models (LAMs) that internalize the generation of Chain-of-Action (CoA), enabling the model to autonomously decide when and how to use external tools. Our proposed AutoCoA framework combines supervised fine-tuning (SFT) and reinforcement learning (RL), allowing the model to seamlessly switch between reasoning and action while efficiently managing environment interactions. Main components include step-level action triggering, trajectory-level CoA optimization, and an internal world model to reduce real-environment interaction costs. Evaluations on open-domain QA tasks demonstrate that AutoCoA-trained agent models significantly outperform ReAct-based workflows in task completion, especially in tasks that require long-term reasoning and multi-step actions. Code and dataset are available at https://github.com/ADaM-BJTU/AutoCoA

Summary

AI-Generated Summary

PDF173March 11, 2025