ET-Agent:通过行为校准激励高效工具集成推理智能体
ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration
January 11, 2026
作者: Yifei Chen, Guanting Dong, Zhicheng Dou
cs.AI
摘要
大型语言模型(LLMs)可通过采用工具集成推理(TIR)范式突破其参数知识限制。然而,现有基于LLM的智能体训练框架往往侧重于答案准确性,忽视了对行为模式的针对性对齐,导致智能体在执行TIR任务时频繁出现冗余调用工具、工具调用不足等低效行为。如何在校正错误行为模式的同时探索有效轨迹,仍是亟待解决的问题。本文提出ET-Agent训练框架,通过自我演进数据飞轮与行为校正训练双重视角的协同作用,校准智能体的工具使用行为。具体而言,我们引入自我演进的数据飞轮生成增强数据,用于微调LLM以提升其探索能力。基于此,我们构建了双阶段行为校正训练框架,旨在渐进式地将错误行为模式校准至最优状态。深入的实验证实了该框架在正确性、效率、推理简洁性和工具执行准确性等多维度的优越性。ET-Agent框架为TIR领域研究提供了实践指导,代码详见https://github.com/asilverlight/ET-Agent。
English
Large Language Models (LLMs) can extend their parameter knowledge limits by adopting the Tool-Integrated Reasoning (TIR) paradigm. However, existing LLM-based agent training framework often focuses on answers' accuracy, overlooking specific alignment for behavior patterns. Consequently, agent often exhibits ineffective actions during TIR tasks, such as redundant and insufficient tool calls. How to calibrate erroneous behavioral patterns when executing TIR tasks, thereby exploring effective trajectories, remains an open-ended problem. In this paper, we propose ET-Agent, a training framework for calibrating agent's tool-use behavior through two synergistic perspectives: Self-evolving Data Flywheel and Behavior Calibration Training. Specifically, we introduce a self-evolutionary data flywheel to generate enhanced data, used to fine-tune LLM to improve its exploration ability. Based on this, we implement an two-phases behavior-calibration training framework. It is designed to progressively calibrate erroneous behavioral patterns to optimal behaviors. Further in-depth experiments confirm the superiority of across multiple dimensions, including correctness, efficiency, reasoning conciseness, and tool execution accuracy. Our ET-Agent framework provides practical insights for research in the TIR field. Codes can be found in https://github.com/asilverlight/ET-Agent