智能体省略机制：基于智能体强化学习的自适应思维与观察省略高效大语言模型训练

摘要

在多轮智能体-环境交互过程中管理思维与观察，是提升智能体效能的新兴策略。然而现有研究均等对待整个交互轨迹，忽视了不同轮次中思维的必要性与观察效用的动态差异。为此，我们首先定量研究了思维与观察如何影响智能体的效能与效率。基于研究结果，我们提出Agent-Omit这一统一训练框架，使大语言模型智能体能够自适应地省略冗余思维和观察。具体而言，我们首先合成包含单轮与多轮省略场景的小规模冷启动数据，对智能体进行省略行为微调。进一步提出省略感知的智能体强化学习方法，通过双采样机制和定制化省略奖励来增强智能体的自适应省略能力。理论上我们证明了省略策略的偏差存在KL散度上界。在五个智能体基准测试上的实验表明，我们构建的Agent-Omit-8B模型性能可比肩七种前沿大语言模型智能体，并在与七种高效大语言模型智能体方法的对比中实现了最佳效能-效率平衡。代码与数据已开源：https://github.com/usail-hkust/Agent-Omit。

English

Managing agent thought and observation during multi-turn agent-environment interactions is an emerging strategy to improve agent efficiency. However, existing studies treat the entire interaction trajectories equally, overlooking the thought necessity and observation utility varies across turns. To this end, we first conduct quantitative investigations into how thought and observation affect agent effectiveness and efficiency. Based on our findings, we propose Agent-Omit, a unified training framework that empowers LLM agents to adaptively omit redundant thoughts and observations. Specifically, we first synthesize a small amount of cold-start data, including both single-turn and multi-turn omission scenarios, to fine-tune the agent for omission behaviors. Furthermore, we introduce an omit-aware agentic reinforcement learning approach, incorporating a dual sampling mechanism and a tailored omission reward to incentivize the agent's adaptive omission capability. Theoretically, we prove that the deviation of our omission policy is upper-bounded by KL-divergence. Experimental results on five agent benchmarks show that our constructed Agent-Omit-8B could obtain performance comparable to seven frontier LLM agent, and achieve the best effectiveness-efficiency trade-off than seven efficient LLM agents methods. Our code and data are available at https://github.com/usail-hkust/Agent-Omit.

智能体省略机制：基于智能体强化学习的自适应思维与观察省略高效大语言模型训练

Agent-Omit: Training Efficient LLM Agents for Adaptive Thought and Observation Omission via Agentic Reinforcement Learning

摘要

Support