ChatPaper.aiChatPaper

无计划的目标仅是愿望:面向长程智能体任务的高效全局规划器训练

A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks

October 7, 2025
作者: Shuzheng Si, Haozhe Zhao, Kangyang Luo, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun
cs.AI

摘要

基于大型语言模型(LLMs)的智能体在长期任务中因缺乏全局规划,常陷入无脑试错并产生幻觉性行为。本文提出了一种规划与执行框架,并引入了EAGLET,一种高效且有效的规划器训练方法,旨在无需人工干预的情况下提升执行智能体的规划能力。具体而言,我们通过两步流程训练一个即插即用的全局规划器:首先,利用我们提出的同源共识过滤策略从高级LLM中合成高质量规划,并采用微调作为冷启动;其次,通过基于规则的强化学习阶段,采用新颖的执行能力增益奖励机制进一步优化规划器,确保其能处理不同难度的任务指令。在三个长期智能体任务上的实验表明,配备我们规划器的执行智能体超越了现有方法,达到了新的最优性能。同时,EAGLET相比基于强化学习的基线方法减少了8倍的训练成本,且无需人工干预或额外训练数据,提供了一种高效且有效的解决方案。
English
Agents based on large language models (LLMs) struggle with brainless trial-and-error and generating hallucinatory actions due to a lack of global planning in long-horizon tasks. In this paper, we introduce a plan-and-execute framework and propose EAGLET, an efficient and effective planner training method to enhance the executor agent's planning abilities without human effort. Specifically, we train a plug-and-play global planner through a two-step process: we first synthesize high-quality plans from an advanced LLM using our proposed homologous consensus filtering strategy, and apply fine-tuning as a cold start. Moreover, we further improve the planner with a rule-based reinforcement learning stage using a novel executor capability gain reward, ensuring it can handle task instructions of varying difficulty. Experiments on three long-horizon agent tasks show that executor agents equipped with our planner outperform existing methods, achieving new state-of-the-art performance. Meanwhile, EAGLET reduces training costs by 8x compared to RL-based baselines, and it does not require manual effort or extra training data, offering an efficient and effective solution.
PDF32October 13, 2025