沒有計劃的目標只是願望:為長時程代理任務高效訓練全域規劃器
A Goal Without a Plan Is Just a Wish: Efficient and Effective Global Planner Training for Long-Horizon Agent Tasks
October 7, 2025
作者: Shuzheng Si, Haozhe Zhao, Kangyang Luo, Gang Chen, Fanchao Qi, Minjia Zhang, Baobao Chang, Maosong Sun
cs.AI
摘要
基於大型語言模型(LLM)的智能體在處理長期任務時,由於缺乏全局規劃,往往陷入無腦試錯和產生虛幻行動的困境。本文提出了一種計劃與執行框架,並介紹了EAGLET,這是一種高效且有效的規劃器訓練方法,旨在無需人工干預的情況下增強執行智能體的規劃能力。具體而言,我們通過兩步過程訓練一個即插即用的全局規劃器:首先,利用我們提出的同源共識過濾策略從先進的LLM中合成高質量計劃,並應用微調作為冷啟動。此外,我們進一步通過基於規則的強化學習階段改進規劃器,使用一種新穎的執行能力增益獎勵,確保其能夠處理不同難度的任務指令。在三個長期智能體任務上的實驗表明,配備我們規劃器的執行智能體超越了現有方法,達到了新的最先進性能。同時,EAGLET相比基於強化學習的基線方法,將訓練成本降低了8倍,且無需人工努力或額外訓練數據,提供了一種高效且有效的解決方案。
English
Agents based on large language models (LLMs) struggle with brainless
trial-and-error and generating hallucinatory actions due to a lack of global
planning in long-horizon tasks. In this paper, we introduce a plan-and-execute
framework and propose EAGLET, an efficient and effective planner training
method to enhance the executor agent's planning abilities without human effort.
Specifically, we train a plug-and-play global planner through a two-step
process: we first synthesize high-quality plans from an advanced LLM using our
proposed homologous consensus filtering strategy, and apply fine-tuning as a
cold start. Moreover, we further improve the planner with a rule-based
reinforcement learning stage using a novel executor capability gain reward,
ensuring it can handle task instructions of varying difficulty. Experiments on
three long-horizon agent tasks show that executor agents equipped with our
planner outperform existing methods, achieving new state-of-the-art
performance. Meanwhile, EAGLET reduces training costs by 8x compared to
RL-based baselines, and it does not require manual effort or extra training
data, offering an efficient and effective solution.