ATLaS:通過學習關鍵步驟進行智能體調優
ATLaS: Agent Tuning via Learning Critical Steps
March 4, 2025
作者: Zhixun Chen, Ming Li, Yuxuan Huang, Yali Du, Meng Fang, Tianyi Zhou
cs.AI
摘要
大型語言模型(LLM)代理在多領域任務中展現了卓越的泛化能力。現有的代理調優方法通常對整個專家軌跡進行監督式微調。然而,對完整軌跡的行為克隆可能會引入專家偏見,並削弱對未涵蓋於專家數據中的狀態的泛化能力。此外,規劃、對中間子任務的複雜推理以及戰略決策等關鍵步驟對於代理任務的成功至關重要,因此學習這些步驟是提升LLM代理的關鍵。為了實現更有效且高效的代理調優,我們提出了ATLaS,該方法識別專家軌跡中的關鍵步驟,並僅對這些步驟進行LLM的微調,從而降低成本。通過將訓練重點集中在少數關鍵步驟上,我們的方法減少了對整個軌跡過擬合的風險,並促進了在不同環境和任務中的泛化能力。在大量實驗中,僅使用ATLaS選取的30%關鍵步驟進行微調的LLM,其表現優於對所有步驟進行微調的LLM以及近期開源的LLM代理。ATLaS保持並提升了基礎LLM作為與多樣環境交互的通用代理的技能。
English
Large Language Model (LLM) agents have demonstrated remarkable generalization
capabilities across multi-domain tasks. Existing agent tuning approaches
typically employ supervised finetuning on entire expert trajectories. However,
behavior-cloning of full trajectories can introduce expert bias and weaken
generalization to states not covered by the expert data. Additionally, critical
steps, such as planning, complex reasoning for intermediate subtasks, and
strategic decision-making, are essential to success in agent tasks, so learning
these steps is the key to improving LLM agents. For more effective and
efficient agent tuning, we propose ATLaS that identifies the critical steps in
expert trajectories and finetunes LLMs solely on these steps with reduced
costs. By steering the training's focus to a few critical steps, our method
mitigates the risk of overfitting entire trajectories and promotes
generalization across different environments and tasks. In extensive
experiments, an LLM finetuned on only 30% critical steps selected by ATLaS
outperforms the LLM finetuned on all steps and recent open-source LLM agents.
ATLaS maintains and improves base LLM skills as generalist agents interacting
with diverse environments.Summary
AI-Generated Summary