ChatPaper.aiChatPaper

AgentFly:無需微調大型語言模型的智能體精細調優

AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

August 22, 2025
作者: Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, Jun Wang
cs.AI

摘要

本文提出了一種新穎的學習範式,用於適應性大型語言模型(LLM)代理,該範式消除了對底層LLM進行微調的需求。現有方法往往要么過於僵化,依賴於靜態、手工製作的反思工作流程,要么計算密集,需要對LLM模型參數進行梯度更新。相比之下,我們的方法通過基於記憶的在線強化學習實現了低成本的持續適應。我們將其形式化為記憶增強的馬爾可夫決策過程(M-MDP),配備了神經案例選擇策略來指導行動決策。過去的經驗存儲在情節記憶中,無論是可微分的還是非參數的。該策略通過記憶重寫機制基於環境反饋不斷更新,而策略改進則通過高效的記憶讀取(檢索)實現。我們在深度研究場景中實例化了我們的代理模型,即AgentFly,它在GAIA驗證集上達到了87.88%的Pass@3,在測試集上達到了79.40%。它在DeepResearcher數據集上達到了66.6%的F1和80.4%的PM,超越了基於訓練的最先進方法,而基於案例的記憶在分佈外任務上增加了4.7%到9.6%的絕對分數。我們的方法為開發能夠在不進行梯度更新的情況下進行持續、實時學習的通用LLM代理提供了一條可擴展且高效的途徑,推動機器學習向開放式技能獲取和深度研究場景邁進。代碼可在https://github.com/Agent-on-the-Fly/AgentFly獲取。
English
In this paper, we introduce a novel learning paradigm for adaptive Large Language Model (LLM) agents that eliminates the need for fine-tuning the underlying LLMs. Existing approaches are often either rigid, relying on static, handcrafted reflection workflows, or computationally intensive, requiring gradient updates of LLM model parameters. In contrast, our method enables low-cost continual adaptation via memory-based online reinforcement learning. We formalise this as a Memory-augmented Markov Decision Process (M-MDP), equipped with a neural case-selection policy to guide action decisions. Past experiences are stored in an episodic memory, either differentiable or non-parametric. The policy is continually updated based on environmental feedback through a memory rewriting mechanism, whereas policy improvement is achieved through efficient memory reading (retrieval). We instantiate our agent model in the deep research setting, namely AgentFly, which attains top-1 on GAIA validation (87.88% Pass@3) and 79.40% on the test set. It reaches 66.6% F1 and 80.4% PM on the DeepResearcher dataset, outperforming the state-of-the-art training-based method, while case-based memory adds 4.7% to 9.6% absolute points on out-of-distribution tasks. Our approach offers a scalable and efficient pathway for developing generalist LLM agents capable of continuous, real-time learning without gradient updates, advancing machine learning towards open-ended skill acquisition and deep research scenarios. The code is available at https://github.com/Agent-on-the-Fly/AgentFly.
PDF837August 25, 2025