AgentFly:无需微调大语言模型的智能体优化方法
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
August 22, 2025
作者: Huichi Zhou, Yihang Chen, Siyuan Guo, Xue Yan, Kin Hei Lee, Zihan Wang, Ka Yiu Lee, Guchun Zhang, Kun Shao, Linyi Yang, Jun Wang
cs.AI
摘要
本文提出了一种新颖的学习范式,用于自适应大型语言模型(LLM)智能体,无需对底层LLM进行微调。现有方法往往要么僵化,依赖于静态、手工设计的反思流程,要么计算密集,需要更新LLM模型参数的梯度。相比之下,我们的方法通过基于记忆的在线强化学习实现了低成本的持续适应。我们将此形式化为记忆增强的马尔可夫决策过程(M-MDP),配备了一个神经案例选择策略来指导行动决策。过去的经验存储在情景记忆中,记忆可以是可微分的或非参数的。策略通过记忆重写机制根据环境反馈持续更新,而策略改进则通过高效的记忆读取(检索)实现。我们在深度研究场景中实例化了我们的智能体模型,即AgentFly,它在GAIA验证集上取得了87.88%的Pass@3成绩,在测试集上达到了79.40%。在DeepResearcher数据集上,它获得了66.6%的F1分数和80.4%的PM分数,超越了基于训练的最先进方法,而基于案例的记忆在分布外任务上增加了4.7%到9.6%的绝对分数。我们的方法为开发能够持续、实时学习且无需梯度更新的通用LLM智能体提供了一条可扩展且高效的路径,推动了机器学习向开放式技能获取和深度研究场景的发展。代码可在https://github.com/Agent-on-the-Fly/AgentFly获取。
English
In this paper, we introduce a novel learning paradigm for adaptive Large
Language Model (LLM) agents that eliminates the need for fine-tuning the
underlying LLMs. Existing approaches are often either rigid, relying on static,
handcrafted reflection workflows, or computationally intensive, requiring
gradient updates of LLM model parameters. In contrast, our method enables
low-cost continual adaptation via memory-based online reinforcement learning.
We formalise this as a Memory-augmented Markov Decision Process (M-MDP),
equipped with a neural case-selection policy to guide action decisions. Past
experiences are stored in an episodic memory, either differentiable or
non-parametric. The policy is continually updated based on environmental
feedback through a memory rewriting mechanism, whereas policy improvement is
achieved through efficient memory reading (retrieval). We instantiate our agent
model in the deep research setting, namely AgentFly, which attains top-1 on
GAIA validation (87.88% Pass@3) and 79.40% on the test set. It reaches
66.6% F1 and 80.4% PM on the DeepResearcher dataset, outperforming the
state-of-the-art training-based method, while case-based memory adds 4.7% to
9.6% absolute points on out-of-distribution tasks. Our approach offers a
scalable and efficient pathway for developing generalist LLM agents capable of
continuous, real-time learning without gradient updates, advancing machine
learning towards open-ended skill acquisition and deep research scenarios. The
code is available at https://github.com/Agent-on-the-Fly/AgentFly.