在職學習：一個經驗驅動的自我進化代理，用於長期任務

摘要

大型語言模型在多個領域展現了卓越的能力，然而在將其部署為執行現實世界長期任務的AI代理時，仍面臨重大挑戰。現有的LLM代理存在一個關鍵限制：它們在測試時是靜態的，無法從經驗中學習，缺乏積累知識和在工作中持續改進的能力。為應對這一挑戰，我們提出了MUSE，這是一種新穎的代理框架，它引入了一個以分層記憶模塊為核心的經驗驅動、自我進化系統。MUSE組織了多層次的經驗，並利用這些經驗來規劃和執行跨多個應用的長期任務。在每個子任務執行後，代理會自主反思其執行軌跡，將原始軌跡轉化為結構化經驗，並將其整合回記憶模塊中。這一機制使代理能夠超越其靜態預訓練參數，促進持續學習和自我進化。我們在長期生產力基準TAC上評估了MUSE。僅使用輕量級的Gemini-2.5 Flash模型，MUSE就以顯著優勢達到了新的SOTA性能。充分的實驗表明，隨著代理自主積累經驗，它展現出越來越優越的任務完成能力，以及強大的持續學習和自我進化能力。此外，MUSE積累的經驗展現出強烈的泛化特性，能夠在新任務上實現零樣本改進。MUSE為能夠自動化現實世界生產力任務的AI代理建立了一個新範式。

English

Large Language Models have demonstrated remarkable capabilities across diverse domains, yet significant challenges persist when deploying them as AI agents for real-world long-horizon tasks. Existing LLM agents suffer from a critical limitation: they are test-time static and cannot learn from experience, lacking the ability to accumulate knowledge and continuously improve on the job. To address this challenge, we propose MUSE, a novel agent framework that introduces an experience-driven, self-evolving system centered around a hierarchical Memory Module. MUSE organizes diverse levels of experience and leverages them to plan and execute long-horizon tasks across multiple applications. After each sub-task execution, the agent autonomously reflects on its trajectory, converting the raw trajectory into structured experience and integrating it back into the Memory Module. This mechanism enables the agent to evolve beyond its static pretrained parameters, fostering continuous learning and self-evolution. We evaluate MUSE on the long-horizon productivity benchmark TAC. It achieves new SOTA performance by a significant margin using only a lightweight Gemini-2.5 Flash model. Sufficient Experiments demonstrate that as the agent autonomously accumulates experience, it exhibits increasingly superior task completion capabilities, as well as robust continuous learning and self-evolution capabilities. Moreover, the accumulated experience from MUSE exhibits strong generalization properties, enabling zero-shot improvement on new tasks. MUSE establishes a new paradigm for AI agents capable of real-world productivity task automation.

在職學習：一個經驗驅動的自我進化代理，用於長期任務

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

摘要

Support