仕事を通じた学習：長期タスクのための経験駆動型自己進化エージェント

要旨

大規模言語モデル（LLM）は多様な領域で顕著な能力を発揮しているが、現実世界の長期的タスクにおけるAIエージェントとして展開する際には依然として重大な課題が存在する。既存のLLMエージェントは、テスト時に静的であり、経験から学習することができないという重大な制約を抱えている。これにより、知識を蓄積し、継続的に業務を改善する能力が欠如している。この課題に対処するため、我々はMUSEという新しいエージェントフレームワークを提案する。MUSEは、階層的なメモリモジュールを中心とした経験駆動型の自己進化システムを導入する。MUSEは多様なレベルの経験を組織化し、それらを活用して複数のアプリケーションにわたる長期的タスクを計画・実行する。各サブタスクの実行後、エージェントは自律的にその軌跡を振り返り、生の軌跡を構造化された経験に変換し、それをメモリモジュールに統合する。このメカニズムにより、エージェントは静的に事前学習されたパラメータを超えて進化し、継続的な学習と自己進化を促進する。我々はMUSEを長期的生産性ベンチマークTACで評価した。軽量なGemini-2.5 Flashモデルのみを使用して、大幅な差で新しいSOTA性能を達成した。十分な実験により、エージェントが自律的に経験を蓄積するにつれて、タスク完了能力が向上し、堅牢な継続的学習と自己進化能力を示すことが実証された。さらに、MUSEから蓄積された経験は強力な汎化特性を示し、新しいタスクに対するゼロショット改善を可能にする。MUSEは、現実世界の生産性タスク自動化が可能なAIエージェントの新しいパラダイムを確立する。

English

Large Language Models have demonstrated remarkable capabilities across diverse domains, yet significant challenges persist when deploying them as AI agents for real-world long-horizon tasks. Existing LLM agents suffer from a critical limitation: they are test-time static and cannot learn from experience, lacking the ability to accumulate knowledge and continuously improve on the job. To address this challenge, we propose MUSE, a novel agent framework that introduces an experience-driven, self-evolving system centered around a hierarchical Memory Module. MUSE organizes diverse levels of experience and leverages them to plan and execute long-horizon tasks across multiple applications. After each sub-task execution, the agent autonomously reflects on its trajectory, converting the raw trajectory into structured experience and integrating it back into the Memory Module. This mechanism enables the agent to evolve beyond its static pretrained parameters, fostering continuous learning and self-evolution. We evaluate MUSE on the long-horizon productivity benchmark TAC. It achieves new SOTA performance by a significant margin using only a lightweight Gemini-2.5 Flash model. Sufficient Experiments demonstrate that as the agent autonomously accumulates experience, it exhibits increasingly superior task completion capabilities, as well as robust continuous learning and self-evolution capabilities. Moreover, the accumulated experience from MUSE exhibits strong generalization properties, enabling zero-shot improvement on new tasks. MUSE establishes a new paradigm for AI agents capable of real-world productivity task automation.

仕事を通じた学習：長期タスクのための経験駆動型自己進化エージェント

Learning on the Job: An Experience-Driven Self-Evolving Agent for Long-Horizon Tasks

要旨

Support