MUSE-Autoskill: スキル生成、記憶、管理、評価による自己進化エージェント

要旨

大規模言語モデル（LLM）エージェントは、複雑なタスクを解決するために再利用可能なスキルに依存している。しかし、既存のスキル作成手法ではスキルを孤立した静的な成果物として扱うため、その再利用性、信頼性、および長期的な改善が制限されている。本稿では、MUSE-Autoskill Agent（Memory-Utilizing Skill Evolution）を提案する。これは、エージェントがスキルの作成、再利用、洗練を統一されたライフサイクル（作成、記憶、管理、評価、洗練）の下で継続的に行うことにより、タスク解決能力を向上させるスキル中心型エージェントフレームワークである。本フレームワークにより、エージェントは必要に応じてスキルを作成し、タスク間で保存・再利用し、効率的に整理・選択し、単体テストと実行時フィードバックを通じて評価し、継続的に洗練することが可能となる。さらに、タスク間で各スキルの経験を蓄積するスキルレベルの記憶を導入し、時間の経過とともにより効果的な再利用と適応を実現する。SkillsBenchでの実験は、ライフサイクル管理されたスキルがタスク成功率、効率性、再利用性、およびエージェント間転送を向上させるという初期の証拠を提供し、スキルを長期間持続し、経験を認識し、テスト可能な資産として扱うことの重要性を強調している。

English

Large language model (LLM) agents rely on reusable skills to solve complex tasks. However, existing skill creation approaches treat skills as isolated and static artifacts, limiting their reusability, reliability, and long-term improvement. We propose MUSE-Autoskill Agent (Memory-Utilizing Skill Evolution), a skill-centric agent framework that lets agents continuously improve their task-solving capability by creating, reusing, and refining skills under a unified lifecycle (creation, memory, management, evaluation, and refinement). Our framework enables agents to create skills on demand, store and reuse them across tasks, organize and select them efficiently, and evaluate them through unit tests and runtime feedback for continuous refinement. We further introduce skill-level memory that accumulates experience for each skill across tasks, enabling more effective reuse and adaptation over time. Experiments on SkillsBench provide initial evidence that lifecycle-managed skills can improve task success, efficiency, reuse, and cross-agent transfer, highlighting the importance of treating skills as long-lived, experience-aware, and testable assets.